Total 50,615 skills, AI & Machine Learning has 8484 skills
Showing 12 of 8484 skills
Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.
Shared optimization guidance plus cuTile Python DSL-specific overlays. Use when: (1) selecting optimizations for a cuTile Python DSL kernel, (2) checking cuTile-specific implementation traps, (3) deciding whether a profiling finding belongs in shared knowledge or a cuTile overlay, (4) updating cuTile Python DSL optimization docs, (5) reviewing how a shared pattern maps to cuTile.
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with flexible infrastructure backends.
Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.
Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s of nodes. Use for batch inference, data preprocessing, multi-modal data loading, or distributed ETL pipelines.
Workflow for learning CuTe Python DSL by reading, importing, profiling, and extracting reusable patterns from CUTLASS Blackwell example kernels. Use when: (1) studying CUTLASS CuTe DSL reference implementations, (2) importing CUTLASS examples into the project runtime infrastructure, (3) building CuTe DSL knowledge base entries from profiling experiments, (4) understanding CuTe DSL API patterns, TMA pipelining, warpgroup scheduling, or persistent kernel structure.
SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/data/bbuf/repos/sglang`, and use the ready H100 remote environment for SGLang **diffusion** development and validation. Use when a task needs diffusion model smoke tests, Triton/CUDA kernel validation, torch.compile diffusion checks, or a safe remote copy for diffusion-specific SGLang changes.
Prompting techniques for AI video generation models on Replicate. Use when writing prompts for video models or building video generation features.
Operate LM Studio's `lms` CLI and local/remote LM Studio servers for model discovery, server status checks, model loading, endpoint smoke tests, and downstream OpenAI-compatible wiring. Use when the user mentions LM Studio, `lms`, a local model server, `/v1/models`, a remote LM Studio host, or wants to connect another tool to LM Studio; even if they only ask to test a local OpenAI-compatible endpoint or choose the correct loaded-model identifier. Triggers on: lmstudio, lm studio, lms, local model server, LM Studio API, LM Studio endpoint, /v1/models, connect Strix to LM Studio, load model in LM Studio.
Juicebro Content Aggregation Skill. Enables Agents to query, aggregate, and navigate public content from "Juicebro" across 13 platforms including Weibo, Xiaohongshu, Douyin, Bilibili, Xueqiu, Toutiao, WeChat Official Account, Xiaoyuzhou, etc. Supports full-platform search, single-platform query, topic filtering, content type screening, summary report generation, and platform navigation recommendation. Note: This Skill is in the specification-first phase, and content acquisition depends on the Agent's own public access capabilities.
Ultra-lightweight channel for feature workflows: No need to write design docs, checklists, or conduct phased reviews. Let AI write code directly as it normally would, but before it starts, tell it where the CodeStable knowledge base in the project is and how to search it. This way, the code it writes will have fewer pitfalls and be more consistent with project conventions. Trigger scenarios: Users say "fast mode", "fastforward", "skip all those steps", "just start coding", "help me make xxx" and the requirement is too small to go through the design process.
Unified LLM torch-profiler triage skill for `sglang`, `vllm`, and `TensorRT-LLM`. Use it to inspect an existing `trace.json(.gz)` or profile directory, or to drive live profiling against a running server and return one three-table report with kernel, overlap-opportunity, and fuse-pattern tables.