Loading...
Loading...
Found 2,253 Skills
Analyze your project's typography setup and identify issues
Read every docs/benchmarks/runs/*.json and surface drift in win rate, latency, escalation rate, and LLM-baseline cost over time
Analyze host/CPU overhead in TensorRT-LLM inference from nsys traces. Detect whether host overhead is the bottleneck using GPU idle ratio, host prep exposed ratio, and per-phase evidence. For regressions, isolate forward steps via allreduce/NVTX patterns, compare host operation breakdowns across versions, and identify scheduling or request-management overhead. Supports optional inter-kernel gap, eager-vs-graph, pattern mapping, and multi-rank straggler drill-down. Use standalone or within perf-analysis. Triggers: host overhead, inter-step gap, scheduling overhead, forward step isolation, nsys iteration analysis, NVTX breakdown, request management overhead, GPU idle, host bottleneck, host prep exposed, inter-kernel gap, bubble analysis, graph coverage, eager kernel, rank imbalance, straggler detection.
Bundle Size Analyzer - Auto-activating skill for Frontend Development. Triggers on: bundle size analyzer, bundle size analyzer Part of the Frontend Development skill category.
Review backend code for quality, security, maintainability, and best practices based on established checklist rules. Use when the user requests a review, analysis, or improvement of backend files (e.g., `.py`) under the `api/` directory. Do NOT use for frontend files (e.g., `.tsx`, `.ts`, `.js`). Supports pending-change review, code snippets review, and file-focused review.
Эксперт ML API. Используй для model serving, inference endpoints, FastAPI и ML deployment.
Operational guide for enabling hierarchical context parallelism in Megatron-Bridge, including config knobs, code anchors, pitfalls, and verification.
Representative MoE training playbooks by hardware platform and model family. Summarizes rounded throughput bands, parallelism patterns, and common tuning stacks.
Expert cuTile programming assistant. Write high-performance GPU kernels using cuTile's tile-based programming model with proper validation and optimization. Supports deep agent orchestration for complex multi-kernel tasks.