Loading...
Loading...
Found 213 Skills
Use when you need to verify Java performance optimizations by comparing profiling results before and after refactoring — including baseline validation, post-refactoring report generation, quantitative before/after metrics comparison, side-by-side flamegraph analysis, regression detection, or creating profiling-comparison-analysis and profiling-final-results documentation. Part of the skills-for-java project
Audit and improve SwiftUI runtime performance from code review and architecture. Use for requests to diagnose slow rendering, janky scrolling, high CPU/memory usage, excessive view updates, or layout thrash in SwiftUI apps, and to provide guidance for user-run Instruments profiling when code review alone is insufficient.
C++ template skill for reading template errors and optimizing compile times. Use when deciphering template error stacks, setting -ftemplate-backtrace-limit, writing concepts and requires-clauses, understanding SFINAE vs concepts, or profiling template instantiation bottlenecks with Templight. Activates on queries about C++ templates, template error messages, concepts, requires expressions, SFINAE, template metaprogramming, or slow template compilation.
Use when app feels slow, memory grows, battery drains, or diagnosing ANY performance issue. Covers memory leaks, profiling, Instruments workflows, retain cycles, performance optimization.
Application performance profiling and bottleneck identification — Node.js profiling, Chrome DevTools, flame graphs, memory leak detection, CPU profiling, React rendering performance. Activate on "profiling", "performance bottleneck", "flame graph", "memory leak", "slow app", "CPU profiling", "heap snapshot", "React re-renders", "EXPLAIN ANALYZE", "event loop lag", "clinic.js", "Core Web Vitals". NOT for infrastructure monitoring or observability (use logging-observability), load testing (use a load-testing skill), or database schema optimization.
Workflow for learning CuTe Python DSL by reading, importing, profiling, and extracting reusable patterns from CUTLASS Blackwell example kernels. Use when: (1) studying CUTLASS CuTe DSL reference implementations, (2) importing CUTLASS examples into the project runtime infrastructure, (3) building CuTe DSL knowledge base entries from profiling experiments, (4) understanding CuTe DSL API patterns, TMA pipelining, warpgroup scheduling, or persistent kernel structure.
Use when writing automation tests, functional tests, or any test in Unreal Engine. Also use when the user asks about "UE_LOG", logging, log categories, assertion, check, ensure, verify, DrawDebug, debug draw, console command, profiling, Unreal Insights, stat commands, or debugging techniques. See ue-module-build-system for test module setup, and ue-cpp-foundations for general C++ logging patterns.
Full Sentry SDK setup for Node.js, Bun, and Deno. Use when asked to "add Sentry to Node.js", "add Sentry to Bun", "add Sentry to Deno", "install @sentry/node", "@sentry/bun", or "@sentry/deno", or configure error monitoring, tracing, logging, profiling, metrics, crons, or AI monitoring for server-side JavaScript/TypeScript runtimes.
Route Unity and Unreal frame-time complaints into one bottleneck-first profiling brief. Use when the main job is interpreting profiler screenshots, `stat unit` / `stat gpu` output, benchmark-route complaints, or Steam Deck / target-device review packets; choosing the smallest useful next capture; naming one primary bottleneck family; and deciding whether to stay with quick packets, move to an engine-native profiler, or escalate further. Route generic app/service tuning to `performance-optimization`, build/editor/package failures to `game-build-log-triage`, broader game-production coordination to `bmad-gds`, and mixed demo/community feedback to `game-demo-feedback-triage`.
Claude Code skill (trtllm-agent-toolkit): implement or extend TensorRT-LLM AutoDeploy fusion transforms under transform/library/ in a TensorRT-LLM checkout. Prefer existing kernels and custom ops; use Triton only when no viable existing-kernel path exists. Use ad-graph-dump for AD_DUMP_GRAPHS_DIR workflows. Covers TRT-LLM paths, registry, default.yaml registration, graph validation, tests, and a review checklist — without prescribing profiling tools or throughput targets.
Analyze ncu (NVIDIA Nsight Compute) profiling output: SOL% bottleneck classification, roofline analysis, occupancy diagnosis, memory hierarchy analysis, warp stall analysis, metric interpretation, and programmatic .ncu-rep report analysis. NOT for kernel writing or code generation, Nsight Systems (nsys), host-side profiling, or system-level profiling.
ONLY for OpenAI Triton (@triton.jit) kernel development. NEVER use for CUDA C++ kernels, TileIR, or profiling tools (ncu, nsys). The user's request must involve Triton explicitly. Covers Triton-specific patterns: fused elementwise, reductions (softmax, LayerNorm, RMSNorm), tiled GEMM with triton.autotune, and flash attention. Workflow: design, write, verify (with fast-path for explicit requests).