Search Results: profiling

Found 298 Skills

Tools & Utilitiesabsolutelyskilled/absolut...

debugging-tools

Use this skill when debugging applications using Chrome DevTools, lldb, strace, network tools, or memory profilers. Triggers on Chrome DevTools, debugger, breakpoints, network debugging, memory profiling, strace, ltrace, core dumps, and any task requiring systematic debugging with specialized tools.

🇺🇸|EnglishTranslated

Frontend Developmentvercel/next.js

v8-jit

V8 JIT optimization patterns for writing high-performance JavaScript in Next.js server internals. Use when writing or reviewing hot-path code in app-render, stream-utils, routing, caching, or any per-request code path. Covers hidden classes / shapes, monomorphic call sites, inline caches, megamorphic deopt, closure allocation, array packing, and profiling with --trace-opt / --trace-deopt.

🇺🇸|EnglishTranslated

Code Qualitylevnikolaevich/claude-cod...

ln-810-performance-optimizer

Multi-cycle performance optimization with profiling and bottleneck analysis. Use when optimizing application performance.

🇺🇸|EnglishTranslated

Backend Developmenteduardo-sl/go-agent-skill...

go-performance-review

Detect performance anti-patterns and apply optimization techniques in Go. Covers allocations, string handling, slice/map preallocation, sync.Pool, benchmarking, and profiling with pprof. Use when checking performance, finding slow code, reducing allocations, profiling, or reviewing hot paths. Trigger examples: "check performance", "find slow code", "reduce allocations", "benchmark this", "profile", "optimize Go code". Do NOT use for concurrency correctness (use go-concurrency-review) or general code style (use go-coding-standards).

🇺🇸|EnglishTranslated

Code Qualitystackfox-labs/luau-skills

luau-performance

Use for Luau performance work focused on profiling hotspots, allocation-aware code structure, table and iteration costs, builtin and function-call fast paths, compiler/runtime optimization behavior, and environment constraints that change execution speed.

🇺🇸|EnglishTranslated

Tools & Utilitiesmohitmishra786/low-level-...

intel-vtune-amd-uprof

Intel VTune and AMD uProf profiling skill for microarchitecture analysis. Use when analyzing hotspots, microarchitecture bottlenecks, memory access patterns, pipeline stalls, or using the roofline model. Covers VTune Community Edition (free) and AMD uProf as a free alternative. Activates on queries about VTune, uProf, microarchitecture analysis, pipeline stalls, memory bandwidth, roofline model, or hardware performance analysis.

🇺🇸|EnglishTranslated

Backend Developmentbruce-lee-ly/cuda_auto_tu...

cuda-auto-tune

NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels. MANDATORY: every optimization MUST start with NCU profiling, followed by multi-dimensional analysis, then targeted code modification, then re-profiling to verify. Supports roofline, memory hierarchy, warp stalls, instruction mix, occupancy, divergence analysis. Provides implementation-specific code modifications: Native CUDA (launch config, memory patterns, async copy, Tensor Core), CUTLASS (ThreadblockShape, stages, epilogue, schedule policy, alignment), Triton (autotune params, compiler hints, tl.* API patterns), CuTe DSL (threads_per_cta, elems_per_thread, tiled_copy, copy atom, shared memory, warp/cta reduce). Use when optimizing any CUDA kernel performance.

🇺🇸|EnglishTranslated

2 scripts/Attention

AI & Machine Learningpepperu96/hyper-mla

profile-kernel

GPU kernel profiling workflow across supported kernel implementation languages. Provides commands for all 4 profiling modes (annotation, event, ncu, nsys), metric interpretation tables, bottleneck identification rules, and the output contract for returning compact results to the orchestrator. Use when: (1) profiling a kernel version, (2) interpreting profiling artifacts/reports, (3) comparing kernel versions, (4) identifying bottlenecks and optimization opportunities, (5) documenting performance in the development log.

🇺🇸|EnglishTranslated

AI & Machine Learningbbuf/sglang-auto-driven-s...

llm-torch-profiler-analysis

Unified LLM torch-profiler triage skill for `sglang`, `vllm`, and `TensorRT-LLM`. Use it to inspect an existing `trace.json(.gz)` or profile directory, or to drive live profiling against a running server and return one three-table report with kernel, overlap-opportunity, and fuse-pattern tables.

🇺🇸|EnglishTranslated

13 scripts/Attention

Tools & Utilitiesnvidia/skills

perf-nsight-compute-analysis

Analyze ncu (NVIDIA Nsight Compute) profiling output: SOL% bottleneck classification, roofline analysis, occupancy diagnosis, memory hierarchy analysis, warp stall analysis, metric interpretation, and programmatic .ncu-rep report analysis. NOT for kernel writing or code generation, Nsight Systems (nsys), host-side profiling, or system-level profiling.

🇺🇸|EnglishTranslated

Code Qualitydonchitos/claude-code-gam...

perf-profile

Structured performance profiling workflow. Identifies bottlenecks, measures against budgets, and generates optimization recommendations with priority rankings.

🇺🇸|EnglishTranslated

Code Qualitydralgorhythm/claude-agent...

optimizing-code

Improve code performance without changing behavior. Use when code fails latency/throughput requirements. Covers profiling, caching, and algorithmic optimization.

🇺🇸|EnglishTranslated