Loading...
Loading...
Found 246 Skills
Use when mapping code paths, entrypoints, and likely hot files before profiling.
Exploratory Data Analysis (EDA): profiling, visualization, correlation analysis, and data quality checks. Use when understanding dataset structure, distributions, relationships, or preparing for feature engineering and modeling.
Guide for Grafana Pyroscope continuous profiling. Use for Kubernetes Helm deployment, Go/Java/Python/.NET/Ruby/Node.js profiling, storage backends, trace-to-profile linking, and troubleshooting.
Use when profiling CPU/memory hot paths, generating flame graphs, or capturing JFR/perf evidence.
Optimizes Python library performance through profiling (cProfile, PyInstrument), memory analysis (memray, tracemalloc), benchmarking (pytest-benchmark), and optimization strategies. Use when analyzing performance bottlenecks, finding memory leaks, or setting up performance regression testing.
Full Sentry SDK setup for Python. Use when asked to "add Sentry to Python", "install sentry-sdk", "setup Sentry in Python", or configure error monitoring, tracing, profiling, logging, metrics, crons, or AI monitoring for Python applications. Supports Django, Flask, FastAPI, Celery, Starlette, AIOHTTP, Tornado, and more.
Flutter DevTools, Profiling, Logging & Memory Management
Use this skill when profiling application performance, debugging memory leaks, optimizing latency, benchmarking code, or reducing resource consumption. Triggers on CPU profiling, memory profiling, flame graphs, garbage collection tuning, load testing, P99 latency, throughput optimization, bundle size reduction, and any task requiring performance analysis or optimization.
Audit and improve SwiftUI runtime performance from code review and architecture. Use for requests to diagnose slow rendering, janky scrolling, high CPU/memory usage, excessive view updates, or layout thrash in SwiftUI apps, and to provide guidance for user-run Instruments profiling when code review alone is insufficient.
Use when app feels slow, memory grows, battery drains, or diagnosing ANY performance issue. Covers memory leaks, profiling, Instruments workflows, retain cycles, performance optimization.
NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels. MANDATORY: every optimization MUST start with NCU profiling, followed by multi-dimensional analysis, then targeted code modification, then re-profiling to verify. Supports roofline, memory hierarchy, warp stalls, instruction mix, occupancy, divergence analysis. Provides implementation-specific code modifications: Native CUDA (launch config, memory patterns, async copy, Tensor Core), CUTLASS (ThreadblockShape, stages, epilogue, schedule policy, alignment), Triton (autotune params, compiler hints, tl.* API patterns), CuTe DSL (threads_per_cta, elems_per_thread, tiled_copy, copy atom, shared memory, warp/cta reduce). Use when optimizing any CUDA kernel performance.
Workflow for learning CuTe Python DSL by reading, importing, profiling, and extracting reusable patterns from CUTLASS Blackwell example kernels. Use when: (1) studying CUTLASS CuTe DSL reference implementations, (2) importing CUTLASS examples into the project runtime infrastructure, (3) building CuTe DSL knowledge base entries from profiling experiments, (4) understanding CuTe DSL API patterns, TMA pipelining, warpgroup scheduling, or persistent kernel structure.