Loading...
Loading...
Found 298 Skills
Deep-dive data profiling for a specific table. Use when the user asks to profile a table, wants statistics about a dataset, asks about data quality, or needs to understand a table's structure and content. Requires a table name.
Performance profiling principles. Measurement, analysis, and optimization techniques.
Use when you need to set up Java application profiling to detect and measure performance issues — including automated async-profiler v4.0 setup, problem-driven profiling (CPU, memory, threading, GC, I/O), interactive profiling scripts, JFR integration with Java 25 (JEP 518, JEP 520), or collecting profiling data with flamegraphs and JFR recordings. Part of the skills-for-java project
Use when you need to verify Java performance optimizations by comparing profiling results before and after refactoring — including baseline validation, post-refactoring report generation, quantitative before/after metrics comparison, side-by-side flamegraph analysis, regression detection, or creating profiling-comparison-analysis and profiling-final-results documentation. Part of the skills-for-java project
Use when you need to analyze Java profiling data collected during the detection phase — including interpreting flamegraphs, memory allocation patterns, CPU hotspots, threading issues, systematic problem categorization, evidence documentation with profiling-problem-analysis and profiling-solutions markdown files, or prioritizing fixes using Impact/Effort scoring. Part of the skills-for-java project
JVM performance profiling with Java Flight Recorder (JFR), jcmd, and GC analysis. Use for identifying bottlenecks and memory issues. USE WHEN: user mentions "Java profiling", "JFR", "JVM performance", asks about "Java Flight Recorder", "jcmd", "heap dump", "GC tuning", "thread dump", "Java memory leak" DO NOT USE FOR: Node.js/Python profiling - use respective skills instead
Analyze Huawei Ascend NPU profiling data to discover hidden performance anomalies and produce a detailed model architecture report reverse-engineered from profiling. Trigger on Ascend profiling traces, NPU bottlenecks, device idle gaps, host-device issues, kernel_details.csv / trace_view.json / op_summary / communication.json. Also trigger on "profiling", "step time", "device bubble", "underfeed", "host bound", "device bound", "AICPU", "wait anchor", "kernel gap", "Ascend performance", "model architecture", "layer structure", "forward pass", "model structure". Runs anomaly discovery (bubble detection, wait-anchor, AICPU exposure) alongside model architecture analysis (layer classification, per-layer sub-structure, communication pipeline). Outputs a separate Markdown architecture report alongside anomaly analysis.
Python performance profiling with cProfile, tracemalloc, and line_profiler. Use for identifying bottlenecks and memory issues. USE WHEN: user mentions "Python profiling", "cProfile", "memory profiling", asks about "Python performance", "tracemalloc", "line_profiler", "py-spy", "Python optimization", "Python memory leak" DO NOT USE FOR: Java/Node.js profiling - use respective skills instead
AI for Science 场景下的昇腾 NPU Profiling 采集与性能分析 Skill,用于在华为 Ascend NPU 上使用 torch_npu.profiler 采集 L0、L1、L2 级性能数据,分析训练或推理中的算子耗时、调用栈、内存与瓶颈,并指导后续调优。
Code instrumentation for timing workloads. Two scenarios: (1) Training loop — inject manual timing to report per-iteration latency, throughput (samples/sec), and data load time. (2) Standalone kernel/op — write CUDA event timing code with warmup, per-iteration statistics, and anti-pattern avoidance. Also covers NVTX annotation for labeling profiler timelines. NOT for: running or analyzing profiler tools (nsys, ncu, Nsight Systems, Nsight Compute), writing kernels (Triton, CuTe, CUDA), applying optimizations (CUDA Graphs, gradient checkpointing, fusion), or interpreting roofline/SOL% metrics. Triggers: "measure throughput", "benchmark this function", "time my training loop", "samples per second", "NVTX annotate", "instrument my dataloader", "data load time", "kernel timing", "how do I time".
Use to help users get started with Nemo Gym reward profiling. Covers the basic ng_run, ng_collect_rollouts, and ng_reward_profile workflow, repeated rollouts, materialized inputs, rollout JSONL artifacts, task and rollout identity, output inspection, partial profiling, and rollout_infos. For failed jobs, prefer nemo-gym-debugging.
Performance optimization guide for Capacitor apps covering bundle size, rendering, memory, native bridge, and profiling. Use this skill when users need to optimize their app performance.