Loading...
Loading...
Found 1,934 Skills
Reduce the performance impact of third-party scripts on your site using optimized loading strategies.
Deep Performance Optimization Skill for Triton Operators on Ascend NPU, dedicated to achieving the Triton operator performance improvement required by users. Core technologies include but are not limited to Unified Buffer (UB) capacity planning, multi-Tokens parallel processing, MTE/Vector pipeline parallelism, mask optimization, etc. This Skill must be triggered when the user mentions the following: performance optimization of Vector-type Triton operators on Ascend NPU.
AscendC Operator Design Completion - Assist users in completing operator architecture design, interface definition, and performance planning. Use this skill when users mention operator design, operator development, tiling strategy, memory planning, AscendC kernel design, two-level tiling, inter-core splitting, or intra-core splitting.
Generate Triton operator requirement documents suitable for Ascend NPU. Used when users need to design new Triton operators, write operator requirement documents, or perform operator performance optimization design.
Analyze Huawei Ascend NPU profiling data to discover hidden performance anomalies and produce a detailed model architecture report reverse-engineered from profiling. Trigger on Ascend profiling traces, NPU bottlenecks, device idle gaps, host-device issues, kernel_details.csv / trace_view.json / op_summary / communication.json. Also trigger on "profiling", "step time", "device bubble", "underfeed", "host bound", "device bound", "AICPU", "wait anchor", "kernel gap", "Ascend performance", "model architecture", "layer structure", "forward pass", "model structure". Runs anomaly discovery (bubble detection, wait-anchor, AICPU exposure) alongside model architecture analysis (layer classification, per-layer sub-structure, communication pipeline). Outputs a separate Markdown architecture report alongside anomaly analysis.
Troubleshoot and optimize the performance of Ascend C operators. This skill is applicable when users develop, review or optimize Ascend C kernel operators, or triggered when users mention keywords such as Ascend C performance optimization, operator optimization, tiling, pipeline, data copy, memory optimization, NPU/Ascend.
Maintain JSONL-only profiler performance test cases under csrc/ops/<op>/test in ascend-kernel. Collect data using torch_npu.profiler (with fixed warmup=5 and active=5), aggregate the Total Time(us) from ASCEND_PROFILER_OUTPUT/op_statistic.csv, and output a unified Markdown comparison report (custom operator vs baseline) that includes a DType column. Do not generate perf_cases.json or *_profiler_results.json. Refer to examples/layer_norm_profiler_reference/ for the reference implementation.
Task Orchestration for Full-Process Development of Ascend Triton Operators. Used when users need to develop Triton Operators, covering the complete workflow of environment configuration → requirement design → code generation → static inspection → precision verification → performance evaluation → document generation → performance optimization.
Reviews iOS animation code for correctness, performance, accessibility, and Apple API best practices. Use when reviewing .swift files containing animation code — withAnimation, .animation(), PhaseAnimator, KeyframeAnimator, matchedGeometryEffect, navigationTransition, CABasicAnimation, CASpringAnimation, UIViewPropertyAnimator, UIDynamicAnimator, symbolEffect, scrollTransition, contentTransition, or custom Transition conformances.
Frontend full-chain performance optimization guide based on Web Vitals metrics. Provides metric thresholds, diagnostic methods, and optimization strategies for LCP, FCP, INP, CLS, TTFB, TBT. Use when optimizing frontend performance, analyzing Web Vitals, reducing page load time, fixing layout shifts, improving interaction responsiveness, or reviewing frontend code for performance issues.
Use when diagnosing service performance issues. Use when user says "service is slow", "high CPU", "out of memory", "check resource usage", "monitor service", or "why is my service lagging".
Use when analyzing optimal posting times on Xiaohongshu, studying audience activity patterns, determining when followers are most active, scheduling content for maximum reach, or measuring time-based performance