Loading...
Loading...
Found 298 Skills
Optimize multi-agent systems with coordinated profiling, workload distribution, and cost-aware orchestration. Use when improving agent performance, throughput, or reliability.
Performance optimization expert covering profiling, benchmarking, memory allocation, SIMD, cache optimization, false sharing, lock contention, and NUMA-aware programming.
Advanced sub-skill for PyTorch focused on deep research and production engineering. Covers custom Autograd functions, module hooks, advanced initialization, Distributed Data Parallel (DDP), and performance profiling.
When validating system performance under load, identifying bottlenecks through profiling, or optimizing application responsiveness. Covers load testing (k6, Locust), profiling (CPU, memory, I/O), and optimization strategies (caching, query optimization, Core Web Vitals). Use for capacity planning, regression detection, and establishing performance SLOs.
Profile datasets to understand schema, quality, and characteristics. Use when analyzing data files (CSV, JSON, Parquet), discovering dataset properties, assessing data quality, or when user mentions data profiling, schema detection, data analysis, or quality metrics. Provides basic and intermediate profiling including distributions, uniqueness, and pattern detection.
When validating system performance under load, identifying bottlenecks through profiling, or optimizing application responsiveness. Covers load testing (k6, Locust), profiling (CPU, memory, I/O), and optimization strategies (caching, query optimization, Core Web Vitals). Use for capacity planning, regression detection, and establishing performance SLOs.
Hunt performance bottlenecks with swift precision. Stalk the slow paths, pinpoint the prey, streamline the code, catch the gains, and celebrate the win. Use when optimizing performance, profiling code, or hunting for speed.
Use when app feels slow, memory grows, battery drains, or diagnosing ANY performance issue. Covers memory leaks, profiling, Instruments workflows, retain cycles, performance optimization.
Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, and targeted tuning. Covers tile sizes, occupancy, autotune configs, TMA, latency hints, persistent scheduling, num_ctas, flush_to_zero, and IR-level debugging. Use when asked to "optimize cutile kernel", "improve kernel perf", "tune cutile performance", "make kernel faster", or iteratively benchmark and refine a cuTile GPU kernel in the TileGym project.
Optimize end-to-end application performance with profiling, observability, and backend/frontend tuning. Use when coordinating performance optimization across the stack.
Comprehensive testing and development workflow specialist combining DDD testing, characterization tests, performance profiling, code review, and quality assurance. Use when writing tests, measuring coverage, creating characterization tests, performing TDD, running CI/CD quality checks, or reviewing pull requests. Do NOT use for debugging runtime errors (use expert-debug agent instead) or code refactoring (use moai-workflow-ddd instead).
Use when automating Instruments profiling, running headless performance analysis, or integrating profiling into CI/CD - comprehensive xctrace CLI reference with record/export patterns