Loading...
Loading...
Found 569 Skills
Generates comprehensive synthetic fine-tuning datasets in ChatML format (JSONL) for use with Unsloth, Axolotl, and similar training frameworks. Gathers requirements, creates datasets with diverse examples, validates quality, and provides framework integration guidance.
Feishu Calendar (calendar): Provides comprehensive management capabilities for calendars and schedules (meetings). Core scenarios include: viewing/searching schedules, creating/updating schedules, managing attendees, checking free/busy status, and recommending available time slots. For high-frequency operations, prioritize using Shortcuts: +agenda (quick overview of today's/upcoming schedules), +create (create a schedule and invite attendees as needed), +freebusy (check the free/busy status of the user's primary calendar and RSVP status), +suggestion (provide multiple time recommendation solutions for appointment schedule requests with undetermined times).
Audit and optimize Convex application performance, covering hot path reads, write contention, subscription cost, and function limits. Use when a Convex feature is slow, reads too much data, writes too often, has OCC conflicts, or needs performance investigation.
Profile and optimize Python code using cProfile, memory profilers, and performance best practices. Use when debugging slow Python code, optimizing bottlenecks, or improving application performance.
Turns AI-generated demo UIs into real usable product workflows. Use when building, reviewing, or finishing apps, dashboards, forms, CRUD flows, onboarding, checkout, settings, auth-like flows, or any interface that must work beyond a static mockup.
improve a CLAUDE.md file using <important if> blocks to improve instruction adherence
Write, debug, and optimize CUTLASS and CuTeDSL GPU kernels using local source code, examples, and header references. Use when the user mentions CUTLASS, CuTe, CuTeDSL, cute::Layout, cute::Tensor, TiledMMA, TiledCopy, CollectiveMainloop, CollectiveEpilogue, GEMM kernel, grouped GEMM, sparse GEMM, flash attention CUTLASS, blackwell GEMM, hopper GEMM, FP8 GEMM, blockwise scaling, MoE GEMM, StreamK, warp specialization CUTLASS, TMA CUTLASS, or asks about writing high-performance CUDA kernels with CUTLASS/CuTe templates.
End-to-end Stake game development workflow for math, RGS contract, frontend playback, and compliance gating. Use when building or updating Stake games, defining game modes and RTP targets, validating generated books/index metadata, validating event streams, integrating frontend event playback, implementing RGS communication and replay mode, or preparing publication checks including social-language and jurisdiction requirements.
Develop, debug, and optimize SGLang LLM serving engine. Use when the user mentions SGLang, sglang, srt, sgl-kernel, LLM serving, model inference, KV cache, attention backend, FlashInfer, MLA, MoE routing, speculative decoding, disaggregated serving, TP/PP/EP, radix cache, continuous batching, chunked prefill, CUDA graph, model loading, quantization FP8/GPTQ/AWQ, JIT kernel, triton kernel SGLang, or asks about serving LLMs with SGLang.
Write, debug, and optimize Triton and Gluon GPU kernels using local source code, tutorials, and kernel references. Use when the user mentions Triton, Gluon, tl.load, tl.store, tl.dot, triton.jit, gluon.jit, wgmma, tcgen05, TMA, tensor descriptor, persistent kernel, warp specialization, fused attention, matmul kernel, kernel fusion, tl.program_id, triton autotune, MXFP, FP8, FP4, block-scaled matmul, SwiGLU, top-k, or asks about writing GPU kernels in Python.
Query NVIDIA PTX ISA 9.1, CUDA Runtime API 13.1, Driver API 13.1, Programming Guide v13.1, Best Practices Guide, Nsight Compute, Nsight Systems local documentation. Debug and optimize GPU kernels with nsys/ncu/compute-sanitizer workflows. Use when writing, debugging, or optimizing CUDA code, GPU kernels, PTX instructions, inline PTX, TensorCore operations (WMMA, WGMMA, TMA, tcgen05), or when the user mentions CUDA API functions, error codes, device properties, memory management, profiling, GPU performance, compute capabilities, CUDA Graphs, Cooperative Groups, Unified Memory, dynamic parallelism, or CUDA programming model concepts.
Louis Rossmann's writing voice for general prose: testable-number density, high sentence-length variance, claim-then-proof structure, contractions, contempt shown through precision. Consult when writing in his voice.