Total 46,040 skills
Showing 12 of 46040 skills
Replay-first debug flow for SGLang serving problems. Use when a live or recent server shows health-check failures, latency or throughput regressions, queue growth, timeouts, distributed stalls, crash dumps, wrong outputs after deploys, or PD/EP/HiCache issues, and the job is to turn the problem into a replay plus the right next debug tool.
Shared optimization guidance plus CuTe Python DSL overlays. Use when: (1) selecting optimizations for a CuTe Python DSL kernel, (2) deciding whether a finding is shared or cute-dsl-specific, (3) recording CuTe Python DSL implementation notes, (4) reviewing the knowledge layout for cute-dsl work, (5) mapping shared patterns to a CuTe Python DSL implementation surface.
SSH into host `h100_sglang`, enter Docker container `sglang_bbuf`, work in `/data/bbuf/repos/sglang`, and use the ready H100 remote environment for SGLang **diffusion** development and validation. Use when a task needs diffusion model smoke tests, Triton/CUDA kernel validation, torch.compile diffusion checks, or a safe remote copy for diffusion-specific SGLang changes.
CuTe Python DSL API reference and implementation patterns for NVIDIA GPU kernel programming. Provides execution model, core API table, key constraints, common patterns, and documentation index. Use when: (1) writing or modifying CuTe DSL kernel code, (2) looking up CuTe DSL API syntax, (3) implementing attention/GEMM/MLA patterns in CuTe DSL, (4) understanding CuTe DSL execution model and compilation pipeline, (5) checking what CuTe DSL can and cannot do.
Find AI models on Replicate using search and curated collections.
Extracts the behavioral requirements, user experience (UX) flows, micro-interactions, and conditional visibility rules of a frontend component from its source code. Produces an experience.md file focused on "how it feels, behaves, and when it renders".
Standard Restaurant POS UI derived from the Restaurant POS redesign plan. Use for any restaurant POS screen to enforce the approved layout, components, accessibility, and speed workflow.
Search and progressively read open-access academic papers through DeepXiv. Use when the user wants layered paper access, section-level reading, trending papers, or DeepXiv-backed literature retrieval.
Cubox CLI is a callable personal reading memory system that enables you to search, read, and use saved content, perform semantic (RAG-based) queries, access articles, highlights, and metadata, save URLs, update content states, and retrieve annotations and structure such as folders and tags. Use this tool when a task depends on the user’s reading history or requires context from their Cubox library.
Get an external patent examiner review of a patent application. Use when user says "专利审查", "patent review", "审查意见", "examiner review", or wants critical feedback on patent claims and specification.
Run GPU workloads on Modal — training, fine-tuning, inference, batch processing. Zero-config serverless: no SSH, no Docker, auto scale-to-zero. Use when user says "modal run", "modal training", "modal inference", "deploy to modal", "need a GPU", "run on modal", "serverless GPU", or needs remote GPU compute.
Use when main results pass result-to-claim (claim_supported=yes or partial) and ablation studies are needed for paper submission. Codex designs ablations from a reviewer's perspective, CC reviews feasibility and implements.