Loading...
Loading...
Found 39 Skills
Framework for automated search over task-specific model harnesses — the code around a fixed base model that decides what to store, retrieve, and show while the model works.
Optimize and structure context for agents and LLMs by reducing noise, prioritizing relevance, organizing memory, defining constraints, and managing token budgets.
Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.
Ultra-compressed communication mode. Cuts token usage ~75% by speaking like caveman while keeping full technical accuracy. Supports intensity levels: lite, full (default), ultra. Use when user says "caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief", or invokes /caveman. Also auto-triggers when token efficiency is requested. Integrated into Cavekit: enabled by default for build, inspect, and subagent phases via caveman_mode config. See scripts/bp-config.sh for caveman_mode and caveman_phases.
High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with distributed architecture and GPU resource sharing.
Expert prompt optimization for LLMs and AI systems. Use PROACTIVELY when building AI features, improving agent performance, or crafting system prompts. Masters prompt patterns and techniques.
Convert various file formats (PDF, Office documents, images, audio, web content, structured data) to Markdown optimized for LLM processing. Use when converting documents to markdown, extracting text from PDFs/Office files, transcribing audio, performing OCR on images, extracting YouTube transcripts, or processing batches of files. Supports 20+ formats including DOCX, XLSX, PPTX, PDF, HTML, EPUB, CSV, JSON, images with OCR, and audio with transcription.
Push the LLM to reconsider, refine, and improve its recent output. Use when user asks for deeper critique or mentions a known deeper critique method, e.g. socratic, first principles, pre-mortem, red team.
Intelligent system governor that continuously shadow-tests APIs for performance while enforcing strict financial and security guardrails against runaway costs.
Systematic approach to exploring the TensorRT-LLM codebase before implementing new features or optimizations. Teaches how to discover existing infrastructure, trace code paths, and avoid reimplementing what already exists. Derived from real mistakes where ~250 lines of code were written and deleted because existing forward methods weren't discovered upfront. Use when starting any new feature, optimization, or code modification in TRT-LLM.
AI/LLM: Use when crafting system prompts, optimizing LLM outputs, or improving agent instructions. NOT for general coding.
System prompt toolkit that removes AI slop and makes any LLM respond like a normal person — concise, direct, no filler.