Loading...
Loading...
Found 1,288 Skills
Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster inference than vLLM with prefix sharing. Powers 300,000+ GPUs at xAI, AMD, NVIDIA, and LinkedIn.
Production-grade fault tolerance for distributed systems. Use when implementing circuit breakers, retry with exponential backoff, bulkhead isolation patterns, or building resilience into LLM API integrations.
LLM observability platform for tracing, evaluation, prompt management, and cost tracking. Use when setting up Langfuse, monitoring LLM costs, tracking token usage, or implementing prompt versioning.
Comprehensive guide for building production-grade LLM applications using LangChain's chains, agents, memory systems, RAG patterns, and advanced orchestration
Expert prompt engineering for LLM applications including prompt design, optimization, RAG systems, agent architectures, and AI product development.
Quickly test and compare LLM models via OpenRouter. Find the fastest/cheapest model, compare response quality. Trigger words: openrouter, test model, compare models, find fastest model, find cheapest model
Guides the agent through building LLM-powered applications with LangChain and stateful agent workflows with LangGraph. Triggered when the user asks to "create an AI agent", "build a LangChain chain", "create a LangGraph workflow", "implement tool calling", "build RAG pipeline", "create a multi-agent system", "define agent state", "add human-in-the-loop", "implement streaming", or mentions LangChain, LangGraph, chains, agents, tools, retrieval augmented generation, state graphs, or LLM orchestration.
Use when building secure AI pipelines or hardening LLM integrations. Defense-in-depth implements 8 validation layers from edge to storage with no single point of failure.
Expert prompt optimization for LLMs and AI systems. Use PROACTIVELY when building AI features, improving agent performance, or crafting system prompts. Masters prompt patterns and techniques.
LLM and ML model deployment for inference. Use when serving models in production, building AI APIs, or optimizing inference. Covers vLLM (LLM serving), TensorRT-LLM (GPU optimization), Ollama (local), BentoML (ML deployment), Triton (multi-model), LangChain (orchestration), LlamaIndex (RAG), and streaming patterns.
Model Context Protocol expert for building MCP servers, tools, resources, and client integrationsUse when "mcp server, model context protocol, claude code extension, building ai tools, tool definition, mcp transport, stdio transport, sse transport, resource provider, prompt template, mcp, model-context-protocol, claude-code, ai-tools, llm-integration, anthropic, server, protocol" mentioned.
Expert in building Retrieval-Augmented Generation systems. Masters embedding models, vector databases, chunking strategies, and retrieval optimization for LLM applications. Use when: building RAG, vector search, embeddings, semantic search, document retrieval.