Loading...
Loading...
Found 1,195 Skills
Configure LLM providers, use fallback models, handle streaming, and manage model settings in PydanticAI. Use when selecting models, implementing resilience, or optimizing API calls.
Expert skill for AI model quantization and optimization. Covers 4-bit/8-bit quantization, GGUF conversion, memory optimization, and quality-performance tradeoffs for deploying LLMs in resource-constrained JARVIS environments.
Master of LLM Economic Orchestration, specialized in Google GenAI (Gemini 3), Context Caching, and High-Fidelity Token Engineering.
Elite AI/ML Senior Engineer with 20+ years experience. Transforms Claude into a world-class AI researcher and engineer capable of building production-grade ML systems, LLMs, transformers, and computer vision solutions. Use when: (1) Building ML/DL models from scratch or fine-tuning, (2) Designing neural network architectures, (3) Implementing LLMs, transformers, attention mechanisms, (4) Computer vision tasks (object detection, segmentation, GANs), (5) NLP tasks (NER, sentiment, embeddings), (6) MLOps and production deployment, (7) Data preprocessing and feature engineering, (8) Model optimization and debugging, (9) Clean code review for ML projects, (10) Choosing optimal libraries and frameworks. Triggers: "ML", "AI", "deep learning", "neural network", "transformer", "LLM", "computer vision", "NLP", "TensorFlow", "PyTorch", "sklearn", "train model", "fine-tune", "embedding", "CNN", "RNN", "LSTM", "attention", "GPT", "BERT", "diffusion", "GAN", "object detection", "segmentation".
LangGraph tool calling patterns. Use when binding tools to LLMs, implementing ToolNode for execution, dynamic tool selection, or adding approval gates to tool calls.
You are an expert prompt engineer specializing in crafting effective prompts for LLMs through advanced techniques including constitutional AI, chain-of-thought reasoning, and model-specific optimizati
LLM gateway and routing configuration using OpenRouter and LiteLLM. Invoke when: - Setting up multi-model access (OpenRouter, LiteLLM) - Configuring model fallbacks and reliability - Implementing cost-based or latency-based routing - A/B testing different models - Self-hosting an LLM proxy Keywords: openrouter, litellm, llm gateway, model routing, fallback, A/B testing
Use this skill when building MCP (Model Context Protocol) servers with TypeScript on Cloudflare Workers. This skill provides production-tested patterns for implementing tools, resources, and prompts using the official @modelcontextprotocol/sdk. It prevents 10+ common errors including export syntax issues, schema validation failures, memory leaks from unclosed transports, CORS misconfigurations, and authentication vulnerabilities. This skill should be used when developers need stateless MCP servers for API integrations, external tool exposure, or serverless edge deployments. For stateful agents with WebSockets and persistent storage, consider the Cloudflare Agents SDK instead. Supports multiple authentication methods (API keys, OAuth, Zero Trust), Cloudflare service integrations (D1, KV, R2, Vectorize), and comprehensive testing strategies. Production tested with token savings of ~70% vs manual implementation. Keywords: mcp, model context protocol, typescript mcp, cloudflare workers mcp, mcp server, mcp tools, mcp resources, mcp sdk, @modelcontextprotocol/sdk, hono mcp, streamablehttpservertransport, mcp authentication, mcp cloudflare, edge mcp server, serverless mcp, typescript mcp server, mcp api, llm tools, ai tools, cloudflare d1 mcp, cloudflare kv mcp, mcp testing, mcp deployment, wrangler mcp, export syntax error, schema validation error, memory leak mcp, cors mcp, rate limiting mcp
Complete knowledge domain for Cloudflare Workers AI - Run AI models on serverless GPUs across Cloudflare's global network. Use when: implementing AI inference on Workers, running LLM models, generating text/images with AI, configuring Workers AI bindings, implementing AI streaming, using AI Gateway, integrating with embeddings/RAG systems, or encountering "AI_ERROR", rate limit errors, model not found, token limit exceeded, or neurons exceeded errors. Keywords: workers ai, cloudflare ai, ai bindings, llm workers, @cf/meta/llama, workers ai models, ai inference, cloudflare llm, ai streaming, text generation ai, ai embeddings, image generation ai, workers ai rag, ai gateway, llama workers, flux image generation, stable diffusion workers, vision models ai, ai chat completion, AI_ERROR, rate limit ai, model not found, token limit exceeded, neurons exceeded, ai quota exceeded, streaming failed, model unavailable, workers ai hono, ai gateway workers, vercel ai sdk workers, openai compatible workers, workers ai vectorize
Running and fine-tuning LLMs on Apple Silicon with MLX. Use when working with models locally on Mac, converting Hugging Face models to MLX format, fine-tuning with LoRA/QLoRA on Apple Silicon, or serving models via HTTP API.
Provides patterns to build Retrieval-Augmented Generation (RAG) systems for AI applications with vector databases and semantic search. Use when implementing knowledge-grounded AI, building document Q&A systems, or integrating LLMs with external knowledge bases.
Audit LLM token cost estimates against actual API usage. Activate on 'cost verification', 'token estimate accuracy', 'API cost audit', 'estimation variance'. NOT for pricing lookups, budget planning, or cost optimization strategies.