Loading...
Loading...
Found 1,564 Skills
Set up and maintain a persistent, LLM-managed knowledge base for a digital health project — turning clinical observations, papers, interviews, and planning docs into a compounding, interlinked wiki.
Debug and harden production LLM prompts — handle prompt injection, output format drift, instruction forgetting in long contexts, and cross-model portability issues. Use this skill when the user ships an LLM-powered feature to production and needs to diagnose why outputs are inconsistent, unsafe, or regressed after model updates — NOT for basic 'write a better prompt' questions.
Lossless DFlash speculative decoding for MLX on Apple Silicon — 1.7–4x faster LLM inference using block diffusion drafting with target model verification.
Expert skill for using Future AGI — the open-source end-to-end platform for evaluating, observing, and improving LLM and AI agent applications with tracing, evals, simulations, datasets, gateway, and guardrails.
Build GraphRAG retrieval pipelines on Neo4j using the neo4j-graphrag Python package (formerly neo4j-genai). Covers retriever selection (VectorRetriever, HybridRetriever, VectorCypherRetriever, HybridCypherRetriever, Text2CypherRetriever), retrieval_query Cypher fragments, query_params, pipeline wiring (GraphRAG + LLM), embedder setup, index creation, and LangChain/LlamaIndex integration. Does NOT handle KG construction from documents — use neo4j-document-import-skill. Does NOT handle plain vector search — use neo4j-vector-index-skill. Does NOT handle GDS analytics — use neo4j-gds-skill. Does NOT handle agent memory — use neo4j-agent-memory-skill.
Comprehensive testing doctrine for software and AI systems — covers positive patterns, anti-patterns, gates for coding agents writing tests, CI discipline, and an LLM/agent evaluation primer. Use when authoring or reviewing tests, adding mocks, deciding test placement, generating tests via agents, debugging flaky CI, designing eval suites for LLM features, or rebuilding a brittle test suite. Contains 12 positive patterns (selector hierarchy, table-driven, builders, real-system gates), 25 anti-patterns across Brittleness, Flakiness, Mock-misuse, Process, and AI-specific families, 7 mandatory gates for agents writing tests, flaky-test taxonomy with quarantine workflow, contract / property / mutation testing patterns, and an oracle-ladder primer for LLM-as-judge and agent eval. Language-agnostic — pseudo-code only. Don't use for general code review, library-specific debugging unrelated to tests, non-testing CI pipeline design, or production observability.
Run Claude Code CLI, VS Code, or JetBrains ACP through a local proxy that routes to NVIDIA NIM, Kimi, OpenRouter, DeepSeek, or local LLMs
Evaluates accuracy of quantized or unquantized LLMs using NeMo Evaluator Launcher (NEL). Triggers on "evaluate model", "benchmark accuracy", "run MMLU", "evaluate quantized model", "accuracy drop", "run nel". Handles deployment, config generation, and evaluation execution. Not for quantizing models (use ptq) or deploying/serving models (use deployment).
Create custom LLM evaluation benchmarks using the BYOB decorator framework. Use when the user wants to (1) create a new benchmark from a dataset, (2) pick or write a scorer, (3) compile and run a BYOB benchmark, (4) containerize a benchmark, or (5) use LLM-as-Judge evaluation. Triggers on mentions of BYOB, custom benchmark, bring your own benchmark, scorer, or benchmark compilation.
Migrate an application with hardcoded LLM prompts to a full LaunchDarkly AgentControl implementation in five stages: audit the code, wrap the call, move the tools, add tracking, attach evaluators. Use when the user wants to externalize model/prompt configuration, move from direct provider calls (OpenAI, Anthropic, Bedrock, Gemini, Strands) to a managed config, or stage a full hardcoded-to-LaunchDarkly migration.
Expert prompt engineer specializing in advanced prompting techniques, LLM optimization, and AI system design. Masters chain-of-thought, constitutional AI, and production prompt strategies. Use when building AI features, improving agent performance, or crafting system prompts.
World-class ML engineering skill for productionizing ML models, MLOps, and building scalable ML systems. Expertise in PyTorch, TensorFlow, model deployment, feature stores, model monitoring, and ML infrastructure. Includes LLM integration, fine-tuning, RAG systems, and agentic AI. Use when deploying ML models, building ML platforms, implementing MLOps, or integrating LLMs into production systems.