Loading...
Loading...
Found 1,573 Skills
Build and run LLM-as-judge evaluation pipelines using Amazon Bedrock Evaluation Jobs with pre-computed inference datasets. Use when setting up automated model evaluation, designing test scenarios, collecting pre-computed responses, configuring custom metrics, creating AWS infrastructure, running evaluation jobs, parsing results, and iterating on findings.
Build AI agents with persistent threads, tool calling, and streaming on Convex. Use when implementing chat interfaces, AI assistants, multi-agent workflows, RAG systems, or any LLM-powered features with message history.
Expert prompt engineering for creating effective prompts for Claude, GPT, and other LLMs. Use when writing system prompts, user prompts, few-shot examples, or optimizing existing prompts for better performance.
Instruments Python and TypeScript code with MLflow Tracing for observability. Triggers on questions about adding tracing, instrumenting agents/LLM apps, getting started with MLflow tracing, or tracing specific frameworks (LangGraph, LangChain, OpenAI, DSPy, CrewAI, AutoGen). Examples - "How do I add tracing?", "How to instrument my agent?", "How to trace my LangChain app?", "Getting started with MLflow tracing", "Trace my TypeScript app"
Integrates Flowlines observability SDK into Python LLM applications. Use when adding Flowlines telemetry, instrumenting LLM providers, or setting up OpenTelemetry-based LLM monitoring.
创建高质量 MCP(模型上下文协议)服务器的指南,使 LLM 能够通过精心设计的工具与外部服务交互。在构建 MCP 服务器以集成外部 API 或服务时使用,无论是 Python (FastMCP) 还是 Node/TypeScript (MCP SDK)。
Data format specialist covering TOON encoding, JSON/YAML optimization, serialization patterns, and data validation for modern applications. Use when optimizing data for LLM transmission, implementing high-performance serialization, validating data schemas, or converting between data formats.
Performs semantic code intelligence and token optimization through context engineering and automated context packing. Use when reducing token overhead for large codebases, creating repository digests with Gitingest, packaging code context with Repomix, or tracing cross-file dependencies with llm-tldr.
Expert-level AI implementation, deployment, LLM integration, and production AI systems
Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support
Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual content, weapons, substances, self-harm, criminal planning. 94-95% accuracy. Deploy with vLLM, HuggingFace, Sagemaker. Integrates with NeMo Guardrails.
Design LLM-as-Judge evaluators for subjective criteria that code-based checks cannot handle. Use when a failure mode requires interpretation (tone, faithfulness, relevance, completeness). Do NOT use when the failure mode can be checked with code (regex, schema validation, execution tests). Do NOT use when you need to validate or calibrate the judge — use validate-evaluator instead.