Loading...
Loading...
Found 253 Skills
Build production-ready monitoring, logging, and tracing systems. Implements comprehensive observability strategies, SLI/SLO management, and incident response workflows. Use PROACTIVELY for monitoring infrastructure, performance optimization, or production reliability.
DeepEval evaluation workflow for AI agents and LLM applications. TRIGGER when the user wants to evaluate or improve an AI agent, tool-using workflow, multi-turn chatbot, RAG pipeline, or LLM app; add evals; generate datasets or goldens; use deepeval generate; use deepeval test run; add tracing or @observe; send results to Confident AI; monitor production; run online evals; inspect traces; or iterate on prompts, tools, retrieval, or agent behavior from eval failures. AI agents are the primary use case. Covers Python SDK, pytest eval suites, CLI generation, tracing, Confident AI reporting, and agent-driven improvement loops. DO NOT TRIGGER for unrelated generic pytest, non-AI test setup, or non-DeepEval observability work unless the user asks to compare or migrate to DeepEval.
Guidance for reverse engineering graphics rendering programs (ray tracers, path tracers) from binary executables. This skill should be used when tasked with recreating a program that generates images through ray/path tracing, particularly when the goal is to achieve pixel-perfect or near-pixel-perfect output matching. Applies to tasks requiring binary analysis, floating-point constant extraction, and systematic algorithm reconstruction.
Performs root cause analysis on DAG execution failures. Traces failure propagation, identifies systemic issues, and generates actionable remediation guidance. Activate on 'failure analysis', 'root cause', 'why did it fail', 'debug failure', 'error investigation'. NOT for execution tracing (use dag-execution-tracer) or performance issues (use dag-performance-profiler).
Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes - four-phase framework with built-in backward tracing for deep-stack failures, ensuring root-cause understanding before implementation
Apply scientific debugging methodology through conversational investigation. Use when investigating bugs, forming hypotheses, tracing error causes, performing root cause analysis, or systematically diagnosing issues. Includes progressive disclosure patterns, observable actions principle, and user-controlled dialogue flow.
Full-stack observability with Datadog APM, logs, metrics, synthetics, and RUM. Use when implementing monitoring, tracing, alerting, or cost optimization for production systems.
Implement distributed tracing with Jaeger and Tempo to track requests across microservices and identify performance bottlenecks. Use when debugging microservices, analyzing request flows, or implementing observability for distributed systems.
AI-powered systematic codebase analysis. Combines mechanical structure extraction with Claude's semantic understanding to produce documentation that captures not just WHAT code does, but WHY it exists and HOW it fits into the system. Includes pattern recognition, red flag detection, flow tracing, and quality assessment. Use for codebase analysis, documentation generation, architecture understanding, or code review.
Expert guidance for emitting high-quality, cost-efficient OpenTelemetry telemetry. Use when instrumenting applications with traces, metrics, or logs. Triggers on requests for observability, telemetry, tracing, metrics collection, logging integration, or OTel setup.
Comprehensive logging and observability patterns for production systems including structured logging, distributed tracing, metrics collection, log aggregation, and alerting. Triggers for this skill - log, logging, logs, trace, tracing, traces, metrics, observability, OpenTelemetry, OTEL, Jaeger, Zipkin, structured logging, log level, debug, info, warn, error, fatal, correlation ID, span, spans, ELK, Elasticsearch, Loki, Datadog, Prometheus, Grafana, distributed tracing, log aggregation, alerting, monitoring, JSON logs, telemetry.
Observability and SRE expert. Use when setting up monitoring, logging, tracing, defining SLOs, or managing incidents. Covers Prometheus, Grafana, OpenTelemetry, and incident response best practices.