Loading...
Loading...
Found 220 Skills
Use when encountering any bug, test failure, or unexpected behavior, before proposing fixes - four-phase framework with built-in backward tracing for deep-stack failures, ensuring root-cause understanding before implementation
Grafana Tempo distributed tracing backend. Covers TraceQL query language (span selectors, attribute scopes, pipeline operators, structural operators, metrics functions), trace ingestion via OTLP/Jaeger/Zipkin, Tempo architecture (distributor/ingester/compactor/querier/metrics-generator), full configuration reference with YAML, metrics-from-traces (span metrics, service graphs, TraceQL metrics), deployment modes (monolithic/microservices/Helm/Kubernetes), multi-tenancy, performance tuning, caching, and HTTP API. Use when working with distributed traces, writing TraceQL queries, deploying Tempo, configuring trace pipelines, or setting up Grafana-Tempo integrations (traces-to-logs, traces-to-metrics, traces-to-profiles).
Builds composable, pipeable function chains on the iii engine. Use when building functional pipelines, effect systems, or typed composition layers where each step is a pure function with distributed tracing and retry.
Set up orq.ai observability for LLM applications. Use when setting up tracing, adding the AI Router proxy, integrating OpenTelemetry, auditing existing instrumentation, or enriching traces with metadata.
Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention framework. Use when conducting causal tracing, activation patching, interchange intervention training, or testing causal hypotheses about model behavior.
Implement distributed tracing with correlation IDs, trace propagation, and span tracking across microservices. Use when debugging distributed systems, monitoring request flows, or implementing observability.
Monitoring and observability strategy, implementation, and troubleshooting. Use for designing metrics/logs/traces systems, setting up Prometheus/Grafana/Loki, creating alerts and dashboards, calculating SLOs and error budgets, analyzing performance issues, and comparing monitoring tools (Datadog, ELK, CloudWatch). Covers the Four Golden Signals, RED/USE methods, OpenTelemetry instrumentation, log aggregation patterns, and distributed tracing.
Build production-ready monitoring, logging, and tracing systems. Implements comprehensive observability strategies, SLI/SLO management, and incident response workflows. Use PROACTIVELY for monitoring infrastructure, performance optimization, or production reliability.
Observability guidelines for distributed systems using OpenTelemetry, tracing, metrics, and structured logging
See exactly what your AI did on a specific request. Use when you need to debug a wrong answer, trace a specific AI request, profile slow AI pipelines, find which step failed, inspect LM calls, view token usage per request, build audit trails, or understand why a customer got a bad response. Covers DSPy inspection, per-step tracing, OpenTelemetry instrumentation, and trace viewer setup.
Idiomatic Go HTTP middleware patterns with context propagation, structured logging via slog, centralized error handling, and panic recovery. Use when writing middleware, adding request tracing, or implementing cross-cutting concerns.
Performs root cause analysis on DAG execution failures. Traces failure propagation, identifies systemic issues, and generates actionable remediation guidance. Activate on 'failure analysis', 'root cause', 'why did it fail', 'debug failure', 'error investigation'. NOT for execution tracing (use dag-execution-tracer) or performance issues (use dag-performance-profiler).