Loading...
Loading...
Found 302 Skills
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
Cloudflare Workers observability with logging, Analytics Engine, Tail Workers, metrics, and alerting. Use for monitoring, debugging, tracing, or encountering log parsing, metric aggregation, alert configuration errors.
High-performance structured JSON logging for Node.js. Use when building production APIs that need fast, structured logs for observability platforms (Datadog, ELK, CloudWatch). Provides request logging middleware, child loggers for context, and sensitive data redaction. Choose Pino over console.log for any production TypeScript backend.
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management. Masters incident command, blameless post-mortems, error budget management, and system reliability patterns. Handles critical outages, communication strategies, and continuous improvement. Use IMMEDIATELY for production incidents or SRE practices.
Implement OpenTelemetry (OTEL) observability - Collector configuration, Kubernetes deployment, traces/metrics/logs pipelines, instrumentation, and troubleshooting. Use when working with OTEL Collector, telemetry pipelines, observability infrastructure, or Kubernetes monitoring.
Expert-level Istio service mesh management, traffic control, security, and observability for Kubernetes
Manage Workers/KV/R2/D1/Hyperdrive via Cloudflare MCP, perform observability/build troubleshooting/audit/container sandbox operations. Triggers: worker/KV/R2/D1/logs/build/deploy/screenshot/audit/sandbox. Three permission tiers: Diagnose (read-only), Change (write requires confirmation), Super Admin (isolated environment). Write operations must follow read-first, user confirmation, post-execution verification.
Query Prometheus and Loki billing metrics from Grafana. Use when discussing observability costs, active series, ingestion rates, storage usage, or cardinality analysis.
Comprehensive toolkit for generating best practice PromQL (Prometheus Query Language) queries following current standards and conventions. Use this skill when creating new PromQL queries, implementing monitoring and alerting rules, or building observability dashboards.
Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.
Use when building comprehensive monitoring and observability systems.
OpenTelemetry with Grafana stack. Covers OTel SDK instrumentation for Go/Java/Python/Node.js/.NET, OTLP protocol and endpoint configuration, sending telemetry to Grafana Cloud via OTLP endpoint, Grafana Alloy as OTel collector, sampling strategies, Kubernetes OTel Operator, and migration from other observability tools. Use when instrumenting apps with OTel, configuring OTLP endpoints, setting up collectors, or migrating to OpenTelemetry.