Loading...
Loading...
Found 28 Skills
Review existing Datadog dashboards for operational readiness. Audits alert threshold markers, threshold proximity to normal traffic, customer-facing section completeness, and zero-knowledge readability. Uses pup CLI to fetch dashboard definitions. Use when auditing dashboards before on-call handoff, after dashboard changes, or during operational reviews. Do not use for: (1) designing new dashboards from scratch, (2) monitor/alert rule design, (3) APM instrumentation or tracing setup, (4) log pipeline configuration.
Partition-first log analysis methodology. Use for log searches, error analysis, pattern finding across Datadog, CloudWatch, or Kubernetes logs.
Instrument web applications to send telemetry data to Azure Application Insights for observability and monitoring. USE FOR: instrument app with app insights, add appinsights instrumentation, configure application insights, set up telemetry monitoring, enable app insights auto-instrumentation, add observability to azure web app, instrument webapp to send data to app insights, configure telemetry for app service. DO NOT USE FOR: non-Azure monitoring (use CloudWatch for AWS, Datadog for third-party), log analysis (use azure-kusto), cost monitoring (use azure-cost-optimization), security monitoring (use azure-security).
Set up monitoring, logging, and observability for applications and infrastructure. Use when implementing health checks, metrics collection, log aggregation, or alerting systems. Handles Prometheus, Grafana, ELK Stack, Datadog, and monitoring best practices.
Create a new built-in evlog adapter to send wide events to an external observability platform. Use when adding a new drain adapter (e.g., for Datadog, Sentry, Loki, Elasticsearch, etc.) to the evlog package. Covers source code, build config, package exports, tests, and all documentation.
Comprehensive logging and observability patterns for production systems including structured logging, distributed tracing, metrics collection, log aggregation, and alerting. Triggers for this skill - log, logging, logs, trace, tracing, traces, metrics, observability, OpenTelemetry, OTEL, Jaeger, Zipkin, structured logging, log level, debug, info, warn, error, fatal, correlation ID, span, spans, ELK, Elasticsearch, Loki, Datadog, Prometheus, Grafana, distributed tracing, log aggregation, alerting, monitoring, JSON logs, telemetry.
Upgrade any Pulumi provider to a newer version and reconcile the resulting diff. Use when users want to upgrade or update a provider (including editing package.json, requirements.txt, pyproject.toml, go.mod, or Pulumi.yaml to bump a provider SDK), check for breaking changes before or during an upgrade, fix resources that broke after a provider upgrade, or resolve unexpected replacements, creates, or deletes in a post-upgrade preview. Applies to all providers (aws, azure-native, gcp, kubernetes, aws-native, cloudflare, datadog, etc.) — not just Tier 1. Do NOT use for querying which stacks use what package versions; use skill `package-usage` for cross-stack audits. Do NOT use for general infrastructure tasks.
Guide for implementing HolmesGPT - an AI agent for troubleshooting cloud-native environments. Use when investigating Kubernetes issues, analyzing alerts from Prometheus/AlertManager/PagerDuty, performing root cause analysis, configuring HolmesGPT installations (CLI/Helm/Docker), setting up AI providers (OpenAI/Anthropic/Azure), creating custom toolsets, or integrating with observability platforms (Grafana, Loki, Tempo, DataDog).
Use this skill when working with SigNoz - open-source observability platform for application monitoring, distributed tracing, log management, metrics, alerts, and dashboards. Triggers on SigNoz setup, OpenTelemetry instrumentation for SigNoz, sending traces/logs/metrics to SigNoz, creating SigNoz dashboards, configuring SigNoz alerts, exception monitoring, and migrating from Datadog/Grafana/New Relic to SigNoz.
Configures log and metric export for CockroachDB Cloud clusters to external monitoring services including AWS CloudWatch, GCP Cloud Logging, and Datadog. Use when setting up log export for audit compliance, configuring metric export for monitoring, or troubleshooting log delivery issues.
Error tracking and monitoring integration. Sentry, Datadog RUM, Bugsnag. Source maps, breadcrumbs, release tracking, performance monitoring, and alerting configuration. USE WHEN: user mentions "Sentry", "error tracking", "Bugsnag", "Datadog RUM", "crash reporting", "source maps", "release tracking", "error monitoring" DO NOT USE FOR: application logging - use logging skills; APM/tracing - use `opentelemetry`; structured error responses - use `error-handling`
Use this skill when implementing logging, metrics, distributed tracing, alerting, or defining SLOs. Triggers on structured logging, Prometheus, Grafana, OpenTelemetry, Datadog, distributed tracing, error tracking, dashboards, alert fatigue, SLIs, SLOs, error budgets, and any task requiring system observability or monitoring setup.