Loading...
Loading...
Found 93 Skills
Use when setting up monitoring systems, logging, metrics, tracing, or alerting. Invoke for dashboards, Prometheus/Grafana, load testing, profiling, capacity planning.
Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.
Full-stack observability with Datadog APM, logs, metrics, synthetics, and RUM. Use when implementing monitoring, tracing, alerting, or cost optimization for production systems.
Expert-level monitoring and observability with Prometheus, Grafana, logging, and alerting
Create and manage Kibana alerting rules via REST API or Terraform. Use when creating, updating, or managing rule lifecycle (enable, disable, mute, snooze) or rules-as-code workflows.
Implement comprehensive alert management with PagerDuty, escalation policies, and incident coordination. Use when setting up alerting systems, managing on-call schedules, or coordinating incident response.
Author monitoring resources: PrometheusRules, ServiceMonitors, PodMonitors, AlertmanagerConfig, Silence CRs, and canary-checker health checks. Use when: (1) Creating or modifying alert rules (PrometheusRule), (2) Adding scrape targets (ServiceMonitor/PodMonitor), (3) Configuring Alertmanager routing or silences, (4) Writing canary-checker health checks, (5) Creating recording rules, (6) Adding monitoring for a new application or platform component. Triggers: "create alert", "add alerting", "PrometheusRule", "ServiceMonitor", "PodMonitor", "AlertmanagerConfig", "silence alert", "canary check", "recording rule", "add monitoring", "scrape target", "alert rule", "prometheus rule", "health check canary"
Builds a structured vulnerability scanning workflow using tools like Nessus, Qualys, and OpenVAS to discover, prioritize, and track remediation of security vulnerabilities across infrastructure. Use when SOC teams need to establish recurring vulnerability assessment processes, integrate scan results with SIEM alerting, and build remediation tracking dashboards.
Deploy and use an LLM-powered public opinion analytics assistant that crawls 26 hot lists from 15 platforms, performs sentiment analysis, topic clustering, and multi-channel alerting
Cloudflare Workers observability with logging, Analytics Engine, Tail Workers, metrics, and alerting. Use for monitoring, debugging, tracing, or encountering log parsing, metric aggregation, alert configuration errors.
Comprehensive logging and observability patterns for production systems including structured logging, distributed tracing, metrics collection, log aggregation, and alerting. Triggers for this skill - log, logging, logs, trace, tracing, traces, metrics, observability, OpenTelemetry, OTEL, Jaeger, Zipkin, structured logging, log level, debug, info, warn, error, fatal, correlation ID, span, spans, ELK, Elasticsearch, Loki, Datadog, Prometheus, Grafana, distributed tracing, log aggregation, alerting, monitoring, JSON logs, telemetry.
Application monitoring and observability setup for Python/React projects. Use when configuring logging, metrics collection, health checks, alerting rules, or dashboard creation. Covers structured logging with structlog, Prometheus metrics for FastAPI, health check endpoints, alert threshold design, Grafana dashboard patterns, error tracking with Sentry, and uptime monitoring. Does NOT cover incident response procedures (use incident-response) or deployment (use deployment-pipeline).