Loading...
Loading...
Found 81 Skills
Set up comprehensive observability for Databricks with metrics, traces, and alerts. Use when implementing monitoring for Databricks jobs, setting up dashboards, or configuring alerting for pipeline health. Trigger with phrases like "databricks monitoring", "databricks metrics", "databricks observability", "monitor databricks", "databricks alerts", "databricks logging".
Implements error handling patterns, structured logging, retry strategies, circuit breakers, and graceful degradation. Use when designing error handling, setting up logging, implementing retries, adding error tracking, or when asked about error boundaries, log aggregation, alerting, or resilience patterns.
Validate, lint, audit, or fix PromQL queries and alerting rules; detects anti-patterns.
Defines database performance monitoring strategy with slow query detection, resource usage alerts, query execution thresholds, and automated alerting. Use for "database monitoring", "performance alerts", "slow queries", or "DB metrics".
SRE patterns for production service reliability: SLOs, error budgets, postmortems, and incident response. Use when defining reliability targets, writing postmortems, implementing SLO alerting, or establishing on-call practices. NOT for initial service development (use scaffolding skills instead).
Prometheus, Grafana, CloudWatch, Azure Monitor, Stackdriver, logging, alerting, and SRE practices
Set up comprehensive observability for Mistral AI integrations with metrics, traces, and alerts. Use when implementing monitoring for Mistral AI operations, setting up dashboards, or configuring alerting for Mistral AI integration health. Trigger with phrases like "mistral monitoring", "mistral metrics", "mistral observability", "monitor mistral", "mistral alerts", "mistral tracing".
Grafana Alerting, Incident Response Management (IRM), and SLOs. Covers Grafana-managed and data source-managed alert rules, notification policies, contact points (Slack/PagerDuty/email/webhook), silences, muting, on-call scheduling, incident management workflows, and SLO configuration with burn-rate alerts. Use when configuring alerts, debugging notification routing, setting up on-call rotations, managing incidents, defining SLOs, or provisioning alerting via YAML/API.
Grafana Cloud AI and ML features — Grafana Assistant (natural language queries, dashboard generation, incident investigations), Dynamic Alerting (ML forecasting and outlier detection), Sift (automated root cause analysis with 8 analysis types), Knowledge Graph (entity discovery and RCA Workbench), and the LLM Plugin (OpenAI/Anthropic/Azure integration). Use when setting up AI-powered alerting, using natural language to query metrics/logs, automating incident investigation, or integrating LLMs with Grafana panels and workflows.
This skill should be used when the user asks to "manage alerts", "create alert", "list alerts", "check alert status", "enable alert", "disable alert", "investigate firing alerts", "check which alerts are active", "find alerting rules", "set up an alert", "configure alerting", "mute an alert", "silence an alert", "see alert definitions", "check alert priority", or wants to manage Coralogix alert definitions using the cx CLI.
Comprehensive logging and observability patterns for production systems including structured logging, distributed tracing, metrics collection, log aggregation, and alerting. Triggers for this skill - log, logging, logs, trace, tracing, traces, metrics, observability, OpenTelemetry, OTEL, Jaeger, Zipkin, structured logging, log level, debug, info, warn, error, fatal, correlation ID, span, spans, ELK, Elasticsearch, Loki, Datadog, Prometheus, Grafana, distributed tracing, log aggregation, alerting, monitoring, JSON logs, telemetry.
Prometheus monitoring and alerting with PromQL. Use for metrics collection.