Loading...
Loading...
Found 87 Skills
Write, validate, and optimise PromQL queries for Prometheus and Grafana Cloud Metrics. Use when the user asks to query metrics, write a PromQL expression, calculate rates, aggregate across labels, build histogram quantiles, create recording rules, debug query performance, or understand metric cardinality. Triggers on phrases like "PromQL", "Prometheus query", "write a metric query", "calculate rate", "histogram_quantile", "recording rule", "metric cardinality", "sum by", "rate vs irate", "absent()", or "query is slow".
Guide for implementing Grafana Mimir - a horizontally scalable, highly available, multi-tenant TSDB for long-term storage of Prometheus metrics. Use when configuring Mimir on Kubernetes, setting up Azure/S3/GCS storage backends, troubleshooting authentication issues, or optimizing performance.
Prometheus/Grafana metrics analysis and PromQL queries. Use when investigating latency, error rates, resource usage, or any time-series metrics.
Configure Prometheus Alertmanager with routing trees, receivers (Slack, PagerDuty, email), inhibition rules, silences, and notification templates for actionable incident alerting. Use when implementing proactive monitoring with automated incident detection, routing alerts to the appropriate team by severity, reducing alert fatigue through grouping and deduplication, integrating with on-call systems like PagerDuty, or migrating from legacy alerting to Prometheus-based alerting.
Sending telemetry data to Grafana Cloud — metrics via Prometheus remote write or OTLP, logs via Loki push or Alloy, traces via OTLP to Tempo, profiles via Pyroscope. Covers Alloy-based pipelines, direct SDK/agent integrations, cloud integrations catalog, and credentials management. Use when connecting an application or infrastructure to Grafana Cloud, setting up data ingestion, configuring remote write, or choosing between ingestion methods.
Monitoring and observability with OpenTelemetry, Prometheus, Grafana dashboards, and structured logging
Author monitoring resources: PrometheusRules, ServiceMonitors, PodMonitors, AlertmanagerConfig, Silence CRs, and canary-checker health checks. Use when: (1) Creating or modifying alert rules (PrometheusRule), (2) Adding scrape targets (ServiceMonitor/PodMonitor), (3) Configuring Alertmanager routing or silences, (4) Writing canary-checker health checks, (5) Creating recording rules, (6) Adding monitoring for a new application or platform component. Triggers: "create alert", "add alerting", "PrometheusRule", "ServiceMonitor", "PodMonitor", "AlertmanagerConfig", "silence alert", "canary check", "recording rule", "add monitoring", "scrape target", "alert rule", "prometheus rule", "health check canary"
Query Ethereum network data via ethpandaops CLI or MCP server. Use when analyzing blockchain data, block timing, attestations, validator performance, network health, or infrastructure metrics. Provides access to ClickHouse (blockchain data), Prometheus (metrics), Loki (logs), and Dora (explorer APIs).
Perses datasource lifecycle management: create, update, delete datasources at global, project, or dashboard scope. Supports Prometheus, Tempo, Loki, Pyroscope, ClickHouse, and VictoriaLogs. Uses MCP tools when available, percli CLI as fallback. Use for "perses datasource", "add datasource", "configure prometheus perses", "perses data source". Do NOT use for dashboard creation (use perses-dashboard-create).
Bridge between OMO Prometheus and TKT ticket lifecycle. Generates structured review context for Prometheus after bundle close, converts Prometheus plans into TKT bundle commands, writes review feedback into the Review Agent Inbox, and provides a structured question protocol for gathering information efficiently. Load this skill when you need to: (1) generate a review prompt for a completed ticket/bundle, (2) convert a Prometheus plan.md into TKT worker tickets, (3) write review feedback back into the ticket system, (4) ask structured questions using the question tool across all scenarios (requirements, decisions, review, planning).
Observability and monitoring for data pipelines using OpenTelemetry (traces) and Prometheus (metrics). Covers instrumentation, dashboards, and alerting.
Discover optimal autoscaling parameters for a Deco site by analyzing Prometheus metrics. Correlates CPU, concurrency, and latency to find the right scaling target and method.