Loading...
Loading...
Found 81 Skills
Bridge between OMO Prometheus and TKT ticket lifecycle. Generates structured review context for Prometheus after bundle close, converts Prometheus plans into TKT bundle commands, writes review feedback into the Review Agent Inbox, and provides a structured question protocol for gathering information efficiently. Load this skill when you need to: (1) generate a review prompt for a completed ticket/bundle, (2) convert a Prometheus plan.md into TKT worker tickets, (3) write review feedback back into the ticket system, (4) ask structured questions using the question tool across all scenarios (requirements, decisions, review, planning).
Sending telemetry data to Grafana Cloud — metrics via Prometheus remote write or OTLP, logs via Loki push or Alloy, traces via OTLP to Tempo, profiles via Pyroscope. Covers Alloy-based pipelines, direct SDK/agent integrations, cloud integrations catalog, and credentials management. Use when connecting an application or infrastructure to Grafana Cloud, setting up data ingestion, configuring remote write, or choosing between ingestion methods.
Expert evaluator for Prometheus label strategy. Audits, designs, and improves label schemas using cardinality scoring, access-pattern alignment, static vs. dynamic label rules, histogram bucket discipline, instrumentation hygiene, and source-side prevention via relabel_config / metric_relabel_configs. Use when the user asks to evaluate, audit, design, or improve Prometheus labels — or asks how to prevent high cardinality at the source. For post-ingest aggregation, see the adaptive-metrics skill. For "why is my Prometheus slow / expensive right now" triage, see prometheus-cardinality-troubleshooter.
Expert-level Prometheus monitoring, metrics collection, PromQL queries, alerting, and production operations
Monitoring and observability patterns for Prometheus metrics, Grafana dashboards, Langfuse LLM tracing, and drift detection. Use when adding logging, metrics, distributed tracing, LLM cost tracking, or quality drift monitoring.
Configure Prometheus Alertmanager with routing trees, receivers (Slack, PagerDuty, email), inhibition rules, silences, and notification templates for actionable incident alerting. Use when implementing proactive monitoring with automated incident detection, routing alerts to the appropriate team by severity, reducing alert fatigue through grouping and deduplication, integrating with on-call systems like PagerDuty, or migrating from legacy alerting to Prometheus-based alerting.
Production server monitoring stack covering Prometheus, Node Exporter, Grafana, Alertmanager, Loki, and Promtail on bare-metal or VM Linux hosts. USE WHEN: - Setting up monitoring for a new production server or VPS - Configuring Prometheus scrape targets for application or system metrics - Creating Grafana dashboards and datasource provisioning - Writing Alertmanager routing rules with email/Slack notifications - Implementing the PLG stack (Promtail + Loki + Grafana) for log aggregation - Performing live system diagnostics with htop, iotop, nethogs, ss, vmstat, iostat - Setting up uptime monitoring with UptimeRobot or healthchecks.io DO NOT USE FOR: - Kubernetes-native observability (use the kubernetes skill instead) - Application-level APM (distributed tracing with Jaeger/Tempo — use observability skill) - Cloud-managed monitoring (CloudWatch, GCP Monitoring, Azure Monitor) - Windows Server monitoring
Export cost-tracking telemetry in Prometheus textfile or webhook JSON formats — for external observability (Grafana, Datadog, custom dashboards)
Discover optimal autoscaling parameters for a Deco site by analyzing Prometheus metrics. Correlates CPU, concurrency, and latency to find the right scaling target and method.
Build FastAPI services with JWT auth, structlog, and Prometheus metrics. Use when creating or modifying a Python HTTP server, adding authentication, structured logging, or instrumentation to a FastAPI app.
Prometheus monitoring and alerting for cloud-native observability. USE WHEN: Writing PromQL queries, configuring Prometheus scrape targets, creating alerting rules, setting up recording rules, instrumenting applications with Prometheus metrics, configuring service discovery. DO NOT USE: For building dashboards (use /grafana), for log analysis (use /logging-observability), for general observability architecture (use senior-software-engineer with infrastructure focus). TRIGGERS: metrics, prometheus, promql, counter, gauge, histogram, summary, alert, alertmanager, alerting rule, recording rule, scrape, target, label, service discovery, relabeling, exporter, instrumentation, slo, error budget.
Query Ethereum network data via ethpandaops CLI or MCP server. Use when analyzing blockchain data, block timing, attestations, validator performance, network health, or infrastructure metrics. Provides access to ClickHouse (blockchain data), Prometheus (metrics), Loki (logs), and Dora (explorer APIs).