Loading...
Loading...
Found 64 Skills
Comprehensive logging and observability patterns for production systems including structured logging, distributed tracing, metrics collection, log aggregation, and alerting. Triggers for this skill - log, logging, logs, trace, tracing, traces, metrics, observability, OpenTelemetry, OTEL, Jaeger, Zipkin, structured logging, log level, debug, info, warn, error, fatal, correlation ID, span, spans, ELK, Elasticsearch, Loki, Datadog, Prometheus, Grafana, distributed tracing, log aggregation, alerting, monitoring, JSON logs, telemetry.
Observability visualization with Grafana and LGTM stack. Dashboard design, panel configuration, alerting, variables/templating, and data sources. USE WHEN: Creating Grafana dashboards, configuring panels and visualizations, writing LogQL/TraceQL queries, setting up Grafana data sources, configuring dashboard variables and templates, building Grafana alerts. DO NOT USE: For writing PromQL queries (use /prometheus), for alerting rule strategy (use /prometheus), for general observability architecture (use senior-software-engineer with infrastructure focus). TRIGGERS: grafana, dashboard, panel, visualization, logql, traceql, loki, tempo, mimir, data source, annotation, variable, template, row, stat, graph, table, heatmap, gauge, bar chart, pie chart, time series, logs panel, traces panel, LGTM stack.
Monitoring, logging, and tracing implementation using OpenTelemetry as the unified standard. Use when building production systems requiring visibility into performance, errors, and behavior. Covers OpenTelemetry (metrics, logs, traces), Prometheus, Grafana, Loki, Jaeger, Tempo, structured logging (structlog, tracing, slog, pino), and alerting.
Guide for implementing HolmesGPT - an AI agent for troubleshooting cloud-native environments. Use when investigating Kubernetes issues, analyzing alerts from Prometheus/AlertManager/PagerDuty, performing root cause analysis, configuring HolmesGPT installations (CLI/Helm/Docker), setting up AI providers (OpenAI/Anthropic/Azure), creating custom toolsets, or integrating with observability platforms (Grafana, Loki, Tempo, DataDog).
Automatically discover observability and monitoring skills when working with Prometheus, Grafana, distributed tracing, structured logging, metrics, alerting, dashboards, or monitoring. Activates for observability development tasks.
Expert guidance for building production-ready FastAPI applications with modular architecture where each business domain is an independent module with own routes, models, schemas, services, cache, and migrations. Uses UV + pyproject.toml for modern Python dependency management, project name subdirectory for clean workspace organization, structlog (JSON+colored logging), pydantic-settings configuration, auto-discovery module loader, async SQLAlchemy with PostgreSQL, per-module Alembic migrations, Redis/memory cache with module-specific namespaces, central httpx client, OpenTelemetry/Prometheus observability, conversation ID tracking (X-Conversation-ID header+cookie), conditional Keycloak/app-based RBAC authentication, DDD/clean code principles, and automation scripts for rapid module development. Use when user requests FastAPI project setup, modular architecture, independent module development, microservice architecture, async database operations, caching strategies, logging patterns, configuration management, authentication systems, observability implementation, or enterprise Python web services. Supports max 3-4 route nesting depth, cache invalidation patterns, inter-module communication via service layer, and comprehensive error handling workflows.
CI/CD pipeline design, containerization, and infrastructure management. Handles Docker, Kubernetes, monitoring setup (Prometheus/Grafana), and infrastructure-as-code (Terraform/Pulumi).
Use this skill when working on infrastructure, DevOps, CI/CD, Kubernetes, cloud deployment, observability, or cost optimization. Activates on mentions of Kubernetes, Docker, Terraform, Pulumi, OpenTofu, GitOps, Argo CD, Flux, CI/CD, GitHub Actions, observability, OpenTelemetry, Prometheus, Grafana, AWS, GCP, Azure, infrastructure as code, platform engineering, FinOps, or cloud costs.
Use when adding logging to services, setting up monitoring, creating alerts, debugging production issues, designing SLIs/SLOs, or implementing structured logging (Pino, Winston), metrics (Prometheus, DataDog, CloudWatch), or distributed tracing (OpenTelemetry).
Kubernetes Cluster API v1.12. Covers clusterctl CLI, ClusterClass, GitOps integration. Scripts for health checks, backup, migration, linting. Templates: clusters, DR, Prometheus. Keywords: CAPI, clusterctl, kubeadm, cluster lifecycle.
Three-lens code review using parallel subagents: Epimetheus (hindsight — bugs, debt, fragility), Metis (craft — clarity, idiom, fit-for-purpose), Prometheus (foresight — vision, extensibility, future-Claude). Triggers on /titans, /review, 'review this code', 'what did I miss', 'before I ship this'. Use after completing substantial work, before /close. (user)
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.