Loading...
Loading...
Found 456 Skills
Run post-deployment smoke checks with Makefile targets (`remote-status`, `remote-logs`) plus optional health URL checks. Use after deployment to verify runtime state before final acceptance.
Analyzes Claude Code session transcripts (JSONL files) to reveal context window content, token usage patterns, and decision-making processes using view_session_context.py tool. Use when debugging Claude behavior, investigating token patterns, tracking agent delegation, or analyzing context exhaustion. Triggers on "why did Claude do X", "analyze session", "check session logs", "context window exhaustion", or "track agent delegation".
Manage Alibaba Cloud Performance Testing Service (PTS) via OpenAPI/SDK. Use for scene lifecycle operations, test start/stop control, report retrieval, and metadata-driven API discovery before production changes.
Investigates Google Cloud networking issues by analyzing logs, metrics, and diagnostics. Use when investigating VPC Flow Logs, NAT, firewall, or threat logs, querying latency and throughput metrics, or running Connectivity Tests for path diagnostics.
Structured JSON logging with correlation IDs, request context propagation across async boundaries, performance timing decorators, and worker metrics collection.
OpenTelemetry, structured logging, distributed tracing, alerting, and dashboards
Parses OpenTelemetry-formatted logs to reconstruct execution traces, extract errors with call chains, and provide AI-powered root cause analysis. Use when investigating errors, checking logs, debugging issues, viewing traces, or analyzing execution flow. Triggers on "check the logs", "analyze errors", "what's failing", "debug this issue", "show me the traces", or "investigate the error".
Expert DevOps troubleshooter specializing in rapid incident response, advanced debugging, and modern observability. Masters log analysis, distributed tracing, Kubernetes debugging, performance optimization, and root cause analysis. Handles production outages, system reliability, and preventive monitoring. Use PROACTIVELY for debugging, incident response, or system troubleshooting.
CLI for querying Prometheus and PromQL-compatible engines (Thanos, Cortex, VictoriaMetrics, Grafana Mimir, Grafana Tempo...) — instant queries, range queries, metric discovery (metrics/labels/meta subcommands), output formats (table/csv/json/graph). Apply when executing PromQL queries, troubleshooting performance issues on a software having observability, investigating latency/error rates/saturation, or analyzing time series data.
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Use when building cloud-native apps. Keywords: kubernetes, k8s, docker, container, grpc, tonic, microservice, service mesh, observability, tracing, metrics, health check, cloud, deployment, 云原生, 微服务, 容器
Set up Prometheus monitoring for applications with custom metrics, scraping configurations, and service discovery. Use when implementing time-series metrics collection, monitoring applications, or building observability infrastructure.