Loading...
Loading...
Found 390 Skills
Guidelines for building production-grade microservices with FastAPI/Python and Go, covering serverless patterns, clean architecture, observability, and resilience.
Application monitoring and observability setup for Python/React projects. Use when configuring logging, metrics collection, health checks, alerting rules, or dashboard creation. Covers structured logging with structlog, Prometheus metrics for FastAPI, health check endpoints, alert threshold design, Grafana dashboard patterns, error tracking with Sentry, and uptime monitoring. Does NOT cover incident response procedures (use incident-response) or deployment (use deployment-pipeline).
Expert-level monitoring and observability with Prometheus, Grafana, logging, and alerting
Complete reference for the Galileo AI platform TypeScript/JS SDK for evaluating, observing, and protecting GenAI applications. Use when building Node.js or TypeScript applications that need LLM evaluation, production observability, tracing, or runtime guardrails with Galileo.
Expert performance engineer specializing in modern observability, application optimization, and scalable system performance. Masters OpenTelemetry, distributed tracing, load testing, multi-tier caching, Core Web Vitals, and performance monitoring. Handles end-to-end optimization, real user monitoring, and scalability patterns. Use PROACTIVELY for performance optimization, observability, or scalability challenges.
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
High-performance structured JSON logging for Node.js. Use when building production APIs that need fast, structured logs for observability platforms (Datadog, ELK, CloudWatch). Provides request logging middleware, child loggers for context, and sensitive data redaction. Choose Pino over console.log for any production TypeScript backend.
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management. Masters incident command, blameless post-mortems, error budget management, and system reliability patterns. Handles critical outages, communication strategies, and continuous improvement. Use IMMEDIATELY for production incidents or SRE practices.
Implement OpenTelemetry (OTEL) observability - Collector configuration, Kubernetes deployment, traces/metrics/logs pipelines, instrumentation, and troubleshooting. Use when working with OTEL Collector, telemetry pipelines, observability infrastructure, or Kubernetes monitoring.
Expert-level Istio service mesh management, traffic control, security, and observability for Kubernetes
Structured observability with Pydantic Logfire and OpenTelemetry. Use when: (1) Adding traces/logs to Python APIs, (2) Instrumenting FastAPI, HTTPX, SQLAlchemy, or LLMs, (3) Setting up service metadata, (4) Configuring sampling or scrubbing sensitive data, (5) Testing observability code.
Manage Workers/KV/R2/D1/Hyperdrive via Cloudflare MCP, perform observability/build troubleshooting/audit/container sandbox operations. Triggers: worker/KV/R2/D1/logs/build/deploy/screenshot/audit/sandbox. Three permission tiers: Diagnose (read-only), Change (write requires confirmation), Super Admin (isolated environment). Write operations must follow read-first, user confirmation, post-execution verification.