Loading...
Loading...
Found 5 Skills
Implements reliability patterns including circuit breakers, retries, fallbacks, bulkheads, and SLO definitions. Provides failure mode analysis and incident response plans. Use for "SRE", "reliability", "resilience", or "failure handling".
Use this skill when implementing SRE practices, defining error budgets, reducing toil, planning capacity, or improving service reliability. Triggers on SRE, error budgets, SLOs, SLAs, toil automation, incident management, postmortems, on-call rotation, capacity planning, chaos engineering, and any task requiring reliability engineering decisions.
Software architecture and system design - scalability patterns, reliability engineering, and the art of making technical trade-offs that survive productionUse when "system design, architecture, scalability, how should we structure, distributed, microservices, monolith, high availability, design the system, component diagram, architecture, system-design, scalability, reliability, distributed, api, modeling, c4" mentioned.
Refactor Kubernetes configurations to improve security, reliability, and maintainability. This skill applies defense-in-depth security principles, proper resource constraints, and GitOps patterns using Kustomize or Helm. It addresses containers running as root, missing health probes, hardcoded configs, and duplicate YAML across environments. Apply when you notice security vulnerabilities, missing Pod Disruption Budgets, or :latest image tags in production.
Principal backend engineering intelligence for Python services and data systems. Actions: plan, design, build, implement, review, fix, optimize, refactor, debug, secure, scale backend code and architectures. Focus: correctness, reliability, performance, security, observability, scalability, operability, cost.