Loading...
Loading...
Found 7 Skills
Use when defining SLIs/SLOs, managing error budgets, or building reliable systems at scale. Invoke for incident management, chaos engineering, toil reduction, capacity planning.
Build production-ready systems with stability patterns: circuit breakers, bulkheads, timeouts, and retry logic. Use when the user mentions "production outage", "circuit breaker", "timeout strategy", "deployment pipeline", or "chaos engineering". Covers capacity planning, health checks, and anti-fragility patterns. For data systems, see ddia-systems. For system architecture, see system-design.
Use when building reliable and scalable distributed systems.
Design and implement disaster recovery strategies with RTO/RPO planning, database backups, Kubernetes DR, cross-region replication, and chaos engineering testing. Use when implementing backup systems, configuring point-in-time recovery, setting up multi-region failover, or validating DR procedures.
Expert-level site reliability engineering, SLOs, incident management, and operational excellence
Testing in production with feature flags, canary deployments, synthetic monitoring, and chaos engineering. Use when implementing production observability or progressive delivery.
Advanced testing strategies and methodologies. Use when user asks to "design tests", "test coverage", "property-based testing", "mutation testing", "contract testing", "chaos engineering", "test pyramid", "testing strategy", "behavior-driven development", "acceptance testing", or mentions comprehensive testing approaches.