Loading...
Loading...
Found 6 Skills
Software architecture and system design - scalability patterns, reliability engineering, and the art of making technical trade-offs that survive productionUse when "system design, architecture, scalability, how should we structure, distributed, microservices, monolith, high availability, design the system, component diagram, architecture, system-design, scalability, reliability, distributed, api, modeling, c4" mentioned.
Principal backend engineering intelligence for Python services and data systems. Actions: plan, design, build, implement, review, fix, optimize, refactor, debug, secure, scale backend code and architectures. Focus: correctness, reliability, performance, security, observability, scalability, operability, cost.
Implements reliability patterns including circuit breakers, retries, fallbacks, bulkheads, and SLO definitions. Provides failure mode analysis and incident response plans. Use for "SRE", "reliability", "resilience", or "failure handling".
Use this skill when implementing SRE practices, defining error budgets, reducing toil, planning capacity, or improving service reliability. Triggers on SRE, error budgets, SLOs, SLAs, toil automation, incident management, postmortems, on-call rotation, capacity planning, chaos engineering, and any task requiring reliability engineering decisions.
Refactor Kubernetes configurations to improve security, reliability, and maintainability. This skill applies defense-in-depth security principles, proper resource constraints, and GitOps patterns using Kustomize or Helm. It addresses containers running as root, missing health probes, hardcoded configs, and duplicate YAML across environments. Apply when you notice security vulnerabilities, missing Pod Disruption Budgets, or :latest image tags in production.
When the user wants to optimize maintenance strategies, improve equipment reliability, reduce downtime, or implement predictive maintenance. Also use when the user mentions "preventive maintenance," "predictive maintenance," "TPM," "Total Productive Maintenance," "MTBF," "MTTR," "reliability analysis," "equipment maintenance," "condition monitoring," "CBM," "failure analysis," or "spare parts optimization." For quality improvements, see quality-management. For OEE, see lean-manufacturing.