Loading...
Loading...
Create operational runbooks and standard operating procedures. Document troubleshooting guides and recovery procedures. Use when documenting operational knowledge.
npx skill4agent add bagelhole/devops-security-agent-skills runbook-creation# Runbook: [Service/Process Name]
## Overview
Brief description of the service and runbook purpose.
## Prerequisites
- Required access
- Tools needed
- Knowledge required
## Procedure
Step-by-step instructions with commands.
## Verification
How to confirm success.
## Rollback
Steps to undo if needed.
## Escalation
When and how to escalate.
## Related Runbooks
Links to related procedures.# Runbook: Database Failover
## Overview
Procedure to failover PostgreSQL to replica.
## Prerequisites
- [ ] DBA access to primary and replica
- [ ] VPN connected
- [ ] Slack channel #db-ops open
## Procedure
### 1. Verify Replica Status
\`\`\`bash
psql -h replica -c "SELECT pg_is_in_recovery();"
# Should return 't'
\`\`\`
### 2. Stop Application Writes
\`\`\`bash
kubectl scale deployment app --replicas=0
\`\`\`
### 3. Promote Replica
\`\`\`bash
psql -h replica -c "SELECT pg_promote();"
\`\`\`
### 4. Update DNS
\`\`\`bash
aws route53 change-resource-record-sets ...
\`\`\`
## Verification
- [ ] Application connects to new primary
- [ ] No replication lag errors
- [ ] Transactions completing
## Escalation
If issues persist after 15 minutes, escalate to:
- Primary: @dba-lead
- Secondary: @platform-oncall