runbook-creation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseRunbook Creation
运维手册创建
Create effective operational runbooks and procedures.
创建高效的运维手册与操作流程。
Runbook Structure
运维手册结构
markdown
undefinedmarkdown
undefinedRunbook: [Service/Process Name]
运维手册:[服务/流程名称]
Overview
概述
Brief description of the service and runbook purpose.
对服务及本手册用途的简要说明。
Prerequisites
前置条件
- Required access
- Tools needed
- Knowledge required
- 所需权限
- 必备工具
- 必要知识
Procedure
操作流程
Step-by-step instructions with commands.
带命令的分步操作说明。
Verification
验证步骤
How to confirm success.
如何确认操作成功。
Rollback
回滚步骤
Steps to undo if needed.
如有需要,如何撤销操作。
Escalation
上报流程
When and how to escalate.
何时及如何上报问题。
Related Runbooks
相关运维手册
Links to related procedures.
undefined相关操作流程的链接。
undefinedExample Runbook
示例运维手册
markdown
undefinedmarkdown
undefinedRunbook: Database Failover
运维手册:数据库故障切换
Overview
概述
Procedure to failover PostgreSQL to replica.
将PostgreSQL切换至副本节点的操作流程。
Prerequisites
前置条件
- DBA access to primary and replica
- VPN connected
- Slack channel #db-ops open
- 拥有主节点和副本节点的DBA权限
- 已连接VPN
- 已打开Slack频道#db-ops
Procedure
操作流程
1. Verify Replica Status
1. 验证副本节点状态
```bash
psql -h replica -c "SELECT pg_is_in_recovery();"
bash
psql -h replica -c "SELECT pg_is_in_recovery();"Should return 't'
应返回't'
```
undefined2. Stop Application Writes
2. 停止应用写入
```bash
kubectl scale deployment app --replicas=0
```
bash
kubectl scale deployment app --replicas=03. Promote Replica
3. 提升副本节点为主节点
```bash
psql -h replica -c "SELECT pg_promote();"
```
bash
psql -h replica -c "SELECT pg_promote();"4. Update DNS
4. 更新DNS
```bash
aws route53 change-resource-record-sets ...
```
bash
aws route53 change-resource-record-sets ...Verification
验证步骤
- Application connects to new primary
- No replication lag errors
- Transactions completing
- 应用已连接至新主节点
- 无复制延迟错误
- 事务正常完成
Escalation
上报流程
If issues persist after 15 minutes, escalate to:
- Primary: @dba-lead
- Secondary: @platform-oncall
undefined若问题持续15分钟以上,上报至:
- 第一负责人:@dba-lead
- 第二负责人:@platform-oncall
undefinedBest Practices
最佳实践
- Keep procedures simple and clear
- Include verification steps
- Test runbooks regularly
- Version control runbooks
- Include troubleshooting tips
- 保持操作流程简洁明了
- 包含验证步骤
- 定期测试运维手册
- 对运维手册进行版本控制
- 包含故障排查提示