deployment-guide
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDeployment Guide Creator
部署指南生成专家
Эксперт по созданию production-ready документации для деплоя.
用于生成可投入生产环境的部署文档的专家工具。
Core Principles
核心原则
Structure & Organization
结构与组织
- Prerequisites listed first
- Environment-specific instructions
- Verification steps after each phase
- Rollback procedures documented
- Operational readiness covered
- 先列出前置条件
- 针对不同环境的说明
- 每个阶段后的验证步骤
- 记录回滚流程
- 涵盖运维就绪检查
Documentation Standards
文档标准
- Imperative tone for instructions
- Exact commands with expected outputs
- Version specifications for all tools
- Context explaining why each step matters
- Estimated execution times per phase
- 说明使用祈使语气
- 包含精确命令及预期输出
- 所有工具标注版本要求
- 说明每个步骤的重要性背景
- 每个阶段的预计执行时间
Standard Guide Structure
标准指南结构
markdown
undefinedmarkdown
undefinedDeployment Guide: [Application Name]
部署指南:[应用名称]
Overview
概述
- Application description
- Deployment strategy (blue-green, rolling, canary)
- Architecture diagram
- Key contacts
- 应用描述
- 部署策略(蓝绿部署、滚动部署、金丝雀部署)
- 架构图
- 关键联系人
Prerequisites
前置条件
System Requirements
系统要求
- OS: Ubuntu 22.04 LTS
- RAM: 8GB minimum
- Disk: 50GB SSD
- Network: 100Mbps
- 操作系统:Ubuntu 22.04 LTS
- 内存:最低8GB
- 磁盘:50GB SSD
- 网络:100Mbps
Required Tools
所需工具
| Tool | Version | Purpose |
|---|---|---|
| Docker | 24.0+ | Containerization |
| kubectl | 1.28+ | Kubernetes CLI |
| Helm | 3.12+ | Package management |
| 工具 | 版本 | 用途 |
|---|---|---|
| Docker | 24.0+ | 容器化 |
| kubectl | 1.28+ | Kubernetes命令行工具 |
| Helm | 3.12+ | 包管理工具 |
Access Requirements
访问权限要求
- SSH access to jump server
- Kubernetes cluster credentials
- Container registry credentials
- Secrets management access
- 拥有跳板机的SSH访问权限
- 持有Kubernetes集群凭证
- 持有容器镜像仓库凭证
- 拥有密钥管理系统访问权限
Security Checklist
安全检查清单
- VPN connection established
- MFA configured
- SSH keys rotated (< 90 days)
undefined- 已建立VPN连接
- 已配置多因素认证(MFA)
- SSH密钥已轮换(<90天)
undefinedPre-Deployment Checklist
部署前检查清单
markdown
undefinedmarkdown
undefinedPre-Deployment Checklist
部署前检查清单
Code Readiness
代码就绪检查
- All tests passing in CI
- Code review approved
- Security scan completed
- Documentation updated
- CI中所有测试已通过
- 代码评审已通过
- 安全扫描已完成
- 文档已更新
Environment Checks
环境检查
- Target cluster healthy
- Database backups verified
- Monitoring alerts silenced
- Maintenance window scheduled
- 目标集群状态健康
- 数据库备份已验证
- 监控告警已静音
- 已安排维护窗口
Rollback Preparation
回滚准备
- Previous version tagged
- Rollback procedure tested
- Data migration reversible
- Communication plan ready
undefined- 已标记上一版本
- 回滚流程已测试
- 数据迁移可回退
- 沟通计划已准备就绪
undefinedDeployment Phases
部署阶段
Phase 1: Infrastructure Prep
阶段1:基础设施准备
bash
undefinedbash
undefinedEstimated time: 10 minutes
预计时间:10分钟
1. Verify cluster connectivity
1. 验证集群连通性
kubectl cluster-info
kubectl cluster-info
Expected: Kubernetes control plane is running
预期结果:Kubernetes控制平面正常运行
2. Check node readiness
2. 检查节点就绪状态
kubectl get nodes
kubectl get nodes
Expected: All nodes in "Ready" state
预期结果:所有节点处于"Ready"状态
3. Verify namespace exists
3. 验证命名空间是否存在
kubectl get namespace production
kubectl get namespace production
If not exists:
若不存在则创建:
kubectl create namespace production
undefinedkubectl create namespace production
undefinedPhase 2: Application Deployment
阶段2:应用部署
bash
undefinedbash
undefinedEstimated time: 15 minutes
预计时间:15分钟
1. Pull latest configuration
1. 拉取最新配置
git pull origin main
cd deployment/kubernetes
git pull origin main
cd deployment/kubernetes
2. Update image tag in values
2. 更新values文件中的镜像标签
export IMAGE_TAG=v1.2.3
sed -i "s/tag: .*/tag: ${IMAGE_TAG}/" values.yaml
export IMAGE_TAG=v1.2.3
sed -i "s/tag: .*/tag: ${IMAGE_TAG}/" values.yaml
3. Deploy with Helm
3. 使用Helm部署应用
helm upgrade --install myapp ./charts/myapp
--namespace production
--values values.yaml
--wait
--timeout 10m
--namespace production
--values values.yaml
--wait
--timeout 10m
helm upgrade --install myapp ./charts/myapp
--namespace production
--values values.yaml
--wait
--timeout 10m
--namespace production
--values values.yaml
--wait
--timeout 10m
Expected output:
预期输出:
Release "myapp" has been upgraded. Happy Helming!
Release "myapp" has been upgraded. Happy Helming!
undefinedundefinedPhase 3: Database Migration
阶段3:数据库迁移
bash
undefinedbash
undefinedEstimated time: 5-30 minutes (depends on data size)
预计时间:5-30分钟(取决于数据量)
1. Create backup before migration
1. 迁移前创建备份
kubectl exec -n production deploy/myapp --
pg_dump -Fc > backup_$(date +%Y%m%d_%H%M%S).dump
pg_dump -Fc > backup_$(date +%Y%m%d_%H%M%S).dump
kubectl exec -n production deploy/myapp --
pg_dump -Fc > backup_$(date +%Y%m%d_%H%M%S).dump
pg_dump -Fc > backup_$(date +%Y%m%d_%H%M%S).dump
2. Run migrations
2. 执行迁移
kubectl exec -n production deploy/myapp --
npm run migrate
npm run migrate
kubectl exec -n production deploy/myapp --
npm run migrate
npm run migrate
3. Verify migration status
3. 验证迁移状态
kubectl exec -n production deploy/myapp --
npm run migrate:status
npm run migrate:status
undefinedkubectl exec -n production deploy/myapp --
npm run migrate:status
npm run migrate:status
undefinedKubernetes Deployment Example
Kubernetes部署示例
yaml
undefinedyaml
undefineddeployment.yaml
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
labels:
app: myapp
version: v1.2.3
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: registry.example.com/myapp:v1.2.3
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: myapp-secrets
key: database-url
undefinedapiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
labels:
app: myapp
version: v1.2.3
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: registry.example.com/myapp:v1.2.3
ports:
- containerPort: 8080
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: myapp-secrets
key: database-url
undefinedPost-Deployment Verification
部署后验证
markdown
undefinedmarkdown
undefinedVerification Checklist
验证检查清单
Health Checks
健康检查
- All pods running:
kubectl get pods -n production - Endpoints healthy:
curl -s https://api.example.com/health - Database connected: Check application logs
- 所有Pod正常运行:
kubectl get pods -n production - 端点状态健康:
curl -s https://api.example.com/health - 数据库已连接:检查应用日志
Performance Validation
性能验证
- Response time < 200ms (p95)
- Error rate < 0.1%
- Memory usage stable
- 响应时间 < 200ms(p95)
- 错误率 < 0.1%
- 内存使用稳定
Security Checks
安全检查
- TLS certificates valid
- No sensitive data in logs
- Rate limiting active
undefined- TLS证书有效
- 日志中无敏感数据
- 速率限制已启用
undefinedVerification Script
验证脚本
bash
#!/bin/bashbash
#!/bin/bashverify-deployment.sh
verify-deployment.sh
echo "=== Deployment Verification ==="
echo "=== 部署验证 ==="
Check pod status
检查Pod状态
echo "Checking pods..."
READY_PODS=$(kubectl get pods -n production -l app=myapp
-o jsonpath='{.items[*].status.containerStatuses[0].ready}' | tr ' ' '\n' | grep -c true) TOTAL_PODS=$(kubectl get pods -n production -l app=myapp --no-headers | wc -l)
-o jsonpath='{.items[*].status.containerStatuses[0].ready}' | tr ' ' '\n' | grep -c true) TOTAL_PODS=$(kubectl get pods -n production -l app=myapp --no-headers | wc -l)
if [ "$READY_PODS" -eq "$TOTAL_PODS" ]; then
echo "✅ All $TOTAL_PODS pods ready"
else
echo "❌ Only $READY_PODS of $TOTAL_PODS pods ready"
exit 1
fi
echo "检查Pod状态..."
READY_PODS=$(kubectl get pods -n production -l app=myapp
-o jsonpath='{.items[*].status.containerStatuses[0].ready}' | tr ' ' '\n' | grep -c true) TOTAL_PODS=$(kubectl get pods -n production -l app=myapp --no-headers | wc -l)
-o jsonpath='{.items[*].status.containerStatuses[0].ready}' | tr ' ' '\n' | grep -c true) TOTAL_PODS=$(kubectl get pods -n production -l app=myapp --no-headers | wc -l)
if [ "$READY_PODS" -eq "$TOTAL_PODS" ]; then
echo "✅ 所有$TOTAL_PODS个Pod已就绪"
else
echo "❌ 仅$READY_PODS个Pod就绪,共$TOTAL_PODS个"
exit 1
fi
Check endpoints
检查端点状态
echo "Checking health endpoint..."
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://api.example.com/health)
if [ "$HTTP_CODE" -eq 200 ]; then
echo "✅ Health endpoint returning 200"
else
echo "❌ Health endpoint returning $HTTP_CODE"
exit 1
fi
echo "检查健康端点..."
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://api.example.com/health)
if [ "$HTTP_CODE" -eq 200 ]; then
echo "✅ 健康端点返回200状态码"
else
echo "❌ 健康端点返回$HTTP_CODE状态码"
exit 1
fi
Check logs for errors
检查日志中的错误
echo "Checking for errors in logs..."
ERROR_COUNT=$(kubectl logs -n production -l app=myapp --since=5m | grep -c "ERROR")
if [ "$ERROR_COUNT" -lt 5 ]; then
echo "✅ Error count acceptable: $ERROR_COUNT"
else
echo "⚠️ High error count: $ERROR_COUNT"
fi
echo "=== Verification Complete ==="
undefinedecho "检查日志中的错误..."
ERROR_COUNT=$(kubectl logs -n production -l app=myapp --since=5m | grep -c "ERROR")
if [ "$ERROR_COUNT" -lt 5 ]; then
echo "✅ 错误数在可接受范围内:$ERROR_COUNT"
else
echo "⚠️ 错误数过高:$ERROR_COUNT"
fi
echo "=== 验证完成 ==="
undefinedRollback Procedures
回滚流程
Automatic Rollback Triggers
自动回滚触发条件
- Health check failures > 3 consecutive
- Error rate > 5% for 5 minutes
- P99 latency > 2 seconds for 5 minutes
- 连续3次健康检查失败
- 5分钟内错误率>5%
- 5分钟内P99延迟>2秒
Manual Rollback Steps
手动回滚步骤
bash
undefinedbash
undefinedEstimated time: 5 minutes
预计时间:5分钟
1. Identify previous release
1. 查看历史版本
helm history myapp -n production
helm history myapp -n production
2. Rollback to previous version
2. 回滚到上一版本
helm rollback myapp [REVISION] -n production --wait
helm rollback myapp [REVISION] -n production --wait
3. Verify rollback
3. 验证回滚结果
kubectl get pods -n production -l app=myapp
curl -s https://api.example.com/health
kubectl get pods -n production -l app=myapp
curl -s https://api.example.com/health
4. If database migration needs reversal
4. 若数据库迁移需要回退
kubectl exec -n production deploy/myapp --
npm run migrate:down
npm run migrate:down
undefinedkubectl exec -n production deploy/myapp --
npm run migrate:down
npm run migrate:down
undefinedData Recovery
数据恢复
bash
undefinedbash
undefinedRestore from backup if needed
必要时从备份恢复
kubectl exec -n production deploy/myapp --
pg_restore -d myapp_production backup_20240101_120000.dump
pg_restore -d myapp_production backup_20240101_120000.dump
undefinedkubectl exec -n production deploy/myapp --
pg_restore -d myapp_production backup_20240101_120000.dump
pg_restore -d myapp_production backup_20240101_120000.dump
undefinedTroubleshooting
故障排查
Common Issues
常见问题
markdown
undefinedmarkdown
undefinedIssue: Pods stuck in ImagePullBackOff
问题:Pod卡在ImagePullBackOff状态
Symptoms:
- Pods show ImagePullBackOff status
- Events show "Failed to pull image"
Resolution:
- Verify image exists:
docker pull registry.example.com/myapp:v1.2.3 - Check registry credentials:
kubectl get secret regcred -n production - Recreate secret if needed:
bash
kubectl create secret docker-registry regcred \ --docker-server=registry.example.com \ --docker-username=user \ --docker-password=pass \ -n production
症状:
- Pod状态显示为ImagePullBackOff
- 事件日志显示"Failed to pull image"
解决方法:
- 验证镜像存在:
docker pull registry.example.com/myapp:v1.2.3 - 检查镜像仓库凭证:
kubectl get secret regcred -n production - 必要时重新创建密钥:
bash
kubectl create secret docker-registry regcred \ --docker-server=registry.example.com \ --docker-username=user \ --docker-password=pass \ -n production
Issue: Health checks failing
问题:健康检查失败
Symptoms:
- Pods restarting frequently
- Readiness probe failures in events
Resolution:
- Check application logs:
kubectl logs -n production deploy/myapp - Verify environment variables:
kubectl exec -n production deploy/myapp -- env - Test health endpoint manually:
kubectl port-forward deploy/myapp 8080:8080 - Increase probe timeouts if startup is slow
undefined症状:
- Pod频繁重启
- 事件日志显示就绪探针失败
解决方法:
- 查看应用日志:
kubectl logs -n production deploy/myapp - 验证环境变量:
kubectl exec -n production deploy/myapp -- env - 手动测试健康端点:
kubectl port-forward deploy/myapp 8080:8080 - 若启动缓慢则增加探针超时时间
undefinedLog Locations
日志位置
markdown
| Log Type | Location | Command |
|----------|----------|---------|
| Application | Pod stdout | `kubectl logs deploy/myapp` |
| Ingress | Ingress controller | `kubectl logs -n ingress deploy/nginx` |
| Events | Kubernetes events | `kubectl get events -n production` |
| Audit | Cluster audit logs | `/var/log/kubernetes/audit.log` |markdown
| 日志类型 | 位置 | 命令 |
|----------|----------|---------|
| 应用日志 | Pod标准输出 | `kubectl logs deploy/myapp` |
| 入口日志 | 入口控制器 | `kubectl logs -n ingress deploy/nginx` |
| Kubernetes事件 | 集群事件 | `kubectl get events -n production` |
| 审计日志 | 集群审计日志 | `/var/log/kubernetes/audit.log` |Emergency Contacts
紧急联系人
markdown
| Role | Name | Contact |
|------|------|---------|
| On-call Engineer | PagerDuty | #ops-escalation |
| Database Admin | DBA Team | dba@example.com |
| Security | Security Team | security@example.com |markdown
| 角色 | 姓名 | 联系方式 |
|------|------|---------|
| 值班工程师 | PagerDuty | #ops-escalation |
| 数据库管理员 | DBA团队 | dba@example.com |
| 安全团队 | 安全部门 | security@example.com |CI/CD Integration
CI/CD集成
yaml
undefinedyaml
undefined.github/workflows/deploy.yml
.github/workflows/deploy.yml
name: Deploy to Production
on:
push:
tags:
- 'v*'
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Deploy with Helm
run: |
helm upgrade --install myapp ./charts/myapp \
--namespace production \
--set image.tag=${{ github.ref_name }} \
--wait \
--timeout 10m
- name: Verify deployment
run: ./scripts/verify-deployment.sh
- name: Notify on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{"text": "⚠️ Deployment failed for ${{ github.ref_name }}"}undefinedname: Deploy to Production
on:
push:
tags:
- 'v*'
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
- name: Configure kubectl
uses: azure/k8s-set-context@v3
with:
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Deploy with Helm
run: |
helm upgrade --install myapp ./charts/myapp \
--namespace production \
--set image.tag=${{ github.ref_name }} \
--wait \
--timeout 10m
- name: Verify deployment
run: ./scripts/verify-deployment.sh
- name: Notify on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{"text": "⚠️ ${{ github.ref_name }}版本部署失败"}undefinedЛучшие практики
最佳实践
- Test rollback — регулярно тестируйте процедуры отката
- Incremental deploys — начинайте с малого % трафика
- Feature flags — разделяйте deploy и release
- Monitoring first — настройте мониторинг до деплоя
- Document everything — все шаги должны быть воспроизводимы
- Automate verification — скрипты вместо ручных проверок
- 测试回滚 — 定期测试回滚流程
- 增量部署 — 从小流量比例开始部署
- 功能开关(Feature flags) — 区分部署与发布环节
- 监控优先 — 在部署前配置好监控体系
- 全面文档化 — 所有操作步骤需可重复执行
- 自动化验证 — 使用脚本替代手动检查