deployment-guide

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Deployment Guide Creator

部署指南生成专家

Эксперт по созданию production-ready документации для деплоя.
用于生成可投入生产环境的部署文档的专家工具。

Core Principles

核心原则

Structure & Organization

结构与组织

  • Prerequisites listed first
  • Environment-specific instructions
  • Verification steps after each phase
  • Rollback procedures documented
  • Operational readiness covered
  • 先列出前置条件
  • 针对不同环境的说明
  • 每个阶段后的验证步骤
  • 记录回滚流程
  • 涵盖运维就绪检查

Documentation Standards

文档标准

  • Imperative tone for instructions
  • Exact commands with expected outputs
  • Version specifications for all tools
  • Context explaining why each step matters
  • Estimated execution times per phase
  • 说明使用祈使语气
  • 包含精确命令及预期输出
  • 所有工具标注版本要求
  • 说明每个步骤的重要性背景
  • 每个阶段的预计执行时间

Standard Guide Structure

标准指南结构

markdown
undefined
markdown
undefined

Deployment Guide: [Application Name]

部署指南:[应用名称]

Overview

概述

  • Application description
  • Deployment strategy (blue-green, rolling, canary)
  • Architecture diagram
  • Key contacts
  • 应用描述
  • 部署策略(蓝绿部署、滚动部署、金丝雀部署)
  • 架构图
  • 关键联系人

Prerequisites

前置条件

System Requirements

系统要求

  • OS: Ubuntu 22.04 LTS
  • RAM: 8GB minimum
  • Disk: 50GB SSD
  • Network: 100Mbps
  • 操作系统:Ubuntu 22.04 LTS
  • 内存:最低8GB
  • 磁盘:50GB SSD
  • 网络:100Mbps

Required Tools

所需工具

ToolVersionPurpose
Docker24.0+Containerization
kubectl1.28+Kubernetes CLI
Helm3.12+Package management
工具版本用途
Docker24.0+容器化
kubectl1.28+Kubernetes命令行工具
Helm3.12+包管理工具

Access Requirements

访问权限要求

  • SSH access to jump server
  • Kubernetes cluster credentials
  • Container registry credentials
  • Secrets management access
  • 拥有跳板机的SSH访问权限
  • 持有Kubernetes集群凭证
  • 持有容器镜像仓库凭证
  • 拥有密钥管理系统访问权限

Security Checklist

安全检查清单

  • VPN connection established
  • MFA configured
  • SSH keys rotated (< 90 days)
undefined
  • 已建立VPN连接
  • 已配置多因素认证(MFA)
  • SSH密钥已轮换(<90天)
undefined

Pre-Deployment Checklist

部署前检查清单

markdown
undefined
markdown
undefined

Pre-Deployment Checklist

部署前检查清单

Code Readiness

代码就绪检查

  • All tests passing in CI
  • Code review approved
  • Security scan completed
  • Documentation updated
  • CI中所有测试已通过
  • 代码评审已通过
  • 安全扫描已完成
  • 文档已更新

Environment Checks

环境检查

  • Target cluster healthy
  • Database backups verified
  • Monitoring alerts silenced
  • Maintenance window scheduled
  • 目标集群状态健康
  • 数据库备份已验证
  • 监控告警已静音
  • 已安排维护窗口

Rollback Preparation

回滚准备

  • Previous version tagged
  • Rollback procedure tested
  • Data migration reversible
  • Communication plan ready
undefined
  • 已标记上一版本
  • 回滚流程已测试
  • 数据迁移可回退
  • 沟通计划已准备就绪
undefined

Deployment Phases

部署阶段

Phase 1: Infrastructure Prep

阶段1:基础设施准备

bash
undefined
bash
undefined

Estimated time: 10 minutes

预计时间:10分钟

1. Verify cluster connectivity

1. 验证集群连通性

kubectl cluster-info
kubectl cluster-info

Expected: Kubernetes control plane is running

预期结果:Kubernetes控制平面正常运行

2. Check node readiness

2. 检查节点就绪状态

kubectl get nodes
kubectl get nodes

Expected: All nodes in "Ready" state

预期结果:所有节点处于"Ready"状态

3. Verify namespace exists

3. 验证命名空间是否存在

kubectl get namespace production
kubectl get namespace production

If not exists:

若不存在则创建:

kubectl create namespace production
undefined
kubectl create namespace production
undefined

Phase 2: Application Deployment

阶段2:应用部署

bash
undefined
bash
undefined

Estimated time: 15 minutes

预计时间:15分钟

1. Pull latest configuration

1. 拉取最新配置

git pull origin main cd deployment/kubernetes
git pull origin main cd deployment/kubernetes

2. Update image tag in values

2. 更新values文件中的镜像标签

export IMAGE_TAG=v1.2.3 sed -i "s/tag: .*/tag: ${IMAGE_TAG}/" values.yaml
export IMAGE_TAG=v1.2.3 sed -i "s/tag: .*/tag: ${IMAGE_TAG}/" values.yaml

3. Deploy with Helm

3. 使用Helm部署应用

helm upgrade --install myapp ./charts/myapp
--namespace production
--values values.yaml
--wait
--timeout 10m
helm upgrade --install myapp ./charts/myapp
--namespace production
--values values.yaml
--wait
--timeout 10m

Expected output:

预期输出:

Release "myapp" has been upgraded. Happy Helming!

Release "myapp" has been upgraded. Happy Helming!

undefined
undefined

Phase 3: Database Migration

阶段3:数据库迁移

bash
undefined
bash
undefined

Estimated time: 5-30 minutes (depends on data size)

预计时间:5-30分钟(取决于数据量)

1. Create backup before migration

1. 迁移前创建备份

kubectl exec -n production deploy/myapp --
pg_dump -Fc > backup_$(date +%Y%m%d_%H%M%S).dump
kubectl exec -n production deploy/myapp --
pg_dump -Fc > backup_$(date +%Y%m%d_%H%M%S).dump

2. Run migrations

2. 执行迁移

kubectl exec -n production deploy/myapp --
npm run migrate
kubectl exec -n production deploy/myapp --
npm run migrate

3. Verify migration status

3. 验证迁移状态

kubectl exec -n production deploy/myapp --
npm run migrate:status
undefined
kubectl exec -n production deploy/myapp --
npm run migrate:status
undefined

Kubernetes Deployment Example

Kubernetes部署示例

yaml
undefined
yaml
undefined

deployment.yaml

deployment.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: myapp namespace: production labels: app: myapp version: v1.2.3 spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: registry.example.com/myapp:v1.2.3 ports: - containerPort: 8080 resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 env: - name: NODE_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: myapp-secrets key: database-url
undefined
apiVersion: apps/v1 kind: Deployment metadata: name: myapp namespace: production labels: app: myapp version: v1.2.3 spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: registry.example.com/myapp:v1.2.3 ports: - containerPort: 8080 resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 env: - name: NODE_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: myapp-secrets key: database-url
undefined

Post-Deployment Verification

部署后验证

markdown
undefined
markdown
undefined

Verification Checklist

验证检查清单

Health Checks

健康检查

  • All pods running:
    kubectl get pods -n production
  • Endpoints healthy:
    curl -s https://api.example.com/health
  • Database connected: Check application logs
  • 所有Pod正常运行:
    kubectl get pods -n production
  • 端点状态健康:
    curl -s https://api.example.com/health
  • 数据库已连接:检查应用日志

Performance Validation

性能验证

  • Response time < 200ms (p95)
  • Error rate < 0.1%
  • Memory usage stable
  • 响应时间 < 200ms(p95)
  • 错误率 < 0.1%
  • 内存使用稳定

Security Checks

安全检查

  • TLS certificates valid
  • No sensitive data in logs
  • Rate limiting active
undefined
  • TLS证书有效
  • 日志中无敏感数据
  • 速率限制已启用
undefined

Verification Script

验证脚本

bash
#!/bin/bash
bash
#!/bin/bash

verify-deployment.sh

verify-deployment.sh

echo "=== Deployment Verification ==="
echo "=== 部署验证 ==="

Check pod status

检查Pod状态

echo "Checking pods..." READY_PODS=$(kubectl get pods -n production -l app=myapp
-o jsonpath='{.items[*].status.containerStatuses[0].ready}' | tr ' ' '\n' | grep -c true) TOTAL_PODS=$(kubectl get pods -n production -l app=myapp --no-headers | wc -l)
if [ "$READY_PODS" -eq "$TOTAL_PODS" ]; then echo "✅ All $TOTAL_PODS pods ready" else echo "❌ Only $READY_PODS of $TOTAL_PODS pods ready" exit 1 fi
echo "检查Pod状态..." READY_PODS=$(kubectl get pods -n production -l app=myapp
-o jsonpath='{.items[*].status.containerStatuses[0].ready}' | tr ' ' '\n' | grep -c true) TOTAL_PODS=$(kubectl get pods -n production -l app=myapp --no-headers | wc -l)
if [ "$READY_PODS" -eq "$TOTAL_PODS" ]; then echo "✅ 所有$TOTAL_PODS个Pod已就绪" else echo "❌ 仅$READY_PODS个Pod就绪,共$TOTAL_PODS个" exit 1 fi

Check endpoints

检查端点状态

echo "Checking health endpoint..." HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://api.example.com/health) if [ "$HTTP_CODE" -eq 200 ]; then echo "✅ Health endpoint returning 200" else echo "❌ Health endpoint returning $HTTP_CODE" exit 1 fi
echo "检查健康端点..." HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://api.example.com/health) if [ "$HTTP_CODE" -eq 200 ]; then echo "✅ 健康端点返回200状态码" else echo "❌ 健康端点返回$HTTP_CODE状态码" exit 1 fi

Check logs for errors

检查日志中的错误

echo "Checking for errors in logs..." ERROR_COUNT=$(kubectl logs -n production -l app=myapp --since=5m | grep -c "ERROR") if [ "$ERROR_COUNT" -lt 5 ]; then echo "✅ Error count acceptable: $ERROR_COUNT" else echo "⚠️ High error count: $ERROR_COUNT" fi
echo "=== Verification Complete ==="
undefined
echo "检查日志中的错误..." ERROR_COUNT=$(kubectl logs -n production -l app=myapp --since=5m | grep -c "ERROR") if [ "$ERROR_COUNT" -lt 5 ]; then echo "✅ 错误数在可接受范围内:$ERROR_COUNT" else echo "⚠️ 错误数过高:$ERROR_COUNT" fi
echo "=== 验证完成 ==="
undefined

Rollback Procedures

回滚流程

Automatic Rollback Triggers

自动回滚触发条件

  • Health check failures > 3 consecutive
  • Error rate > 5% for 5 minutes
  • P99 latency > 2 seconds for 5 minutes
  • 连续3次健康检查失败
  • 5分钟内错误率>5%
  • 5分钟内P99延迟>2秒

Manual Rollback Steps

手动回滚步骤

bash
undefined
bash
undefined

Estimated time: 5 minutes

预计时间:5分钟

1. Identify previous release

1. 查看历史版本

helm history myapp -n production
helm history myapp -n production

2. Rollback to previous version

2. 回滚到上一版本

helm rollback myapp [REVISION] -n production --wait
helm rollback myapp [REVISION] -n production --wait

3. Verify rollback

3. 验证回滚结果

kubectl get pods -n production -l app=myapp curl -s https://api.example.com/health
kubectl get pods -n production -l app=myapp curl -s https://api.example.com/health

4. If database migration needs reversal

4. 若数据库迁移需要回退

kubectl exec -n production deploy/myapp --
npm run migrate:down
undefined
kubectl exec -n production deploy/myapp --
npm run migrate:down
undefined

Data Recovery

数据恢复

bash
undefined
bash
undefined

Restore from backup if needed

必要时从备份恢复

kubectl exec -n production deploy/myapp --
pg_restore -d myapp_production backup_20240101_120000.dump
undefined
kubectl exec -n production deploy/myapp --
pg_restore -d myapp_production backup_20240101_120000.dump
undefined

Troubleshooting

故障排查

Common Issues

常见问题

markdown
undefined
markdown
undefined

Issue: Pods stuck in ImagePullBackOff

问题:Pod卡在ImagePullBackOff状态

Symptoms:
  • Pods show ImagePullBackOff status
  • Events show "Failed to pull image"
Resolution:
  1. Verify image exists:
    docker pull registry.example.com/myapp:v1.2.3
  2. Check registry credentials:
    kubectl get secret regcred -n production
  3. Recreate secret if needed:
    bash
    kubectl create secret docker-registry regcred \
      --docker-server=registry.example.com \
      --docker-username=user \
      --docker-password=pass \
      -n production
症状:
  • Pod状态显示为ImagePullBackOff
  • 事件日志显示"Failed to pull image"
解决方法:
  1. 验证镜像存在:
    docker pull registry.example.com/myapp:v1.2.3
  2. 检查镜像仓库凭证:
    kubectl get secret regcred -n production
  3. 必要时重新创建密钥:
    bash
    kubectl create secret docker-registry regcred \
      --docker-server=registry.example.com \
      --docker-username=user \
      --docker-password=pass \
      -n production

Issue: Health checks failing

问题:健康检查失败

Symptoms:
  • Pods restarting frequently
  • Readiness probe failures in events
Resolution:
  1. Check application logs:
    kubectl logs -n production deploy/myapp
  2. Verify environment variables:
    kubectl exec -n production deploy/myapp -- env
  3. Test health endpoint manually:
    kubectl port-forward deploy/myapp 8080:8080
  4. Increase probe timeouts if startup is slow
undefined
症状:
  • Pod频繁重启
  • 事件日志显示就绪探针失败
解决方法:
  1. 查看应用日志:
    kubectl logs -n production deploy/myapp
  2. 验证环境变量:
    kubectl exec -n production deploy/myapp -- env
  3. 手动测试健康端点:
    kubectl port-forward deploy/myapp 8080:8080
  4. 若启动缓慢则增加探针超时时间
undefined

Log Locations

日志位置

markdown
| Log Type | Location | Command |
|----------|----------|---------|
| Application | Pod stdout | `kubectl logs deploy/myapp` |
| Ingress | Ingress controller | `kubectl logs -n ingress deploy/nginx` |
| Events | Kubernetes events | `kubectl get events -n production` |
| Audit | Cluster audit logs | `/var/log/kubernetes/audit.log` |
markdown
| 日志类型 | 位置 | 命令 |
|----------|----------|---------|
| 应用日志 | Pod标准输出 | `kubectl logs deploy/myapp` |
| 入口日志 | 入口控制器 | `kubectl logs -n ingress deploy/nginx` |
| Kubernetes事件 | 集群事件 | `kubectl get events -n production` |
| 审计日志 | 集群审计日志 | `/var/log/kubernetes/audit.log` |

Emergency Contacts

紧急联系人

markdown
| Role | Name | Contact |
|------|------|---------|
| On-call Engineer | PagerDuty | #ops-escalation |
| Database Admin | DBA Team | dba@example.com |
| Security | Security Team | security@example.com |
markdown
| 角色 | 姓名 | 联系方式 |
|------|------|---------|
| 值班工程师 | PagerDuty | #ops-escalation |
| 数据库管理员 | DBA团队 | dba@example.com |
| 安全团队 | 安全部门 | security@example.com |

CI/CD Integration

CI/CD集成

yaml
undefined
yaml
undefined

.github/workflows/deploy.yml

.github/workflows/deploy.yml

name: Deploy to Production
on: push: tags: - 'v*'
jobs: deploy: runs-on: ubuntu-latest environment: production
steps:
  - uses: actions/checkout@v4

  - name: Configure kubectl
    uses: azure/k8s-set-context@v3
    with:
      kubeconfig: ${{ secrets.KUBE_CONFIG }}

  - name: Deploy with Helm
    run: |
      helm upgrade --install myapp ./charts/myapp \
        --namespace production \
        --set image.tag=${{ github.ref_name }} \
        --wait \
        --timeout 10m

  - name: Verify deployment
    run: ./scripts/verify-deployment.sh

  - name: Notify on failure
    if: failure()
    uses: slackapi/slack-github-action@v1
    with:
      payload: |
        {"text": "⚠️ Deployment failed for ${{ github.ref_name }}"}
undefined
name: Deploy to Production
on: push: tags: - 'v*'
jobs: deploy: runs-on: ubuntu-latest environment: production
steps:
  - uses: actions/checkout@v4

  - name: Configure kubectl
    uses: azure/k8s-set-context@v3
    with:
      kubeconfig: ${{ secrets.KUBE_CONFIG }}

  - name: Deploy with Helm
    run: |
      helm upgrade --install myapp ./charts/myapp \
        --namespace production \
        --set image.tag=${{ github.ref_name }} \
        --wait \
        --timeout 10m

  - name: Verify deployment
    run: ./scripts/verify-deployment.sh

  - name: Notify on failure
    if: failure()
    uses: slackapi/slack-github-action@v1
    with:
      payload: |
        {"text": "⚠️ ${{ github.ref_name }}版本部署失败"}
undefined

Лучшие практики

最佳实践

  1. Test rollback — регулярно тестируйте процедуры отката
  2. Incremental deploys — начинайте с малого % трафика
  3. Feature flags — разделяйте deploy и release
  4. Monitoring first — настройте мониторинг до деплоя
  5. Document everything — все шаги должны быть воспроизводимы
  6. Automate verification — скрипты вместо ручных проверок
  1. 测试回滚 — 定期测试回滚流程
  2. 增量部署 — 从小流量比例开始部署
  3. 功能开关(Feature flags) — 区分部署与发布环节
  4. 监控优先 — 在部署前配置好监控体系
  5. 全面文档化 — 所有操作步骤需可重复执行
  6. 自动化验证 — 使用脚本替代手动检查