Deployment Guide Creator

部署指南生成专家

Эксперт по созданию production-ready документации для деплоя.

用于生成可投入生产环境的部署文档的专家工具。

Core Principles

核心原则

Structure & Organization

结构与组织

Prerequisites listed first
Environment-specific instructions
Verification steps after each phase
Rollback procedures documented
Operational readiness covered

先列出前置条件
针对不同环境的说明
每个阶段后的验证步骤
记录回滚流程
涵盖运维就绪检查

Documentation Standards

文档标准

Imperative tone for instructions
Exact commands with expected outputs
Version specifications for all tools
Context explaining why each step matters
Estimated execution times per phase

说明使用祈使语气
包含精确命令及预期输出
所有工具标注版本要求
说明每个步骤的重要性背景
每个阶段的预计执行时间

Standard Guide Structure

标准指南结构

markdown

undefined

markdown

undefined

Deployment Guide: [Application Name]

部署指南：[应用名称]

Overview

概述

Application description
Deployment strategy (blue-green, rolling, canary)
Architecture diagram
Key contacts

应用描述
部署策略（蓝绿部署、滚动部署、金丝雀部署）
架构图
关键联系人

Prerequisites

前置条件

System Requirements

系统要求

OS: Ubuntu 22.04 LTS
RAM: 8GB minimum
Disk: 50GB SSD
Network: 100Mbps

操作系统：Ubuntu 22.04 LTS
内存：最低8GB
磁盘：50GB SSD
网络：100Mbps

Required Tools

所需工具

Tool	Version	Purpose
Docker	24.0+	Containerization
kubectl	1.28+	Kubernetes CLI
Helm	3.12+	Package management

工具	版本	用途
Docker	24.0+	容器化
kubectl	1.28+	Kubernetes命令行工具
Helm	3.12+	包管理工具

Access Requirements

访问权限要求

SSH access to jump server
Kubernetes cluster credentials
Container registry credentials
Secrets management access

拥有跳板机的SSH访问权限
持有Kubernetes集群凭证
持有容器镜像仓库凭证
拥有密钥管理系统访问权限

Security Checklist

安全检查清单

VPN connection established
MFA configured
SSH keys rotated (< 90 days)

undefined

已建立VPN连接
已配置多因素认证（MFA）
SSH密钥已轮换（<90天）

undefined

Pre-Deployment Checklist

部署前检查清单

markdown

undefined

markdown

undefined

Pre-Deployment Checklist

部署前检查清单

Code Readiness

代码就绪检查

All tests passing in CI
Code review approved
Security scan completed
Documentation updated

CI中所有测试已通过
代码评审已通过
安全扫描已完成
文档已更新

Environment Checks

环境检查

Target cluster healthy
Database backups verified
Monitoring alerts silenced
Maintenance window scheduled

目标集群状态健康
数据库备份已验证
监控告警已静音
已安排维护窗口

Rollback Preparation

回滚准备

Previous version tagged
Rollback procedure tested
Data migration reversible
Communication plan ready

undefined

已标记上一版本
回滚流程已测试
数据迁移可回退
沟通计划已准备就绪

undefined

Deployment Phases

部署阶段

Phase 1: Infrastructure Prep

阶段1：基础设施准备

bash

undefined

bash

undefined

Estimated time: 10 minutes

预计时间：10分钟

1. Verify cluster connectivity

1. 验证集群连通性

kubectl cluster-info

Expected: Kubernetes control plane is running

预期结果：Kubernetes控制平面正常运行

2. Check node readiness

2. 检查节点就绪状态

kubectl get nodes

Expected: All nodes in "Ready" state

预期结果：所有节点处于"Ready"状态

3. Verify namespace exists

3. 验证命名空间是否存在

kubectl get namespace production

If not exists:

若不存在则创建：

kubectl create namespace production

undefined

kubectl create namespace production

undefined

Phase 2: Application Deployment

阶段2：应用部署

bash

undefined

bash

undefined

Estimated time: 15 minutes

预计时间：15分钟

1. Pull latest configuration

1. 拉取最新配置

git pull origin main cd deployment/kubernetes

2. Update image tag in values

2. 更新values文件中的镜像标签

export IMAGE_TAG=v1.2.3 sed -i "s/tag: .*/tag: ${IMAGE_TAG}/" values.yaml

3. Deploy with Helm

3. 使用Helm部署应用

helm upgrade --install myapp ./charts/myapp
--namespace production
--values values.yaml
--wait
--timeout 10m

Expected output:

预期输出:

Release "myapp" has been upgraded. Happy Helming!

undefined

undefined

Phase 3: Database Migration

阶段3：数据库迁移

bash

undefined

bash

undefined

Estimated time: 5-30 minutes (depends on data size)

预计时间：5-30分钟（取决于数据量）

1. Create backup before migration

1. 迁移前创建备份

kubectl exec -n production deploy/myapp --
pg_dump -Fc > backup_$(date +%Y%m%d_%H%M%S).dump

2. Run migrations

2. 执行迁移

kubectl exec -n production deploy/myapp --
npm run migrate

3. Verify migration status

3. 验证迁移状态

kubectl exec -n production deploy/myapp --
npm run migrate:status

undefined

kubectl exec -n production deploy/myapp --
npm run migrate:status

undefined

Kubernetes Deployment Example

Kubernetes部署示例

yaml

undefined

yaml

undefined

deployment.yaml

apiVersion: apps/v1 kind: Deployment metadata: name: myapp namespace: production labels: app: myapp version: v1.2.3 spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: registry.example.com/myapp:v1.2.3 ports: - containerPort: 8080 resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 env: - name: NODE_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: myapp-secrets key: database-url

undefined

apiVersion: apps/v1 kind: Deployment metadata: name: myapp namespace: production labels: app: myapp version: v1.2.3 spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: containers: - name: myapp image: registry.example.com/myapp:v1.2.3 ports: - containerPort: 8080 resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 5 periodSeconds: 5 env: - name: NODE_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: myapp-secrets key: database-url

undefined

Post-Deployment Verification

部署后验证

markdown

undefined

markdown

undefined

Verification Checklist

验证检查清单

Health Checks

健康检查

All pods running:
```
kubectl get pods -n production
```
Endpoints healthy:
```
curl -s https://api.example.com/health
```
Database connected: Check application logs

所有Pod正常运行：
```
kubectl get pods -n production
```
端点状态健康：
```
curl -s https://api.example.com/health
```
数据库已连接：检查应用日志

Performance Validation

性能验证

Response time < 200ms (p95)
Error rate < 0.1%
Memory usage stable

响应时间 < 200ms（p95）
错误率 < 0.1%
内存使用稳定

Security Checks

安全检查

TLS certificates valid
No sensitive data in logs
Rate limiting active

undefined

TLS证书有效
日志中无敏感数据
速率限制已启用

undefined

Verification Script

验证脚本

bash

#!/bin/bash

bash

#!/bin/bash

verify-deployment.sh

echo "=== Deployment Verification ==="

echo "=== 部署验证 ==="

Check pod status

检查Pod状态

echo "Checking pods..." READY_PODS=$(kubectl get pods -n production -l app=myapp
-o jsonpath='{.items[*].status.containerStatuses[0].ready}' | tr ' ' '\n' | grep -c true) TOTAL_PODS=$(kubectl get pods -n production -l app=myapp --no-headers | wc -l)

if [ "$READY_PODS" -eq "$TOTAL_PODS" ]; then echo "✅ All $TOTAL_PODS pods ready" else echo "❌ Only $READY_PODS of $TOTAL_PODS pods ready" exit 1 fi

echo "检查Pod状态..." READY_PODS=$(kubectl get pods -n production -l app=myapp
-o jsonpath='{.items[*].status.containerStatuses[0].ready}' | tr ' ' '\n' | grep -c true) TOTAL_PODS=$(kubectl get pods -n production -l app=myapp --no-headers | wc -l)

if [ "$READY_PODS" -eq "$TOTAL_PODS" ]; then echo "✅ 所有$TOTAL_PODS个Pod已就绪" else echo "❌ 仅$READY_PODS个Pod就绪，共$TOTAL_PODS个" exit 1 fi

Check endpoints

检查端点状态

echo "Checking health endpoint..." HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://api.example.com/health) if [ "$HTTP_CODE" -eq 200 ]; then echo "✅ Health endpoint returning 200" else echo "❌ Health endpoint returning $HTTP_CODE" exit 1 fi

echo "检查健康端点..." HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" https://api.example.com/health) if [ "$HTTP_CODE" -eq 200 ]; then echo "✅ 健康端点返回200状态码" else echo "❌ 健康端点返回$HTTP_CODE状态码" exit 1 fi

Check logs for errors

检查日志中的错误

echo "Checking for errors in logs..." ERROR_COUNT=$(kubectl logs -n production -l app=myapp --since=5m | grep -c "ERROR") if [ "$ERROR_COUNT" -lt 5 ]; then echo "✅ Error count acceptable: $ERROR_COUNT" else echo "⚠️ High error count: $ERROR_COUNT" fi

echo "=== Verification Complete ==="

undefined

echo "检查日志中的错误..." ERROR_COUNT=$(kubectl logs -n production -l app=myapp --since=5m | grep -c "ERROR") if [ "$ERROR_COUNT" -lt 5 ]; then echo "✅ 错误数在可接受范围内：$ERROR_COUNT" else echo "⚠️ 错误数过高：$ERROR_COUNT" fi

echo "=== 验证完成 ==="

undefined

Rollback Procedures

回滚流程

Automatic Rollback Triggers

自动回滚触发条件

Health check failures > 3 consecutive
Error rate > 5% for 5 minutes
P99 latency > 2 seconds for 5 minutes

连续3次健康检查失败
5分钟内错误率>5%
5分钟内P99延迟>2秒

Manual Rollback Steps

手动回滚步骤

bash

undefined

bash

undefined

Estimated time: 5 minutes

预计时间：5分钟

1. Identify previous release

1. 查看历史版本

helm history myapp -n production

2. Rollback to previous version

2. 回滚到上一版本

helm rollback myapp [REVISION] -n production --wait

3. Verify rollback

3. 验证回滚结果

kubectl get pods -n production -l app=myapp curl -s https://api.example.com/health

4. If database migration needs reversal

4. 若数据库迁移需要回退

kubectl exec -n production deploy/myapp --
npm run migrate:down

undefined

kubectl exec -n production deploy/myapp --
npm run migrate:down

undefined

Data Recovery

数据恢复

bash

undefined

bash

undefined

Restore from backup if needed

必要时从备份恢复

kubectl exec -n production deploy/myapp --
pg_restore -d myapp_production backup_20240101_120000.dump

undefined

kubectl exec -n production deploy/myapp --
pg_restore -d myapp_production backup_20240101_120000.dump

undefined

Troubleshooting

故障排查

Common Issues

常见问题

markdown

undefined

markdown

undefined

Issue: Pods stuck in ImagePullBackOff

问题：Pod卡在ImagePullBackOff状态

Symptoms:

Pods show ImagePullBackOff status
Events show "Failed to pull image"

Resolution:

Verify image exists:

docker pull registry.example.com/myapp:v1.2.3

Check registry credentials:

kubectl get secret regcred -n production

Recreate secret if needed:

bash

kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=user \
  --docker-password=pass \
  -n production

症状：

Pod状态显示为ImagePullBackOff
事件日志显示"Failed to pull image"

解决方法：

验证镜像存在：

docker pull registry.example.com/myapp:v1.2.3

检查镜像仓库凭证：

kubectl get secret regcred -n production

必要时重新创建密钥：

bash

kubectl create secret docker-registry regcred \
  --docker-server=registry.example.com \
  --docker-username=user \
  --docker-password=pass \
  -n production

Issue: Health checks failing

问题：健康检查失败

Symptoms:

Pods restarting frequently
Readiness probe failures in events

Resolution:

Check application logs:
```
kubectl logs -n production deploy/myapp
```

Verify environment variables:

kubectl exec -n production deploy/myapp -- env

Test health endpoint manually:

kubectl port-forward deploy/myapp 8080:8080

Increase probe timeouts if startup is slow

undefined

症状：

Pod频繁重启
事件日志显示就绪探针失败

解决方法：

查看应用日志：
```
kubectl logs -n production deploy/myapp
```

验证环境变量：

kubectl exec -n production deploy/myapp -- env

手动测试健康端点：

kubectl port-forward deploy/myapp 8080:8080

若启动缓慢则增加探针超时时间

undefined

Log Locations

日志位置

markdown

| Log Type | Location | Command |
|----------|----------|---------|
| Application | Pod stdout | `kubectl logs deploy/myapp` |
| Ingress | Ingress controller | `kubectl logs -n ingress deploy/nginx` |
| Events | Kubernetes events | `kubectl get events -n production` |
| Audit | Cluster audit logs | `/var/log/kubernetes/audit.log` |

markdown

| 日志类型 | 位置 | 命令 |
|----------|----------|---------|
| 应用日志 | Pod标准输出 | `kubectl logs deploy/myapp` |
| 入口日志 | 入口控制器 | `kubectl logs -n ingress deploy/nginx` |
| Kubernetes事件 | 集群事件 | `kubectl get events -n production` |
| 审计日志 | 集群审计日志 | `/var/log/kubernetes/audit.log` |

Emergency Contacts

紧急联系人

markdown

| Role | Name | Contact |
|------|------|---------|
| On-call Engineer | PagerDuty | #ops-escalation |
| Database Admin | DBA Team | dba@example.com |
| Security | Security Team | security@example.com |

markdown

| 角色 | 姓名 | 联系方式 |
|------|------|---------|
| 值班工程师 | PagerDuty | #ops-escalation |
| 数据库管理员 | DBA团队 | dba@example.com |
| 安全团队 | 安全部门 | security@example.com |

CI/CD Integration

CI/CD集成

yaml

undefined

yaml

undefined

.github/workflows/deploy.yml

name: Deploy to Production

on: push: tags: - 'v*'

jobs: deploy: runs-on: ubuntu-latest environment: production

steps:
  - uses: actions/checkout@v4

  - name: Configure kubectl
    uses: azure/k8s-set-context@v3
    with:
      kubeconfig: ${{ secrets.KUBE_CONFIG }}

  - name: Deploy with Helm
    run: |
      helm upgrade --install myapp ./charts/myapp \
        --namespace production \
        --set image.tag=${{ github.ref_name }} \
        --wait \
        --timeout 10m

  - name: Verify deployment
    run: ./scripts/verify-deployment.sh

  - name: Notify on failure
    if: failure()
    uses: slackapi/slack-github-action@v1
    with:
      payload: |
        {"text": "⚠️ Deployment failed for ${{ github.ref_name }}"}

undefined

name: Deploy to Production

on: push: tags: - 'v*'

jobs: deploy: runs-on: ubuntu-latest environment: production

steps:
  - uses: actions/checkout@v4

  - name: Configure kubectl
    uses: azure/k8s-set-context@v3
    with:
      kubeconfig: ${{ secrets.KUBE_CONFIG }}

  - name: Deploy with Helm
    run: |
      helm upgrade --install myapp ./charts/myapp \
        --namespace production \
        --set image.tag=${{ github.ref_name }} \
        --wait \
        --timeout 10m

  - name: Verify deployment
    run: ./scripts/verify-deployment.sh

  - name: Notify on failure
    if: failure()
    uses: slackapi/slack-github-action@v1
    with:
      payload: |
        {"text": "⚠️ ${{ github.ref_name }}版本部署失败"}

undefined

Лучшие практики

最佳实践

Test rollback — регулярно тестируйте процедуры отката
Incremental deploys — начинайте с малого % трафика
Feature flags — разделяйте deploy и release
Monitoring first — настройте мониторинг до деплоя
Document everything — все шаги должны быть воспроизводимы
Automate verification — скрипты вместо ручных проверок

测试回滚 — 定期测试回滚流程
增量部署 — 从小流量比例开始部署
功能开关（Feature flags） — 区分部署与发布环节
监控优先 — 在部署前配置好监控体系
全面文档化 — 所有操作步骤需可重复执行
自动化验证 — 使用脚本替代手动检查

deployment-guide

Original

Translation

Deployment Guide Creator

部署指南生成专家

Core Principles

核心原则

Structure & Organization

结构与组织

Documentation Standards

文档标准

Standard Guide Structure

标准指南结构

Deployment Guide: [Application Name]

部署指南：[应用名称]

Overview

概述

Prerequisites

前置条件

System Requirements

系统要求

Required Tools

所需工具

Access Requirements

访问权限要求

Security Checklist

安全检查清单

Pre-Deployment Checklist

部署前检查清单

Pre-Deployment Checklist

部署前检查清单

Code Readiness

代码就绪检查

Environment Checks

环境检查

Rollback Preparation

回滚准备

Deployment Phases

部署阶段

Phase 1: Infrastructure Prep

阶段1：基础设施准备

Estimated time: 10 minutes

预计时间：10分钟

1. Verify cluster connectivity

1. 验证集群连通性

Expected: Kubernetes control plane is running

预期结果：Kubernetes控制平面正常运行

2. Check node readiness

2. 检查节点就绪状态

Expected: All nodes in "Ready" state

预期结果：所有节点处于"Ready"状态

3. Verify namespace exists

3. 验证命名空间是否存在

If not exists:

若不存在则创建：

Phase 2: Application Deployment

阶段2：应用部署

Estimated time: 15 minutes

预计时间：15分钟

1. Pull latest configuration

1. 拉取最新配置

2. Update image tag in values

2. 更新values文件中的镜像标签

3. Deploy with Helm

3. 使用Helm部署应用

Expected output:

预期输出:

Release "myapp" has been upgraded. Happy Helming!

Release "myapp" has been upgraded. Happy Helming!

Phase 3: Database Migration

阶段3：数据库迁移

Estimated time: 5-30 minutes (depends on data size)

预计时间：5-30分钟（取决于数据量）

1. Create backup before migration

1. 迁移前创建备份

2. Run migrations

2. 执行迁移

3. Verify migration status

3. 验证迁移状态

Kubernetes Deployment Example