ci-cd-pipelines

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When this skill is activated, always start your first response with the 🧢 emoji.

当激活本技能时，请始终以🧢表情作为你的第一条回复开头。

CI/CD Pipelines

CI/CD流水线

A practitioner's guide to continuous integration and continuous delivery for production systems. This skill covers pipeline design, GitHub Actions workflows, deployment strategies, and the operational patterns that keep software shipping safely at speed. The emphasis is on when to apply each pattern and why it matters, not just the YAML syntax.

CI/CD is not a tool configuration problem - it is a software delivery discipline. The pipeline is the product team's contract with production: every commit that passes is a candidate release, and the pipeline enforces that contract automatically.

这是一份面向生产系统的持续集成与持续交付实践指南。本技能涵盖流水线设计、GitHub Actions工作流、部署策略，以及保障软件安全快速交付的运维模式。重点在于何时应用每种模式以及为何它至关重要，而非仅仅讲解YAML语法。

CI/CD并非工具配置问题——它是一种软件交付规范。流水线是产品团队与生产环境的契约：每一个通过流水线的提交都是潜在的发布版本，而流水线会自动执行这份契约。

When to use this skill

何时使用本技能

Trigger this skill when the user:

Creates or modifies a GitHub Actions, GitLab CI, or Jenkins pipeline
Implements PR checks, branch protection rules, or required status checks
Sets up deployment environments (staging, production) with promotion gates
Implements blue-green, canary, rolling, or recreate deployment strategies
Configures caching for dependencies or build artifacts to speed up pipelines
Sets up matrix builds to test across multiple Node versions or operating systems
Automates secrets injection, environment promotion, or rollback procedures
Diagnoses a slow pipeline and needs to find what to parallelize or cache

Do NOT trigger this skill for:

Infrastructure provisioning from scratch (use a Terraform/Kubernetes skill instead)
Application-level testing strategies unrelated to pipeline structure

当用户有以下需求时触发本技能：

创建或修改GitHub Actions、GitLab CI或Jenkins流水线
实现PR检查、分支保护规则或必要状态检查
搭建带有晋升闸门的部署环境（预发布、生产）
落地蓝绿部署、金丝雀发布、滚动更新或重建式部署策略
配置依赖或构建制品的缓存以加速流水线
搭建构建矩阵以跨多个Node版本或操作系统进行测试
自动化密钥注入、环境晋升或回滚流程
诊断缓慢的流水线，确定可并行化或缓存的环节

请勿在以下场景触发本技能：

从零开始的基础设施配置（请使用Terraform/Kubernetes相关技能）
与流水线结构无关的应用级测试策略

Key principles

核心原则

Fail fast - The pipeline should surface errors as early as possible. Run linting and type-checking before tests. Run unit tests before integration tests. A 30-second lint failure beats a 10-minute test run that tells you the same thing.
Cache aggressively -
```
node_modules
```
, Maven
```
.m2
```
, pip wheels, and Docker layer caches can turn a 12-minute pipeline into a 3-minute one. Cache by the lockfile hash so the cache busts exactly when dependencies change.
Keep pipelines under 10 minutes - Pipelines longer than 10 minutes cause developers to stop watching them, batch commits to avoid waiting, and skip running them locally. Parallelize jobs, split slow test suites, and move heavy analysis to scheduled runs.
Trunk-based development - Short-lived branches merged frequently (at least daily) are the prerequisite for effective CI. Long-lived branches turn CI into a lie - the code integrates in CI but not in reality.
Immutable artifacts - Build once, deploy everywhere. The same Docker image or archive that passed staging must be the thing that goes to production. Never rebuild from source at deploy time.

快速失败 - 流水线应尽早暴露错误。在测试前先运行代码检查和类型校验，在集成测试前先运行单元测试。30秒的代码检查失败，远比10分钟的测试运行后才告知你同样的问题要好。
积极缓存 -
```
node_modules
```
、Maven
```
.m2
```
、pip包以及Docker层缓存可将12分钟的流水线缩短至3分钟。通过锁文件的哈希值进行缓存，确保仅当依赖变更时才失效缓存。
控制流水线时长在10分钟内 - 超过10分钟的流水线会导致开发者不再关注，为避免等待而批量提交代码，甚至跳过本地运行。可并行化任务、拆分缓慢的测试套件，将重型分析移至定时运行。
主干开发模式 - 频繁合并（至少每日一次）的短期分支是有效CI的前提。长期分支会让CI名存实亡——代码在CI中集成，但实际并未真正集成。
不可变制品 - 一次构建，随处部署。通过预发布环境的Docker镜像或归档包，必须直接部署到生产环境。绝不在部署时从源码重新构建。

Core concepts

核心概念

Pipeline stages run in order and each must pass before the next begins:

build -> test -> deploy:staging -> approve -> deploy:production

Triggers determine when a pipeline runs:

```
push
```
on any branch - run build and test
```
pull_request
```
- run full check suite for the PR
```
schedule
```
(cron) - run security scans or long test suites nightly
```
workflow_dispatch
```
- manual trigger with optional inputs for on-demand deploys

Environments are named targets (staging, production) with their own secrets, protection rules, and deployment history. GitHub Environments let you require manual approvals before promoting to production.

Secrets management - secrets live in GitHub Secrets or an external vault (Vault, AWS Secrets Manager). They are injected as environment variables at runtime. Never print them in logs. Rotate them on a schedule.

Artifact storage - build outputs (compiled code, Docker images, test reports) are stored in GitHub Artifacts or a registry (GHCR, ECR, Docker Hub). Artifacts have a retention window; images are tagged with the commit SHA.

流水线阶段按顺序运行，每个阶段必须通过后才能进入下一阶段：

build -> test -> deploy:staging -> approve -> deploy:production

触发器决定流水线的运行时机：

```
push
```
到任意分支时 - 运行构建和测试
```
pull_request
```
时 - 为PR运行完整检查套件
```
schedule
```
（定时任务）- 夜间运行安全扫描或长时测试套件
```
workflow_dispatch
```
- 手动触发，支持按需部署的可选输入参数

环境是命名的目标环境（预发布、生产），拥有独立的密钥、保护规则和部署历史。GitHub Environments允许你在晋升到生产环境前要求手动审批。

密钥管理 - 密钥存储在GitHub Secrets或外部密钥管理系统（如Vault、AWS Secrets Manager）中，在运行时以环境变量的形式注入。绝不要在日志中打印密钥，需定期轮换密钥。

制品存储 - 构建输出（编译后的代码、Docker镜像、测试报告）存储在GitHub Artifacts或镜像仓库（如GHCR、ECR、Docker Hub）中。制品有保留期限，镜像需使用提交SHA打标签。

Common tasks

常见任务

Set up GitHub Actions for Node.js

为Node.js搭建GitHub Actions

A standard Node.js pipeline with lint, test, and build, using dependency caching:

yaml

undefined

一个包含代码检查、测试和构建的标准Node.js流水线，使用依赖缓存：

yaml

undefined

.github/workflows/ci.yml

on: push: branches: [main] pull_request:

jobs: ci: runs-on: ubuntu-latest

steps:
  - uses: actions/checkout@v4

  - uses: actions/setup-node@v4
    with:
      node-version: 20
      cache: npm            # caches ~/.npm by package-lock.json hash

  - run: npm ci             # clean install from lockfile

  - run: npm run lint

  - run: npm test -- --coverage

  - run: npm run build

  - uses: actions/upload-artifact@v4
    with:
      name: dist
      path: dist/
      retention-days: 7


> Use `npm ci` instead of `npm install` in CI. It is faster, deterministic,
> and will fail if `package-lock.json` is out of sync with `package.json`.

---

on: push: branches: [main] pull_request:

jobs: ci: runs-on: ubuntu-latest

steps:
  - uses: actions/checkout@v4

  - uses: actions/setup-node@v4
    with:
      node-version: 20
      cache: npm            # caches ~/.npm by package-lock.json hash

  - run: npm ci             # clean install from lockfile

  - run: npm run lint

  - run: npm test -- --coverage

  - run: npm run build

  - uses: actions/upload-artifact@v4
    with:
      name: dist
      path: dist/
      retention-days: 7


> 在CI中使用`npm ci`而非`npm install`。它更快、结果可预测，且当`package-lock.json`与`package.json`不同步时会直接失败。

---

Implement PR checks

实现PR检查

Require the CI workflow to pass before merging. Configure in GitHub Settings > Branches > Branch protection rules:

Enable "Require status checks to pass before merging"
Add the job name (
```
ci
```
) as a required check
Enable "Require branches to be up to date before merging"

yaml

undefined

要求CI工作流通过后才能合并代码。在GitHub设置 > 分支 > 分支保护规则中配置：

启用“合并前需要状态检查通过”
添加任务名称（
```
ci
```
）为必要检查项
启用“合并前需要分支保持最新”

yaml

undefined

.github/workflows/pr-check.yml

on: pull_request: types: [opened, synchronize, reopened]

jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 cache: npm - run: npm ci - run: npm run lint

test: runs-on: ubuntu-latest needs: lint # only run tests if lint passes steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 cache: npm - run: npm ci - run: npm test

typecheck: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 cache: npm - run: npm ci - run: npm run typecheck

---

on: pull_request: types: [opened, synchronize, reopened]

jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 cache: npm - run: npm ci - run: npm run lint

typecheck: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 with: node-version: 20 cache: npm - run: npm ci - run: npm run typecheck

---

Set up deployment environments with approvals

搭建带审批的部署环境

Use GitHub Environments to gate production deploys behind a manual approval:

yaml

undefined

使用GitHub Environments，在生产部署前设置手动审批闸门：

yaml

undefined

.github/workflows/deploy.yml

on: push: branches: [main]

jobs: build: runs-on: ubuntu-latest outputs: image-tag: ${{ steps.tag.outputs.tag }} steps: - uses: actions/checkout@v4 - id: tag run: echo "tag=${{ github.sha }}" >> $GITHUB_OUTPUT - run: docker build -t myapp:${{ github.sha }} . - run: docker push ghcr.io/org/myapp:${{ github.sha }} env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

deploy-staging: needs: build runs-on: ubuntu-latest environment: staging # uses staging secrets + URL steps: - run: ./scripts/deploy.sh env: IMAGE_TAG: ${{ needs.build.outputs.image-tag }} DEPLOY_URL: ${{ vars.DEPLOY_URL }} API_KEY: ${{ secrets.DEPLOY_API_KEY }}

deploy-production: needs: deploy-staging runs-on: ubuntu-latest environment: production # requires manual approval in GitHub UI steps: - run: ./scripts/deploy.sh env: IMAGE_TAG: ${{ needs.build.outputs.image-tag }} DEPLOY_URL: ${{ vars.DEPLOY_URL }} API_KEY: ${{ secrets.DEPLOY_API_KEY }}


Configure environment protection rules in GitHub Settings > Environments >
production > Required reviewers.

---

on: push: branches: [main]


在GitHub设置 > 环境 > production > 必要审阅者中配置环境保护规则。

---

Implement blue-green deployment

实现蓝绿部署

Route traffic between two identical environments. Switch instantly; roll back by switching back:

yaml

  deploy-blue-green:
    runs-on: ubuntu-latest
    environment: production
    env:
      IMAGE_TAG: ${{ needs.build.outputs.image-tag }}
    steps:
      - uses: actions/checkout@v4

      - name: Determine inactive slot
        id: slot
        run: |
          ACTIVE=$(curl -s https://api.example.com/active-slot)
          if [ "$ACTIVE" = "blue" ]; then
            echo "target=green" >> $GITHUB_OUTPUT
          else
            echo "target=blue" >> $GITHUB_OUTPUT
          fi

      - name: Deploy to inactive slot
        run: ./scripts/deploy-slot.sh ${{ steps.slot.outputs.target }} $IMAGE_TAG

      - name: Run smoke tests against inactive slot
        run: ./scripts/smoke-test.sh ${{ steps.slot.outputs.target }}

      - name: Switch traffic to new slot
        run: ./scripts/switch-slot.sh ${{ steps.slot.outputs.target }}

      - name: Verify production is healthy
        run: ./scripts/health-check.sh production

      - name: Roll back on failure
        if: failure()
        run: ./scripts/switch-slot.sh ${{ steps.slot.outputs.target == 'blue' && 'green' || 'blue' }}

See
references/deployment-strategies.md
for a detailed comparison of blue-green vs canary vs rolling vs recreate.

在两个相同的环境之间切换流量。可即时切换，回滚时只需切回原环境：

yaml

  deploy-blue-green:
    runs-on: ubuntu-latest
    environment: production
    env:
      IMAGE_TAG: ${{ needs.build.outputs.image-tag }}
    steps:
      - uses: actions/checkout@v4

      - name: Determine inactive slot
        id: slot
        run: |
          ACTIVE=$(curl -s https://api.example.com/active-slot)
          if [ "$ACTIVE" = "blue" ]; then
            echo "target=green" >> $GITHUB_OUTPUT
          else
            echo "target=blue" >> $GITHUB_OUTPUT
          fi

      - name: Deploy to inactive slot
        run: ./scripts/deploy-slot.sh ${{ steps.slot.outputs.target }} $IMAGE_TAG

      - name: Run smoke tests against inactive slot
        run: ./scripts/smoke-test.sh ${{ steps.slot.outputs.target }}

      - name: Switch traffic to new slot
        run: ./scripts/switch-slot.sh ${{ steps.slot.outputs.target }}

      - name: Verify production is healthy
        run: ./scripts/health-check.sh production

      - name: Roll back on failure
        if: failure()
        run: ./scripts/switch-slot.sh ${{ steps.slot.outputs.target == 'blue' && 'green' || 'blue' }}

如需详细对比蓝绿部署、金丝雀发布、滚动更新和重建式部署，请查看
references/deployment-strategies.md
。

Implement canary release with rollback

实现带回滚的金丝雀发布

Route a small percentage of traffic to the new version before full rollout:

yaml

  deploy-canary:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Deploy canary (10% traffic)
        run: ./scripts/deploy-canary.sh ${{ env.IMAGE_TAG }} 10

      - name: Monitor canary for 5 minutes
        run: |
          for i in $(seq 1 10); do
            sleep 30
            ERROR_RATE=$(./scripts/get-error-rate.sh canary)
            echo "Canary error rate: $ERROR_RATE%"
            if (( $(echo "$ERROR_RATE > 1.0" | bc -l) )); then
              echo "Error rate too high. Rolling back canary."
              ./scripts/rollback-canary.sh
              exit 1
            fi
          done

      - name: Promote canary to 100%
        run: ./scripts/promote-canary.sh ${{ env.IMAGE_TAG }}

      - name: Roll back on any failure
        if: failure()
        run: ./scripts/rollback-canary.sh

在全量发布前，将小部分流量路由到新版本：

yaml

  deploy-canary:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: actions/checkout@v4

      - name: Deploy canary (10% traffic)
        run: ./scripts/deploy-canary.sh ${{ env.IMAGE_TAG }} 10

      - name: Monitor canary for 5 minutes
        run: |
          for i in $(seq 1 10); do
            sleep 30
            ERROR_RATE=$(./scripts/get-error-rate.sh canary)
            echo "Canary error rate: $ERROR_RATE%"
            if (( $(echo "$ERROR_RATE > 1.0" | bc -l) )); then
              echo "Error rate too high. Rolling back canary."
              ./scripts/rollback-canary.sh
              exit 1
            fi
          done

      - name: Promote canary to 100%
        run: ./scripts/promote-canary.sh ${{ env.IMAGE_TAG }}

      - name: Roll back on any failure
        if: failure()
        run: ./scripts/rollback-canary.sh

Cache dependencies and build artifacts

缓存依赖与构建制品

Cache

node_modules

by lockfile hash. Always restore-then-save so partial installs don't get cached:

yaml

      - name: Cache node_modules
        id: cache-node-modules
        uses: actions/cache@v4
        with:
          path: node_modules
          key: node-modules-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
          restore-keys: |
            node-modules-${{ runner.os }}-

      - name: Install dependencies
        if: steps.cache-node-modules.outputs.cache-hit != 'true'
        run: npm ci

      - name: Cache Next.js build
        uses: actions/cache@v4
        with:
          path: |
            .next/cache
          key: nextjs-${{ runner.os }}-${{ hashFiles('package-lock.json') }}-${{ hashFiles('**/*.ts', '**/*.tsx') }}
          restore-keys: |
            nextjs-${{ runner.os }}-${{ hashFiles('package-lock.json') }}-
            nextjs-${{ runner.os }}-

Cache keys should go from most-specific to least-specific in
restore-keys
. A partial cache restore is almost always faster than a cold install.

通过锁文件哈希值缓存

node_modules

。始终先恢复缓存再保存，避免缓存不完整的安装包：

yaml

      - name: Cache node_modules
        id: cache-node-modules
        uses: actions/cache@v4
        with:
          path: node_modules
          key: node-modules-${{ runner.os }}-${{ hashFiles('package-lock.json') }}
          restore-keys: |
            node-modules-${{ runner.os }}-

      - name: Install dependencies
        if: steps.cache-node-modules.outputs.cache-hit != 'true'
        run: npm ci

      - name: Cache Next.js build
        uses: actions/cache@v4
        with:
          path: |
            .next/cache
          key: nextjs-${{ runner.os }}-${{ hashFiles('package-lock.json') }}-${{ hashFiles('**/*.ts', '**/*.tsx') }}
          restore-keys: |
            nextjs-${{ runner.os }}-${{ hashFiles('package-lock.json') }}-
            nextjs-${{ runner.os }}-

在
restore-keys
中，缓存键应从最具体到最不具体排列。部分缓存恢复几乎总是比全新安装更快。

Set up matrix builds

搭建构建矩阵

Test across multiple Node versions and operating systems in parallel:

yaml

  test-matrix:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false        # don't cancel other jobs if one fails
      matrix:
        node-version: [18, 20, 22]
        os: [ubuntu-latest, windows-latest, macos-latest]
        exclude:
          - os: windows-latest
            node-version: 18  # don't test EOL Node on Windows

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
          cache: npm
      - run: npm ci
      - run: npm test

Set

fail-fast: false

when the matrix combinations are independent. Use

fail-fast: true

(default) when any failure means the whole build is broken.

跨多个Node版本和操作系统并行测试：

yaml

  test-matrix:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false        # don't cancel other jobs if one fails
      matrix:
        node-version: [18, 20, 22]
        os: [ubuntu-latest, windows-latest, macos-latest]
        exclude:
          - os: windows-latest
            node-version: 18  # don't test EOL Node on Windows

    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
          cache: npm
      - run: npm ci
      - run: npm test

当矩阵组合相互独立时，设置

fail-fast: false

。当任何失败意味着整个构建失效时，使用默认的

fail-fast: true

。

Error handling

错误处理

Failure	Likely cause	Fix
`npm ci` fails with lockfile mismatch	`package.json` updated without re-running `npm install`	Run `npm install` locally and commit the updated `package-lock.json`
Cache miss on every run	Cache key includes volatile data (timestamps, random)	Use only stable inputs in cache key - lockfile hash, OS, Node version
Secrets not available in fork PR	GitHub does not expose secrets to workflows triggered by fork PRs	Use `pull_request_target` with caution, or require manual approval for external PRs
Workflow hangs with no output	Long-running process with no stdout, or missing `--ci` flag on test runner	Add `timeout-minutes` to the job; pass `--ci` flag to jest/vitest
Deploy fails but staging passed	Environment-specific secrets or config missing in production environment	Verify all `vars` and `secrets` are configured in the production environment settings
Matrix job passes on one OS but fails another	Path separators, line endings, or OS-specific tools diverge	Use `path.join()` in code; add `.gitattributes` for line endings; pin tool versions

故障场景	可能原因	修复方案
`npm ci` 因锁文件不匹配失败	更新了 `package.json` 但未重新运行 `npm install`	本地运行 `npm install` 并提交更新后的 `package-lock.json`
每次运行都缓存未命中	缓存键包含易变数据（如时间戳、随机值）	缓存键仅使用稳定输入——锁文件哈希值、操作系统、Node版本
分支PR中无法获取密钥	GitHub不会向分支PR触发的工作流暴露密钥	谨慎使用 `pull_request_target` ，或要求对外部PR进行手动审批
工作流挂起无输出	长时间运行的进程无标准输出，或测试运行器缺少 `--ci` 参数	为任务添加 `timeout-minutes` ；向jest/vitest传递 `--ci` 参数
预发布通过但生产部署失败	生产环境缺少特定环境的密钥或配置	验证生产环境设置中所有 `vars` 和 `secrets` 均已配置
矩阵任务在某操作系统通过但在另一系统失败	路径分隔符、行尾符或特定操作系统工具存在差异	代码中使用 `path.join()` ；添加 `.gitattributes` 统一行尾符；固定工具版本

References

参考资料

For detailed implementation guidance on specific deployment strategies:

```
references/deployment-strategies.md
```
- blue-green, canary, rolling, recreate, A/B, and shadow deployments with ASCII diagrams and decision framework

Only load the references file when choosing or implementing a specific deployment strategy - it is detailed and will consume context.

如需特定部署策略的详细实现指南：

```
references/deployment-strategies.md
```
- 包含蓝绿部署、金丝雀发布、滚动更新、重建式部署、A/B测试和影子部署的ASCII示意图与决策框架

仅在选择或落地特定部署策略时加载该参考文件——内容详细，会占用较多上下文资源。

ci-cd-pipelines

Original

Translation

CI/CD Pipelines

CI/CD流水线

When to use this skill

何时使用本技能

Key principles

核心原则

Core concepts

核心概念

Common tasks

常见任务

Set up GitHub Actions for Node.js

为Node.js搭建GitHub Actions

.github/workflows/ci.yml

.github/workflows/ci.yml

Implement PR checks

实现PR检查

.github/workflows/pr-check.yml

.github/workflows/pr-check.yml

Set up deployment environments with approvals

搭建带审批的部署环境

.github/workflows/deploy.yml

.github/workflows/deploy.yml

Implement blue-green deployment

实现蓝绿部署

Implement canary release with rollback

实现带回滚的金丝雀发布

Cache dependencies and build artifacts

缓存依赖与构建制品

Set up matrix builds

搭建构建矩阵

Error handling

错误处理

References

参考资料

Related skills

相关技能