cloud-architecture

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cloud Architecture

云架构

<!-- dual-compat-start -->
<!-- dual-compat-start -->

Use When

适用场景

  • Use when designing cloud deployments, Dockerising applications, laying out AWS or GCP environments, choosing a deployment pattern, or moving a workload from a single VM to a resilient multi-AZ topology.
  • The task needs reusable judgment, domain constraints, or a proven workflow rather than ad hoc advice.
  • 适用于设计云部署方案、将应用Docker容器化、规划AWS或GCP环境、选择部署模式,或是将工作负载从单一VM迁移至高可用多可用区(multi-AZ)拓扑结构时使用。
  • 任务需要可复用的判断逻辑、领域约束条件或经过验证的工作流,而非临时建议。

Do Not Use When

不适用场景

  • The task is unrelated to
    cloud-architecture
    or would be better handled by a more specific companion skill.
  • The request only needs a trivial answer and none of this skill's constraints or references materially help.
  • 任务与
    cloud-architecture
    无关,或更适合由更专业的配套技能处理。
  • 请求仅需要简单答案,且本技能的约束条件或参考资料无法提供实质性帮助。

Required Inputs

必要输入

  • Gather relevant project context, constraints, and the concrete problem to solve; load
    references
    only as needed.
  • Confirm the desired deliverable: design, code, review, migration plan, audit, or documentation.
  • 收集相关项目背景、约束条件及具体待解决问题;仅在需要时加载
    references
    内容。
  • 确认期望交付物:设计方案、代码、评审结果、迁移计划、审计报告或文档。

Workflow

工作流

  • Read this
    SKILL.md
    first, then load only the referenced deep-dive files that are necessary for the task.
  • Apply the ordered guidance, checklists, and decision rules in this skill instead of cherry-picking isolated snippets.
  • Produce the deliverable with assumptions, risks, and follow-up work made explicit when they matter.
  • 首先阅读本
    SKILL.md
    ,然后仅加载完成任务所需的参考深度文档。
  • 应用本技能中的有序指导、检查清单和决策规则,而非随意挑选孤立片段。
  • 交付成果需明确说明假设条件、风险及后续工作(若相关)。

Quality Standards

质量标准

  • Keep outputs execution-oriented, concise, and aligned with the repository's baseline engineering standards.
  • Preserve compatibility with existing project conventions unless the skill explicitly requires a stronger standard.
  • Prefer deterministic, reviewable steps over vague advice or tool-specific magic.
  • 输出内容需以执行为导向、简洁明了,并与仓库的基线工程标准保持一致。
  • 除非技能明确要求更高标准,否则需兼容现有项目约定。
  • 优先采用可确定、可评审的步骤,而非模糊建议或工具特定的“魔法操作”。

Anti-Patterns

反模式

  • Treating examples as copy-paste truth without checking fit, constraints, or failure modes.
  • Loading every reference file by default instead of using progressive disclosure.
  • 将示例视为可直接复制粘贴的标准答案,而不检查是否适配、约束条件或失败模式。
  • 默认加载所有参考文件,而非按需逐步披露。

Outputs

输出成果

  • A concrete result that fits the task: implementation guidance, review findings, architecture decisions, templates, or generated artifacts.
  • Clear assumptions, tradeoffs, or unresolved gaps when the task cannot be completed from available context alone.
  • References used, companion skills, or follow-up actions when they materially improve execution.
  • 符合任务需求的具体结果:实现指导、评审发现、架构决策、模板或生成的工件。
  • 当仅靠现有上下文无法完成任务时,需明确说明假设条件、权衡方案或未解决的缺口。
  • 若能实质性提升执行效果,需列出使用的参考资料、配套技能或后续行动。

Evidence Produced

生成的证据

CategoryArtifactFormatExample
CorrectnessCloud topology decision recordMarkdown doc per
skill-composition-standards/references/adr-template.md
covering compute, storage, network, and IAM picks
docs/cloud/topology-adr.md
SecurityCloud account hardening checklistMarkdown doc covering root-account, IAM, network, and logging baseline
docs/cloud/hardening-checklist.md
分类工件格式示例
正确性云拓扑决策记录遵循
skill-composition-standards/references/adr-template.md
的Markdown文档,涵盖计算、存储、网络和IAM选择
docs/cloud/topology-adr.md
安全性云账户加固检查清单涵盖根账户、IAM、网络和日志基线的Markdown文档
docs/cloud/hardening-checklist.md

References

参考资料

  • Use the
    references/
    directory for deep detail after reading the core workflow below.
<!-- dual-compat-end -->
  • 阅读核心工作流后,如需深入细节可使用
    references/
    目录下的内容。
<!-- dual-compat-end -->

Load Order

加载顺序

  1. Load
    world-class-engineering
    for the production bar.
  2. Load
    system-architecture-design
    for decomposition and contracts.
  3. Load this skill for the cloud runtime shape.
  4. Pair with
    cicd-pipelines
    for delivery,
    cicd-devsecops
    for gate policy,
    observability-monitoring
    for telemetry,
    deployment-release-engineering
    for rollout, and
    reliability-engineering
    for failure design.
  1. 加载
    world-class-engineering
    以获取生产环境标准。
  2. 加载
    system-architecture-design
    以进行分解和契约定义。
  3. 加载本技能以确定云运行时形态。
  4. 搭配
    cicd-pipelines
    用于交付、
    cicd-devsecops
    用于网关策略、
    observability-monitoring
    用于遥测、
    deployment-release-engineering
    用于发布、
    reliability-engineering
    用于故障设计。

Executable Outputs

可执行输出

For meaningful cloud architecture work produce: workload classification (stateless, stateful, async, batch, scheduled), chosen compute model with rationale, VPC + subnet + routing layout across AZs, Dockerfile (multi-stage, pinned base),
docker-compose.yml
mirroring production, IAM role inventory with least-privilege policies, deployment pattern choice and rollback runbook, cost posture (reserved/on-demand/spot split, Savings Plan assessment), and CDN/TLS/WAF/auto-scaling configuration.
对于有意义的云架构工作,需生成:工作负载分类(无状态、有状态、异步、批处理、定时任务)、选定的计算模型及理由、跨可用区的VPC+子网+路由布局、Dockerfile(多阶段、固定基础镜像)、镜像生产环境的
docker-compose.yml
、遵循最小权限原则的IAM角色清单、部署模式选择及回滚手册、成本配置(预留/按需/竞价实例拆分、Savings Plan评估),以及CDN/TLS/WAF/自动扩缩容配置。

Cloud Provider Selection

云服务商选择

East African SaaS workloads (Uganda, Kenya, Tanzania) weigh four dimensions: latency to users, data-residency obligations under Uganda DPPA 2019, support hours overlapping EAT (UTC+3), and price-per-workload.
DimensionAWSGCPAzure
Closest region
af-south-1
Cape Town (~30 ms)
europe-west1
(~160 ms)
southafricanorth
(~40 ms)
Data-residency fitStrong (af-south-1 + KMS)Weak (no ZA region for many services)Strong (ZA North + Customer Lockbox)
Support in EAT24/7 Business; EMEA TAM overlap24/7 Standard24/7 ProDirect; ZA partners
Managed services breadthWidestData/ML ledMicrosoft-stack integration
Default to AWS
af-south-1
for Uganda workloads with S-tier DPPA 2019 data; use Azure
southafricanorth
only for .NET-heavy stacks with an existing EA licence; avoid GCP as primary for DPPA-scoped data until a ZA region is GA.
bash
aws configure set region af-south-1 --profile ug-prod
aws ec2 describe-availability-zones --region af-south-1 --query "AvailabilityZones[].ZoneName"
东非SaaS工作负载(乌干达、肯尼亚、坦桑尼亚)需权衡四个维度:用户延迟、乌干达DPPA 2019法规下的数据驻留要求、与东非时间(UTC+3)重叠的支持时长,以及单位工作负载成本。
维度AWSGCPAzure
最近区域
af-south-1
开普敦(~30毫秒)
europe-west1
(~160毫秒)
southafricanorth
(~40毫秒)
数据驻留适配性强(af-south-1 + KMS)弱(多数服务无南非区域)强(南非北部区域 + Customer Lockbox)
东非时间支持7×24小时商务支持;EMEA技术客户经理重叠7×24小时标准支持7×24小时ProDirect支持;南非合作伙伴
托管服务广度最广以数据/ML为主与Microsoft栈深度集成
对于受乌干达DPPA 2019法规约束的S级数据工作负载,默认选择AWS
af-south-1
;仅当现有EA许可证且以.NET技术栈为主时,使用Azure
southafricanorth
;在GCP推出南非正式区域前,避免将其作为受DPPA约束数据的主要服务商。
bash
aws configure set region af-south-1 --profile ug-prod
aws ec2 describe-availability-zones --region af-south-1 --query "AvailabilityZones[].ZoneName"

Compute Model Decision Rules

计算模型决策规则

  1. Single app, low traffic, one region → EC2 + Docker Compose, backed by RDS Multi-AZ and S3.
  2. Multiple services, scaling needs, no Kubernetes skill → ECS Fargate with ALB.
  3. Multiple services, platform-ready team, polyglot runtime, multi-tenant isolation → Kubernetes (defer to
    kubernetes-platform
    ).
  4. Async fan-out, batch, or event pipeline → Lambda + SQS + EventBridge, with state in DynamoDB or RDS.
Kubernetes is a commitment, not a default.
  1. 单一应用、低流量、单区域 → EC2 + Docker Compose,搭配RDS多可用区和S3。
  2. 多服务、有扩缩容需求、无Kubernetes技能 → ECS Fargate + ALB。
  3. 多服务、团队具备平台化能力、多语言运行时、多租户隔离 → Kubernetes(参考
    kubernetes-platform
    技能)。
  4. 异步扇出、批处理或事件流水线 → Lambda + SQS + EventBridge,状态存储在DynamoDB或RDS中。
Kubernetes是一种承诺,而非默认选项。

Docker Fundamentals

Docker基础

Images are immutable, content-addressed layers. Containers are processes isolated by namespaces and cgroups. Disciplined Dockerfile authorship controls image size, cache behaviour, and attack surface.
镜像是不可变的、基于内容寻址的分层结构。容器是通过命名空间和cgroups实现隔离的进程。规范的Dockerfile编写可控制镜像大小、缓存行为和攻击面。

Dockerfile Checklist

Dockerfile检查清单

  • Multi-stage: compile/install in
    builder
    , copy only runtime artifacts to the final stage.
  • Pin base images by version and digest (
    node:22.11.0-slim@sha256:...
    ).
  • Prefer distroless or
    alpine
    for runtime; target image ≤ 200 MB.
  • Run as non-root (
    USER nonroot
    or dedicated UID ≥ 10000). Set
    WORKDIR
    ,
    EXPOSE
    ,
    HEALTHCHECK
    explicitly.
  • Secrets via mounted files or orchestrator env — never baked in.
    .dockerignore
    excludes
    .git
    ,
    node_modules
    , logs, fixtures, editor config.
  • Order
    COPY
    from least-changing (manifests) to most-changing (source) to preserve layer caching.
  • 多阶段构建:在
    builder
    阶段编译/安装,仅将运行时工件复制到最终阶段。
  • 固定基础镜像的版本和摘要(如
    node:22.11.0-slim@sha256:...
    )。
  • 运行时优先选择distroless或
    alpine
    ;目标镜像≤200MB。
  • 以非root用户运行(
    USER nonroot
    或专用UID≥10000)。显式设置
    WORKDIR
    EXPOSE
    HEALTHCHECK
  • 通过挂载文件或编排器环境变量管理密钥——绝不要嵌入镜像中。
    .dockerignore
    需排除
    .git
    node_modules
    、日志、测试数据、编辑器配置。
  • 按变更频率从低到高排序
    COPY
    指令(先复制清单,再复制源码)以保留层缓存。

Production Node.js Dockerfile

生产环境Node.js Dockerfile

dockerfile
undefined
dockerfile
undefined

syntax=docker/dockerfile:1.7

syntax=docker/dockerfile:1.7

FROM node:22.11.0-slim@sha256:<digest> AS builder WORKDIR /app COPY package*.json ./ RUN --mount=type=cache,target=/root/.npm npm ci --include=dev COPY . . RUN npm run build && npm prune --omit=dev
FROM gcr.io/distroless/nodejs22-debian12:nonroot AS runtime WORKDIR /app ENV NODE_ENV=production COPY --from=builder --chown=nonroot:nonroot /app/node_modules ./node_modules COPY --from=builder --chown=nonroot:nonroot /app/dist ./dist COPY --from=builder --chown=nonroot:nonroot /app/package.json ./ USER nonroot EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 CMD ["node", "dist/healthcheck.js"] CMD ["dist/server.js"]
undefined
FROM node:22.11.0-slim@sha256:<digest> AS builder WORKDIR /app COPY package*.json ./ RUN --mount=type=cache,target=/root/.npm npm ci --include=dev COPY . . RUN npm run build && npm prune --omit=dev
FROM gcr.io/distroless/nodejs22-debian12:nonroot AS runtime WORKDIR /app ENV NODE_ENV=production COPY --from=builder --chown=nonroot:nonroot /app/node_modules ./node_modules COPY --from=builder --chown=nonroot:nonroot /app/dist ./dist COPY --from=builder --chown=nonroot:nonroot /app/package.json ./ USER nonroot EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 CMD ["node", "dist/healthcheck.js"] CMD ["dist/server.js"]
undefined

Docker Compose

Docker Compose

One
docker-compose.yml
in the repo root mirrors production. Named volumes for stateful services; never bind-mount databases. Declare
healthcheck
on every dependency and gate startup with
depends_on.condition: service_healthy
.
yaml
name: saas-local
services:
  web:
    build: .
    env_file: .env
    ports: ["3000:3000"]
    depends_on:
      db: { condition: service_healthy }
      redis: { condition: service_healthy }
    healthcheck:
      test: ["CMD", "node", "dist/healthcheck.js"]
      interval: 30s
      timeout: 5s
      retries: 3
  db:
    image: postgres:16.4-alpine
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
      POSTGRES_DB: app
    volumes: ["db-data:/var/lib/postgresql/data"]
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d app"]
      interval: 10s
      timeout: 3s
      retries: 5
    secrets: [db_password]
  redis:
    image: redis:7.4-alpine
    command: ["redis-server", "--appendonly", "yes"]
    volumes: ["redis-data:/data"]
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5
volumes:
  db-data: {}
  redis-data: {}
secrets:
  db_password: { file: ./.secrets/db_password }
Commit
.env.example
, ignore
.env
, and provide env through the orchestrator in production. See
references/docker-compose-patterns.md
for the full template.
仓库根目录下的一个
docker-compose.yml
需镜像生产环境状态。为有状态服务使用命名卷;绝不要绑定挂载数据库。为每个依赖项声明
healthcheck
,并通过
depends_on.condition: service_healthy
控制启动顺序。
yaml
name: saas-local
services:
  web:
    build: .
    env_file: .env
    ports: ["3000:3000"]
    depends_on:
      db: { condition: service_healthy }
      redis: { condition: service_healthy }
    healthcheck:
      test: ["CMD", "node", "dist/healthcheck.js"]
      interval: 30s
      timeout: 5s
      retries: 3
  db:
    image: postgres:16.4-alpine
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
      POSTGRES_DB: app
    volumes: ["db-data:/var/lib/postgresql/data"]
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d app"]
      interval: 10s
      timeout: 3s
      retries: 5
    secrets: [db_password]
  redis:
    image: redis:7.4-alpine
    command: ["redis-server", "--appendonly", "yes"]
    volumes: ["redis-data:/data"]
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5
volumes:
  db-data: {}
  redis-data: {}
secrets:
  db_password: { file: ./.secrets/db_password }
提交
.env.example
,忽略
.env
,生产环境通过编排器提供环境变量。完整模板请参考
references/docker-compose-patterns.md

AWS Core Services

AWS核心服务

Compute

计算

Instance families:
t3
/
t4g
burstable (dev, low-traffic),
m6i
/
m7i
balanced production,
c6i
/
c7i
CPU-bound,
r6i
/
r7i
memory-bound,
i4i
NVMe-heavy. Place production instances in private subnets; expose only via ALB/NLB. Build AMIs with Packer or EC2 Image Builder; no manual console edits.
yaml
LaunchTemplate:
  Type: AWS::EC2::LaunchTemplate
  Properties:
    LaunchTemplateName: app-prod-lt
    LaunchTemplateData:
      ImageId: ami-0123456789abcdef0
      InstanceType: m6i.large
      IamInstanceProfile: { Name: app-prod-instance-profile }
      SecurityGroupIds: [sg-app]
      MetadataOptions: { HttpTokens: required, HttpEndpoint: enabled }
AppASG:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    MinSize: 2
    MaxSize: 10
    DesiredCapacity: 3
    HealthCheckType: ELB
    HealthCheckGracePeriod: 120
    VPCZoneIdentifier: [subnet-priv-a, subnet-priv-b, subnet-priv-c]
    LaunchTemplate:
      LaunchTemplateId: !Ref LaunchTemplate
      Version: !GetAtt LaunchTemplate.LatestVersionNumber
    TargetGroupARNs: [!Ref AppTargetGroup]
实例系列:
t3
/
t4g
突发型(开发、低流量场景),
m6i
/
m7i
均衡型生产环境,
c6i
/
c7i
CPU密集型,
r6i
/
r7i
内存密集型,
i4i
NVMe存储密集型。生产环境实例放置在私有子网中;仅通过ALB/NLB暴露。使用Packer或EC2 Image Builder构建AMI;禁止手动通过控制台修改。
yaml
LaunchTemplate:
  Type: AWS::EC2::LaunchTemplate
  Properties:
    LaunchTemplateName: app-prod-lt
    LaunchTemplateData:
      ImageId: ami-0123456789abcdef0
      InstanceType: m6i.large
      IamInstanceProfile: { Name: app-prod-instance-profile }
      SecurityGroupIds: [sg-app]
      MetadataOptions: { HttpTokens: required, HttpEndpoint: enabled }
AppASG:
  Type: AWS::AutoScaling::AutoScalingGroup
  Properties:
    MinSize: 2
    MaxSize: 10
    DesiredCapacity: 3
    HealthCheckType: ELB
    HealthCheckGracePeriod: 120
    VPCZoneIdentifier: [subnet-priv-a, subnet-priv-b, subnet-priv-c]
    LaunchTemplate:
      LaunchTemplateId: !Ref LaunchTemplate
      Version: !GetAtt LaunchTemplate.LatestVersionNumber
    TargetGroupARNs: [!Ref AppTargetGroup]

Storage

存储

Enable default encryption, block public access, and turn on versioning for any data you cannot reconstruct. Lifecycle: transition > 30 days to Standard-IA, > 90 days to Glacier Instant Retrieval, expire multipart uploads > 7 days. Use presigned URLs for customer uploads/downloads; never hand out credentials. Multipart upload threshold ≥ 100 MB; part size 8–16 MB.
bash
aws s3 presign s3://app-prod-uploads/customer/42/invoice.pdf \
  --expires-in 900 --region af-south-1

aws configure set default.s3.multipart_threshold 100MB
aws configure set default.s3.multipart_chunksize 16MB
json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyInsecureTransport",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": ["arn:aws:s3:::app-prod-uploads", "arn:aws:s3:::app-prod-uploads/*"],
      "Condition": { "Bool": { "aws:SecureTransport": "false" } }
    }
  ]
}
对所有无法重建的数据启用默认加密、阻止公共访问并开启版本控制。生命周期策略:超过30天的对象转换为Standard-IA存储类,超过90天转换为Glacier即时检索存储类,超过7天的多部分上传自动过期。使用预签名URL处理客户上传/下载;绝不要分发凭证。多部分上传阈值≥100MB;分片大小8-16MB。
bash
aws s3 presign s3://app-prod-uploads/customer/42/invoice.pdf \
  --expires-in 900 --region af-south-1

aws configure set default.s3.multipart_threshold 100MB
aws configure set default.s3.multipart_chunksize 16MB
json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyInsecureTransport",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": ["arn:aws:s3:::app-prod-uploads", "arn:aws:s3:::app-prod-uploads/*"],
      "Condition": { "Bool": { "aws:SecureTransport": "false" } }
    }
  ]
}

Database

数据库

Multi-AZ for every production RDS MySQL/PostgreSQL; synchronous standby in a second AZ. Automated backups retention 7–35 days with PITR. Read replicas for read-heavy paths, never for durability. Parameter groups hold tunings; never edit defaults in place.
bash
aws rds create-db-parameter-group --db-parameter-group-name app-pg16-prod \
  --db-parameter-group-family postgres16 --description "Prod PG16 params"
aws rds create-db-instance --db-instance-identifier app-prod \
  --engine postgres --engine-version 16.4 --db-instance-class db.m6i.large \
  --allocated-storage 200 --storage-type gp3 --storage-encrypted \
  --multi-az --backup-retention-period 14 --db-parameter-group-name app-pg16-prod \
  --monitoring-interval 60 --enable-performance-insights
所有生产环境RDS MySQL/PostgreSQL均需启用多可用区;在第二个可用区配置同步备用实例。自动备份保留7-35天,并开启PITR(时点恢复)。读取副本仅用于读密集型场景,绝不用于数据持久化。参数组存储调优配置;禁止直接修改默认值。
bash
aws rds create-db-parameter-group --db-parameter-group-name app-pg16-prod \
  --db-parameter-group-family postgres16 --description "Prod PG16 params"
aws rds create-db-instance --db-instance-identifier app-prod \
  --engine postgres --engine-version 16.4 --db-instance-class db.m6i.large \
  --allocated-storage 200 --storage-type gp3 --storage-encrypted \
  --multi-az --backup-retention-period 14 --db-parameter-group-name app-pg16-prod \
  --monitoring-interval 60 --enable-performance-insights

Serverless

无服务器

Lambda triggers: S3 object-created, SQS queue, API Gateway, EventBridge schedule, DynamoDB Streams. Cold-start mitigation: provisioned concurrency for latency-sensitive paths; a 5-minute EventBridge keep-warm rule as a low-cost fallback. Keep deployment package ≤ 50 MB zipped; container images only when native deps demand it.
bash
aws lambda put-provisioned-concurrency-config \
  --function-name order-api --qualifier live \
  --provisioned-concurrent-executions 5
Lambda触发器:S3对象创建、SQS队列、API Gateway、EventBridge定时任务、DynamoDB流。冷启动缓解策略:对延迟敏感路径配置预置并发;低成本 fallback方案是使用5分钟间隔的EventBridge保活规则。部署包压缩后≤50MB;仅当存在原生依赖时使用容器镜像。
bash
aws lambda put-provisioned-concurrency-config \
  --function-name order-api --qualifier live \
  --provisioned-concurrent-executions 5

IAM

IAM

Roles, not users, for workloads — instance profiles on EC2, task roles on ECS. Policy statements scoped to specific ARNs and actions — no
*:*
. CI uses OIDC federation to assume role; no long-lived keys. MFA on every human account; root locked away with hardware MFA.
json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AppReadUploads",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": "arn:aws:s3:::app-prod-uploads/*"
    },
    {
      "Sid": "AppReadSecrets",
      "Effect": "Allow",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "arn:aws:secretsmanager:af-south-1:111122223333:secret:app/prod/*"
    }
  ]
}
工作负载使用角色而非用户——EC2使用实例配置文件,ECS使用任务角色。策略语句限定到特定ARN和操作——禁止使用
*:*
。CI使用OIDC联邦身份来扮演角色;禁止使用长期密钥。所有人工账户启用MFA;根账户通过硬件MFA锁定。
json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AppReadUploads",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": "arn:aws:s3:::app-prod-uploads/*"
    },
    {
      "Sid": "AppReadSecrets",
      "Effect": "Allow",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "arn:aws:secretsmanager:af-south-1:111122223333:secret:app/prod/*"
    }
  ]
}

Networking

网络

Design the VPC across ≥ 3 AZs for production, 2 for non-production. Allocate a /16 and carve /20 public and /20 private subnets per AZ. One NAT gateway per AZ in production — single-AZ NAT is a SPOF and cross-AZ data charges bite.
LayerCIDR exampleRouting
Public subnets10.20.0.0/20 per AZIGW default route
Private app subnets10.20.32.0/20 per AZNAT gateway in same AZ
Private data subnets10.20.64.0/20 per AZNo outbound route
Security groups are stateful instance-level allow-lists — the primary tool. NACLs are stateless subnet-level deny/allow lists — use only for coarse boundaries (blocking known-bad CIDRs). Reserve ≥ /18 headroom for peering or Transit Gateway.
bash
aws ec2 create-vpc --cidr-block 10.20.0.0/16 \
  --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=ug-prod-vpc}]'
aws ec2 create-nat-gateway --subnet-id subnet-pub-a --allocation-id eipalloc-aaa
生产环境VPC需覆盖≥3个可用区,非生产环境覆盖2个可用区。分配/16网段,每个可用区划分/20的公共子网和/20的私有子网。生产环境每个可用区配置一个NAT网关——单可用区NAT是单点故障,且跨可用区数据传输成本较高。
层级CIDR示例路由配置
公共子网每个可用区10.20.0.0/20默认路由指向IGW
私有应用子网每个可用区10.20.32.0/20指向同可用区的NAT网关
私有数据子网每个可用区10.20.64.0/20无出站路由
安全组是有状态的实例级允许列表——这是主要工具。NACL是无状态的子网级拒绝/允许列表——仅用于粗粒度边界(阻止已知恶意CIDR)。预留≥/18的网段空间用于对等连接或Transit Gateway。
bash
aws ec2 create-vpc --cidr-block 10.20.0.0/16 \
  --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=ug-prod-vpc}]'
aws ec2 create-nat-gateway --subnet-id subnet-pub-a --allocation-id eipalloc-aaa

Load Balancers

负载均衡器

FeatureALBNLB
Layer7 (HTTP/HTTPS/gRPC)4 (TCP/UDP/TLS)
RoutingHost, path, header, queryPort-based
TLS terminationAt ALBPassthrough or at NLB
Sticky sessionsCookie-basedSource-IP flow hash
Use caseWeb APIs, microservicesHigh-throughput TCP, static IPs, PrivateLink
Health checks hit a dedicated
/healthz
path on a dedicated port when feasible; verify dependencies shallowly — not deeply, or cascading failures evict healthy targets.
bash
aws elbv2 create-target-group --name app-tg-blue --protocol HTTP --port 3000 \
  --vpc-id vpc-0abc --health-check-path /healthz --health-check-interval-seconds 15 \
  --healthy-threshold-count 2 --unhealthy-threshold-count 3 --matcher HttpCode=200
aws elbv2 create-listener --load-balancer-arn $ALB_ARN --protocol HTTPS --port 443 \
  --certificates CertificateArn=$ACM_ARN \
  --ssl-policy ELBSecurityPolicy-TLS13-1-2-2021-06 \
  --default-actions Type=forward,TargetGroupArn=$TG_BLUE
特性ALBNLB
层级7层(HTTP/HTTPS/gRPC)4层(TCP/UDP/TLS)
路由方式基于主机、路径、头信息、查询参数基于端口
TLS终止在ALB层透传或在NLB层
会话保持基于Cookie基于源IP流哈希
适用场景Web API、微服务高吞吐量TCP、静态IP、PrivateLink
健康检查尽可能命中专用端口上的
/healthz
路径;浅度验证依赖项——不要深度验证,否则级联故障会驱逐健康目标。
bash
aws elbv2 create-target-group --name app-tg-blue --protocol HTTP --port 3000 \
  --vpc-id vpc-0abc --health-check-path /healthz --health-check-interval-seconds 15 \
  --healthy-threshold-count 2 --unhealthy-threshold-count 3 --matcher HttpCode=200
aws elbv2 create-listener --load-balancer-arn $ALB_ARN --protocol HTTPS --port 443 \
  --certificates CertificateArn=$ACM_ARN \
  --ssl-policy ELBSecurityPolicy-TLS13-1-2-2021-06 \
  --default-actions Type=forward,TargetGroupArn=$TG_BLUE

CDN

CDN

CloudFront or Cloudflare in front of every static asset and cacheable API response. Enable Origin Shield in a region close to the origin to cut origin fetches by 60–80%. Attach AWS WAF with the Managed Rules Core Rule Set plus Known Bad Inputs and IP-Reputation lists; add a rate-based rule at 2000 requests per 5 minutes per IP for unauthenticated endpoints.
bash
aws cloudfront create-distribution --distribution-config file://cf-dist.json
aws wafv2 create-web-acl --name app-prod-waf --scope CLOUDFRONT --default-action Allow={} \
  --visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=app-prod-waf \
  --rules file://waf-managed-rules.json
Invalidate surgically — never
invalidate /*
on every deploy; use versioned asset paths (
/static/v=<build-sha>/
) and cache-bust only HTML.
所有静态资源和可缓存API响应前需部署CloudFront或Cloudflare。在靠近源站的区域启用Origin Shield,可减少60-80%的源站请求。附加AWS WAF,启用托管规则核心规则集、已知恶意输入规则和IP信誉列表;对未认证端点添加速率限制规则,限制为每IP每5分钟2000次请求。
bash
aws cloudfront create-distribution --distribution-config file://cf-dist.json
aws wafv2 create-web-acl --name app-prod-waf --scope CLOUDFRONT --default-action Allow={} \
  --visibility-config SampledRequestsEnabled=true,CloudWatchMetricsEnabled=true,MetricName=app-prod-waf \
  --rules file://waf-managed-rules.json
精准执行失效操作——不要在每次部署时执行
invalidate /*
;使用带版本号的资源路径(如
/static/v=<build-sha>/
),仅对HTML进行缓存刷新。

SSL/TLS Automation

SSL/TLS自动化

  • AWS ALB, CloudFront, API Gateway → ACM certificates: free, auto-renewed, DNS-validated via Route 53.
  • VPS or single host → Certbot + Let's Encrypt with the installer's systemd timer; nightly cron only when systemd is unavailable.
  • Kubernetes →
    cert-manager
    with a
    ClusterIssuer
    for Let's Encrypt ACME HTTP-01 or DNS-01.
bash
aws acm request-certificate --domain-name app.example.co.ug \
  --subject-alternative-names "*.app.example.co.ug" \
  --validation-method DNS --key-algorithm RSA_2048
sudo certbot --nginx -d app.example.co.ug --deploy-hook "systemctl reload nginx"
kubectl apply -f cert-manager/letsencrypt-prod-issuer.yaml
TLS 1.2 minimum, prefer 1.3. Enable HSTS
max-age=31536000; includeSubDomains; preload
once the production cert path is stable.
  • AWS ALB、CloudFront、API Gateway → ACM证书:免费、自动续期、通过Route 53进行DNS验证。
  • VPS或单主机 → Certbot + Let's Encrypt,使用安装程序的systemd定时器;仅当systemd不可用时使用夜间cron任务。
  • Kubernetes →
    cert-manager
    搭配
    ClusterIssuer
    ,使用Let's Encrypt ACME HTTP-01或DNS-01验证方式。
bash
aws acm request-certificate --domain-name app.example.co.ug \
  --subject-alternative-names "*.app.example.co.ug" \
  --validation-method DNS --key-algorithm RSA_2048
sudo certbot --nginx -d app.example.co.ug --deploy-hook "systemctl reload nginx"
kubectl apply -f cert-manager/letsencrypt-prod-issuer.yaml
最低要求TLS 1.2,优先使用TLS 1.3。生产环境证书路径稳定后,启用HSTS
max-age=31536000; includeSubDomains; preload

Auto-Scaling

自动扩缩容

Target tracking first, step scaling second, predictive third. Scale on request count per target and P95 latency — not CPU alone.
bash
aws application-autoscaling put-scaling-policy --service-namespace ecs \
  --scalable-dimension ecs:service:DesiredCount --resource-id service/app-cluster/app-svc \
  --policy-name tt-reqcount --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 1000,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ALBRequestCountPerTarget",
      "ResourceLabel": "app/alb-arn/tg-arn"
    },
    "ScaleOutCooldown": 60, "ScaleInCooldown": 300
  }'
  • CPU target 70% for CPU-bound services; never below 40% (wastes capacity). Scheduled scaling for predictable load (EAT business hours 07:00–19:00). Predictive scaling requires ≥ 14 days of CloudWatch history and a regular daily/weekly pattern — otherwise predictions are noise. Warm pools for slow-booting AMIs (> 3 min boot).
优先使用目标追踪,其次是步进扩缩容,最后是预测性扩缩容。基于每个目标的请求数和P95延迟进行扩缩容——不要仅基于CPU。
bash
aws application-autoscaling put-scaling-policy --service-namespace ecs \
  --scalable-dimension ecs:service:DesiredCount --resource-id service/app-cluster/app-svc \
  --policy-name tt-reqcount --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 1000,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ALBRequestCountPerTarget",
      "ResourceLabel": "app/alb-arn/tg-arn"
    },
    "ScaleOutCooldown": 60, "ScaleInCooldown": 300
  }'
  • CPU密集型服务的CPU目标设为70%;绝不低于40%(会浪费容量)。针对可预测负载使用定时扩缩容(东非时间工作日07:00–19:00)。预测性扩缩容需要≥14天的CloudWatch历史数据和规律的每日/每周模式——否则预测结果无效。对于启动缓慢的AMI(启动时间>3分钟)使用预热池。

Zero-Downtime Deployments

零停机部署

Blue-green via ALB target-group swap for stateful-client apps; ASG instance refresh for stateless fleets. Canary for risky changes (pull weight to zero to rollback); shadow for unproven services receiving mirrored traffic. Automatic rollback triggers on health-check failure, 5xx-rate regression > 0.5% over 5 min, or P95 latency regression beyond SLO budget.
Blue-green procedure: register green with
app-tg-green
, wait for all targets
healthy
via
aws elbv2 describe-target-health
, then swap the listener:
bash
aws elbv2 modify-listener --listener-arn $LISTENER_ARN \
  --default-actions Type=forward,TargetGroupArn=$TG_GREEN
Hold blue for 30 minutes as a hot rollback target; deregister only after error-rate and latency SLOs hold. Rolling update via ASG instance refresh:
bash
aws autoscaling start-instance-refresh --auto-scaling-group-name app-prod-asg \
  --strategy Rolling --preferences '{
    "MinHealthyPercentage": 90, "InstanceWarmup": 180,
    "CheckpointPercentages": [25, 50, 100], "CheckpointDelay": 600
  }'
Rollback: re-point the listener to
app-tg-blue
(blue-green), or
aws autoscaling cancel-instance-refresh
and roll forward with the prior Launch Template version. Schema migrations must be backwards-compatible across two application versions (expand → migrate → contract). Every deploy writes a signed record: who, what, when, artifact digest.
针对有状态客户端应用,通过ALB目标组切换实现蓝绿部署;针对无状态集群使用ASG实例刷新。对于高风险变更使用金丝雀发布(将流量权重调至0即可回滚);对于未验证的服务使用影子发布(接收镜像流量)。当健康检查失败、5xx错误率较基准上升>0.5%(持续5分钟)或P95延迟超出SLO预算时,触发自动回滚。
蓝绿部署流程:将绿色环境注册到
app-tg-green
,等待所有目标通过
aws elbv2 describe-target-health
显示为
healthy
,然后切换监听器:
bash
aws elbv2 modify-listener --listener-arn $LISTENER_ARN \
  --default-actions Type=forward,TargetGroupArn=$TG_GREEN
保留蓝色环境30分钟作为热回滚目标;仅在错误率和延迟符合SLO后才注销。通过ASG实例刷新实现滚动更新:
bash
aws autoscaling start-instance-refresh --auto-scaling-group-name app-prod-asg \
  --strategy Rolling --preferences '{
    "MinHealthyPercentage": 90, "InstanceWarmup": 180,
    "CheckpointPercentages": [25, 50, 100], "CheckpointDelay": 600
  }'
回滚操作:将监听器重新指向
app-tg-blue
(蓝绿部署),或执行
aws autoscaling cancel-instance-refresh
并使用之前的Launch Template版本向前滚动。 Schema迁移必须在两个应用版本间保持向后兼容(扩展→迁移→收缩)。每次部署需写入签名记录:执行人、内容、时间、工件摘要。

Backup & Disaster Recovery

备份与灾难恢复

Define RTO (how fast to recover) and RPO (how much data loss is tolerable) before picking tools. Typical production SaaS targets RTO ≤ 4 h, RPO ≤ 15 min.
  • RDS: automated backups retention 7–35 days with PITR; weekly manual snapshots retained 90 days; cross-region snapshot copy to
    eu-west-1
    as a sovereignty-preserving DR site.
  • S3: versioning on every data bucket; lifecycle moves non-current versions to Glacier Deep Archive after 60 days; Cross-Region Replication for critical buckets.
  • EBS: daily snapshots via AWS Backup with a 30-day retention plan.
bash
aws rds copy-db-snapshot \
  --source-db-snapshot-identifier arn:aws:rds:af-south-1:111122223333:snapshot:app-prod-2026-04-15 \
  --target-db-snapshot-identifier app-prod-2026-04-15-dr \
  --kms-key-id alias/rds-dr --source-region af-south-1 --region eu-west-1
aws s3api put-bucket-versioning --bucket app-prod-uploads --versioning-configuration Status=Enabled
Rehearse restore quarterly — an untested backup is a hypothesis, not a backup.
选择工具前需定义RTO(恢复速度)和RPO(可容忍的数据丢失量)。典型生产SaaS的目标为RTO≤4小时,RPO≤15分钟。
  • RDS:自动备份保留7-35天并开启PITR;每周手动快照保留90天;跨区域快照复制到
    eu-west-1
    作为符合主权要求的灾难恢复站点。
  • S3:所有数据桶开启版本控制;生命周期策略将非当前版本在60天后迁移至Glacier深度归档;关键桶启用跨区域复制。
  • EBS:通过AWS Backup每日创建快照,保留30天。
bash
aws rds copy-db-snapshot \
  --source-db-snapshot-identifier arn:aws:rds:af-south-1:111122223333:snapshot:app-prod-2026-04-15 \
  --target-db-snapshot-identifier app-prod-2026-04-15-dr \
  --kms-key-id alias/rds-dr --source-region af-south-1 --region eu-west-1
aws s3api put-bucket-versioning --bucket app-prod-uploads --versioning-configuration Status=Enabled
每季度演练恢复流程——未测试的备份只是假设,而非有效备份。

Cost Optimisation

成本优化

  • Reserved Instances or Savings Plans for steady baseline (70–80% of average compute); on-demand for burst. Prefer Compute Savings Plans (1y no-upfront starting posture; 3y only when headcount and roadmap are certain) — they apply across EC2, Fargate, Lambda.
  • Spot for non-critical async workers and CI runners with a graceful shutdown handler for the 2-minute interruption notice.
  • S3 Intelligent-Tiering on buckets with unpredictable access; tag every resource with
    Environment
    ,
    Team
    ,
    CostCenter
    ,
    Project
    and activate these as cost-allocation tags in Billing.
  • Cost Explorer, Cost Anomaly Detection, and per-environment budgets on from day one.
bash
aws ce list-cost-allocation-tags --status Active --region us-east-1
aws budgets create-budget --account-id 111122223333 --budget '{
  "BudgetName": "ug-prod-monthly",
  "BudgetLimit": { "Amount": "5000", "Unit": "USD" },
  "TimeUnit": "MONTHLY", "BudgetType": "COST",
  "CostFilters": { "TagKeyValue": ["user:Environment$prod"] }
}'
  • 针对稳定基线计算(占平均计算量的70-80%)使用预留实例或Savings Plans;针对突发流量使用按需实例。优先选择Compute Savings Plans(1年期无预付金起步;仅当人员配置和路线图明确时选择3年期)——适用于EC2、Fargate、Lambda。
  • 针对非关键异步工作者和CI运行器使用竞价实例,并实现优雅关闭处理以应对2分钟中断通知。
  • 对访问模式不可预测的桶使用S3智能分层;为每个资源添加
    Environment
    Team
    CostCenter
    Project
    标签,并在账单中激活这些标签作为成本分配标签。
  • 从第一天起启用Cost Explorer、成本异常检测和按环境预算。
bash
aws ce list-cost-allocation-tags --status Active --region us-east-1
aws budgets create-budget --account-id 111122223333 --budget '{
  "BudgetName": "ug-prod-monthly",
  "BudgetLimit": { "Amount": "5000", "Unit": "USD" },
  "TimeUnit": "MONTHLY", "BudgetType": "COST",
  "CostFilters": { "TagKeyValue": ["user:Environment$prod"] }
}'

Multi-Region Considerations

多区域考量

  • Latency from East Africa:
    af-south-1
    ~ 30 ms;
    eu-west-1
    ~ 150 ms;
    us-east-1
    ~ 220 ms. Place user-facing tiers in
    af-south-1
    whenever available.
  • Data residency: Uganda DPPA 2019 requires personal data of Ugandan data subjects to be processed in a jurisdiction with adequate protection;
    af-south-1
    with KMS customer-managed keys is the low-friction default. Log the data-flow and cross-border transfer basis in
    _context/compliance.md
    .
  • Replication: active-passive (primary
    af-south-1
    , warm standby
    eu-west-1
    ) is the common starting posture; active-active only when conflict-resolution is designed in (DynamoDB Global Tables, Aurora Global Database with write forwarding). Route 53 health-checked failover records for DR, not client-side retry loops.
bash
aws route53 create-health-check --caller-reference "ug-app-$(date +%s)" --health-check-config file://hc.json
aws dynamodb update-table --table-name orders --replica-updates '[{"Create": {"RegionName": "eu-west-1"}}]'
  • 东非区域延迟:
    af-south-1
    ~30毫秒;
    eu-west-1
    ~150毫秒;
    us-east-1
    ~220毫秒。只要可用,用户面向层应部署在
    af-south-1
  • 数据驻留:乌干达DPPA 2019要求乌干达数据主体的个人数据在具备充分保护的司法管辖区处理;
    af-south-1
    搭配KMS客户管理密钥是低摩擦的默认选择。在
    _context/compliance.md
    中记录数据流和跨境传输依据。
  • 复制:主备模式(主
    af-south-1
    ,热备
    eu-west-1
    )是常见的初始配置;仅当设计了冲突解决机制时才使用多活模式(DynamoDB全局表、支持写入转发的Aurora全局数据库)。使用Route 53健康检查故障转移记录进行灾难恢复,而非客户端重试循环。
bash
aws route53 create-health-check --caller-reference "ug-app-$(date +%s)" --health-check-config file://hc.json
aws dynamodb update-table --table-name orders --replica-updates '[{"Create": {"RegionName": "eu-west-1"}}]'

Security Baseline

安全基线

Enable these on the management account and every member account on day one. All commands are idempotent — safe to re-run.
bash
aws cloudtrail create-trail --name org-trail --s3-bucket-name org-cloudtrail-logs \
  --is-multi-region-trail --is-organization-trail --enable-log-file-validation \
  --kms-key-id alias/cloudtrail
aws cloudtrail start-logging --name org-trail
aws s3api put-bucket-versioning --bucket org-cloudtrail-logs --versioning-configuration Status=Enabled
aws configservice start-configuration-recorder --configuration-recorder-name default
aws guardduty create-detector --enable --finding-publishing-frequency FIFTEEN_MINUTES
aws securityhub enable-security-hub --enable-default-standards
aws accessanalyzer create-analyzer --analyzer-name org-analyzer --type ORGANIZATION
  • CloudTrail: all regions, S3 bucket with versioning, log-file validation, KMS-encrypted.
  • AWS Config: enable the AWS Foundational Security Best Practices conformance pack.
  • GuardDuty: detector in every region with S3 and EKS protection on.
  • Security Hub: aggregate findings in a delegated admin account; resolve Critical/High within team SLO. IAM Access Analyzer: organization-level, reviewed weekly.
在管理账户和所有成员账户启用以下功能,从第一天开始执行。所有命令均为幂等——可安全重复执行。
bash
aws cloudtrail create-trail --name org-trail --s3-bucket-name org-cloudtrail-logs \
  --is-multi-region-trail --is-organization-trail --enable-log-file-validation \
  --kms-key-id alias/cloudtrail
aws cloudtrail start-logging --name org-trail
aws s3api put-bucket-versioning --bucket org-cloudtrail-logs --versioning-configuration Status=Enabled
aws configservice start-configuration-recorder --configuration-recorder-name default
aws guardduty create-detector --enable --finding-publishing-frequency FIFTEEN_MINUTES
aws securityhub enable-security-hub --enable-default-standards
aws accessanalyzer create-analyzer --analyzer-name org-analyzer --type ORGANIZATION
  • CloudTrail:覆盖所有区域,S3桶开启版本控制,日志文件验证,KMS加密。
  • AWS Config:启用AWS基础安全最佳实践一致性包。
  • GuardDuty:每个区域启用检测器,开启S3和EKS保护。
  • Security Hub:在委托管理账户聚合发现结果;按团队SLO解决严重/高危问题。IAM Access Analyzer:组织级,每周评审。

Review Checklist

评审检查清单

  • Workload classified; compute model justified in writing.
  • VPC spans ≥ 2 AZs; data stores Multi-AZ.
  • No credentials in images, committed files, or Git history; IAM uses roles + OIDC, not long-lived keys.
  • Deployment pattern chosen with rollback runbook validated; TLS, CDN, WAF posture documented.
  • Auto-scaling signal is request- or latency-driven, not CPU-only.
  • CloudTrail, Config, GuardDuty, Security Hub enabled across all regions; backups tested with a quarterly restore rehearsal (RTO/RPO documented); billing alerts active, Cost Explorer tags applied, Spot use paired with shutdown handling.
  • 已完成工作负载分类;计算模型有书面理由。
  • VPC覆盖≥2个可用区;数据存储启用多可用区。
  • 镜像、已提交文件或Git历史中无凭证;IAM使用角色+OIDC,而非长期密钥。
  • 已选择部署模式并验证回滚手册;TLS、CDN、WAF配置已记录。
  • 自动扩缩容信号基于请求数或延迟,而非仅基于CPU。
  • CloudTrail、Config、GuardDuty、Security Hub已在所有区域启用;备份已通过每季度恢复演练验证(RTO/RPO已记录);账单告警已激活,Cost Explorer标签已应用,竞价实例搭配关闭处理机制。

Platform Notes

平台说明

  • Claude Code:
    aws
    CLI and
    docker
    CLI are the primary surface. Configure profiles with
    aws configure sso
    ; use named profiles per environment.
  • Codex: treat every command as a patch candidate; keep commands in shell blocks so they stay portable.
  • Claude Code:
    aws
    CLI和
    docker
    CLI是主要操作界面。使用
    aws configure sso
    配置配置文件;按环境使用命名配置文件。
  • Codex:将每个命令视为补丁候选;将命令放在shell块中以保持可移植性。

References

参考资料

  • references/aws-core-services.md: EC2, S3, RDS, IAM, ALB, ASG, CloudFront CLI recipes.
  • references/docker-compose-patterns.md: Full local-parity stack template.
  • references/deployment-patterns.md: Blue-green and canary runbooks with rollback steps.
  • AWS Well-Architected Framework: aws.amazon.com/architecture/well-architected
  • Docker Deep Dive — Nigel Poulton (reading programme, Phase 01 priority 1).
  • references/aws-core-services.md:EC2、S3、RDS、IAM、ALB、ASG、CloudFront CLI示例。
  • references/docker-compose-patterns.md:完整本地一致栈模板。
  • references/deployment-patterns.md:蓝绿和金丝雀发布手册及回滚步骤。
  • AWS Well-Architected框架:aws.amazon.com/architecture/well-architected
  • Docker Deep Dive — Nigel Poulton(阅读计划,Phase 01优先级1)。