gcp-cloud-architect

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GCP Cloud Architect

GCP云架构师

Design scalable, cost-effective Google Cloud architectures for startups and enterprises with infrastructure-as-code templates.

借助基础设施即代码模板,为初创企业和大型企业设计可扩展、具成本效益的Google Cloud架构。

Workflow

工作流程

Step 1: Gather Requirements

步骤1:收集需求

Collect application specifications:
- Application type (web app, mobile backend, data pipeline, SaaS)
- Expected users and requests per second
- Budget constraints (monthly spend limit)
- Team size and GCP experience level
- Compliance requirements (GDPR, HIPAA, SOC 2)
- Availability requirements (SLA, RPO/RTO)
收集应用规格信息:
- 应用类型(Web应用、移动后端、数据管道、SaaS)
- 预期用户量和每秒请求数
- 预算限制(月度支出上限)
- 团队规模与GCP经验水平
- 合规要求(GDPR、HIPAA、SOC 2)
- 可用性要求(SLA、RPO/RTO)

Step 2: Design Architecture

步骤2:设计架构

Run the architecture designer to get pattern recommendations:
bash
python scripts/architecture_designer.py --input requirements.json
Example output:
json
{
  "recommended_pattern": "serverless_web",
  "service_stack": ["Cloud Storage", "Cloud CDN", "Cloud Run", "Firestore", "Identity Platform"],
  "estimated_monthly_cost_usd": 30,
  "pros": ["Low ops overhead", "Pay-per-use", "Auto-scaling", "No cold starts on Cloud Run min instances"],
  "cons": ["Vendor lock-in", "Regional limitations", "Eventual consistency with Firestore"]
}
Select from recommended patterns:
  • Serverless Web: Cloud Storage + Cloud CDN + Cloud Run + Firestore
  • Microservices on GKE: GKE Autopilot + Cloud SQL + Memorystore + Cloud Pub/Sub
  • Serverless Data Pipeline: Pub/Sub + Dataflow + BigQuery + Looker
  • ML Platform: Vertex AI + Cloud Storage + BigQuery + Cloud Functions
See
references/architecture_patterns.md
for detailed pattern specifications.
Validation checkpoint: Confirm the recommended pattern matches the team's operational maturity and compliance requirements before proceeding to Step 3.
运行架构设计工具获取模式推荐:
bash
python scripts/architecture_designer.py --input requirements.json
示例输出:
json
{
  "recommended_pattern": "serverless_web",
  "service_stack": ["Cloud Storage", "Cloud CDN", "Cloud Run", "Firestore", "Identity Platform"],
  "estimated_monthly_cost_usd": 30,
  "pros": ["Low ops overhead", "Pay-per-use", "Auto-scaling", "No cold starts on Cloud Run min instances"],
  "cons": ["Vendor lock-in", "Regional limitations", "Eventual consistency with Firestore"]
}
从推荐模式中选择:
  • 无服务器Web架构:Cloud Storage + Cloud CDN + Cloud Run + Firestore
  • GKE微服务架构:GKE Autopilot + Cloud SQL + Memorystore + Cloud Pub/Sub
  • 无服务器数据管道:Pub/Sub + Dataflow + BigQuery + Looker
  • 机器学习平台:Vertex AI + Cloud Storage + BigQuery + Cloud Functions
查看
references/architecture_patterns.md
获取详细的模式规格说明。
验证检查点: 在进入步骤3之前,确认推荐模式符合团队的运营成熟度和合规要求。

Step 3: Estimate Cost

步骤3:成本估算

Analyze estimated costs and optimization opportunities:
bash
python scripts/cost_optimizer.py --resources current_setup.json --monthly-spend 2000
Example output:
json
{
  "current_monthly_usd": 2000,
  "recommendations": [
    { "action": "Right-size Cloud SQL db-custom-4-16384 to db-custom-2-8192", "savings_usd": 380, "priority": "high" },
    { "action": "Purchase 1-yr committed use discount for GKE nodes", "savings_usd": 290, "priority": "high" },
    { "action": "Move Cloud Storage objects >90 days to Nearline", "savings_usd": 75, "priority": "medium" }
  ],
  "total_potential_savings_usd": 745
}
Output includes:
  • Monthly cost breakdown by service
  • Right-sizing recommendations
  • Committed use discount opportunities
  • Sustained use discount analysis
  • Potential monthly savings
Use the GCP Pricing Calculator for detailed estimates.
分析预估成本及优化机会:
bash
python scripts/cost_optimizer.py --resources current_setup.json --monthly-spend 2000
示例输出:
json
{
  "current_monthly_usd": 2000,
  "recommendations": [
    { "action": "Right-size Cloud SQL db-custom-4-16384 to db-custom-2-8192", "savings_usd": 380, "priority": "high" },
    { "action": "Purchase 1-yr committed use discount for GKE nodes", "savings_usd": 290, "priority": "high" },
    { "action": "Move Cloud Storage objects >90 days to Nearline", "savings_usd": 75, "priority": "medium" }
  ],
  "total_potential_savings_usd": 745
}
输出内容包括:
  • 按服务划分的月度成本明细
  • 资源规格调整建议
  • 承诺使用折扣机会
  • 持续使用折扣分析
  • 潜在月度节省金额
使用GCP定价计算器获取详细估算值。

Step 4: Generate IaC

步骤4:生成基础设施即代码(IaC)

Create infrastructure-as-code for the selected pattern:
bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1
Example Terraform HCL output (Cloud Run + Firestore):
hcl
terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

provider "google" {
  project = var.project_id
  region  = var.region
}

variable "project_id" {
  description = "GCP project ID"
  type        = string
}

variable "region" {
  description = "GCP region"
  type        = string
  default     = "us-central1"
}

resource "google_cloud_run_v2_service" "api" {
  name     = "${var.environment}-${var.app_name}-api"
  location = var.region

  template {
    containers {
      image = "gcr.io/${var.project_id}/${var.app_name}:latest"
      resources {
        limits = {
          cpu    = "1000m"
          memory = "512Mi"
        }
      }
      env {
        name  = "FIRESTORE_PROJECT"
        value = var.project_id
      }
    }
    scaling {
      min_instance_count = 0
      max_instance_count = 10
    }
  }
}

resource "google_firestore_database" "default" {
  project     = var.project_id
  name        = "(default)"
  location_id = var.region
  type        = "FIRESTORE_NATIVE"
}
Example gcloud CLI deployment:
bash
undefined
为选定的模式创建基础设施即代码:
bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1
示例Terraform HCL输出(Cloud Run + Firestore):
hcl
terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

provider "google" {
  project = var.project_id
  region  = var.region
}

variable "project_id" {
  description = "GCP project ID"
  type        = string
}

variable "region" {
  description = "GCP region"
  type        = string
  default     = "us-central1"
}

resource "google_cloud_run_v2_service" "api" {
  name     = "${var.environment}-${var.app_name}-api"
  location = var.region

  template {
    containers {
      image = "gcr.io/${var.project_id}/${var.app_name}:latest"
      resources {
        limits = {
          cpu    = "1000m"
          memory = "512Mi"
        }
      }
      env {
        name  = "FIRESTORE_PROJECT"
        value = var.project_id
      }
    }
    scaling {
      min_instance_count = 0
      max_instance_count = 10
    }
  }
}

resource "google_firestore_database" "default" {
  project     = var.project_id
  name        = "(default)"
  location_id = var.region
  type        = "FIRESTORE_NATIVE"
}
示例gcloud CLI部署命令:
bash
undefined

Deploy Cloud Run service

Deploy Cloud Run service

gcloud run deploy my-app-api
--image gcr.io/$PROJECT_ID/my-app:latest
--region us-central1
--platform managed
--allow-unauthenticated
--memory 512Mi
--cpu 1
--min-instances 0
--max-instances 10
gcloud run deploy my-app-api
--image gcr.io/$PROJECT_ID/my-app:latest
--region us-central1
--platform managed
--allow-unauthenticated
--memory 512Mi
--cpu 1
--min-instances 0
--max-instances 10

Create Firestore database

Create Firestore database

gcloud firestore databases create --location=us-central1

> Full templates including Cloud CDN, Identity Platform, IAM, and Cloud Monitoring are generated by `deployment_manager.py` and also available in `references/architecture_patterns.md`.
gcloud firestore databases create --location=us-central1

> 包含Cloud CDN、Identity Platform、IAM和Cloud Monitoring的完整模板由`deployment_manager.py`生成,同时也可在`references/architecture_patterns.md`中获取。

Step 5: Configure CI/CD

步骤5:配置CI/CD

Set up automated deployment with Cloud Build or GitHub Actions:
yaml
undefined
使用Cloud Build或GitHub Actions设置自动化部署:
yaml
undefined

cloudbuild.yaml

cloudbuild.yaml

steps:
  • name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA', '.']
  • name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA']
  • name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args:
    • 'run'
    • 'deploy'
    • 'my-app-api'
    • '--image=gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'
    • '--region=us-central1'
    • '--platform=managed'
images:
  • 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'

```bash
steps:
  • name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA', '.']
  • name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA']
  • name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args:
    • 'run'
    • 'deploy'
    • 'my-app-api'
    • '--image=gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'
    • '--region=us-central1'
    • '--platform=managed'
images:
  • 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'

```bash

Connect repo and create trigger

Connect repo and create trigger

gcloud builds triggers create github
--repo-name=my-app
--repo-owner=my-org
--branch-pattern="^main$"
--build-config=cloudbuild.yaml
undefined
gcloud builds triggers create github
--repo-name=my-app
--repo-owner=my-org
--branch-pattern="^main$"
--build-config=cloudbuild.yaml
undefined

Step 6: Security Review

步骤6:安全审查

Verify security configuration:
bash
undefined
验证安全配置:
bash
undefined

Review IAM bindings

Review IAM bindings

gcloud projects get-iam-policy $PROJECT_ID --format=json
gcloud projects get-iam-policy $PROJECT_ID --format=json

Check service account permissions

Check service account permissions

gcloud iam service-accounts list --project=$PROJECT_ID
gcloud iam service-accounts list --project=$PROJECT_ID

Verify VPC Service Controls (if applicable)

Verify VPC Service Controls (if applicable)

gcloud access-context-manager perimeters list --policy=$POLICY_ID

**Security checklist:**
- IAM roles follow least privilege (prefer predefined roles over basic roles)
- Service accounts use Workload Identity for GKE
- VPC Service Controls configured for sensitive APIs
- Cloud KMS encryption keys for customer-managed encryption
- Cloud Audit Logs enabled for all admin activity
- Organization policies restrict public access
- Secret Manager used for all credentials

**If deployment fails:**

1. Check the failure reason:
   ```bash
   gcloud run services describe my-app-api --region us-central1
   gcloud logging read "resource.type=cloud_run_revision" --limit=20
  1. Review Cloud Logging for application errors.
  2. Fix the configuration or container image.
  3. Redeploy:
    bash
    gcloud run deploy my-app-api --image gcr.io/$PROJECT_ID/my-app:latest --region us-central1
Common failure causes:
  • IAM permission errors -- verify service account roles and
    --allow-unauthenticated
    flag
  • Quota exceeded -- request quota increase via IAM & Admin > Quotas
  • Container startup failure -- check container logs and health check configuration
  • Region not enabled -- enable the required APIs with
    gcloud services enable

gcloud access-context-manager perimeters list --policy=$POLICY_ID

**安全检查清单:**
- IAM角色遵循最小权限原则(优先使用预定义角色而非基础角色)
- GKE服务账号使用工作负载身份(Workload Identity)
- 针对敏感API配置VPC服务控制
- 使用Cloud KMS加密密钥进行客户管理的加密
- 为所有管理员活动启用Cloud审计日志
- 组织策略限制公共访问
- 所有凭证均使用Secret Manager存储

**如果部署失败:**

1. 检查失败原因:
   ```bash
   gcloud run services describe my-app-api --region us-central1
   gcloud logging read "resource.type=cloud_run_revision" --limit=20
  1. 查看Cloud Logging中的应用错误信息。
  2. 修复配置或容器镜像问题。
  3. 重新部署:
    bash
    gcloud run deploy my-app-api --image gcr.io/$PROJECT_ID/my-app:latest --region us-central1
常见失败原因:
  • IAM权限错误——验证服务账号角色和
    --allow-unauthenticated
    标志
  • 配额超限——通过IAM与管理员>配额申请配额提升
  • 容器启动失败——检查容器日志和健康检查配置
  • 区域未启用——使用
    gcloud services enable
    启用所需API

Tools

工具

architecture_designer.py

architecture_designer.py

Recommends GCP services based on workload requirements.
bash
python scripts/architecture_designer.py --input requirements.json --output design.json
Input: JSON with app type, scale, budget, compliance needs Output: Recommended pattern, service stack, cost estimate, pros/cons
根据工作负载需求推荐GCP服务。
bash
python scripts/architecture_designer.py --input requirements.json --output design.json
输入: 包含应用类型、规模、预算、合规需求的JSON文件 输出: 推荐模式、服务栈、成本估算、优缺点

cost_optimizer.py

cost_optimizer.py

Analyzes GCP resources for cost savings.
bash
python scripts/cost_optimizer.py --resources inventory.json --monthly-spend 5000
Output: Recommendations for:
  • Idle resource removal
  • Machine type right-sizing
  • Committed use discounts
  • Storage class transitions
  • Network egress optimization
分析GCP资源以实现成本节约。
bash
python scripts/cost_optimizer.py --resources inventory.json --monthly-spend 5000
输出: 以下方面的优化建议:
  • 闲置资源移除
  • 机器规格调整
  • 承诺使用折扣
  • 存储类别转换
  • 网络出口优化

deployment_manager.py

deployment_manager.py

Generates gcloud CLI deployment scripts and Terraform configurations.
bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1
Output: Production-ready deployment scripts with:
  • Cloud Run or GKE deployment
  • Firestore or Cloud SQL setup
  • Identity Platform configuration
  • IAM roles with least privilege
  • Cloud Monitoring and Logging

生成gcloud CLI部署脚本和Terraform配置。
bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1
输出: 生产就绪的部署脚本,包含:
  • Cloud Run或GKE部署
  • Firestore或Cloud SQL设置
  • Identity Platform配置
  • 遵循最小权限原则的IAM角色
  • Cloud Monitoring和Logging

Quick Start

快速入门

Web App on Cloud Run (< $100/month)

Cloud Run上的Web应用(月度成本<100美元)

Ask: "Design a serverless web backend for a mobile app with 1000 users"

Result:
- Cloud Run for API (auto-scaling, no cold start with min instances)
- Firestore for data (pay-per-operation)
- Identity Platform for authentication
- Cloud Storage + Cloud CDN for static assets
- Estimated: $15-40/month
请求:"为拥有1000用户的移动应用设计无服务器Web后端"

结果:
- Cloud Run用于API(自动扩缩容,通过最小实例避免冷启动)
- Firestore用于数据存储(按操作付费)
- Identity Platform用于身份验证
- Cloud Storage + Cloud CDN用于静态资源
- 预估成本:15-40美元/月

Microservices on GKE ($500-2000/month)

GKE上的微服务(月度成本500-2000美元)

Ask: "Design a scalable architecture for a SaaS platform with 50k users"

Result:
- GKE Autopilot for containerized workloads
- Cloud SQL (PostgreSQL) with read replicas
- Memorystore (Redis) for session caching
- Cloud CDN for global delivery
- Cloud Build for CI/CD
- Multi-zone deployment
请求:"为拥有5万用户的SaaS平台设计可扩展架构"

结果:
- GKE Autopilot用于容器化工作负载
- 带只读副本的Cloud SQL(PostgreSQL)
- Memorystore(Redis)用于会话缓存
- Cloud CDN用于全球分发
- Cloud Build用于CI/CD
- 多区域部署

Serverless Data Pipeline

无服务器数据管道

Ask: "Design a real-time analytics pipeline for event data"

Result:
- Pub/Sub for event ingestion
- Dataflow (Apache Beam) for stream processing
- BigQuery for analytics and warehousing
- Looker for dashboards
- Cloud Functions for lightweight transforms
请求:"为事件数据设计实时分析管道"

结果:
- Pub/Sub用于事件采集
- Dataflow(Apache Beam)用于流处理
- BigQuery用于分析和数据仓库
- Looker用于可视化仪表板
- Cloud Functions用于轻量级转换

ML Platform

机器学习平台

Ask: "Design a machine learning platform for model training and serving"

Result:
- Vertex AI for training and prediction
- Cloud Storage for datasets and model artifacts
- BigQuery for feature store
- Cloud Functions for preprocessing triggers
- Cloud Monitoring for model drift detection

请求:"为模型训练和部署设计机器学习平台"

结果:
- Vertex AI用于训练和预测
- Cloud Storage用于数据集和模型工件
- BigQuery用于特征存储
- Cloud Functions用于预处理触发
- Cloud Monitoring用于模型漂移检测

Input Requirements

输入要求

Provide these details for architecture design:
RequirementDescriptionExample
Application typeWhat you're buildingSaaS platform, mobile backend
Expected scaleUsers, requests/sec10k users, 100 RPS
BudgetMonthly GCP limit$500/month max
Team contextSize, GCP experience3 devs, intermediate
ComplianceRegulatory needsHIPAA, GDPR, SOC 2
AvailabilityUptime requirements99.9% SLA, 1hr RPO
JSON Format:
json
{
  "application_type": "saas_platform",
  "expected_users": 10000,
  "requests_per_second": 100,
  "budget_monthly_usd": 500,
  "team_size": 3,
  "gcp_experience": "intermediate",
  "compliance": ["SOC2"],
  "availability_sla": "99.9%"
}

提供以下细节用于架构设计:
需求项描述示例
应用类型你要构建的应用类型SaaS平台、移动后端
预期规模用户量、每秒请求数1万用户、100 RPS
预算GCP月度支出上限最高500美元/月
团队情况规模、GCP经验水平3名开发人员、中级经验
合规要求监管需求HIPAA、GDPR、SOC 2
可用性停机时间要求99.9% SLA、1小时RPO
JSON格式:
json
{
  "application_type": "saas_platform",
  "expected_users": 10000,
  "requests_per_second": 100,
  "budget_monthly_usd": 500,
  "team_size": 3,
  "gcp_experience": "intermediate",
  "compliance": ["SOC2"],
  "availability_sla": "99.9%"
}

Output Formats

输出格式

Architecture Design

架构设计

  • Pattern recommendation with rationale
  • Service stack diagram (ASCII)
  • Monthly cost estimate and trade-offs
  • 带理由的模式推荐
  • ASCII格式的服务栈图
  • 月度成本估算与权衡分析

IaC Templates

IaC模板

  • Terraform HCL: Production-ready Google provider configs
  • gcloud CLI: Scripted deployment commands
  • Cloud Build YAML: CI/CD pipeline definitions
  • Terraform HCL:生产就绪的Google provider配置
  • gcloud CLI:脚本化部署命令
  • Cloud Build YAML:CI/CD管道定义

Cost Analysis

成本分析

  • Current spend breakdown with optimization recommendations
  • Priority action list (high/medium/low) and implementation checklist

  • 当前支出明细及优化建议
  • 优先级操作列表(高/中/低)和实施检查清单

Anti-Patterns

反模式

Anti-PatternWhy It FailsBetter Approach
Using default VPC for productionNo isolation, shared firewall rulesCreate custom VPC with private subnets
Over-provisioning GKE node poolsWasted cost on idle capacityUse GKE Autopilot or cluster autoscaler
Storing secrets in environment variablesVisible in Cloud Console, logsUse Secret Manager with Workload Identity
Ignoring sustained use discountsMissing 20-30% automatic savingsRight-size VMs for consistent baseline usage
Single-region deployment for SaaSOne region outage = full downtimeMulti-region with Cloud Load Balancing
BigQuery on-demand for heavy workloadsUnpredictable costs at scaleUse BigQuery slots (flat-rate) for consistent workloads
Running Cloud Functions for long tasks9-minute timeout, cold startsUse Cloud Run for tasks > 60 seconds

反模式失败原因更佳方案
生产环境使用默认VPC无隔离、共享防火墙规则创建带私有子网的自定义VPC
过度配置GKE节点池闲置容量造成成本浪费使用GKE Autopilot或集群自动扩缩容
环境变量中存储密钥在Cloud控制台和日志中可见使用Secret Manager搭配Workload Identity
忽略持续使用折扣错失20-30%的自动优惠调整VM规格以匹配稳定基线使用量
SaaS采用单区域部署单区域故障导致全面停机多区域部署搭配Cloud负载均衡
高负载工作负载使用BigQuery按需计费规模增长后成本不可预测针对稳定工作负载使用BigQuery插槽(固定费率)
使用Cloud Functions处理长任务9分钟超时限制、冷启动问题处理超过60秒的任务使用Cloud Run

Cross-References

交叉参考

SkillRelationship
engineering-team/aws-solution-architect
AWS equivalent — same 6-step workflow, different services
engineering-team/azure-cloud-architect
Azure equivalent — completes the cloud trifecta
engineering-team/senior-devops
Broader DevOps scope — pipelines, monitoring, containerization
engineering/terraform-patterns
IaC implementation — use for Terraform modules targeting GCP
engineering/ci-cd-pipeline-builder
Pipeline construction — automates Cloud Build and deployment

技能关联关系
engineering-team/aws-solution-architect
AWS等效技能——相同的6步工作流程,不同服务
engineering-team/azure-cloud-architect
Azure等效技能——覆盖三大云平台
engineering-team/senior-devops
更广泛的DevOps范围——管道、监控、容器化
engineering/terraform-patterns
IaC实现——用于面向GCP的Terraform模块
engineering/ci-cd-pipeline-builder
管道构建——自动化Cloud Build和部署

Reference Documentation

参考文档

DocumentContents
references/architecture_patterns.md
6 patterns: serverless, GKE microservices, three-tier, data pipeline, ML platform, multi-region
references/service_selection.md
Decision matrices for compute, database, storage, messaging
references/best_practices.md
Naming, labels, IAM, networking, monitoring, disaster recovery
文档内容
references/architecture_patterns.md
6种模式:无服务器、GKE微服务、三层架构、数据管道、机器学习平台、多区域
references/service_selection.md
计算、数据库、存储、消息服务的决策矩阵
references/best_practices.md
命名规范、标签、IAM、网络、监控、灾难恢复