gcp-cloud-architect
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGCP Cloud Architect
GCP云架构师
Design scalable, cost-effective Google Cloud architectures for startups and enterprises with infrastructure-as-code templates.
借助基础设施即代码模板,为初创企业和大型企业设计可扩展、具成本效益的Google Cloud架构。
Workflow
工作流程
Step 1: Gather Requirements
步骤1:收集需求
Collect application specifications:
- Application type (web app, mobile backend, data pipeline, SaaS)
- Expected users and requests per second
- Budget constraints (monthly spend limit)
- Team size and GCP experience level
- Compliance requirements (GDPR, HIPAA, SOC 2)
- Availability requirements (SLA, RPO/RTO)收集应用规格信息:
- 应用类型(Web应用、移动后端、数据管道、SaaS)
- 预期用户量和每秒请求数
- 预算限制(月度支出上限)
- 团队规模与GCP经验水平
- 合规要求(GDPR、HIPAA、SOC 2)
- 可用性要求(SLA、RPO/RTO)Step 2: Design Architecture
步骤2:设计架构
Run the architecture designer to get pattern recommendations:
bash
python scripts/architecture_designer.py --input requirements.jsonExample output:
json
{
"recommended_pattern": "serverless_web",
"service_stack": ["Cloud Storage", "Cloud CDN", "Cloud Run", "Firestore", "Identity Platform"],
"estimated_monthly_cost_usd": 30,
"pros": ["Low ops overhead", "Pay-per-use", "Auto-scaling", "No cold starts on Cloud Run min instances"],
"cons": ["Vendor lock-in", "Regional limitations", "Eventual consistency with Firestore"]
}Select from recommended patterns:
- Serverless Web: Cloud Storage + Cloud CDN + Cloud Run + Firestore
- Microservices on GKE: GKE Autopilot + Cloud SQL + Memorystore + Cloud Pub/Sub
- Serverless Data Pipeline: Pub/Sub + Dataflow + BigQuery + Looker
- ML Platform: Vertex AI + Cloud Storage + BigQuery + Cloud Functions
See for detailed pattern specifications.
references/architecture_patterns.mdValidation checkpoint: Confirm the recommended pattern matches the team's operational maturity and compliance requirements before proceeding to Step 3.
运行架构设计工具获取模式推荐:
bash
python scripts/architecture_designer.py --input requirements.json示例输出:
json
{
"recommended_pattern": "serverless_web",
"service_stack": ["Cloud Storage", "Cloud CDN", "Cloud Run", "Firestore", "Identity Platform"],
"estimated_monthly_cost_usd": 30,
"pros": ["Low ops overhead", "Pay-per-use", "Auto-scaling", "No cold starts on Cloud Run min instances"],
"cons": ["Vendor lock-in", "Regional limitations", "Eventual consistency with Firestore"]
}从推荐模式中选择:
- 无服务器Web架构:Cloud Storage + Cloud CDN + Cloud Run + Firestore
- GKE微服务架构:GKE Autopilot + Cloud SQL + Memorystore + Cloud Pub/Sub
- 无服务器数据管道:Pub/Sub + Dataflow + BigQuery + Looker
- 机器学习平台:Vertex AI + Cloud Storage + BigQuery + Cloud Functions
查看获取详细的模式规格说明。
references/architecture_patterns.md验证检查点: 在进入步骤3之前,确认推荐模式符合团队的运营成熟度和合规要求。
Step 3: Estimate Cost
步骤3:成本估算
Analyze estimated costs and optimization opportunities:
bash
python scripts/cost_optimizer.py --resources current_setup.json --monthly-spend 2000Example output:
json
{
"current_monthly_usd": 2000,
"recommendations": [
{ "action": "Right-size Cloud SQL db-custom-4-16384 to db-custom-2-8192", "savings_usd": 380, "priority": "high" },
{ "action": "Purchase 1-yr committed use discount for GKE nodes", "savings_usd": 290, "priority": "high" },
{ "action": "Move Cloud Storage objects >90 days to Nearline", "savings_usd": 75, "priority": "medium" }
],
"total_potential_savings_usd": 745
}Output includes:
- Monthly cost breakdown by service
- Right-sizing recommendations
- Committed use discount opportunities
- Sustained use discount analysis
- Potential monthly savings
Use the GCP Pricing Calculator for detailed estimates.
分析预估成本及优化机会:
bash
python scripts/cost_optimizer.py --resources current_setup.json --monthly-spend 2000示例输出:
json
{
"current_monthly_usd": 2000,
"recommendations": [
{ "action": "Right-size Cloud SQL db-custom-4-16384 to db-custom-2-8192", "savings_usd": 380, "priority": "high" },
{ "action": "Purchase 1-yr committed use discount for GKE nodes", "savings_usd": 290, "priority": "high" },
{ "action": "Move Cloud Storage objects >90 days to Nearline", "savings_usd": 75, "priority": "medium" }
],
"total_potential_savings_usd": 745
}输出内容包括:
- 按服务划分的月度成本明细
- 资源规格调整建议
- 承诺使用折扣机会
- 持续使用折扣分析
- 潜在月度节省金额
使用GCP定价计算器获取详细估算值。
Step 4: Generate IaC
步骤4:生成基础设施即代码(IaC)
Create infrastructure-as-code for the selected pattern:
bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1Example Terraform HCL output (Cloud Run + Firestore):
hcl
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
provider "google" {
project = var.project_id
region = var.region
}
variable "project_id" {
description = "GCP project ID"
type = string
}
variable "region" {
description = "GCP region"
type = string
default = "us-central1"
}
resource "google_cloud_run_v2_service" "api" {
name = "${var.environment}-${var.app_name}-api"
location = var.region
template {
containers {
image = "gcr.io/${var.project_id}/${var.app_name}:latest"
resources {
limits = {
cpu = "1000m"
memory = "512Mi"
}
}
env {
name = "FIRESTORE_PROJECT"
value = var.project_id
}
}
scaling {
min_instance_count = 0
max_instance_count = 10
}
}
}
resource "google_firestore_database" "default" {
project = var.project_id
name = "(default)"
location_id = var.region
type = "FIRESTORE_NATIVE"
}Example gcloud CLI deployment:
bash
undefined为选定的模式创建基础设施即代码:
bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1示例Terraform HCL输出(Cloud Run + Firestore):
hcl
terraform {
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
}
provider "google" {
project = var.project_id
region = var.region
}
variable "project_id" {
description = "GCP project ID"
type = string
}
variable "region" {
description = "GCP region"
type = string
default = "us-central1"
}
resource "google_cloud_run_v2_service" "api" {
name = "${var.environment}-${var.app_name}-api"
location = var.region
template {
containers {
image = "gcr.io/${var.project_id}/${var.app_name}:latest"
resources {
limits = {
cpu = "1000m"
memory = "512Mi"
}
}
env {
name = "FIRESTORE_PROJECT"
value = var.project_id
}
}
scaling {
min_instance_count = 0
max_instance_count = 10
}
}
}
resource "google_firestore_database" "default" {
project = var.project_id
name = "(default)"
location_id = var.region
type = "FIRESTORE_NATIVE"
}示例gcloud CLI部署命令:
bash
undefinedDeploy Cloud Run service
Deploy Cloud Run service
gcloud run deploy my-app-api
--image gcr.io/$PROJECT_ID/my-app:latest
--region us-central1
--platform managed
--allow-unauthenticated
--memory 512Mi
--cpu 1
--min-instances 0
--max-instances 10
--image gcr.io/$PROJECT_ID/my-app:latest
--region us-central1
--platform managed
--allow-unauthenticated
--memory 512Mi
--cpu 1
--min-instances 0
--max-instances 10
gcloud run deploy my-app-api
--image gcr.io/$PROJECT_ID/my-app:latest
--region us-central1
--platform managed
--allow-unauthenticated
--memory 512Mi
--cpu 1
--min-instances 0
--max-instances 10
--image gcr.io/$PROJECT_ID/my-app:latest
--region us-central1
--platform managed
--allow-unauthenticated
--memory 512Mi
--cpu 1
--min-instances 0
--max-instances 10
Create Firestore database
Create Firestore database
gcloud firestore databases create --location=us-central1
> Full templates including Cloud CDN, Identity Platform, IAM, and Cloud Monitoring are generated by `deployment_manager.py` and also available in `references/architecture_patterns.md`.gcloud firestore databases create --location=us-central1
> 包含Cloud CDN、Identity Platform、IAM和Cloud Monitoring的完整模板由`deployment_manager.py`生成,同时也可在`references/architecture_patterns.md`中获取。Step 5: Configure CI/CD
步骤5:配置CI/CD
Set up automated deployment with Cloud Build or GitHub Actions:
yaml
undefined使用Cloud Build或GitHub Actions设置自动化部署:
yaml
undefinedcloudbuild.yaml
cloudbuild.yaml
steps:
-
name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA', '.']
-
name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA']
-
name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args:
- 'run'
- 'deploy'
- 'my-app-api'
- '--image=gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'
- '--region=us-central1'
- '--platform=managed'
images:
- 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'
```bashsteps:
-
name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA', '.']
-
name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA']
-
name: 'gcr.io/google.com/cloudsdktool/cloud-sdk' entrypoint: gcloud args:
- 'run'
- 'deploy'
- 'my-app-api'
- '--image=gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'
- '--region=us-central1'
- '--platform=managed'
images:
- 'gcr.io/$PROJECT_ID/my-app:$COMMIT_SHA'
```bashConnect repo and create trigger
Connect repo and create trigger
gcloud builds triggers create github
--repo-name=my-app
--repo-owner=my-org
--branch-pattern="^main$"
--build-config=cloudbuild.yaml
--repo-name=my-app
--repo-owner=my-org
--branch-pattern="^main$"
--build-config=cloudbuild.yaml
undefinedgcloud builds triggers create github
--repo-name=my-app
--repo-owner=my-org
--branch-pattern="^main$"
--build-config=cloudbuild.yaml
--repo-name=my-app
--repo-owner=my-org
--branch-pattern="^main$"
--build-config=cloudbuild.yaml
undefinedStep 6: Security Review
步骤6:安全审查
Verify security configuration:
bash
undefined验证安全配置:
bash
undefinedReview IAM bindings
Review IAM bindings
gcloud projects get-iam-policy $PROJECT_ID --format=json
gcloud projects get-iam-policy $PROJECT_ID --format=json
Check service account permissions
Check service account permissions
gcloud iam service-accounts list --project=$PROJECT_ID
gcloud iam service-accounts list --project=$PROJECT_ID
Verify VPC Service Controls (if applicable)
Verify VPC Service Controls (if applicable)
gcloud access-context-manager perimeters list --policy=$POLICY_ID
**Security checklist:**
- IAM roles follow least privilege (prefer predefined roles over basic roles)
- Service accounts use Workload Identity for GKE
- VPC Service Controls configured for sensitive APIs
- Cloud KMS encryption keys for customer-managed encryption
- Cloud Audit Logs enabled for all admin activity
- Organization policies restrict public access
- Secret Manager used for all credentials
**If deployment fails:**
1. Check the failure reason:
```bash
gcloud run services describe my-app-api --region us-central1
gcloud logging read "resource.type=cloud_run_revision" --limit=20- Review Cloud Logging for application errors.
- Fix the configuration or container image.
- Redeploy:
bash
gcloud run deploy my-app-api --image gcr.io/$PROJECT_ID/my-app:latest --region us-central1
Common failure causes:
- IAM permission errors -- verify service account roles and flag
--allow-unauthenticated - Quota exceeded -- request quota increase via IAM & Admin > Quotas
- Container startup failure -- check container logs and health check configuration
- Region not enabled -- enable the required APIs with
gcloud services enable
gcloud access-context-manager perimeters list --policy=$POLICY_ID
**安全检查清单:**
- IAM角色遵循最小权限原则(优先使用预定义角色而非基础角色)
- GKE服务账号使用工作负载身份(Workload Identity)
- 针对敏感API配置VPC服务控制
- 使用Cloud KMS加密密钥进行客户管理的加密
- 为所有管理员活动启用Cloud审计日志
- 组织策略限制公共访问
- 所有凭证均使用Secret Manager存储
**如果部署失败:**
1. 检查失败原因:
```bash
gcloud run services describe my-app-api --region us-central1
gcloud logging read "resource.type=cloud_run_revision" --limit=20- 查看Cloud Logging中的应用错误信息。
- 修复配置或容器镜像问题。
- 重新部署:
bash
gcloud run deploy my-app-api --image gcr.io/$PROJECT_ID/my-app:latest --region us-central1
常见失败原因:
- IAM权限错误——验证服务账号角色和标志
--allow-unauthenticated - 配额超限——通过IAM与管理员>配额申请配额提升
- 容器启动失败——检查容器日志和健康检查配置
- 区域未启用——使用启用所需API
gcloud services enable
Tools
工具
architecture_designer.py
architecture_designer.py
Recommends GCP services based on workload requirements.
bash
python scripts/architecture_designer.py --input requirements.json --output design.jsonInput: JSON with app type, scale, budget, compliance needs
Output: Recommended pattern, service stack, cost estimate, pros/cons
根据工作负载需求推荐GCP服务。
bash
python scripts/architecture_designer.py --input requirements.json --output design.json输入: 包含应用类型、规模、预算、合规需求的JSON文件
输出: 推荐模式、服务栈、成本估算、优缺点
cost_optimizer.py
cost_optimizer.py
Analyzes GCP resources for cost savings.
bash
python scripts/cost_optimizer.py --resources inventory.json --monthly-spend 5000Output: Recommendations for:
- Idle resource removal
- Machine type right-sizing
- Committed use discounts
- Storage class transitions
- Network egress optimization
分析GCP资源以实现成本节约。
bash
python scripts/cost_optimizer.py --resources inventory.json --monthly-spend 5000输出: 以下方面的优化建议:
- 闲置资源移除
- 机器规格调整
- 承诺使用折扣
- 存储类别转换
- 网络出口优化
deployment_manager.py
deployment_manager.py
Generates gcloud CLI deployment scripts and Terraform configurations.
bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1Output: Production-ready deployment scripts with:
- Cloud Run or GKE deployment
- Firestore or Cloud SQL setup
- Identity Platform configuration
- IAM roles with least privilege
- Cloud Monitoring and Logging
生成gcloud CLI部署脚本和Terraform配置。
bash
python scripts/deployment_manager.py --app-name my-app --pattern serverless_web --region us-central1输出: 生产就绪的部署脚本,包含:
- Cloud Run或GKE部署
- Firestore或Cloud SQL设置
- Identity Platform配置
- 遵循最小权限原则的IAM角色
- Cloud Monitoring和Logging
Quick Start
快速入门
Web App on Cloud Run (< $100/month)
Cloud Run上的Web应用(月度成本<100美元)
Ask: "Design a serverless web backend for a mobile app with 1000 users"
Result:
- Cloud Run for API (auto-scaling, no cold start with min instances)
- Firestore for data (pay-per-operation)
- Identity Platform for authentication
- Cloud Storage + Cloud CDN for static assets
- Estimated: $15-40/month请求:"为拥有1000用户的移动应用设计无服务器Web后端"
结果:
- Cloud Run用于API(自动扩缩容,通过最小实例避免冷启动)
- Firestore用于数据存储(按操作付费)
- Identity Platform用于身份验证
- Cloud Storage + Cloud CDN用于静态资源
- 预估成本:15-40美元/月Microservices on GKE ($500-2000/month)
GKE上的微服务(月度成本500-2000美元)
Ask: "Design a scalable architecture for a SaaS platform with 50k users"
Result:
- GKE Autopilot for containerized workloads
- Cloud SQL (PostgreSQL) with read replicas
- Memorystore (Redis) for session caching
- Cloud CDN for global delivery
- Cloud Build for CI/CD
- Multi-zone deployment请求:"为拥有5万用户的SaaS平台设计可扩展架构"
结果:
- GKE Autopilot用于容器化工作负载
- 带只读副本的Cloud SQL(PostgreSQL)
- Memorystore(Redis)用于会话缓存
- Cloud CDN用于全球分发
- Cloud Build用于CI/CD
- 多区域部署Serverless Data Pipeline
无服务器数据管道
Ask: "Design a real-time analytics pipeline for event data"
Result:
- Pub/Sub for event ingestion
- Dataflow (Apache Beam) for stream processing
- BigQuery for analytics and warehousing
- Looker for dashboards
- Cloud Functions for lightweight transforms请求:"为事件数据设计实时分析管道"
结果:
- Pub/Sub用于事件采集
- Dataflow(Apache Beam)用于流处理
- BigQuery用于分析和数据仓库
- Looker用于可视化仪表板
- Cloud Functions用于轻量级转换ML Platform
机器学习平台
Ask: "Design a machine learning platform for model training and serving"
Result:
- Vertex AI for training and prediction
- Cloud Storage for datasets and model artifacts
- BigQuery for feature store
- Cloud Functions for preprocessing triggers
- Cloud Monitoring for model drift detection请求:"为模型训练和部署设计机器学习平台"
结果:
- Vertex AI用于训练和预测
- Cloud Storage用于数据集和模型工件
- BigQuery用于特征存储
- Cloud Functions用于预处理触发
- Cloud Monitoring用于模型漂移检测Input Requirements
输入要求
Provide these details for architecture design:
| Requirement | Description | Example |
|---|---|---|
| Application type | What you're building | SaaS platform, mobile backend |
| Expected scale | Users, requests/sec | 10k users, 100 RPS |
| Budget | Monthly GCP limit | $500/month max |
| Team context | Size, GCP experience | 3 devs, intermediate |
| Compliance | Regulatory needs | HIPAA, GDPR, SOC 2 |
| Availability | Uptime requirements | 99.9% SLA, 1hr RPO |
JSON Format:
json
{
"application_type": "saas_platform",
"expected_users": 10000,
"requests_per_second": 100,
"budget_monthly_usd": 500,
"team_size": 3,
"gcp_experience": "intermediate",
"compliance": ["SOC2"],
"availability_sla": "99.9%"
}提供以下细节用于架构设计:
| 需求项 | 描述 | 示例 |
|---|---|---|
| 应用类型 | 你要构建的应用类型 | SaaS平台、移动后端 |
| 预期规模 | 用户量、每秒请求数 | 1万用户、100 RPS |
| 预算 | GCP月度支出上限 | 最高500美元/月 |
| 团队情况 | 规模、GCP经验水平 | 3名开发人员、中级经验 |
| 合规要求 | 监管需求 | HIPAA、GDPR、SOC 2 |
| 可用性 | 停机时间要求 | 99.9% SLA、1小时RPO |
JSON格式:
json
{
"application_type": "saas_platform",
"expected_users": 10000,
"requests_per_second": 100,
"budget_monthly_usd": 500,
"team_size": 3,
"gcp_experience": "intermediate",
"compliance": ["SOC2"],
"availability_sla": "99.9%"
}Output Formats
输出格式
Architecture Design
架构设计
- Pattern recommendation with rationale
- Service stack diagram (ASCII)
- Monthly cost estimate and trade-offs
- 带理由的模式推荐
- ASCII格式的服务栈图
- 月度成本估算与权衡分析
IaC Templates
IaC模板
- Terraform HCL: Production-ready Google provider configs
- gcloud CLI: Scripted deployment commands
- Cloud Build YAML: CI/CD pipeline definitions
- Terraform HCL:生产就绪的Google provider配置
- gcloud CLI:脚本化部署命令
- Cloud Build YAML:CI/CD管道定义
Cost Analysis
成本分析
- Current spend breakdown with optimization recommendations
- Priority action list (high/medium/low) and implementation checklist
- 当前支出明细及优化建议
- 优先级操作列表(高/中/低)和实施检查清单
Anti-Patterns
反模式
| Anti-Pattern | Why It Fails | Better Approach |
|---|---|---|
| Using default VPC for production | No isolation, shared firewall rules | Create custom VPC with private subnets |
| Over-provisioning GKE node pools | Wasted cost on idle capacity | Use GKE Autopilot or cluster autoscaler |
| Storing secrets in environment variables | Visible in Cloud Console, logs | Use Secret Manager with Workload Identity |
| Ignoring sustained use discounts | Missing 20-30% automatic savings | Right-size VMs for consistent baseline usage |
| Single-region deployment for SaaS | One region outage = full downtime | Multi-region with Cloud Load Balancing |
| BigQuery on-demand for heavy workloads | Unpredictable costs at scale | Use BigQuery slots (flat-rate) for consistent workloads |
| Running Cloud Functions for long tasks | 9-minute timeout, cold starts | Use Cloud Run for tasks > 60 seconds |
| 反模式 | 失败原因 | 更佳方案 |
|---|---|---|
| 生产环境使用默认VPC | 无隔离、共享防火墙规则 | 创建带私有子网的自定义VPC |
| 过度配置GKE节点池 | 闲置容量造成成本浪费 | 使用GKE Autopilot或集群自动扩缩容 |
| 环境变量中存储密钥 | 在Cloud控制台和日志中可见 | 使用Secret Manager搭配Workload Identity |
| 忽略持续使用折扣 | 错失20-30%的自动优惠 | 调整VM规格以匹配稳定基线使用量 |
| SaaS采用单区域部署 | 单区域故障导致全面停机 | 多区域部署搭配Cloud负载均衡 |
| 高负载工作负载使用BigQuery按需计费 | 规模增长后成本不可预测 | 针对稳定工作负载使用BigQuery插槽(固定费率) |
| 使用Cloud Functions处理长任务 | 9分钟超时限制、冷启动问题 | 处理超过60秒的任务使用Cloud Run |
Cross-References
交叉参考
| Skill | Relationship |
|---|---|
| AWS equivalent — same 6-step workflow, different services |
| Azure equivalent — completes the cloud trifecta |
| Broader DevOps scope — pipelines, monitoring, containerization |
| IaC implementation — use for Terraform modules targeting GCP |
| Pipeline construction — automates Cloud Build and deployment |
| 技能 | 关联关系 |
|---|---|
| AWS等效技能——相同的6步工作流程,不同服务 |
| Azure等效技能——覆盖三大云平台 |
| 更广泛的DevOps范围——管道、监控、容器化 |
| IaC实现——用于面向GCP的Terraform模块 |
| 管道构建——自动化Cloud Build和部署 |
Reference Documentation
参考文档
| Document | Contents |
|---|---|
| 6 patterns: serverless, GKE microservices, three-tier, data pipeline, ML platform, multi-region |
| Decision matrices for compute, database, storage, messaging |
| Naming, labels, IAM, networking, monitoring, disaster recovery |
| 文档 | 内容 |
|---|---|
| 6种模式:无服务器、GKE微服务、三层架构、数据管道、机器学习平台、多区域 |
| 计算、数据库、存储、消息服务的决策矩阵 |
| 命名规范、标签、IAM、网络、监控、灾难恢复 |