devops-iac-engineer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDevOps IaC Engineer
DevOps IaC工程师
This skill provides expertise in designing and managing cloud infrastructure using Infrastructure as Code (IaC) and DevOps/SRE best practices.
本技能提供基于基础设施即代码(IaC)和DevOps/SRE最佳实践的云基础设施设计与管理专业知识。
When to Use
适用场景
- Designing cloud architecture (AWS, GCP, Azure)
- Implementing or refactoring CI/CD pipelines
- Setting up observability (logging, metrics, tracing)
- Creating Kubernetes clusters and container orchestration strategies
- Implementing security controls and compliance checks
- Improving system reliability (SLO/SLA, Disaster Recovery)
- 设计云架构(AWS、GCP、Azure)
- 实施或重构CI/CD流水线
- 搭建可观测性系统(日志、指标、追踪)
- 创建Kubernetes集群及容器编排策略
- 实施安全控制与合规检查
- 提升系统可靠性(SLO/SLA、灾难恢复)
Infrastructure as Code (IaC) Principles
基础设施即代码(IaC)原则
- Declarative Code: Use Terraform/OpenTofu to define the desired state.
- GitOps: Code repository is the single source of truth. Changes are applied via PRs and automated pipelines.
- Immutable Infrastructure: Replace servers/containers rather than patching them in place.
- 声明式代码:使用Terraform/OpenTofu定义期望状态。
- GitOps:代码仓库为唯一可信源,通过PR和自动化流水线应用变更。
- 不可变基础设施:替换服务器/容器而非原地修补。
Core Domains
核心领域
1. Terraform & IaC
1. Terraform & IaC
- Use modules for reusability.
- Separate state by environment (dev, stage, prod) and region.
- Automate and
planin CI/CD.apply
- 使用模块提升复用性。
- 按环境(开发、预发布、生产)和区域分离状态。
- 在CI/CD中自动化和
plan操作。apply
2. Kubernetes & Containers
2. Kubernetes & 容器
- Build small, stateless containers.
- Use Helm or Kustomize for resource management.
- Implement resource limits and requests.
- Use namespaces for isolation.
- 构建轻量、无状态容器。
- 使用Helm或Kustomize进行资源管理。
- 实施资源限制与请求配置。
- 使用命名空间实现隔离。
3. CI/CD Pipelines
3. CI/CD流水线
- CI: Lint, test, build, and scan (security) on every commit.
- CD: Automated deployment to lower environments; manual approval for production.
- Use tools like GitHub Actions, Cloud Build, or ArgoCD.
- CI:每次提交时执行代码检查、测试、构建和安全扫描。
- CD:自动部署到低环境;生产环境需手动审批。
- 使用GitHub Actions、Cloud Build或ArgoCD等工具。
4. Observability
4. 可观测性
- Logs: Centralized logging (e.g., Cloud Logging, ELK).
- Metrics: Prometheus/Grafana or Cloud Monitoring.
- Tracing: OpenTelemetry for distributed tracing.
- 日志:集中式日志系统(如Cloud Logging、ELK)。
- 指标:Prometheus/Grafana或Cloud Monitoring。
- 追踪:使用OpenTelemetry进行分布式追踪。
5. Security (DevSecOps)
5. 安全(DevSecOps)
- Scan IaC for misconfigurations (e.g., Checkov, Trivy).
- Manage secrets utilizing Secret Manager or Vault (never in code).
- Least privilege IAM roles.
- 扫描IaC配置错误(如Checkov、Trivy)。
- 使用Secret Manager或Vault管理密钥(绝不要嵌入代码)。
- 遵循最小权限IAM角色原则。
SRE Practices
SRE实践
- SLI/SLO: Define Service Level Indicators and Objectives for critical user journeys.
- Error Budgets: Use error budgets to balance innovation and reliability.
- Post-Mortems: Conduct blameless post-mortems for incidents.
- SLI/SLO:为关键用户旅程定义服务水平指标和目标。
- 错误预算:利用错误平衡创新与可靠性。
- 事后复盘:针对事件开展无责复盘。