autoscaling-configuration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Autoscaling Configuration

自动扩缩容配置

Overview

概述

Implement autoscaling strategies to automatically adjust resource capacity based on demand, ensuring cost efficiency while maintaining performance and availability.
实施自动扩缩容策略,根据需求自动调整资源容量,在保持性能和可用性的同时确保成本效益。

When to Use

适用场景

  • Traffic-driven workload scaling
  • Time-based scheduled scaling
  • Resource utilization optimization
  • Cost reduction
  • High-traffic event handling
  • Batch processing optimization
  • Database connection pooling
  • 流量驱动的工作负载扩缩容
  • 基于时间的调度式扩缩容
  • 资源利用率优化
  • 成本降低
  • 高流量事件处理
  • 批处理优化
  • 数据库连接池管理

Implementation Examples

实现示例

1. Kubernetes Horizontal Pod Autoscaler

1. Kubernetes Horizontal Pod Autoscaler

yaml
undefined
yaml
undefined

hpa-configuration.yaml

hpa-configuration.yaml

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 - type: Pods value: 2 periodSeconds: 60 selectPolicy: Min scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 - type: Pods value: 2 periodSeconds: 60 selectPolicy: Min scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max

Vertical Pod Autoscaler for resource optimization

Vertical Pod Autoscaler for resource optimization

apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: myapp-vpa namespace: production spec: targetRef: apiVersion: apps/v1 kind: Deployment name: myapp updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: myapp minAllowed: cpu: 50m memory: 64Mi maxAllowed: cpu: 1000m memory: 512Mi controlledResources: - cpu - memory
undefined
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: myapp-vpa namespace: production spec: targetRef: apiVersion: apps/v1 kind: Deployment name: myapp updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: myapp minAllowed: cpu: 50m memory: 64Mi maxAllowed: cpu: 1000m memory: 512Mi controlledResources: - cpu - memory
undefined

2. AWS Auto Scaling

2. AWS Auto Scaling

yaml
undefined
yaml
undefined

aws-autoscaling.yaml

aws-autoscaling.yaml

apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-config namespace: production data: setup-asg.sh: | #!/bin/bash set -euo pipefail
ASG_NAME="myapp-asg"
MIN_SIZE=2
MAX_SIZE=10
DESIRED_CAPACITY=3
TARGET_CPU=70
TARGET_MEMORY=80

echo "Creating Auto Scaling Group..."

# Create launch template
aws ec2 create-launch-template \
  --launch-template-name myapp-template \
  --version-description "Production version" \
  --launch-template-data '{
    "ImageId": "ami-0c55b159cbfafe1f0",
    "InstanceType": "t3.medium",
    "KeyName": "myapp-key",
    "SecurityGroupIds": ["sg-0123456789abcdef0"],
    "UserData": "#!/bin/bash\ncd /app && docker-compose up -d",
    "TagSpecifications": [{
      "ResourceType": "instance",
      "Tags": [{"Key": "Name", "Value": "myapp-instance"}]
    }]
  }' || true

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name "$ASG_NAME" \
  --launch-template LaunchTemplateName=myapp-template \
  --min-size $MIN_SIZE \
  --max-size $MAX_SIZE \
  --desired-capacity $DESIRED_CAPACITY \
  --availability-zones us-east-1a us-east-1b us-east-1c \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \
  --health-check-type ELB \
  --health-check-grace-period 300 \
  --tags "Key=Name,Value=myapp,PropagateAtLaunch=true"

# Create CPU scaling policy
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name "$ASG_NAME" \
  --policy-name myapp-cpu-scaling \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "TargetValue": '$TARGET_CPU',
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 300
  }'

echo "Auto Scaling Group created: $ASG_NAME"

apiVersion: batch/v1 kind: CronJob metadata: name: scheduled-autoscaling namespace: production spec:

Scale up at 8 AM

  • schedule: "0 8 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity
    --auto-scaling-group-name myapp-asg
    --desired-capacity 10 restartPolicy: OnFailure

Scale down at 6 PM

  • schedule: "0 18 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity
    --auto-scaling-group-name myapp-asg
    --desired-capacity 3 restartPolicy: OnFailure
undefined
apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-config namespace: production data: setup-asg.sh: | #!/bin/bash set -euo pipefail
ASG_NAME="myapp-asg"
MIN_SIZE=2
MAX_SIZE=10
DESIRED_CAPACITY=3
TARGET_CPU=70
TARGET_MEMORY=80

echo "Creating Auto Scaling Group..."

# Create launch template
aws ec2 create-launch-template \
  --launch-template-name myapp-template \
  --version-description "Production version" \
  --launch-template-data '{
    "ImageId": "ami-0c55b159cbfafe1f0",
    "InstanceType": "t3.medium",
    "KeyName": "myapp-key",
    "SecurityGroupIds": ["sg-0123456789abcdef0"],
    "UserData": "#!/bin/bash\ncd /app && docker-compose up -d",
    "TagSpecifications": [{
      "ResourceType": "instance",
      "Tags": [{"Key": "Name", "Value": "myapp-instance"}]
    }]
  }' || true

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name "$ASG_NAME" \
  --launch-template LaunchTemplateName=myapp-template \
  --min-size $MIN_SIZE \
  --max-size $MAX_SIZE \
  --desired-capacity $DESIRED_CAPACITY \
  --availability-zones us-east-1a us-east-1b us-east-1c \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \
  --health-check-type ELB \
  --health-check-grace-period 300 \
  --tags "Key=Name,Value=myapp,PropagateAtLaunch=true"

# Create CPU scaling policy
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name "$ASG_NAME" \
  --policy-name myapp-cpu-scaling \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "TargetValue": '$TARGET_CPU',
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 300
  }'

echo "Auto Scaling Group created: $ASG_NAME"

apiVersion: batch/v1 kind: CronJob metadata: name: scheduled-autoscaling namespace: production spec:

Scale up at 8 AM

  • schedule: "0 8 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity
    --auto-scaling-group-name myapp-asg
    --desired-capacity 10 restartPolicy: OnFailure

Scale down at 6 PM

  • schedule: "0 18 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity
    --auto-scaling-group-name myapp-asg
    --desired-capacity 3 restartPolicy: OnFailure
undefined

3. Custom Metrics Autoscaling

3. Custom Metrics Autoscaling

yaml
undefined
yaml
undefined

custom-metrics-hpa.yaml

custom-metrics-hpa.yaml

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: custom-metrics-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 1 maxReplicas: 50 metrics: # Queue depth from custom metrics - type: Pods pods: metric: name: job_queue_depth target: type: AverageValue averageValue: "100"
# Request rate from custom metrics
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "1000"

# Custom business metric
- type: Pods
  pods:
    metric:
      name: active_connections
    target:
      type: AverageValue
      averageValue: "500"

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: custom-metrics-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 1 maxReplicas: 50 metrics: # Queue depth from custom metrics - type: Pods pods: metric: name: job_queue_depth target: type: AverageValue averageValue: "100"
# Request rate from custom metrics
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "1000"

# Custom business metric
- type: Pods
  pods:
    metric:
      name: active_connections
    target:
      type: AverageValue
      averageValue: "500"

Prometheus ServiceMonitor for custom metrics

Prometheus ServiceMonitor for custom metrics

apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: myapp-metrics namespace: production spec: selector: matchLabels: app: myapp endpoints: - port: metrics interval: 30s path: /metrics
undefined
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: myapp-metrics namespace: production spec: selector: matchLabels: app: myapp endpoints: - port: metrics interval: 30s path: /metrics
undefined

4. Autoscaling Script

4. Autoscaling Script

bash
#!/bin/bash
bash
#!/bin/bash

autoscaling-setup.sh - Complete autoscaling configuration

autoscaling-setup.sh - Complete autoscaling configuration

set -euo pipefail
ENVIRONMENT="${1:-production}" DEPLOYMENT="${2:-myapp}"
echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT"
set -euo pipefail
ENVIRONMENT="${1:-production}" DEPLOYMENT="${2:-myapp}"
echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT"

Create HPA

Create HPA

cat <<EOF | kubectl apply -f - apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: ${DEPLOYMENT}-hpa namespace: ${ENVIRONMENT} spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ${DEPLOYMENT} minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 EOF
echo "HPA created successfully"
cat <<EOF | kubectl apply -f - apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: ${DEPLOYMENT}-hpa namespace: ${ENVIRONMENT} spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ${DEPLOYMENT} minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 EOF
echo "HPA created successfully"

Monitor autoscaling

Monitor autoscaling

echo "Monitoring autoscaling events..." kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w
undefined
echo "Monitoring autoscaling events..." kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w
undefined

5. Monitoring Autoscaling

5. Monitoring Autoscaling

yaml
undefined
yaml
undefined

autoscaling-monitoring.yaml

autoscaling-monitoring.yaml

apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-alerts namespace: monitoring data: alerts.yaml: | groups: - name: autoscaling rules: - alert: HpaMaxedOut expr: | kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas and kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas for: 10m labels: severity: warning annotations: summary: "HPA {{ $labels.hpa }} is at maximum replicas"
      - alert: HpaMinedOut
        expr: |
          kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
          and
          kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas
        for: 30m
        labels:
          severity: info
        annotations:
          summary: "HPA {{ $labels.hpa }} is at minimum replicas"

      - alert: AsgCapacityLow
        expr: |
          aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"
undefined
apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-alerts namespace: monitoring data: alerts.yaml: | groups: - name: autoscaling rules: - alert: HpaMaxedOut expr: | kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas and kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas for: 10m labels: severity: warning annotations: summary: "HPA {{ $labels.hpa }} is at maximum replicas"
      - alert: HpaMinedOut
        expr: |
          kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
          and
          kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas
        for: 30m
        labels:
          severity: info
        annotations:
          summary: "HPA {{ $labels.hpa }} is at minimum replicas"

      - alert: AsgCapacityLow
        expr: |
          aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"
undefined

Best Practices

最佳实践

✅ DO

✅ 建议

  • Set appropriate min/max replicas
  • Monitor metric aggregation window
  • Implement cooldown periods
  • Use multiple metrics
  • Test scaling behavior
  • Monitor scaling events
  • Plan for peak loads
  • Implement fallback strategies
  • 设置合适的最小/最大副本数
  • 监控指标聚合窗口
  • 实施冷却周期
  • 使用多种指标
  • 测试扩缩容行为
  • 监控扩缩容事件
  • 规划峰值负载
  • 实施回退策略

❌ DON'T

❌ 不建议

  • Set min replicas to 1
  • Scale too aggressively
  • Ignore cooldown periods
  • Use single metric only
  • Forget to test scaling
  • Scale below resource needs
  • Neglect monitoring
  • Deploy without capacity tests
  • 将最小副本数设置为1
  • 过于激进地扩缩容
  • 忽略冷却周期
  • 仅使用单一指标
  • 忘记测试扩缩容
  • 扩缩容低于资源需求
  • 忽视监控
  • 未进行容量测试就部署

Scaling Metrics

扩缩容指标

  • CPU Utilization: Most common metric
  • Memory Utilization: Heap-bound applications
  • Request Rate: API-driven scaling
  • Queue Depth: Async job processing
  • Custom Metrics: Business-specific indicators
  • CPU利用率: 最常用的指标
  • 内存利用率: 堆内存受限的应用
  • 请求速率: API驱动的扩缩容
  • 队列深度: 异步作业处理
  • 自定义指标: 业务特定的指示器

Resources

参考资源