Autoscaling Configuration

自动扩缩容配置

Overview

概述

Implement autoscaling strategies to automatically adjust resource capacity based on demand, ensuring cost efficiency while maintaining performance and availability.

实施自动扩缩容策略，根据需求自动调整资源容量，在保持性能和可用性的同时确保成本效益。

When to Use

适用场景

Traffic-driven workload scaling
Time-based scheduled scaling
Resource utilization optimization
Cost reduction
High-traffic event handling
Batch processing optimization
Database connection pooling

流量驱动的工作负载扩缩容
基于时间的调度式扩缩容
资源利用率优化
成本降低
高流量事件处理
批处理优化
数据库连接池管理

Implementation Examples

实现示例

1. Kubernetes Horizontal Pod Autoscaler

yaml

undefined

yaml

undefined

hpa-configuration.yaml

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: myapp-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 - type: Pods value: 2 periodSeconds: 60 selectPolicy: Min scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 4 periodSeconds: 15 selectPolicy: Max

Vertical Pod Autoscaler for resource optimization

apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: myapp-vpa namespace: production spec: targetRef: apiVersion: apps/v1 kind: Deployment name: myapp updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: myapp minAllowed: cpu: 50m memory: 64Mi maxAllowed: cpu: 1000m memory: 512Mi controlledResources: - cpu - memory

undefined

apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: myapp-vpa namespace: production spec: targetRef: apiVersion: apps/v1 kind: Deployment name: myapp updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: myapp minAllowed: cpu: 50m memory: 64Mi maxAllowed: cpu: 1000m memory: 512Mi controlledResources: - cpu - memory

undefined

2. AWS Auto Scaling

yaml

undefined

yaml

undefined

aws-autoscaling.yaml

apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-config namespace: production data: setup-asg.sh: | #!/bin/bash set -euo pipefail

ASG_NAME="myapp-asg"
MIN_SIZE=2
MAX_SIZE=10
DESIRED_CAPACITY=3
TARGET_CPU=70
TARGET_MEMORY=80

echo "Creating Auto Scaling Group..."

# Create launch template
aws ec2 create-launch-template \
  --launch-template-name myapp-template \
  --version-description "Production version" \
  --launch-template-data '{
    "ImageId": "ami-0c55b159cbfafe1f0",
    "InstanceType": "t3.medium",
    "KeyName": "myapp-key",
    "SecurityGroupIds": ["sg-0123456789abcdef0"],
    "UserData": "#!/bin/bash\ncd /app && docker-compose up -d",
    "TagSpecifications": [{
      "ResourceType": "instance",
      "Tags": [{"Key": "Name", "Value": "myapp-instance"}]
    }]
  }' || true

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name "$ASG_NAME" \
  --launch-template LaunchTemplateName=myapp-template \
  --min-size $MIN_SIZE \
  --max-size $MAX_SIZE \
  --desired-capacity $DESIRED_CAPACITY \
  --availability-zones us-east-1a us-east-1b us-east-1c \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \
  --health-check-type ELB \
  --health-check-grace-period 300 \
  --tags "Key=Name,Value=myapp,PropagateAtLaunch=true"

# Create CPU scaling policy
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name "$ASG_NAME" \
  --policy-name myapp-cpu-scaling \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "TargetValue": '$TARGET_CPU',
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 300
  }'

echo "Auto Scaling Group created: $ASG_NAME"

apiVersion: batch/v1 kind: CronJob metadata: name: scheduled-autoscaling namespace: production spec:

Scale up at 8 AM

schedule: "0 8 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity
--auto-scaling-group-name myapp-asg
--desired-capacity 10 restartPolicy: OnFailure

Scale down at 6 PM

schedule: "0 18 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity
--auto-scaling-group-name myapp-asg
--desired-capacity 3 restartPolicy: OnFailure

undefined

apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-config namespace: production data: setup-asg.sh: | #!/bin/bash set -euo pipefail

ASG_NAME="myapp-asg"
MIN_SIZE=2
MAX_SIZE=10
DESIRED_CAPACITY=3
TARGET_CPU=70
TARGET_MEMORY=80

echo "Creating Auto Scaling Group..."

# Create launch template
aws ec2 create-launch-template \
  --launch-template-name myapp-template \
  --version-description "Production version" \
  --launch-template-data '{
    "ImageId": "ami-0c55b159cbfafe1f0",
    "InstanceType": "t3.medium",
    "KeyName": "myapp-key",
    "SecurityGroupIds": ["sg-0123456789abcdef0"],
    "UserData": "#!/bin/bash\ncd /app && docker-compose up -d",
    "TagSpecifications": [{
      "ResourceType": "instance",
      "Tags": [{"Key": "Name", "Value": "myapp-instance"}]
    }]
  }' || true

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name "$ASG_NAME" \
  --launch-template LaunchTemplateName=myapp-template \
  --min-size $MIN_SIZE \
  --max-size $MAX_SIZE \
  --desired-capacity $DESIRED_CAPACITY \
  --availability-zones us-east-1a us-east-1b us-east-1c \
  --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \
  --health-check-type ELB \
  --health-check-grace-period 300 \
  --tags "Key=Name,Value=myapp,PropagateAtLaunch=true"

# Create CPU scaling policy
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name "$ASG_NAME" \
  --policy-name myapp-cpu-scaling \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "TargetValue": '$TARGET_CPU',
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 300
  }'

echo "Auto Scaling Group created: $ASG_NAME"

apiVersion: batch/v1 kind: CronJob metadata: name: scheduled-autoscaling namespace: production spec:

Scale up at 8 AM

schedule: "0 8 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity
--auto-scaling-group-name myapp-asg
--desired-capacity 10 restartPolicy: OnFailure

Scale down at 6 PM

schedule: "0 18 * * 1-5" jobTemplate: spec: template: spec: containers: - name: autoscale image: amazon/aws-cli:latest command: - sh - -c - | aws autoscaling set-desired-capacity
--auto-scaling-group-name myapp-asg
--desired-capacity 3 restartPolicy: OnFailure

undefined

3. Custom Metrics Autoscaling

yaml

undefined

yaml

undefined

custom-metrics-hpa.yaml

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: custom-metrics-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 1 maxReplicas: 50 metrics: # Queue depth from custom metrics - type: Pods pods: metric: name: job_queue_depth target: type: AverageValue averageValue: "100"

# Request rate from custom metrics
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "1000"

# Custom business metric
- type: Pods
  pods:
    metric:
      name: active_connections
    target:
      type: AverageValue
      averageValue: "500"

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: custom-metrics-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: myapp minReplicas: 1 maxReplicas: 50 metrics: # Queue depth from custom metrics - type: Pods pods: metric: name: job_queue_depth target: type: AverageValue averageValue: "100"

# Request rate from custom metrics
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "1000"

# Custom business metric
- type: Pods
  pods:
    metric:
      name: active_connections
    target:
      type: AverageValue
      averageValue: "500"

Prometheus ServiceMonitor for custom metrics

apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: myapp-metrics namespace: production spec: selector: matchLabels: app: myapp endpoints: - port: metrics interval: 30s path: /metrics

undefined

apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: myapp-metrics namespace: production spec: selector: matchLabels: app: myapp endpoints: - port: metrics interval: 30s path: /metrics

undefined

4. Autoscaling Script

bash

#!/bin/bash

bash

#!/bin/bash

autoscaling-setup.sh - Complete autoscaling configuration

set -euo pipefail

ENVIRONMENT="${1:-production}" DEPLOYMENT="${2:-myapp}"

echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT"

set -euo pipefail

ENVIRONMENT="${1:-production}" DEPLOYMENT="${2:-myapp}"

echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT"

Create HPA

cat <<EOF | kubectl apply -f - apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: ${DEPLOYMENT}-hpa namespace: ${ENVIRONMENT} spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ${DEPLOYMENT} minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 EOF

echo "HPA created successfully"

cat <<EOF | kubectl apply -f - apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: ${DEPLOYMENT}-hpa namespace: ${ENVIRONMENT} spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: ${DEPLOYMENT} minReplicas: 2 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 periodSeconds: 15 scaleUp: stabilizationWindowSeconds: 0 policies: - type: Percent value: 100 periodSeconds: 15 EOF

echo "HPA created successfully"

Monitor autoscaling

echo "Monitoring autoscaling events..." kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w

undefined

echo "Monitoring autoscaling events..." kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w

undefined

5. Monitoring Autoscaling

yaml

undefined

yaml

undefined

autoscaling-monitoring.yaml

apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-alerts namespace: monitoring data: alerts.yaml: | groups: - name: autoscaling rules: - alert: HpaMaxedOut expr: | kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas and kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas for: 10m labels: severity: warning annotations: summary: "HPA {{ $labels.hpa }} is at maximum replicas"

      - alert: HpaMinedOut
        expr: |
          kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
          and
          kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas
        for: 30m
        labels:
          severity: info
        annotations:
          summary: "HPA {{ $labels.hpa }} is at minimum replicas"

      - alert: AsgCapacityLow
        expr: |
          aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"

undefined

apiVersion: v1 kind: ConfigMap metadata: name: autoscaling-alerts namespace: monitoring data: alerts.yaml: | groups: - name: autoscaling rules: - alert: HpaMaxedOut expr: | kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas and kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas for: 10m labels: severity: warning annotations: summary: "HPA {{ $labels.hpa }} is at maximum replicas"

      - alert: HpaMinedOut
        expr: |
          kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
          and
          kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas
        for: 30m
        labels:
          severity: info
        annotations:
          summary: "HPA {{ $labels.hpa }} is at minimum replicas"

      - alert: AsgCapacityLow
        expr: |
          aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"

undefined

Best Practices

最佳实践

✅ DO

✅ 建议

Set appropriate min/max replicas
Monitor metric aggregation window
Implement cooldown periods
Use multiple metrics
Test scaling behavior
Monitor scaling events
Plan for peak loads
Implement fallback strategies

设置合适的最小/最大副本数
监控指标聚合窗口
实施冷却周期
使用多种指标
测试扩缩容行为
监控扩缩容事件
规划峰值负载
实施回退策略

❌ DON'T

❌ 不建议

Set min replicas to 1
Scale too aggressively
Ignore cooldown periods
Use single metric only
Forget to test scaling
Scale below resource needs
Neglect monitoring
Deploy without capacity tests

将最小副本数设置为1
过于激进地扩缩容
忽略冷却周期
仅使用单一指标
忘记测试扩缩容
扩缩容低于资源需求
忽视监控
未进行容量测试就部署

Scaling Metrics

扩缩容指标

CPU Utilization: Most common metric
Memory Utilization: Heap-bound applications
Request Rate: API-driven scaling
Queue Depth: Async job processing
Custom Metrics: Business-specific indicators

CPU利用率: 最常用的指标
内存利用率: 堆内存受限的应用
请求速率: API驱动的扩缩容
队列深度: 异步作业处理
自定义指标: 业务特定的指示器

autoscaling-configuration

Original

Translation

Autoscaling Configuration

自动扩缩容配置

Overview

概述

When to Use

适用场景

Implementation Examples

实现示例

1. Kubernetes Horizontal Pod Autoscaler

1. Kubernetes Horizontal Pod Autoscaler

hpa-configuration.yaml

hpa-configuration.yaml

Vertical Pod Autoscaler for resource optimization

Vertical Pod Autoscaler for resource optimization

2. AWS Auto Scaling

2. AWS Auto Scaling

aws-autoscaling.yaml

aws-autoscaling.yaml

Scale up at 8 AM

Scale down at 6 PM

Scale up at 8 AM

Scale down at 6 PM

3. Custom Metrics Autoscaling

3. Custom Metrics Autoscaling

custom-metrics-hpa.yaml

custom-metrics-hpa.yaml

Prometheus ServiceMonitor for custom metrics

Prometheus ServiceMonitor for custom metrics

4. Autoscaling Script

4. Autoscaling Script

autoscaling-setup.sh - Complete autoscaling configuration

autoscaling-setup.sh - Complete autoscaling configuration

Create HPA

Create HPA

Monitor autoscaling

Monitor autoscaling

5. Monitoring Autoscaling

5. Monitoring Autoscaling

autoscaling-monitoring.yaml

autoscaling-monitoring.yaml

Best Practices

最佳实践

✅ DO

✅ 建议

❌ DON'T

❌ 不建议

Scaling Metrics

扩缩容指标

Resources

参考资源