infrastructure-cost-optimization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Infrastructure Cost Optimization

云基础设施成本优化

Overview

概述

Reduce infrastructure costs through intelligent resource allocation, reserved instances, spot instances, and continuous optimization without sacrificing performance.

通过智能资源分配、预留实例、竞价实例以及持续优化，在不牺牲性能的前提下降低基础设施成本。

When to Use

适用场景

Cloud cost reduction
Budget management and tracking
Resource utilization optimization
Multi-environment cost allocation
Waste identification and elimination
Reserved instance planning
Spot instance integration

云成本削减
预算管理与跟踪
资源利用率优化
多环境成本分配
资源浪费识别与消除
预留实例规划
竞价实例集成

Implementation Examples

实施示例

1. AWS Cost Optimization Configuration

1. AWS成本优化配置

yaml

undefined

yaml

undefined

cost-optimization-setup.yaml

apiVersion: v1 kind: ConfigMap metadata: name: cost-optimization-scripts namespace: operations data: analyze-costs.sh: | #!/bin/bash set -euo pipefail

echo "=== AWS Cost Analysis ==="

# Get daily cost trend
echo "Daily costs for last 7 days:"
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity DAILY \
  --metrics "BlendedCost" \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]' \
  --output table

# Find unattached resources
echo -e "\n=== Unattached EBS Volumes ==="
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,CreateTime]' \
  --output table

echo -e "\n=== Unattached Elastic IPs ==="
aws ec2 describe-addresses \
  --filters Name=association-id,Values=none \
  --query 'Addresses[*].[PublicIp,AllocationId]' \
  --output table

echo -e "\n=== Unused RDS Instances ==="
aws rds describe-db-instances \
  --query 'DBInstances[?DBInstanceStatus==`available`].[DBInstanceIdentifier,DBInstanceClass,Engine,AllocatedStorage]' \
  --output table

# Estimate savings with Reserved Instances
echo -e "\n=== Reserved Instance Savings Potential ==="
aws ce get-reservation-purchase-recommendation \
  --service "EC2" \
  --lookback-period THIRTY_DAYS \
  --query 'Recommendations[0].[RecommendationSummary.TotalEstimatedMonthlySavingsAmount,RecommendationSummary.TotalEstimatedMonthlySavingsPercentage]' \
  --output table

optimize-resources.sh: | #!/bin/bash set -euo pipefail

echo "Starting resource optimization..."

# Remove unattached volumes
echo "Removing unattached volumes..."
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].VolumeId' \
  --output text | \
while read volume_id; do
  echo "Deleting volume: $volume_id"
  aws ec2 delete-volume --volume-id "$volume_id" 2>/dev/null || true
done

# Release unused Elastic IPs
echo "Releasing unused Elastic IPs..."
aws ec2 describe-addresses \
  --filters Name=association-id,Values=none \
  --query 'Addresses[*].AllocationId' \
  --output text | \
while read alloc_id; do
  echo "Releasing EIP: $alloc_id"
  aws ec2 release-address --allocation-id "$alloc_id" 2>/dev/null || true
done

# Modify RDS to smaller instances
echo "Analyzing RDS for downsizing..."
# Implement logic to check CloudWatch metrics and downsize if needed

echo "Optimization complete"

apiVersion: v1 kind: ConfigMap metadata: name: cost-optimization-scripts namespace: operations data: analyze-costs.sh: | #!/bin/bash set -euo pipefail

echo "=== AWS Cost Analysis ==="

# Get daily cost trend
echo "Daily costs for last 7 days:"
aws ce get-cost-and-usage \
  --time-period Start=$(date -d '7 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
  --granularity DAILY \
  --metrics "BlendedCost" \
  --group-by Type=DIMENSION,Key=SERVICE \
  --query 'ResultsByTime[*].[TimePeriod.Start,Total.BlendedCost.Amount]' \
  --output table

# Find unattached resources
echo -e "\n=== Unattached EBS Volumes ==="
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].[VolumeId,Size,CreateTime]' \
  --output table

echo -e "\n=== Unattached Elastic IPs ==="
aws ec2 describe-addresses \
  --filters Name=association-id,Values=none \
  --query 'Addresses[*].[PublicIp,AllocationId]' \
  --output table

echo -e "\n=== Unused RDS Instances ==="
aws rds describe-db-instances \
  --query 'DBInstances[?DBInstanceStatus==`available`].[DBInstanceIdentifier,DBInstanceClass,Engine,AllocatedStorage]' \
  --output table

# Estimate savings with Reserved Instances
echo -e "\n=== Reserved Instance Savings Potential ==="
aws ce get-reservation-purchase-recommendation \
  --service "EC2" \
  --lookback-period THIRTY_DAYS \
  --query 'Recommendations[0].[RecommendationSummary.TotalEstimatedMonthlySavingsAmount,RecommendationSummary.TotalEstimatedMonthlySavingsPercentage]' \
  --output table

optimize-resources.sh: | #!/bin/bash set -euo pipefail

echo "Starting resource optimization..."

# Remove unattached volumes
echo "Removing unattached volumes..."
aws ec2 describe-volumes \
  --filters Name=status,Values=available \
  --query 'Volumes[*].VolumeId' \
  --output text | \
while read volume_id; do
  echo "Deleting volume: $volume_id"
  aws ec2 delete-volume --volume-id "$volume_id" 2>/dev/null || true
done

# Release unused Elastic IPs
echo "Releasing unused Elastic IPs..."
aws ec2 describe-addresses \
  --filters Name=association-id,Values=none \
  --query 'Addresses[*].AllocationId' \
  --output text | \
while read alloc_id; do
  echo "Releasing EIP: $alloc_id"
  aws ec2 release-address --allocation-id "$alloc_id" 2>/dev/null || true
done

# Modify RDS to smaller instances
echo "Analyzing RDS for downsizing..."
# Implement logic to check CloudWatch metrics and downsize if needed

echo "Optimization complete"

Terraform cost optimization

resource "aws_ec2_instance" "spot" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.medium"

Use spot instances for non-critical workloads

instance_market_options { market_type = "spot"

spot_options {
  max_price                      = "0.05"  # Set max price
  spot_instance_type             = "persistent"
  interrupt_behavior             = "terminate"
  valid_until                    = "2025-12-31T23:59:59Z"
}

}

tags = { Name = "spot-instance" CostCenter = "engineering" } }

resource "aws_ec2_instance" "spot" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.medium"

Use spot instances for non-critical workloads

instance_market_options { market_type = "spot"

spot_options {
  max_price                      = "0.05"  # Set max price
  spot_instance_type             = "persistent"
  interrupt_behavior             = "terminate"
  valid_until                    = "2025-12-31T23:59:59Z"
}

}

tags = { Name = "spot-instance" CostCenter = "engineering" } }

Reserved instance for baseline capacity

resource "aws_ec2_instance" "reserved" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.medium"

Tag for reserved instance matching

tags = { Name = "reserved-instance" ReservationType = "reserved" } }

resource "aws_ec2_fleet" "mixed" { name = "mixed-capacity"

launch_template_configs { launch_template_specification { launch_template_id = aws_launch_template.app.id version = "$Latest" }

overrides {
  instance_type       = "t3.medium"
  weighted_capacity   = "1"
  priority            = 1  # Reserved
}

overrides {
  instance_type       = "t3.large"
  weighted_capacity   = "2"
  priority            = 2  # Reserved
}

overrides {
  instance_type       = "t3a.medium"
  weighted_capacity   = "1"
  priority            = 3  # Spot
}

overrides {
  instance_type       = "t3a.large"
  weighted_capacity   = "2"
  priority            = 4  # Spot
}

}

target_capacity_specification { total_target_capacity = 10 on_demand_target_capacity = 6 spot_target_capacity = 4 default_target_capacity_type = "on-demand" }

fleet_type = "maintain" }

undefined

resource "aws_ec2_instance" "reserved" { ami = "ami-0c55b159cbfafe1f0" instance_type = "t3.medium"

Tag for reserved instance matching

tags = { Name = "reserved-instance" ReservationType = "reserved" } }

resource "aws_ec2_fleet" "mixed" { name = "mixed-capacity"

launch_template_configs { launch_template_specification { launch_template_id = aws_launch_template.app.id version = "$Latest" }

overrides {
  instance_type       = "t3.medium"
  weighted_capacity   = "1"
  priority            = 1  # Reserved
}

overrides {
  instance_type       = "t3.large"
  weighted_capacity   = "2"
  priority            = 2  # Reserved
}

overrides {
  instance_type       = "t3a.medium"
  weighted_capacity   = "1"
  priority            = 3  # Spot
}

overrides {
  instance_type       = "t3a.large"
  weighted_capacity   = "2"
  priority            = 4  # Spot
}

}

target_capacity_specification { total_target_capacity = 10 on_demand_target_capacity = 6 spot_target_capacity = 4 default_target_capacity_type = "on-demand" }

fleet_type = "maintain" }

undefined

2. Kubernetes Cost Optimization

2. Kubernetes成本优化

yaml

undefined

yaml

undefined

k8s-cost-optimization.yaml

apiVersion: v1 kind: ConfigMap metadata: name: cost-optimization-policies namespace: kube-system data: policies.yaml: | # Resource quotas per namespace apiVersion: v1 kind: ResourceQuota metadata: name: compute-quota namespace: production spec: hard: requests.cpu: "100" requests.memory: "200Gi" limits.cpu: "200" limits.memory: "400Gi" pods: "500" scopeSelector: matchExpressions: - operator: In scopeName: PriorityClass values: ["high", "medium"]

Pod Disruption Budget for cost-effective scaling

apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: cost-optimized-pdb namespace: production spec: minAvailable: 1 selector: matchLabels: tier: backend

Prioritize spot instances with taints/tolerations

apiVersion: v1 kind: Node metadata: name: spot-node-1 spec: taints: - key: cloud.google.com/gke-preemptible value: "true" effect: NoSchedule

apiVersion: apps/v1 kind: Deployment metadata: name: cost-optimized-app namespace: production spec: replicas: 3 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: # Tolerate spot instances tolerations: - key: cloud.google.com/gke-preemptible operator: Equal value: "true" effect: NoSchedule

  # Prefer nodes with lower cost
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
              - key: karpenter.sh/capacity-type
                operator: In
                values: ["spot"]

  containers:
    - name: app
      image: myapp:latest
      resources:
        requests:
          cpu: 100m
          memory: 128Mi
        limits:
          cpu: 500m
          memory: 512Mi

undefined

apiVersion: v1 kind: Node metadata: name: spot-node-1 spec: taints: - key: cloud.google.com/gke-preemptible value: "true" effect: NoSchedule

  # Prefer nodes with lower cost
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
              - key: karpenter.sh/capacity-type
                operator: In
                values: ["spot"]

  containers:
    - name: app
      image: myapp:latest
      resources:
        requests:
          cpu: 100m
          memory: 128Mi
        limits:
          cpu: 500m
          memory: 512Mi

undefined

3. Cost Monitoring Dashboard

3. 成本监控仪表盘

python

undefined

python

undefined

cost-monitoring.py

import boto3 import json from datetime import datetime, timedelta

class CostOptimizer: def init(self): self.ce_client = boto3.client('ce') self.ec2_client = boto3.client('ec2') self.rds_client = boto3.client('rds')

def get_daily_costs(self, days=30):
    """Get daily costs for past N days"""
    end_date = datetime.now().date()
    start_date = end_date - timedelta(days=days)

    response = self.ce_client.get_cost_and_usage(
        TimePeriod={
            'Start': str(start_date),
            'End': str(end_date)
        },
        Granularity='DAILY',
        Metrics=['BlendedCost'],
        GroupBy=[
            {'Type': 'DIMENSION', 'Key': 'SERVICE'}
        ]
    )

    return response

def find_underutilized_instances(self):
    """Find EC2 instances with low CPU usage"""
    cloudwatch = boto3.client('cloudwatch')
    instances = []

    ec2_instances = self.ec2_client.describe_instances()
    for reservation in ec2_instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']

            # Check CPU utilization
            response = cloudwatch.get_metric_statistics(
                Namespace='AWS/EC2',
                MetricName='CPUUtilization',
                Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
                StartTime=datetime.now() - timedelta(days=7),
                EndTime=datetime.now(),
                Period=3600,
                Statistics=['Average']
            )

            if response['Datapoints']:
                avg_cpu = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
                if avg_cpu < 10:  # Less than 10% average
                    instances.append({
                        'InstanceId': instance_id,
                        'Type': instance['InstanceType'],
                        'AverageCPU': avg_cpu,
                        'Recommendation': 'Downsize or terminate'
                    })

    return instances

def estimate_reserved_instance_savings(self):
    """Estimate potential savings from reserved instances"""
    response = self.ce_client.get_reservation_purchase_recommendation(
        Service='EC2',
        LookbackPeriod='THIRTY_DAYS',
        PageSize=100
    )

    total_savings = 0
    for recommendation in response.get('Recommendations', []):
        summary = recommendation['RecommendationSummary']
        savings = float(summary['EstimatedMonthlyMonthlySavingsAmount'])
        total_savings += savings

    return total_savings

def generate_report(self):
    """Generate comprehensive cost optimization report"""
    print("=== Cost Optimization Report ===\n")

    # Daily costs
    print("Daily Costs:")
    costs = self.get_daily_costs(7)
    for result in costs['ResultsByTime']:
        date = result['TimePeriod']['Start']
        total = result['Total']['BlendedCost']['Amount']
        print(f"  {date}: ${total}")

    # Underutilized instances
    print("\nUnderutilized Instances:")
    underutilized = self.find_underutilized_instances()
    for instance in underutilized:
        print(f"  {instance['InstanceId']}: {instance['AverageCPU']:.1f}% CPU - {instance['Recommendation']}")

    # Reserved instance savings
    print("\nReserved Instance Savings Potential:")
    savings = self.estimate_reserved_instance_savings()
    print(f"  Estimated Monthly Savings: ${savings:.2f}")

import boto3 import json from datetime import datetime, timedelta

class CostOptimizer: def init(self): self.ce_client = boto3.client('ce') self.ec2_client = boto3.client('ec2') self.rds_client = boto3.client('rds')

def get_daily_costs(self, days=30):
    """Get daily costs for past N days"""
    end_date = datetime.now().date()
    start_date = end_date - timedelta(days=days)

    response = self.ce_client.get_cost_and_usage(
        TimePeriod={
            'Start': str(start_date),
            'End': str(end_date)
        },
        Granularity='DAILY',
        Metrics=['BlendedCost'],
        GroupBy=[
            {'Type': 'DIMENSION', 'Key': 'SERVICE'}
        ]
    )

    return response

def find_underutilized_instances(self):
    """Find EC2 instances with low CPU usage"""
    cloudwatch = boto3.client('cloudwatch')
    instances = []

    ec2_instances = self.ec2_client.describe_instances()
    for reservation in ec2_instances['Reservations']:
        for instance in reservation['Instances']:
            instance_id = instance['InstanceId']

            # Check CPU utilization
            response = cloudwatch.get_metric_statistics(
                Namespace='AWS/EC2',
                MetricName='CPUUtilization',
                Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
                StartTime=datetime.now() - timedelta(days=7),
                EndTime=datetime.now(),
                Period=3600,
                Statistics=['Average']
            )

            if response['Datapoints']:
                avg_cpu = sum(d['Average'] for d in response['Datapoints']) / len(response['Datapoints'])
                if avg_cpu < 10:  # Less than 10% average
                    instances.append({
                        'InstanceId': instance_id,
                        'Type': instance['InstanceType'],
                        'AverageCPU': avg_cpu,
                        'Recommendation': 'Downsize or terminate'
                    })

    return instances

def estimate_reserved_instance_savings(self):
    """Estimate potential savings from reserved instances"""
    response = self.ce_client.get_reservation_purchase_recommendation(
        Service='EC2',
        LookbackPeriod='THIRTY_DAYS',
        PageSize=100
    )

    total_savings = 0
    for recommendation in response.get('Recommendations', []):
        summary = recommendation['RecommendationSummary']
        savings = float(summary['EstimatedMonthlyMonthlySavingsAmount'])
        total_savings += savings

    return total_savings

def generate_report(self):
    """Generate comprehensive cost optimization report"""
    print("=== Cost Optimization Report ===\n")

    # Daily costs
    print("Daily Costs:")
    costs = self.get_daily_costs(7)
    for result in costs['ResultsByTime']:
        date = result['TimePeriod']['Start']
        total = result['Total']['BlendedCost']['Amount']
        print(f"  {date}: ${total}")

    # Underutilized instances
    print("\nUnderutilized Instances:")
    underutilized = self.find_underutilized_instances()
    for instance in underutilized:
        print(f"  {instance['InstanceId']}: {instance['AverageCPU']:.1f}% CPU - {instance['Recommendation']}")

    # Reserved instance savings
    print("\nReserved Instance Savings Potential:")
    savings = self.estimate_reserved_instance_savings()
    print(f"  Estimated Monthly Savings: ${savings:.2f}")

Usage

if name == 'main': optimizer = CostOptimizer() optimizer.generate_report()

undefined

if name == 'main': optimizer = CostOptimizer() optimizer.generate_report()

undefined

Cost Optimization Strategies

成本优化策略

✅ DO

✅ 建议做法

Use reserved instances for baseline
Leverage spot instances
Right-size resources
Monitor cost trends
Implement auto-scaling
Use multi-region pricing
Tag resources consistently
Schedule non-essential resources

为基础负载使用预留实例
利用竞价实例
合理调整资源规格
监控成本趋势
实现自动扩缩容
利用多区域定价
统一标记资源
为非核心资源设置调度启停

❌ DON'T

❌ 避免做法

Over-provision resources
Ignore unused resources
Neglect cost monitoring
Run all on-demand
Forget to release EIPs
Mix cost centers
Ignore savings opportunities
Deploy without budgets

过度配置资源
忽略未使用资源
忽视成本监控
全部使用按需实例
忘记释放弹性IP
混淆成本中心
忽视节省成本的机会
无预算部署

Cost Saving Opportunities

成本节省机会

Reserved Instances: 40-70% savings
Spot Instances: 70-90% savings
Committed Use Discounts: 25-55% savings
Right-sizing: 10-30% savings
Resource cleanup: 5-20% savings

预留实例：节省40-70%
竞价实例：节省70-90%
承诺使用折扣：节省25-55%
资源规格合理调整：节省10-30%
资源清理：节省5-20%

infrastructure-cost-optimization

Original

Translation

Infrastructure Cost Optimization

云基础设施成本优化

Overview

概述

When to Use

适用场景

Implementation Examples

实施示例

1. AWS Cost Optimization Configuration

1. AWS成本优化配置

cost-optimization-setup.yaml

cost-optimization-setup.yaml

Terraform cost optimization

Terraform cost optimization

Use spot instances for non-critical workloads

Use spot instances for non-critical workloads

Reserved instance for baseline capacity

Reserved instance for baseline capacity

Tag for reserved instance matching

Tag for reserved instance matching

2. Kubernetes Cost Optimization

2. Kubernetes成本优化

k8s-cost-optimization.yaml

k8s-cost-optimization.yaml

Pod Disruption Budget for cost-effective scaling

Pod Disruption Budget for cost-effective scaling

Prioritize spot instances with taints/tolerations

Prioritize spot instances with taints/tolerations

3. Cost Monitoring Dashboard

3. 成本监控仪表盘

cost-monitoring.py

cost-monitoring.py

Usage

Usage

Cost Optimization Strategies

成本优化策略

✅ DO

✅ 建议做法

❌ DON'T

❌ 避免做法

Cost Saving Opportunities

成本节省机会

Resources

参考资源