processing-s3-uploads-with-step-functions
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseStep Functions Workflow: Route S3 Uploads to Lambda or Fargate
Step Functions工作流:将S3上传文件路由至Lambda或Fargate
Overview
概述
This skill deploys an event-driven workflow using AWS CLI. When a file is uploaded to
an S3 bucket, EventBridge triggers a Step Functions state machine. The state machine
checks the file size and routes processing to either a Lambda function (files ≤ 6 MB)
or a Fargate task (files > 6 MB).
The architecture includes:
- An S3 bucket with EventBridge notifications enabled
- An EventBridge rule that triggers Step Functions on S3 object creation
- A Step Functions state machine with a Choice state for routing
- A Lambda function for processing small files
- An ECS Fargate task for processing large files
- A VPC with two subnets, internet gateway, and security group
- An ECR repository for the Fargate container image
- Scoped IAM roles for Lambda, Step Functions, and ECS tasks
Use this skill when:
- You need to process S3 uploads with different compute based on file size
- You want a serverless workflow that can handle both small and large files
- You need Step Functions orchestration with Lambda and Fargate
Do not use this skill when:
- All files are small enough for Lambda (use S3 → Lambda directly)
- You need real-time streaming (use Kinesis)
- You don't need file-size-based routing
本技能使用AWS CLI部署一个事件驱动的工作流。当文件上传至S3存储桶时,EventBridge会触发Step Functions状态机。状态机会检查文件大小,并将处理任务路由至Lambda函数(文件≤6 MB)或Fargate任务(文件>6 MB)。
该架构包含:
- 已启用EventBridge通知的S3存储桶
- 当S3对象创建时触发Step Functions的EventBridge规则
- 包含Choice状态用于路由的Step Functions状态机
- 用于处理小文件的Lambda函数
- 用于处理大文件的ECS Fargate任务
- 包含两个子网、互联网网关和安全组的VPC
- 用于存储Fargate容器镜像的ECR仓库
- 为Lambda、Step Functions和ECS任务配置的权限受限IAM角色
在以下场景使用本技能:
- 需要根据文件大小使用不同计算资源处理S3上传文件时
- 希望构建可同时处理小文件和大文件的无服务器工作流时
- 需要通过Step Functions编排Lambda和Fargate任务时
请勿在以下场景使用本技能:
- 所有文件都小到可由Lambda直接处理时(直接使用S3 → Lambda即可)
- 需要实时流处理时(使用Kinesis)
- 不需要基于文件大小进行路由时
Prerequisites
前置条件
- AWS CLI v2 — Installed and configured. Verify with .
aws sts get-caller-identity - Python 3.12 — For the Lambda function runtime.
- Docker — For building and pushing the Fargate container image.
- AWS CLI v2 — 已安装并配置。可通过命令验证。
aws sts get-caller-identity - Python 3.12 — 作为Lambda函数的运行时环境。
- Docker — 用于构建并推送Fargate容器镜像。
Parameters
参数
- bucket_name (required): Name for the S3 bucket (globally unique, lowercase, 3-63 characters)
- region (required): AWS region for all resources
- ecr_repo_name (required): Name for the ECR repository
- state_machine_name (required): Name for the Step Functions state machine
- kms_key_arn (optional): ARN of a KMS key for CloudWatch Logs encryption. If not provided, create one with
aws kms create-key --description "Key for CloudWatch Logs encryption" --region {region}
Constraints for parameter acquisition:
- You MUST ask for all required parameters upfront in a single prompt
- You MUST support multiple input methods (direct input, file path, URL)
- You MUST confirm successful acquisition of all parameters before proceeding
- You MUST validate that bucket_name follows S3 naming rules
- bucket_name(必填):S3存储桶名称(全局唯一,小写,长度3-63字符)
- region(必填):所有资源所在的AWS区域
- ecr_repo_name(必填):ECR仓库名称
- state_machine_name(必填):Step Functions状态机名称
- kms_key_arn(可选):用于CloudWatch Logs加密的KMS密钥ARN。若未提供,可通过命令创建
aws kms create-key --description "Key for CloudWatch Logs encryption" --region {region}
参数获取约束:
- 必须在单次提示中一次性要求所有必填参数
- 必须支持多种输入方式(直接输入、文件路径、URL)
- 必须在继续操作前确认已成功获取所有参数
- 必须验证bucket_name是否符合S3命名规则
Procedures
操作步骤
Step 0: Verify Dependencies
步骤0:验证依赖项
Constraints:
- You MUST verify the following tools are available: aws-cli, python3 (3.12+), docker
- You MUST inform the user about any missing tools with a clear message
- You MUST ask if the user wants to proceed despite missing tools
- You MUST respect the customer's decision to abort at any point
- You MUST explain to the customer what step is being executed, why, and which tool is being called
约束:
- 必须验证以下工具是否可用:aws-cli、python3(3.12+)、docker
- 若有工具缺失,必须向用户发送清晰的提示信息
- 必须询问用户是否要在工具缺失的情况下继续操作
- 必须尊重用户随时终止操作的决定
- 必须向用户解释当前执行的步骤、原因以及调用的工具
Step 1: Retrieve AWS Account ID
步骤1:获取AWS账户ID
Constraints:
- You MUST retrieve the account ID with:
aws sts get-caller-identity --query 'Account' --output text - You MUST store the result as {account_id} for use in all subsequent steps
- You MUST abort if credentials are not configured
约束:
- 必须通过以下命令获取账户ID:
aws sts get-caller-identity --query 'Account' --output text - 必须将结果存储为{account_id},供后续所有步骤使用
- 若凭证未配置,必须终止操作
Step 2: Get the Default VPC and Networking
步骤2:获取默认VPC和网络资源
Constraints:
- You MUST retrieve the default VPC ID with:
aws ec2 describe-vpcs --filters Name=isDefault,Values=true --query 'Vpcs[0].VpcId' --output text --region {region} - If no default VPC exists, inform the user they must create one with or provide a VPC ID manually
aws ec2 create-default-vpc --region {region} - You MUST retrieve two subnet IDs from the default VPC:
aws ec2 describe-subnets --filters Name=vpc-id,Values={vpc_id} --query 'Subnets[0:2].SubnetId' --output text --region {region} - You MUST create a security group in the default VPC:
aws ec2 create-security-group --group-name fargate-sg --description "Security group for Fargate tasks" --vpc-id {vpc_id} --region {region} - You MUST configure security group egress rules to allow only HTTPS and DNS outbound. First revoke the default allow-all egress rule:
Then add scoped rules:
aws ec2 revoke-security-group-egress --group-id {sg_id} --ip-permissions IpProtocol=-1,IpRanges='[{CidrIp=0.0.0.0/0}]' --region {region}andaws ec2 authorize-security-group-egress --group-id {sg_id} --protocol tcp --port 443 --cidr 0.0.0.0/0 --region {region}aws ec2 authorize-security-group-egress --group-id {sg_id} --protocol udp --port 53 --cidr 0.0.0.0/0 --region {region} - You MUST recommend VPC endpoints for S3 and CloudWatch Logs for production workloads to avoid internet-routed traffic and eliminate the need for broad egress rules
- You MUST capture {vpc_id}, {subnet1_id}, {subnet2_id}, and {sg_id} for use in later steps
约束:
- 必须通过以下命令获取默认VPC ID:
aws ec2 describe-vpcs --filters Name=isDefault,Values=true --query 'Vpcs[0].VpcId' --output text --region {region} - 若不存在默认VPC,需告知用户必须通过命令创建,或手动提供VPC ID
aws ec2 create-default-vpc --region {region} - 必须从默认VPC中获取两个子网ID:
aws ec2 describe-subnets --filters Name=vpc-id,Values={vpc_id} --query 'Subnets[0:2].SubnetId' --output text --region {region} - 必须在默认VPC中创建安全组:
aws ec2 create-security-group --group-name fargate-sg --description "Security group for Fargate tasks" --vpc-id {vpc_id} --region {region} - 必须配置安全组出站规则,仅允许HTTPS和DNS出站。首先撤销默认的允许所有出站流量规则:
然后添加限定规则:
aws ec2 revoke-security-group-egress --group-id {sg_id} --ip-permissions IpProtocol=-1,IpRanges='[{CidrIp=0.0.0.0/0}]' --region {region}和aws ec2 authorize-security-group-egress --group-id {sg_id} --protocol tcp --port 443 --cidr 0.0.0.0/0 --region {region}aws ec2 authorize-security-group-egress --group-id {sg_id} --protocol udp --port 53 --cidr 0.0.0.0/0 --region {region} - 必须建议在生产环境中为S3和CloudWatch Logs配置VPC终端节点,以避免互联网路由流量并消除对宽泛出站规则的需求
- 必须记录{vpc_id}、{subnet1_id}、{subnet2_id}和{sg_id},供后续步骤使用
Step 3: Create the ECR Repository
步骤3:创建ECR仓库
Constraints:
- You MUST create the repository with:
aws ecr create-repository --repository-name {ecr_repo_name} --region {region} - You MUST capture the repositoryUri from the response
约束:
- 必须通过以下命令创建仓库:
aws ecr create-repository --repository-name {ecr_repo_name} --region {region} - 必须从响应中获取repositoryUri
Step 4: Build and Push the Container Image
步骤4:构建并推送容器镜像
Constraints:
-
You MUST verify Docker is installed by running. If Docker is not installed, instruct the user to install it from https://docs.docker.com/get-docker/ and abort until it is available
docker --version -
You MUST authenticate Docker with ECR:
aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com -
The Dockerfile and processor code are inand
scripts/Dockerfilescripts/fargate_processor.py -
You MUST build and push the image from the scripts directory:
cd scripts docker build --platform linux/amd64 -t {ecr_repo_name} . docker tag {ecr_repo_name}:latest {account_id}.dkr.ecr.{region}.amazonaws.com/{ecr_repo_name}:latest docker push {account_id}.dkr.ecr.{region}.amazonaws.com/{ecr_repo_name}:latest cd ..
约束:
-
必须通过运行验证Docker是否已安装。若未安装,需指导用户从https://docs.docker.com/get-docker/安装,并终止操作直至Docker可用
docker --version -
必须通过以下命令让Docker与ECR进行身份验证:
aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com -
Dockerfile和处理器代码位于和
scripts/Dockerfilescripts/fargate_processor.py -
必须从scripts目录构建并推送镜像:
cd scripts docker build --platform linux/amd64 -t {ecr_repo_name} . docker tag {ecr_repo_name}:latest {account_id}.dkr.ecr.{region}.amazonaws.com/{ecr_repo_name}:latest docker push {account_id}.dkr.ecr.{region}.amazonaws.com/{ecr_repo_name}:latest cd ..
Step 5: Create IAM Roles
步骤5:创建IAM角色
Follow the detailed instructions in to create all IAM roles (Lambda, ECS task execution, ECS task, Step Functions, and EventBridge roles).
references/iam-roles.md- You MUST wait at least 10 seconds for IAM role propagation
按照中的详细说明创建所有IAM角色(Lambda、ECS任务执行角色、ECS任务角色、Step Functions角色和EventBridge角色)。
references/iam-roles.md- 必须等待至少10秒,确保IAM角色完成传播
Step 6: Create the Lambda Function
步骤6:创建Lambda函数
Constraints:
-
The function code is in
scripts/lambda_function.py -
You MUST be in the skill root directory before packaging and creating the function
-
You MUST package it with:
python3 -c "import zipfile,io; z=io.BytesIO(); f=zipfile.ZipFile(z,'w'); f.writestr('lambda_function.py', open('scripts/lambda_function.py').read()); f.close(); open('/tmp/lambda_function.zip','wb').write(z.getvalue())" -
You MUST create the function with:
aws lambda create-function \ --function-name sfn-file-processor \ --runtime python3.12 \ --handler lambda_function.lambda_handler \ --role arn:aws:iam::{account_id}:role/sfn-lambda-role \ --zip-file fileb:///tmp/lambda_function.zip \ --timeout 60 \ --architectures x86_64 \ --region {region} -
You MUST verify the function was created with:
aws lambda get-function --function-name sfn-file-processor --region {region}
约束:
-
函数代码位于
scripts/lambda_function.py -
必须在技能根目录下打包并创建函数
-
必须通过以下命令打包:
python3 -c "import zipfile,io; z=io.BytesIO(); f=zipfile.ZipFile(z,'w'); f.writestr('lambda_function.py', open('scripts/lambda_function.py').read()); f.close(); open('/tmp/lambda_function.zip','wb').write(z.getvalue())" -
必须通过以下命令创建函数:
aws lambda create-function \ --function-name sfn-file-processor \ --runtime python3.12 \ --handler lambda_function.lambda_handler \ --role arn:aws:iam::{account_id}:role/sfn-lambda-role \ --zip-file fileb:///tmp/lambda_function.zip \ --timeout 60 \ --architectures x86_64 \ --region {region} -
必须通过以下命令验证函数是否创建成功:
aws lambda get-function --function-name sfn-file-processor --region {region}
Step 7: Create the CloudWatch Log Group
步骤7:创建CloudWatch日志组
Constraints:
- You MUST create the log group for Fargate:
aws logs create-log-group --log-group-name /StepFunctionFargateTask --region {region} - You MUST encrypt the log group with a KMS key:
aws logs associate-kms-key --log-group-name /StepFunctionFargateTask --kms-key-arn {kms_key_arn} --region {region}
约束:
- 必须为Fargate创建日志组:
aws logs create-log-group --log-group-name /StepFunctionFargateTask --region {region} - 必须使用KMS密钥加密日志组:
aws logs associate-kms-key --log-group-name /StepFunctionFargateTask --kms-key-arn {kms_key_arn} --region {region}
Step 8: Create the ECS Cluster and Task Definition
步骤8:创建ECS集群和任务定义
Follow the detailed instructions in to create the ECS cluster and register the Fargate task definition.
references/ecs-task-definition.md- You MUST capture the task definition ARN from the response
按照中的详细说明创建ECS集群并注册Fargate任务定义。
references/ecs-task-definition.md- 必须从响应中获取任务定义ARN
Step 9: Create the S3 Bucket with EventBridge Notifications
步骤9:创建启用EventBridge通知的S3存储桶
Constraints:
- You MUST create the bucket with:
aws s3api create-bucket --bucket {bucket_name} --region {region} --create-bucket-configuration LocationConstraint={region} - You MUST NOT include if region is us-east-1
--create-bucket-configuration - You MUST enable EventBridge notifications on the bucket:
aws s3api put-bucket-notification-configuration --bucket {bucket_name} --notification-configuration '{"EventBridgeConfiguration": {}}' --region {region} - You MUST enable default encryption on the bucket:
aws s3api put-bucket-encryption --bucket {bucket_name} --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms"}}]}' --region {region}
约束:
- 必须通过以下命令创建存储桶:
aws s3api create-bucket --bucket {bucket_name} --region {region} --create-bucket-configuration LocationConstraint={region} - 若区域为us-east-1,不得包含参数
--create-bucket-configuration - 必须为存储桶启用EventBridge通知:
aws s3api put-bucket-notification-configuration --bucket {bucket_name} --notification-configuration '{"EventBridgeConfiguration": {}}' --region {region} - 必须为存储桶启用默认加密:
aws s3api put-bucket-encryption --bucket {bucket_name} --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms"}}]}' --region {region}
Step 10: Create the Step Functions State Machine
步骤10:创建Step Functions状态机
Constraints:
-
The state machine definition is in
scripts/statemachine.asl.json -
You MUST create a working copy and replace all placeholders:
sed -e 's|${LambdaFunction}|arn:aws:lambda:{region}:{account_id}:function:sfn-file-processor|g' \ -e 's|${Cluster}|arn:aws:ecs:{region}:{account_id}:cluster/sfn-cluster|g' \ -e 's|${TaskDefinition}|{task_definition_arn}|g' \ -e 's|${Subnet1}|{subnet1_id}|g' \ -e 's|${Subnet2}|{subnet2_id}|g' \ -e 's|${SecurityGroup}|{sg_id}|g' \ scripts/statemachine.asl.json > /tmp/statemachine.asl.json -
You MUST create the state machine with:
aws stepfunctions create-state-machine \ --name {state_machine_name} \ --definition file:///tmp/statemachine.asl.json \ --role-arn arn:aws:iam::{account_id}:role/sfn-state-machine-role \ --type STANDARD \ --region {region} -
You MUST capture the stateMachineArn from the response
约束:
-
状态机定义位于
scripts/statemachine.asl.json -
必须创建工作副本并替换所有占位符:
sed -e 's|${LambdaFunction}|arn:aws:lambda:{region}:{account_id}:function:sfn-file-processor|g' \ -e 's|${Cluster}|arn:aws:ecs:{region}:{account_id}:cluster/sfn-cluster|g' \ -e 's|${TaskDefinition}|{task_definition_arn}|g' \ -e 's|${Subnet1}|{subnet1_id}|g' \ -e 's|${Subnet2}|{subnet2_id}|g' \ -e 's|${SecurityGroup}|{sg_id}|g' \ scripts/statemachine.asl.json > /tmp/statemachine.asl.json -
必须通过以下命令创建状态机:
aws stepfunctions create-state-machine \ --name {state_machine_name} \ --definition file:///tmp/statemachine.asl.json \ --role-arn arn:aws:iam::{account_id}:role/sfn-state-machine-role \ --type STANDARD \ --region {region} -
必须从响应中获取stateMachineArn
Step 11: Create the EventBridge Rule
步骤11:创建EventBridge规则
Constraints:
-
You MUST create the EventBridge rule to trigger on S3 object creation:
aws events put-rule \ --name s3-to-stepfunctions \ --event-pattern '{ "source": ["aws.s3"], "detail-type": ["Object Created"], "detail": { "bucket": { "name": ["{bucket_name}"] } } }' \ --region {region} -
You MUST add the state machine as a target:
aws events put-targets \ --rule s3-to-stepfunctions \ --targets '[{ "Id": "StepFunctionsTarget", "Arn": "{state_machine_arn}", "RoleArn": "arn:aws:iam::{account_id}:role/sfn-eventbridge-role" }]' \ --region {region}
约束:
-
必须创建EventBridge规则,在S3对象创建时触发:
aws events put-rule \ --name s3-to-stepfunctions \ --event-pattern '{ "source": ["aws.s3"], "detail-type": ["Object Created"], "detail": { "bucket": { "name": ["{bucket_name}"] } } }' \ --region {region} -
必须将状态机添加为目标:
aws events put-targets \ --rule s3-to-stepfunctions \ --targets '[{ "Id": "StepFunctionsTarget", "Arn": "{state_machine_arn}", "RoleArn": "arn:aws:iam::{account_id}:role/sfn-eventbridge-role" }]' \ --region {region}
Step 12: Configure Monitoring
步骤12:配置监控
Constraints:
-
You MUST create a Dead Letter Queue for failed EventBridge invocations:
aws sqs create-queue --queue-name s3-to-stepfunctions-dlq --region {region} -
You MUST update the EventBridge target to attach the DLQ:
aws events put-targets \ --rule s3-to-stepfunctions \ --targets '[{ "Id": "StepFunctionsTarget", "Arn": "{state_machine_arn}", "RoleArn": "arn:aws:iam::{account_id}:role/sfn-eventbridge-role", "DeadLetterConfig": { "Arn": "arn:aws:sqs:{region}:{account_id}:s3-to-stepfunctions-dlq" } }]' \ --region {region} -
You MUST create a CloudWatch alarm for Step Functions execution failures:
aws cloudwatch put-metric-alarm --alarm-name sfn-execution-failures --metric-name ExecutionsFailed --namespace AWS/States --statistic Sum --period 300 --threshold 1 --comparison-operator GreaterThanOrEqualToThreshold --evaluation-periods 1 --dimensions Name=StateMachineArn,Value={state_machine_arn} --region {region}
约束:
-
必须为失败的EventBridge调用创建死信队列:
aws sqs create-queue --queue-name s3-to-stepfunctions-dlq --region {region} -
必须更新EventBridge目标以附加死信队列:
aws events put-targets \ --rule s3-to-stepfunctions \ --targets '[{ "Id": "StepFunctionsTarget", "Arn": "{state_machine_arn}", "RoleArn": "arn:aws:iam::{account_id}:role/sfn-eventbridge-role", "DeadLetterConfig": { "Arn": "arn:aws:sqs:{region}:{account_id}:s3-to-stepfunctions-dlq" } }]' \ --region {region} -
必须为Step Functions执行失败创建CloudWatch告警:
aws cloudwatch put-metric-alarm --alarm-name sfn-execution-failures --metric-name ExecutionsFailed --namespace AWS/States --statistic Sum --period 300 --threshold 1 --comparison-operator GreaterThanOrEqualToThreshold --evaluation-periods 1 --dimensions Name=StateMachineArn,Value={state_machine_arn} --region {region}
Step 13: Validate
步骤13:验证
Constraints:
-
You MUST test with a small file (< 6 MB) to verify Lambda processing:
echo 'test data' > /tmp/small-file.txt aws s3 cp /tmp/small-file.txt s3://{bucket_name}/small-file.txt --region {region} -
You MUST wait 15 seconds then check the Step Functions execution:
aws stepfunctions list-executions --state-machine-arn {state_machine_arn} --region {region} -
You MUST verify the execution succeeded and routed to Lambda
-
You MUST provide a summary of all created resources including: VPC ID, subnet IDs, security group ID, ECR repo URI, ECS cluster ARN, task definition ARN, Lambda function ARN, state machine ARN, bucket name, and EventBridge rule name
约束:
-
必须使用小文件(<6 MB)测试Lambda处理是否正常:
echo 'test data' > /tmp/small-file.txt aws s3 cp /tmp/small-file.txt s3://{bucket_name}/small-file.txt --region {region} -
必须等待15秒后检查Step Functions执行情况:
aws stepfunctions list-executions --state-machine-arn {state_machine_arn} --region {region} -
必须验证执行成功且已路由至Lambda
-
必须提供所有已创建资源的摘要,包括:VPC ID、子网ID、安全组ID、ECR仓库URI、ECS集群ARN、任务定义ARN、Lambda函数ARN、状态机ARN、存储桶名称和EventBridge规则名称
Troubleshooting
故障排除
EventBridge rule not triggering
EventBridge规则未触发
- Verify EventBridge notifications are enabled on the bucket:
aws s3api get-bucket-notification-configuration --bucket {bucket_name} - Verify the rule exists:
aws events describe-rule --name s3-to-stepfunctions --region {region} - Check that the target has the correct state machine ARN and role
- 验证存储桶是否已启用EventBridge通知:
aws s3api get-bucket-notification-configuration --bucket {bucket_name} - 验证规则是否存在:
aws events describe-rule --name s3-to-stepfunctions --region {region} - 检查目标是否配置了正确的状态机ARN和角色
Step Functions execution fails at Fargate task
Step Functions执行在Fargate任务环节失败
- Verify the container image exists in ECR:
aws ecr describe-images --repository-name {ecr_repo_name} --region {region} - Check that the subnets have internet access (route table with IGW)
- Verify the security group allows outbound traffic
- Check CloudWatch Logs at
/StepFunctionFargateTask
- 验证容器镜像是否存在于ECR:
aws ecr describe-images --repository-name {ecr_repo_name} --region {region} - 检查子网是否具有互联网访问权限(路由表包含IGW)
- 验证安全组是否允许出站流量
- 查看CloudWatch Logs中的日志
/StepFunctionFargateTask
Lambda invocation fails
Lambda调用失败
- Check CloudWatch Logs:
aws logs tail /aws/lambda/sfn-file-processor --region {region} - Verify the Step Functions role has permission
lambda:InvokeFunction
- 查看CloudWatch Logs:
aws logs tail /aws/lambda/sfn-file-processor --region {region} - 验证Step Functions角色是否具有权限
lambda:InvokeFunction
IAM PassRole errors
IAM PassRole错误
- The Step Functions role must have for both the ECS execution role and task role ARNs
iam:PassRole
- Step Functions角色必须对ECS执行角色和任务角色ARN具有权限
iam:PassRole
Fargate task stuck in PROVISIONING
Fargate任务卡在PROVISIONING状态
- Verify the subnets have auto-assign public IP enabled
- Verify the internet gateway is attached and route table has 0.0.0.0/0 route
- 验证子网是否已启用自动分配公网IP
- 验证互联网网关已附加且路由表包含0.0.0.0/0路由
Security Considerations
安全注意事项
- Fargate tasks with public IPs are exposed to the internet. Revoke the default allow-all egress rule and configure scoped egress: then add
aws ec2 revoke-security-group-egress --group-id {sg_id} --ip-permissions IpProtocol=-1,IpRanges='[{CidrIp=0.0.0.0/0}]'andaws ec2 authorize-security-group-egress --group-id {sg_id} --protocol tcp --port 443 --cidr 0.0.0.0/0. For production, consider using VPC endpoints for S3 and CloudWatch Logs instead of internet-routed traffic.aws ec2 authorize-security-group-egress --group-id {sg_id} --protocol udp --port 53 --cidr 0.0.0.0/0 - Scan container images for vulnerabilities before pushing to ECR. Enable ECR image scanning with:
aws ecr put-image-scanning-configuration --repository-name {ecr_repo_name} --image-scanning-configuration scanOnPush=true --region {region} - Use IAM roles for credentials — never hardcode access keys in container code.
- Enable encryption at rest for the S3 bucket:
aws s3api put-bucket-encryption --bucket {bucket_name} --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms"}}]}' - Enable CloudWatch Logs encryption for Fargate container logs:
aws logs associate-kms-key --log-group-name /StepFunctionFargateTask --kms-key-arn <KMS_KEY_ARN> - Configure a Dead Letter Queue on the EventBridge rule for failed invocations
- Set up CloudWatch alarms on Step Functions execution failures for operational visibility
- 带有公网IP的Fargate任务会暴露在互联网中。请撤销默认的允许所有出站流量规则,并配置限定的出站规则:,然后添加
aws ec2 revoke-security-group-egress --group-id {sg_id} --ip-permissions IpProtocol=-1,IpRanges='[{CidrIp=0.0.0.0/0}]'和aws ec2 authorize-security-group-egress --group-id {sg_id} --protocol tcp --port 443 --cidr 0.0.0.0/0。对于生产环境,建议使用S3和CloudWatch Logs的VPC终端节点,而非互联网路由流量。aws ec2 authorize-security-group-egress --group-id {sg_id} --protocol udp --port 53 --cidr 0.0.0.0/0 - 在推送镜像至ECR前,扫描容器镜像以查找漏洞。通过以下命令启用ECR镜像扫描:
aws ecr put-image-scanning-configuration --repository-name {ecr_repo_name} --image-scanning-configuration scanOnPush=true --region {region} - 使用IAM角色获取凭证——切勿在容器代码中硬编码访问密钥。
- 为S3存储桶启用静态加密:
aws s3api put-bucket-encryption --bucket {bucket_name} --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"aws:kms"}}]}' - 为Fargate容器日志启用CloudWatch Logs加密:
aws logs associate-kms-key --log-group-name /StepFunctionFargateTask --kms-key-arn <KMS_KEY_ARN> - 为EventBridge规则配置死信队列,处理失败的调用
- 为Step Functions执行失败设置CloudWatch告警,提升运维可见性
Version information
版本信息
- AWS CLI: 2.x
- Python runtime: 3.12
- Last validated: 2026-04-27
- AWS CLI: 2.x
- Python runtime: 3.12
- 最后验证日期: 2026-04-27