aws-lambda-managed-instances
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAWS Lambda Managed Instances (LMI)
AWS Lambda Managed Instances(LMI)
Runs Lambda functions on EC2 instances in the user's account while AWS manages provisioning, patching, scaling, routing, and load balancing. Combines Lambda's developer experience with EC2's pricing and hardware options.
Works best with the AWS MCP server for sandboxed CLI execution and audit logging. All guidance also works with standard AWS CLI or SAM CLI.
Note: Confirm regional availability, quotas, and instance type offerings against current AWS documentation before production deployment.
在用户账户内的EC2实例上运行Lambda函数,同时由AWS负责资源配置、补丁更新、弹性伸缩、路由及负载均衡。结合了Lambda的开发者体验与EC2的定价及硬件选项优势。
最佳搭配:AWS MCP server,用于沙箱化CLI执行及审计日志记录。所有指导也适用于标准AWS CLI或SAM CLI。
注意:生产部署前,请对照AWS当前文档确认区域可用性、配额及实例类型供应情况。
Quick Decision: Is LMI Right for This Workload?
快速决策:LMI是否适用于该工作负载?
| Signal | LMI is a strong fit | Standard Lambda is better |
|---|---|---|
| Traffic | Steady, predictable, 50M+ req/mo | Bursty, unpredictable, long periods of no traffic |
| Cost | Duration-heavy spend at scale | Low or sporadic invocations |
| Cold starts | Unacceptable (LMI eliminates for provisioned capacity) | Tolerable |
| Compute | Latest CPUs, specific families, high network bandwidth, GPU requirements | Standard Lambda memory/CPU sufficient |
| Isolation | Dedicated EC2 instances in your account, full VPC control | Shared Firecracker micro-VMs acceptable |
| Scale-to-zero | Does not scale to zero but can create custom schedules with AWS provided solutions | Required (pay nothing when idle) |
| Code readiness | Thread-safe (Node.js/Java/.NET) or any Python code | Non-thread-safe code, expensive to change |
| 信号 | LMI高度适配 | 标准Lambda更合适 |
|---|---|---|
| 流量模式 | 稳定可预测,月请求量5000万+ | 突发不可预测,长期无流量 |
| 成本 | 大规模下时长相关支出较高 | 调用量低或零散 |
| 冷启动 | 无法接受(LMI通过预置容量消除冷启动) | 可容忍 |
| 计算需求 | 最新CPU、特定实例家族、高网络带宽、GPU需求 | 标准Lambda的内存/CPU足够 |
| 隔离性 | 账户内专用EC2实例,完全VPC控制 | 共享Firecracker微虚拟机可接受 |
| 缩容至零 | 不支持缩容至零,但可通过AWS提供的方案创建自定义调度 | 必须支持(闲置时不付费) |
| 代码就绪度 | 线程安全(Node.js/Java/.NET)或任意Python代码 | 非线程安全代码,修改成本高 |
Routing
路由规则
Read ONLY the single reference file that matches the user's task. Do not preload multiple references.
| User need | Action |
|---|---|
| Cost comparison, pricing analysis, Savings Plans, Reserved Instances | Read cost-comparison.md |
| Instance types, memory sizing, vCPU ratios, scaling tuning, capacity provider config | Read configuration-guide.md |
| Thread safety, concurrency model, code review checklist, multi-concurrency readiness | Read thread-safety.md |
| Before/after code examples, runtime-specific migration, connection pooling | Read migration-patterns.md |
| IAM roles, VPC setup, CLI commands, SAM template, CDK example | Read infrastructure-setup.md |
| Errors, throttling, debugging, stuck deployments | Read troubleshooting.md |
Troubleshooting quick facts (always mention when diagnosing issues):
- Capacity provider stuck in CREATING → most common cause is private subnets missing a NAT gateway route (instances need outbound internet for image pull and Lambda service communication)
- Function not scaling → check that a version is published (PublishToLatestPublished: true)
- Memory errors → LMI minimum is 2048 MB
仅读取与用户任务匹配的单个参考文件,请勿预加载多个参考文件。
| 用户需求 | 操作 |
|---|---|
| 成本对比、定价分析、Savings Plans、预留实例 | 读取cost-comparison.md |
| 实例类型、内存配置、vCPU比例、伸缩调优、容量提供商配置 | 读取configuration-guide.md |
| 线程安全、并发模型、代码审查清单、多并发就绪度 | 读取thread-safety.md |
| 迁移前后代码示例、特定运行时迁移、连接池 | 读取migration-patterns.md |
| IAM角色、VPC设置、CLI命令、SAM模板、CDK示例 | 读取infrastructure-setup.md |
| 错误排查、限流问题、调试、部署停滞 | 读取troubleshooting.md |
故障排查速查要点(诊断问题时务必提及):
- 容量提供商停滞在CREATING状态 → 最常见原因是私有子网缺少NAT网关路由(实例需要出站互联网连接以拉取镜像及与Lambda服务通信)
- 函数无法伸缩 → 检查是否已发布版本(需设置PublishToLatestPublished: true)
- 内存错误 → LMI最低内存要求为2048 MB
Workflow
工作流程
Step 1: Assess the Workload
步骤1:评估工作负载
Gather these signals before recommending:
- Traffic pattern: Steady vs bursty? Requests per second?
- Current costs: Monthly Lambda spend? Existing Savings Plans?
- Runtime: Node.js, Java, .NET, or Python?
- Memory/CPU: How much memory? CPU-bound or I/O-bound?
- Execution duration: Average and P99?
- Concurrency readiness: Thread safety? Shared paths? Per-invocation DB connections?
/tmp - VPC: Already in a VPC? Private resource access needed?
When recommending LMI, ALWAYS mention: minimum 3 execution environments for AZ resiliency (cannot go below 3 in production).
推荐前先收集以下信号:
- 流量模式:稳定型还是突发型?每秒请求数?
- 当前成本:Lambda月支出?已有Savings Plans?
- 运行时:Node.js、Java、.NET还是Python?
- 内存/CPU:内存需求?CPU密集型还是I/O密集型?
- 执行时长:平均时长及P99时长?
- 并发就绪度:线程安全?共享路径?每次调用的数据库连接?
/tmp - VPC:已部署在VPC内?是否需要访问私有资源?
推荐LMI时,务必提及:生产环境需至少3个执行环境以保证可用区弹性(不能低于3个)。
Step 2: Build the Cost Comparison
步骤2:构建成本对比
REQUIRED: Present a cost comparison before recommending LMI.
Rule of thumb: LMI becomes cost-competitive at 50-100M+ req/month with steady traffic. Use the LMI Pricing Calculator for accurate comparisons.
必须操作:推荐LMI前需展示成本对比。
经验法则:当月请求量达5000-10000万+且流量稳定时,LMI具备成本竞争力。使用LMI定价计算器进行精准对比。
Step 3: Configure the Deployment
步骤3:配置部署
- Instance families (400+ types, .large and up): C-series (compute), M-series (general), R-series (memory). ARM (Graviton) for best price-performance.
- When using Graviton instances, MUST set in the function configuration to match.
Architectures: [arm64] - Memory-to-vCPU ratios: 2:1 (compute), 4:1 (general, default), 8:1 (memory). Min 2 GB, max 32 GB.
- Multi-concurrency per-vCPU maximums: Node.js 64, Java 32, .NET 32, Python 16. These are system caps — the actual setting is PerExecutionEnvironmentMaxConcurrency (per execution environment, not per vCPU).
- For I/O-bound workloads: use the runtime default or higher PerExecutionEnvironmentMaxConcurrency (e.g., 10 for Node.js) since each request uses minimal CPU while waiting on network.
- For CPU-bound workloads: set PerExecutionEnvironmentMaxConcurrency to 1-2 per vCPU since each request saturates CPU.
- Scaling: MinExecutionEnvironments (default 3), MaxVCpuCount (optional, default 400 — set explicitly as best practice), TargetResourceUtilization.
- 实例家族(400+种类型,.large及以上):C系列(计算优化)、M系列(通用型)、R系列(内存优化)。ARM(Graviton)实例性价比最优。
- **使用Graviton实例时,必须在函数配置中设置**以匹配架构。
Architectures: [arm64] - 内存与vCPU比例:2:1(计算优化)、4:1(通用型,默认)、8:1(内存优化)。最低2GB,最高32GB。
- 每vCPU最大多并发数:Node.js为64,Java为32,.NET为32,Python为16。这些是系统上限——实际设置为PerExecutionEnvironmentMaxConcurrency(按执行环境而非vCPU计算)。
- 针对I/O密集型工作负载:使用运行时默认值或更高的PerExecutionEnvironmentMaxConcurrency(如Node.js设为10),因为每个请求在等待网络时仅占用极少CPU。
- 针对CPU密集型工作负载:将PerExecutionEnvironmentMaxConcurrency设置为每vCPU 1-2,因为每个请求会占满CPU。
- 伸缩配置:MinExecutionEnvironments(默认3)、MaxVCpuCount(可选,默认400——最佳实践是显式设置)、TargetResourceUtilization。
Step 4: Migrate the Code
步骤4:迁移代码
Review code for concurrency safety. LMI runs multiple invocations concurrently per execution environment:
- Python: Process-based isolation — globals are NOT shared. No thread-safety changes needed. Focus on conflicts and memory sizing.
/tmp - Node.js: Worker threads — globals shared within a worker. Requires async safety.
- Java/.NET: OS threads/Tasks — handler shared across threads. Requires full thread safety.
审查代码的并发安全性。LMI在每个执行环境中同时运行多个调用:
- Python:基于进程的隔离——全局变量不共享。无需修改线程安全相关代码。重点关注冲突及内存配置。
/tmp - Node.js:工作线程——全局变量在单个工作线程内共享。需要异步安全保障。
- Java/.NET:操作系统线程/任务——处理器跨线程共享。需要完全线程安全。
Step 5: Set Up Infrastructure
步骤5:搭建基础设施
- Create two IAM roles: execution role (for the function) and operator role (for capacity provider EC2 management)
- Configure VPC with subnets across 3+ AZs
- Create capacity provider with VPC config and scaling limits
- Create or update function with capacity provider attachment
- Publish a version (triggers instance provisioning)
- 创建两个IAM角色:执行角色(用于函数)和操作员角色(用于容量提供商EC2管理)
- 配置跨3个及以上可用区的VPC子网
- 创建带VPC配置及伸缩限制的容量提供商
- 创建或更新函数并关联容量提供商
- 发布版本(触发实例配置)
Step 6: Validate and Cut Over
步骤6:验证与切换
- Deploy to a non-production environment first
- Monitor CloudWatch: CPU utilization, memory, concurrency, throttle rate
- Gradual traffic shift with weighted aliases (10% → 50% → 100%)
- Compare costs after 1-2 weeks of production data
- Decommission standard Lambda once stable
- 先部署到非生产环境
- 通过CloudWatch监控:CPU利用率、内存、并发数、限流率
- 通过加权别名逐步切换流量(10% → 50% → 100%)
- 收集1-2周生产数据后对比成本
- 稳定后停用标准Lambda
Best Practices
最佳实践
Pricing (always mention when discussing costs)
定价(讨论成本时务必提及)
- Three components: EC2 instance hours + 15% management fee + $0.20/1M requests
- Savings Plans: Compute Savings Plans apply to the EC2 portion (up to 60-72% discount)
- The 15% fee is charged on top of EC2 cost for AWS managing provisioning, patching, scaling, lifecycle
- 三个组成部分:EC2实例时长 + 15%管理费用 + 每百万请求0.2美元
- Savings Plans:Compute Savings Plans适用于EC2部分(最高可享60-72%折扣)
- 15%管理费用:在EC2成本基础上收取,用于支付AWS提供的资源配置、补丁更新、伸缩、生命周期管理等服务
Scaling (always mention when discussing scaling or traffic)
伸缩(讨论伸缩或流量时务必提及)
- LMI absorbs a 50% traffic spike immediately and doubles capacity within 5 minutes — if traffic more than doubles faster, requests throttle
- Standard Lambda bursts to 3000 instantly — LMI cannot match this
- Pre-warm with MinExecutionEnvironments before known spikes
- MaxVCpuCount (default 400) — set explicitly as a cost ceiling
- Shape: Reduce MinExecutionEnvironments to lower capacity during off-hours (minimum 3 for AZ resiliency)
- LMI可立即吸收50%的流量峰值,并在5分钟内将容量翻倍——如果流量增速超过两倍,请求会被限流
- 标准Lambda可瞬间扩容至3000——LMI无法达到此速度
- 预热:在已知峰值前设置MinExecutionEnvironments进行预热
- MaxVCpuCount(默认400)——显式设置以控制成本上限
- 动态调整:在非高峰时段降低MinExecutionEnvironments以减少容量(最低需保持3个以保证可用区弹性)
Instance Sizing
实例配置
- 1 vCPU + 1 GB reserved per instance for OS overhead (not available to your function)
- Usable capacity = total - overhead
- 每个实例预留1 vCPU + 1 GB内存用于系统开销(不可供函数使用)
- 可用容量 = 总容量 - 系统开销
Configuration
配置建议
- Start with 4:1 ratio and runtime default concurrency
- Use ARM (Graviton) unless x86 dependencies exist
- Let Lambda choose instance types unless specific hardware needed
- Set MaxVCpuCount to control cost ceiling
- Never set MinExecutionEnvironments below 3 (breaks AZ resiliency)
- 从4:1的内存vCPU比例及运行时默认并发数开始
- 除非存在x86依赖,否则使用ARM(Graviton)实例
- 除非需要特定硬件,否则让Lambda自动选择实例类型
- 设置MaxVCpuCount以控制成本上限
- MinExecutionEnvironments绝不能低于3(会破坏可用区弹性)
Migration
迁移建议
- Start with I/O-heavy functions (benefit most from multi-concurrency)
- Review code for concurrency safety before attaching to capacity provider
- Use weighted aliases for gradual traffic shift
- Include request IDs in all log statements
- Initialize DB pools and SDK clients outside the handler
- 从I/O密集型函数开始(多并发收益最大)
- 关联容量提供商前先审查代码的并发安全性
- 使用加权别名逐步切换流量
- 在所有日志语句中包含请求ID
- 在处理器外部初始化数据库池及SDK客户端
Operations
运维建议
- Set CloudWatch alarms on throttle rate > 1% and CPU > 80%
- Plan for 14-day instance rotation (automatic)
- Never manually terminate LMI EC2 instances (delete the capacity provider instead)
- Always publish a version — unpublished functions cannot run on LMI
- 设置CloudWatch告警:限流率>1%、CPU>80%
- 规划14天的实例自动轮换(自动执行)
- 切勿手动终止LMI EC2实例(应删除容量提供商)
- 务必发布版本——未发布的函数无法在LMI上运行
Limits Quick Reference
限制速查
| Resource | Limit |
|---|---|
| Memory | 2 GB min, 32 GB max |
| Execution environments | 3 minimum (MinExecutionEnvironments, AZ resiliency) |
| Instance lifespan | 14 days (auto-replaced) |
| Concurrency/vCPU | 64 (Node.js), 32 (Java/.NET), 16 (Python) |
| Runtimes | Node.js 22+, Java 21+, .NET 8+, Python 3.13+, Rust (provided.al2023) |
| Instance families | C, M, R (.large and up) |
| Scaling | Burst headroom equals unused capacity from TargetResourceUtilization; new instances launch within minutes |
| 资源 | 限制 |
|---|---|
| 内存 | 最低2GB,最高32GB |
| 执行环境 | 最少3个(MinExecutionEnvironments,保证可用区弹性) |
| 实例生命周期 | 14天(自动替换) |
| 每vCPU并发数 | Node.js为64,Java/.NET为32,Python为16 |
| 支持的运行时 | Node.js 22+、Java 21+、.NET 8+、Python 3.13+、Rust(provided.al2023) |
| 实例家族 | C、M、R(.large及以上) |
| 伸缩能力 | 突发余量等于TargetResourceUtilization的未使用容量;新实例数分钟内即可启动 |
Security Considerations
安全注意事项
- Operator role scoping: Add and
aws:SourceAccountconditions to trust policies to prevent confused deputy attacks.aws:SourceArn - VPC egress: Scope security group egress to VPC endpoint security groups or AWS prefix lists rather than 0.0.0.0/0.
- Credentials: Use AWS Secrets Manager or Parameter Store for database credentials — never environment variables for secrets.
- Encryption: Enable SQS SSE, CloudWatch Logs encryption (KMS), and S3 default encryption for any data at rest.
- Logging: Set CloudWatch Log group retention policies. Avoid logging PII or credentials. Enable CloudTrail data events for Lambda.
- Instance rotation: The 14-day automatic rotation ensures security patches are applied without manual intervention.
- References: Lambda Security Best Practices, IAM Best Practices
- 操作员角色范围:在信任策略中添加和
aws:SourceAccount条件,防止混淆代理攻击。aws:SourceArn - VPC出站规则:将安全组出站权限限定为VPC端点安全组或AWS前缀列表,而非0.0.0.0/0。
- 凭证管理:使用AWS Secrets Manager或Parameter Store存储数据库凭证——绝不要将密钥存放在环境变量中。
- 加密:启用SQS SSE、CloudWatch日志加密(KMS)及S3默认加密以保护静态数据。
- 日志管理:设置CloudWatch日志组保留策略。避免记录PII或凭证。启用Lambda的CloudTrail数据事件。
- 实例轮换:14天自动轮换机制确保安全补丁无需手动干预即可应用。
- 参考文档:Lambda安全最佳实践、IAM最佳实践
Files
文件清单
| File | Content |
|---|---|
| cost-comparison.md | Pricing analysis, break-even calculations, Savings Plans/RI impact |
| configuration-guide.md | Instance selection, memory ratios, scaling tuning, capacity provider config |
| thread-safety.md | Concurrency model per runtime, code review checklist, Powertools compatibility |
| migration-patterns.md | Before/after code by runtime, connection pooling, gradual cutover |
| infrastructure-setup.md | IAM roles, VPC setup, SAM templates, CLI commands |
| troubleshooting.md | Common errors, throttling, debugging, stuck deployments |
| 文件 | 内容 |
|---|---|
| cost-comparison.md | 定价分析、收支平衡计算、Savings Plans/预留实例影响 |
| configuration-guide.md | 实例选择、内存比例、伸缩调优、容量提供商配置 |
| thread-safety.md | 各运行时并发模型、代码审查清单、Powertools兼容性 |
| migration-patterns.md | 各运行时迁移前后代码示例、连接池、逐步切换方案 |
| infrastructure-setup.md | IAM角色、VPC设置、SAM模板、CLI命令 |
| troubleshooting.md | 常见错误、限流问题、调试、部署停滞排查 |