agents-harden
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseharden
生产环境加固
Prepare your AgentCore agent for production — security, reliability, and performance.
为你的AgentCore Agent做好生产环境准备——聚焦安全性、可靠性与性能。
When to use
适用场景
- You're about to take an agent to production
- You want a checklist of what to review before launch
- You want to restrict who can call your agent
- You want to scope down IAM permissions from the defaults
- You're hitting throttling or quota errors (loads )
references/limits.md - You need to tune session lifecycle for your workload
- You're running long-running background work in your agent
- 你即将将Agent投入生产
- 你需要一份上线前的检查清单
- 你希望限制可调用Agent的对象
- 你希望缩小默认IAM权限的范围
- 你遇到了限流或配额错误(加载)
references/limits.md - 你需要针对工作负载调整会话生命周期
- 你的Agent中运行着长时间的后台任务
Input
输入参数
No arguments required. The skill reads your project config and produces a checklist with specific findings for your project.
无需传入参数。该技能会读取你的项目配置,并生成包含项目专属检查结果的清单。
Process
流程
Step 0: Verify CLI version
步骤0:验证CLI版本
Run . This skill requires v0.9.0 or later. If the version is older, tell the developer to run before proceeding.
agentcore --versionagentcore update运行。该技能需要v0.9.0或更高版本。如果版本较旧,请告知开发者先运行再继续。
agentcore --versionagentcore updateStep 1: Read the project
步骤1:读取项目配置
Read to understand:
agentcore/agentcore.json- What resources are configured (memory, gateway, credentials, evaluators)
- What framework is being used
- What network mode is configured (PUBLIC or VPC)
读取以了解:
agentcore/agentcore.json- 已配置的资源(内存、网关、凭证、评估器)
- 使用的框架
- 配置的网络模式(PUBLIC或VPC)
Step 2: Run through the checklist
步骤2:执行检查清单
Work through each category and report findings specific to the project.
逐一检查每个类别,并报告项目的专属检查结果。
IAM: Scope down permissions
IAM:缩小权限范围
The auto-created execution role has broad Bedrock access (). For production, scope it to the specific models your agent uses.
arn:aws:bedrock:*::foundation-model/*Check the current execution role:
bash
agentcore status --json | jq -r '.runtimes[0].executionRoleArn'Recommended production Bedrock policy:
json
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:<REGION>::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0"
]
}Replace the resource ARN with the specific model(s) your agent uses.
ECR access: Scope to your specific repository:
json
{
"Effect": "Allow",
"Action": ["ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer"],
"Resource": "arn:aws:ecr:<REGION>:<YOUR_ACCOUNT_ID>:repository/bedrock-agentcore-<AGENT_NAME>-*"
}Trust policy: Verify the execution role's trust policy is scoped to your account:
json
{
"Principal": {"Service": "bedrock-agentcore.amazonaws.com"},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {"aws:SourceAccount": "<YOUR_ACCOUNT_ID>"},
"ArnLike": {"aws:SourceArn": "arn:aws:bedrock-agentcore:<REGION>:<YOUR_ACCOUNT_ID>:*"}
}
}Runtime resource-based policies (API-only): For fine-grained control over which principals can invoke your runtime — beyond what IAM roles and JWT auth provide — use via boto3. This is not exposed in the CLI or . Use the MCP server if available to look up the current API shape.
PutAgentRuntimeResourcePolicyagentcore.jsonawsknowledge自动创建的执行角色拥有宽泛的Bedrock访问权限()。在生产环境中,需将其权限范围限定为你的Agent实际使用的特定模型。
arn:aws:bedrock:*::foundation-model/*检查当前执行角色:
bash
agentcore status --json | jq -r '.runtimes[0].executionRoleArn'推荐的生产环境Bedrock策略:
json
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:<REGION>::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0"
]
}将资源ARN替换为你的Agent使用的特定模型ARN。
**ECR访问权限:**限定为你的专属仓库:
json
{
"Effect": "Allow",
"Action": ["ecr:BatchGetImage", "ecr:GetDownloadUrlForLayer"],
"Resource": "arn:aws:ecr:<REGION>:<YOUR_ACCOUNT_ID>:repository/bedrock-agentcore-<AGENT_NAME>-*"
}**信任策略:**验证执行角色的信任策略已限定为你的账户:
json
{
"Principal": {"Service": "bedrock-agentcore.amazonaws.com"},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {"aws:SourceAccount": "<YOUR_ACCOUNT_ID>"},
"ArnLike": {"aws:SourceArn": "arn:aws:bedrock-agentcore:<REGION>:<YOUR_ACCOUNT_ID>:*"}
}
}运行时基于资源的策略(仅API可用):如需对可调用运行时的主体进行细粒度控制——超出IAM角色和JWT认证的范围——可通过boto3使用。该功能未在CLI或中暴露。如果可用,请使用 MCP服务器查询当前API结构。
PutAgentRuntimeResourcePolicyagentcore.jsonawsknowledgeShell Access: Scope InvokeAgentRuntimeCommand
separately
InvokeAgentRuntimeCommandShell访问权限:单独限定InvokeAgentRuntimeCommand
权限
InvokeAgentRuntimeCommandIf your project uses (see ), audit its IAM permissions separately from . The two actions have different blast radii: is arbitrary shell execution inside a live microVM with the runtime's full execution role — callers can read/write the filesystem, reach any network resource the agent can reach, and access the execution role's credentials.
InvokeAgentRuntimeCommandagents-build/references/integrate.mdInvokeAgentRuntimeInvokeAgentRuntimeCommandCheck which principals have the permission:
bash
undefined如果你的项目使用(参见),请单独审核其IAM权限,与权限区分开。这两个操作的影响范围不同:允许在运行中的微VM内执行任意shell命令,且拥有运行时的完整执行角色权限——调用者可读写文件系统、访问Agent可触及的所有网络资源,以及获取执行角色的凭证。
InvokeAgentRuntimeCommandagents-build/references/integrate.mdInvokeAgentRuntimeInvokeAgentRuntimeCommand检查哪些主体拥有该权限:
bash
undefinedList customer-managed policies in your account, then inspect each for InvokeAgentRuntimeCommand
列出账户中的客户托管策略,然后逐一检查是否包含InvokeAgentRuntimeCommand
aws iam list-policies --scope Local
--query 'Policies[*].[PolicyName, Arn, DefaultVersionId]'
--output table
--query 'Policies[*].[PolicyName, Arn, DefaultVersionId]'
--output table
aws iam list-policies --scope Local
--query 'Policies[*].[PolicyName, Arn, DefaultVersionId]'
--output table
--query 'Policies[*].[PolicyName, Arn, DefaultVersionId]'
--output table
Then for each policy of interest:
然后针对每个感兴趣的策略:
aws iam get-policy-version
--policy-arn <POLICY_ARN>
--version-id <VERSION_ID>
--query 'PolicyVersion.Document'
--policy-arn <POLICY_ARN>
--version-id <VERSION_ID>
--query 'PolicyVersion.Document'
Alternatively, use the IAM console: **IAM → Policies → Filter by type: Customer managed** → search for `InvokeAgentRuntimeCommand` in the policy JSON editor.
**Separate IAM policy for command callers** — keep this distinct from the policy granting `InvokeAgentRuntime`:
```json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "bedrock-agentcore:InvokeAgentRuntimeCommand",
"Resource": "arn:aws:bedrock-agentcore:<REGION>:<YOUR_ACCOUNT_ID>:runtime/<RUNTIME_NAME>-*"
}]
}Enable CloudTrail alerting. Create an EventBridge rule to notify your security team when is called:
InvokeAgentRuntimeCommandbash
aws events put-rule \
--name AgentCoreCommandExecution \
--event-pattern '{"source":["aws.bedrock-agentcore"],"detail-type":["AWS API Call via CloudTrail"],"detail":{"eventName":["InvokeAgentRuntimeCommand"]}}' \
--state ENABLEDIf commands are constructed from user input anywhere in calling code: validate before passing — reject strings containing , , , backticks, , or other shell metacharacters.
&&;$(...)|aws iam get-policy-version
--policy-arn <POLICY_ARN>
--version-id <VERSION_ID>
--query 'PolicyVersion.Document'
--policy-arn <POLICY_ARN>
--version-id <VERSION_ID>
--query 'PolicyVersion.Document'
或者使用IAM控制台:**IAM → 策略 → 按类型筛选:客户托管** → 在策略JSON编辑器中搜索`InvokeAgentRuntimeCommand`。
**为命令调用者配置单独的IAM策略**——与授予`InvokeAgentRuntime`权限的策略区分开:
```json
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "bedrock-agentcore:InvokeAgentRuntimeCommand",
"Resource": "arn:aws:bedrock-agentcore:<REGION>:<YOUR_ACCOUNT_ID>:runtime/<RUNTIME_NAME>-*"
}]
}启用CloudTrail告警。创建EventBridge规则,当被调用时通知你的安全团队:
InvokeAgentRuntimeCommandbash
aws events put-rule \
--name AgentCoreCommandExecution \
--event-pattern '{"source":["aws.bedrock-agentcore"],"detail-type":["AWS API Call via CloudTrail"],"detail":{"eventName":["InvokeAgentRuntimeCommand"]}}' \
--state ENABLED**如果命令是从调用代码中的用户输入构造的:**在传入前进行验证——拒绝包含、、、反引号、或其他shell元字符的字符串。
&&;$(...)|Inbound auth: Control who can call your agent
入站认证:控制可调用Agent的对象
By default, agents use AWS IAM (SigV4) for inbound auth. For production, verify this is configured correctly.
Check current auth config:
bash
agentcore status --runtime <AgentName> --json | jq '.runtimes[0].authorizerConfig'Options:
AWS_IAMCUSTOM_JWTbash
agentcore add agent \
--name MyAgent \
--authorizer-type CUSTOM_JWT \
--discovery-url https://your-idp.example.com/.well-known/openid-configuration \
--allowed-audience my-api \
--allowed-clients my-client-id[!WARNING] Never usein production. It allows unauthenticated access to your agent — anyone with the endpoint URL can invoke it. Always use AWS_IAM or CUSTOM_JWT. If you see NONE in production, change it immediately.--authorizer-type NONE
默认情况下,Agent使用AWS IAM(SigV4)进行入站认证。在生产环境中,请验证配置是否正确。
检查当前认证配置:
bash
agentcore status --runtime <AgentName> --json | jq '.runtimes[0].authorizerConfig'可选配置:
AWS_IAMCUSTOM_JWTbash
agentcore add agent \
--name MyAgent \
--authorizer-type CUSTOM_JWT \
--discovery-url https://your-idp.example.com/.well-known/openid-configuration \
--allowed-audience my-api \
--allowed-clients my-client-id[!WARNING] 生产环境中绝不要使用。它允许未经认证的访问 你的Agent——任何拥有端点URL的人都可以调用它。请始终使用AWS_IAM或 CUSTOM_JWT。如果在生产环境中看到NONE配置,请立即修改。--authorizer-type NONE
Choosing allowedClients
vs allowedAudience
allowedClientsallowedAudience选择allowedClients
vs allowedAudience
allowedClientsallowedAudienceThis is the most common JWT misconfiguration. The right choice depends on what's inside the token your IdP issues.
Decode a sample token (at your IdP or with ) and look at the payload:
jwt.io- Token has a claim, no
client_idclaim → configureaudon the runtimeallowedClients - Token has an claim → configure
audon the runtimeallowedAudience - Token has both → use . The
allowedAudienceclaim is the standard OIDC audience field; use that as the primary check.aud
If you pick the wrong one, invocations return 403 even with a valid token — the runtime is validating against a claim the token doesn't have.
这是最常见的JWT配置错误。正确选择取决于你的身份提供商颁发的令牌内容。
解码示例令牌(在你的身份提供商处或使用)并查看负载:
jwt.io- 令牌包含声明,无
client_id声明 → 在运行时配置**aud**allowedClients - 令牌包含声明 → 在运行时配置**
aud**allowedAudience - 令牌同时包含两者 → 使用。
allowedAudience声明是标准OIDC受众字段;请将其作为主要检查项。aud
如果选择错误,即使令牌有效,调用也会返回403——运行时正在验证令牌中不存在的声明。
Issuer ↔ discovery URL prefix requirement
发行者 ↔ 发现URL前缀要求
AgentCore enforces the OIDC discovery spec (RFC 8414 §3): the value in the discovery document must be a URL prefix of the discovery endpoint.
issuerThat means if your discovery URL is , the field in that document must start with . If the document advertises an issuer like (no subdomain), validation fails.
https://qa.example.com/.well-known/openid-configurationissuerhttps://qa.example.comhttps://example.comSome enterprise IdPs (PingFederate, Paylocity, some Keycloak setups) host the discovery endpoint on an environment-specific subdomain while advertising a production-level issuer. This pattern is incompatible with the RFC 8414 prefix rule.
Fix options:
- Align the IdP's discovery endpoint with its issuer — serve discovery from the same origin as the issuer.
- Point the runtime at the actual discovery URL domain — configure the runtime's discovery URL with the subdomain that matches the token's issuer.
AgentCore强制执行OIDC发现规范(RFC 8414 §3):发现文档中的值必须是发现端点的URL前缀。
issuer这意味着如果你的发现URL是,则该文档中的字段必须以开头。如果文档中的发行者是(无子域名),验证将失败。
https://qa.example.com/.well-known/openid-configurationissuerhttps://qa.example.comhttps://example.com某些企业身份提供商(PingFederate、Paylocity、部分Keycloak配置)在环境特定的子域上托管发现端点,但宣传的是生产级发行者。这种模式与RFC 8414前缀规则不兼容。
修复选项:
- 对齐身份提供商的发现端点与发行者——从与发行者相同的源提供发现服务。
- 将运行时指向实际的发现URL域名——配置运行时的发现URL为与令牌发行者匹配的子域。
Debugging JWT auth failures
调试JWT认证失败
When invocations fail with 403, narrow down which check is failing.
Authorization method mismatch- The runtime is configured for (or no authorizer) but the caller is sending a Bearer token → reconfigure the runtime for
AWS_IAM, or have the caller use SigV4.CUSTOM_JWT - The runtime is configured for but the caller's request is being SigV4-signed → likely the SDK or environment is injecting SigV4 headers alongside the Bearer token. Check for
CUSTOM_JWT,X-Amz-Date, orX-Amz-Security-Tokenin the outbound request. Remove the SigV4 path and send only the Bearer token.Authorization: AWS4-HMAC-SHA256
Invalid inbound token- Issuer ↔ discovery URL prefix (above) — verify the token's claim matches the discovery URL's origin
iss - vs
allowedClients— is the runtime configured for the right claim for your token format?allowedAudience - JWKS reachability — can AgentCore reach the listed in the discovery document? It must be publicly reachable.
jwks_uri - Token expired — decode the token, check against now
exp - Signing algorithm support — some IdPs sign with algorithms (PS256, ES384, etc.) that aren't universally supported. Check your IdP's supported algorithms and switch to RS256 if compatibility is the issue.
Only after ruling all of those out should you treat it as a service-side issue.
当调用返回403失败时,请逐步排查是哪项检查未通过。
Authorization method mismatch- 运行时配置为(或无授权器),但调用者发送Bearer令牌 → 将运行时重新配置为
AWS_IAM,或让调用者使用SigV4。CUSTOM_JWT - 运行时配置为,但调用者的请求被SigV4签名 → 可能是SDK或环境在Bearer令牌之外注入了SigV4头。检查出站请求中是否包含
CUSTOM_JWT、X-Amz-Date或X-Amz-Security-Token。移除SigV4相关配置,仅发送Bearer令牌。Authorization: AWS4-HMAC-SHA256
Invalid inbound token- 发行者 ↔ 发现URL前缀(如上)——验证令牌的声明与发现URL的源是否匹配
iss - vs
allowedClients——运行时是否针对你的令牌格式配置了正确的声明?allowedAudience - JWKS可达性——AgentCore能否访问发现文档中列出的?它必须是公开可达的。
jwks_uri - 令牌过期——解码令牌,检查与当前时间是否匹配
exp - 签名算法支持——某些身份提供商使用的算法(PS256、ES384等)并非普遍支持。检查你的身份提供商支持的算法,如果存在兼容性问题,请切换到RS256。
只有在排除以上所有情况后,才应将其视为服务端问题。
Error handling: Fail gracefully
错误处理:优雅失败
Check that your agent code handles errors without exposing internal details:
python
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload, context):
try:
# your agent logic
return {"response": result}
except Exception as e:
# Log the full error internally
app.logger.error(f"Agent error: {e}", exc_info=True)
# Return a safe message to the caller
return {"error": "An error occurred. Please try again."}
if __name__ == "__main__":
app.run()Check for: bare blocks that swallow errors silently, error messages that expose stack traces or internal details to callers, missing error handling in tool call code.
except检查你的Agent代码是否能处理错误且不暴露内部细节:
python
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload, context):
try:
# 你的Agent逻辑
return {"response": result}
except Exception as e:
# 在内部记录完整错误
app.logger.error(f"Agent error: {e}", exc_info=True)
# 向调用者返回安全的错误信息
return {"error": "发生错误,请重试。"}
if __name__ == "__main__":
app.run()**检查要点:**空的块会静默吞掉错误、向调用者暴露堆栈跟踪或内部细节的错误信息、工具调用代码中缺少错误处理。
exceptInput validation and rate limiting
输入验证与速率限制
Agent entrypoints receive arbitrary payloads from callers. Validate inputs before processing:
python
@app.entrypoint
def invoke(payload, context):
prompt = payload.get("prompt", "")
# Validate input
if not prompt or not isinstance(prompt, str):
return {"error": "Missing or invalid 'prompt' field"}
if len(prompt) > 10000:
return {"error": "Prompt exceeds maximum length (10,000 characters)"}
# Sanitize — strip control characters, excessive whitespace
prompt = " ".join(prompt.split())
# Proceed with validated input
result = agent(prompt)
return {"response": str(result)}What to validate:
- Required fields are present and have the expected type
- String inputs don't exceed reasonable length limits (prevents token-bombing the model)
- Numeric inputs are within expected ranges
- User-provided IDs (actor_id, session_id) match expected formats
Rate limiting: AgentCore Runtime has built-in invocation rate limits (default 25 TPS per agent — see ). For application-level rate limiting (per-user, per-tenant), implement it in your calling application or API Gateway layer, not in the agent code itself. The agent should assume it's already been rate-limited by the time a request reaches it.
references/limits.mdAgent入口点接收来自调用者的任意负载。请在处理前验证输入:
python
@app.entrypoint
def invoke(payload, context):
prompt = payload.get("prompt", "")
# 验证输入
if not prompt or not isinstance(prompt, str):
return {"error": "缺少或无效的'prompt'字段"}
if len(prompt) > 10000:
return {"error": "提示超出最大长度(10,000字符)"}
# 清理——移除控制字符、过多空白
prompt = " ".join(prompt.split())
# 使用验证后的输入继续处理
result = agent(prompt)
return {"response": str(result)}验证内容:
- 必填字段是否存在且类型符合预期
- 字符串输入未超过合理长度限制(防止令牌轰炸模型)
- 数值输入在预期范围内
- 用户提供的ID(actor_id、session_id)符合预期格式
**速率限制:**AgentCore Runtime内置调用速率限制(默认每个Agent 25 TPS——参见)。对于应用级速率限制(按用户、按租户),请在调用应用或API网关层实现,不要在Agent代码中实现。Agent应假设请求到达时已完成速率限制。
references/limits.mdSecrets: No credentials in code, no secrets in runtime env vars
密钥管理:代码中无凭证,运行时环境变量中无密钥
Two failure modes to check for:
需检查两种失败模式:
1. Hardcoded secrets in agent code
1. Agent代码中硬编码密钥
bash
undefinedbash
undefinedSearch for common secret patterns in agent code
在Agent代码中搜索常见密钥模式
grep -r "sk-|api_key\s*=\s*['"]" app/ --include=".py"
grep -r "password\s=\s*['"]" app/ --include="*.py"
undefinedgrep -r "sk-|api_key\s*=\s*['"]" app/ --include=".py"
grep -r "password\s=\s*['"]" app/ --include="*.py"
undefined2. Secrets pulled from runtime environment variables
2. 从运行时环境变量中获取密钥
AgentCore Runtime environment variables are not vault-backed. Anything a developer stuffs into the runtime's env (via CDK, boto3 , or similar) is a plaintext config value, not a secret. Audit for the pattern:
UpdateAgentRuntimebash
undefinedAgentCore Runtime环境变量不受Vault保护。开发者通过CDK、boto3 或类似方式放入运行时环境的任何内容都是明文配置值,而非密钥。检查以下模式:
UpdateAgentRuntimebash
undefinedFlag any os.getenv / os.environ call whose name implies a secret
标记任何名称暗示为密钥的os.getenv / os.environ调用
grep -rE "os.(getenv|environ).(TOKEN|SECRET|KEY|PASSWORD|CREDENTIAL)" app/ --include=".py"
Non-secret identifiers injected by the platform are fine and should not match an allowlist (e.g., `MEMORY_*_ID`, `AGENTCORE_GATEWAY_*_URL`, `AWS_REGION`, downstream agent ARNs). Review hits and confirm none are secrets.
**Correct pattern:** Register each outbound credential with `agentcore add credential`, then fetch it in code via the integrated credential providers:
```python
from bedrock_agentcore.identity.auth import requires_api_key, requires_access_token
@requires_api_key(provider_name="MyAPI")
def call_api(payload: dict, *, api_key: str) -> dict:
...
@requires_access_token(provider_name="MyOAuthProvider", scopes=["read"], auth_flow="M2M")
async def call_downstream(data: dict, *, access_token: str) -> dict:
...The decorator fetches from Secrets Manager at call time and handles caching/refresh. Credentials registered this way are encrypted at rest and rotated without a redeploy.
Local dev: (gitignored) is read by so the decorator resolves locally. This file is not uploaded to runtime on deploy — production credentials live in the credential provider.
agentcore/.env.localagentcore devgrep -rE "os.(getenv|environ).(TOKEN|SECRET|KEY|PASSWORD|CREDENTIAL)" app/ --include=".py"
平台注入的非密钥标识符是允许的,且不应匹配允许列表(例如`MEMORY_*_ID`、`AGENTCORE_GATEWAY_*_URL`、`AWS_REGION`、下游Agent ARNs)。检查命中结果并确认无密钥。
**正确模式:**使用`agentcore add credential`注册每个出站凭证,然后通过集成的凭证提供者在代码中获取:
```python
from bedrock_agentcore.identity.auth import requires_api_key, requires_access_token
@requires_api_key(provider_name="MyAPI")
def call_api(payload: dict, *, api_key: str) -> dict:
...
@requires_access_token(provider_name="MyOAuthProvider", scopes=["read"], auth_flow="M2M")
async def call_downstream(data: dict, *, access_token: str) -> dict:
...装饰器会在调用时从Secrets Manager获取凭证,并处理缓存/刷新。以此方式注册的凭证在静态存储时已加密,且无需重新部署即可轮换。
本地开发:(已加入.gitignore)会被读取,以便装饰器在本地解析。该文件不会在部署时上传到运行时——生产凭证存储在凭证提供者中。
agentcore/.env.localagentcore devTool surface: Prefer Gateway targets over direct HTTP in agent code
工具接入:优先选择Gateway目标而非Agent代码中的直接HTTP调用
A related audit — for every external service the agent calls, ask whether it should be a Gateway target instead of a direct HTTP call buried in agent code. Gateway's credential providers inject auth at the edge (so the agent process never sees the secret), the tool catalog is policy-enforceable, and a leaked traceback/log line from agent code can't exfiltrate credentials that never reached it.
bash
undefined相关审计——对于Agent调用的每个外部服务,需判断是否应将其作为Gateway目标,而非隐藏在Agent代码中的直接HTTP调用。Gateway的凭证提供者在边缘注入认证(因此Agent进程永远不会看到密钥),工具目录可通过策略强制执行,且Agent代码中泄露的回溯/日志行无法泄露从未到达它的凭证。
bash
undefinedFind direct outbound HTTP calls in agent code
在Agent代码中查找直接出站HTTP调用
grep -rEn 'httpx.|requests.|aiohttp.' app/ --include="*.py"
For each hit, decide:
| Hit looks like | Action |
|---|---|
| Calls an external REST API the agent treats as a tool | Front as a Gateway target (`agentcore add gateway-target --type open-api-schema` or `api-gateway`). Load [`agents-connect/SKILL.md`](../agents-connect/SKILL.md) Path C. |
| Calls an MCP server directly | Front as a Gateway target (`--type mcp-server`). Load [`agents-connect/SKILL.md`](../agents-connect/SKILL.md) Path A. |
| Calls an AWS service (S3, DynamoDB, etc.) — not appropriate to match this row, should be `boto3` | Migrate from `requests`/`httpx` to the `boto3` client, using the runtime's execution role for IAM. No credential needed. |
| Calls a streaming service (SSE-with-live-output, WebSocket, WebRTC) | OK to keep direct — Gateway doesn't front these yet. Confirm any auth uses `@requires_*`, not `os.getenv`. |
| Calls another agent via A2A | OK to keep direct — A2A is HTTP-by-design. Confirm it uses `@requires_access_token` for the bearer token. |
| Calls a measured latency hot path and the team chose it | OK, but confirm measurement exists and auth uses `@requires_*`. |
If the hit fits none of the "OK to keep direct" rows, open a ticket to convert it to a Gateway target. Gateway targets can be added without a code change in the agent for most framework integrations (MCP tool discovery handles binding).
---grep -rEn 'httpx.|requests.|aiohttp.' app/ --include="*.py"
针对每个命中结果,决定:
| 命中内容特征 | 操作 |
|---|---|
| 调用Agent视为工具的外部REST API | 作为Gateway目标接入(`agentcore add gateway-target --type open-api-schema`或`api-gateway`)。加载[`agents-connect/SKILL.md`](../agents-connect/SKILL.md)路径C。 |
| 直接调用MCP服务器 | 作为Gateway目标接入(`--type mcp-server`)。加载[`agents-connect/SKILL.md`](../agents-connect/SKILL.md)路径A。 |
| 调用AWS服务(S3、DynamoDB等)——不适用本行,应使用`boto3` | 从`requests`/`httpx`迁移到`boto3`客户端,使用运行时的执行角色进行IAM认证。无需凭证。 |
| 调用流服务(带实时输出的SSE、WebSocket、WebRTC) | 可保留直接调用——Gateway目前不支持这些服务。确认任何认证使用`@requires_*`,而非`os.getenv`。 |
| 通过A2A调用另一个Agent | 可保留直接调用——A2A本质是HTTP。确认使用`@requires_access_token`获取Bearer令牌。 |
| 调用已测量延迟的热路径,且团队已选择该方式 | 可保留,但需确认测量存在且认证使用`@requires_*`。 |
如果命中结果不符合任何“可保留直接调用”的情况,请创建工单将其转换为Gateway目标。对于大多数框架集成,无需修改Agent代码即可添加Gateway目标(MCP工具发现会处理绑定)。
---Observability: Verify tracing is enabled
可观测性:验证追踪已启用
AgentCore enables X-Ray tracing and CloudWatch logging automatically. Verify:
bash
agentcore status --runtime <AgentName> --json | jq '.runtimes[0].observabilityConfig'CloudWatch dashboard: AWS Console → CloudWatch → GenAI Observability → Bedrock AgentCore
Log retention: By default, logs are retained indefinitely. Set a retention policy for cost control:
bash
aws logs put-retention-policy \
--log-group-name /aws/bedrock-agentcore/runtimes/<AGENT_ID>-DEFAULT \
--retention-in-days 30AgentCore会自动启用X-Ray追踪和CloudWatch日志。验证:
bash
agentcore status --runtime <AgentName> --json | jq '.runtimes[0].observabilityConfig'**CloudWatch仪表板:**AWS控制台 → CloudWatch → GenAI可观测性 → Bedrock AgentCore
**日志保留:**默认情况下,日志会无限期保留。设置保留策略以控制成本:
bash
aws logs put-retention-policy \
--log-group-name /aws/bedrock-agentcore/runtimes/<AGENT_ID>-DEFAULT \
--retention-in-days 30Evaluation baseline: Know your quality before launch
评估基线:上线前了解Agent质量
Before going to production, establish a quality baseline so you can detect regressions:
bash
undefined在投入生产前,建立质量基线以便检测回归:
bash
undefinedRun a baseline eval
运行基线评估
agentcore run eval
--evaluator "Builtin.Helpfulness"
--evaluator "Builtin.GoalSuccessRate"
--evaluator "Builtin.Helpfulness"
--evaluator "Builtin.GoalSuccessRate"
agentcore run eval
--evaluator "Builtin.Helpfulness"
--evaluator "Builtin.GoalSuccessRate"
--evaluator "Builtin.Helpfulness"
--evaluator "Builtin.GoalSuccessRate"
Set up continuous monitoring
设置持续监控
agentcore add online-eval
--name production_monitor
--runtime <AgentName>
--evaluator "Builtin.Helpfulness"
--sampling-rate 5 agentcore deploy -y
--name production_monitor
--runtime <AgentName>
--evaluator "Builtin.Helpfulness"
--sampling-rate 5 agentcore deploy -y
Record the baseline scores. If scores drop significantly after a change, investigate before continuing.
---agentcore add online-eval
--name production_monitor
--runtime <AgentName>
--evaluator "Builtin.Helpfulness"
--sampling-rate 5 agentcore deploy -y
--name production_monitor
--runtime <AgentName>
--evaluator "Builtin.Helpfulness"
--sampling-rate 5 agentcore deploy -y
记录基线分数。如果更改后分数显著下降,请在继续前进行调查。
---Network: VPC for private resources
网络:使用VPC访问私有资源
If your agent accesses private AWS resources (RDS, internal APIs), configure VPC:
bash
agentcore add agent \
--name MyAgent \
--network-mode VPC \
--subnets subnet-abc,subnet-def \
--security-groups sg-123See (loads ) for full VPC configuration guidance.
agents-buildreferences/vpc.md如果你的Agent访问私有AWS资源(RDS、内部API),请配置VPC:
bash
agentcore add agent \
--name MyAgent \
--network-mode VPC \
--subnets subnet-abc,subnet-def \
--security-groups sg-123完整VPC配置指南请参见(加载)。
agents-buildreferences/vpc.mdInitialization time: Optimize cold start performance
初始化时间:优化冷启动性能
Slow agent initialization causes timeouts, 424 errors, and poor user experience — especially on first invocation after a period of inactivity. Everything the agent does before it's ready to handle a request adds to the time users wait.
缓慢的Agent初始化会导致超时、424错误和糟糕的用户体验——尤其是在一段时间不活动后的首次调用。Agent在准备好处理请求前的所有操作都会增加用户等待时间。
Where cold start time actually goes
冷启动时间的实际构成
A typical cold start for a new environment takes around 20–30 seconds. The breakdown, roughly:
- Container image pull — dominates for Container builds. A 100 MB image takes a few seconds; a 500 MB image can take 15+ seconds.
- Application startup — your code's import time, framework init, module-level setup. Usually 5–10 seconds, can be much more if you're loading models or opening connections at import.
- Platform overhead (microVM boot, network attach, container start) — sub-second to a couple of seconds.
The two you control are image size and application startup. Optimizing either one directly reduces time to first response.
新环境的典型冷启动时间约为20–30秒。大致分解:
- 容器镜像拉取——对于Container构建,这是主要耗时。100 MB的镜像需要几秒;500 MB的镜像可能需要15+秒。
- 应用启动——你的代码导入时间、框架初始化、模块级设置。通常为5–10秒,如果在导入时加载模型或建立连接,耗时可能更长。
- 平台开销(微VM启动、网络连接、容器启动)——亚秒到几秒。
你能控制的是镜像大小和应用启动时间。优化其中任何一项都会直接缩短首次响应时间。
Session reuse is the highest-leverage optimization
会话复用是最高效的优化手段
Same-session requests route to an existing initialized environment — no cold start. The first request per session pays the cold-start cost; every subsequent request on that session is fast.
Concrete patterns:
- Multi-turn conversations: reuse the same across turns. Don't generate a new UUID per turn.
session_id - Batch processing: reuse the same across items in the batch.
session_id - User-facing apps: scope a session to a user interaction (e.g., one session per chat conversation), not one session per message.
Cross-SDK note: if you're using MCP, pass one session identifier, not both and at once. Sending both can cause the platform to bind two separate environments to the same logical session, doubling cold-start cost.
runtimeSessionIdmcpSessionId同一会话的请求会路由到已初始化的现有环境——无冷启动。会话中的第一个请求承担冷启动成本;该会话中的后续请求都很快。
具体模式:
- **多轮对话:**在多轮对话中复用相同的。不要为每轮生成新的UUID。
session_id - **批量处理:**在批量处理的所有项中复用相同的。
session_id - **面向用户的应用:**将会话范围限定为用户交互(例如,每个聊天对话一个会话),而非每条消息一个会话。
跨SDK注意事项:如果使用MCP,请传递一个会话标识符,不要同时传递和。同时传递两者可能导致平台将两个独立环境绑定到同一逻辑会话,使冷启动成本翻倍。
runtimeSessionIdmcpSessionIdPackage size budget
包大小预算
Every MB of deployment package adds to cold-start time.
- Target: under 200 MB. Aim for under 100 MB if you can.
- For Container builds: multi-stage Dockerfiles, slim or distroless base images, remove build tools and test files, add a .
.dockerignore - For CodeZip builds: prune dev dependencies from /
pyproject.toml. Don't shiprequirements.txt,tests/,docs/, local caches..git/ - Audit regularly: (Python) or
pip list(Node) will show you what's actually installed. Remove anything you're not using.npm ls
部署包的每MB都会增加冷启动时间。
- **目标:**200 MB以下。尽可能控制在100 MB以下。
- **对于Container构建:**使用多阶段Dockerfile、精简或无发行版基础镜像、移除构建工具和测试文件、添加。
.dockerignore - **对于CodeZip构建:**从/
pyproject.toml中移除开发依赖。不要打包requirements.txt、tests/、docs/、本地缓存。.git/ - 定期审计:(Python)或
pip list(Node)会显示实际安装的内容。移除任何未使用的依赖。npm ls
Defer heavy initialization
延迟重型初始化
Don't load large models, connect to databases, or initialize MCP clients at module import time. Every second spent in module import is a second the agent can't respond to requests.
python
undefined不要在模块导入时加载大型模型、连接数据库或初始化MCP客户端。模块导入花费的每一秒都是Agent无法响应请求的时间。
python
undefined❌ Slow — runs at import time, before the agent can handle requests
❌ 缓慢——在导入时运行,Agent无法处理请求
import heavy_library
client = heavy_library.Client(config)
import heavy_library
client = heavy_library.Client(config)
✅ Fast — defers until first request
✅ 快速——延迟到首次请求时执行
_client = None
def get_client():
global _client
if _client is None:
import heavy_library
_client = heavy_library.Client(config)
return _client
undefined_client = None
def get_client():
global _client
if _client is None:
import heavy_library
_client = heavy_library.Client(config)
return _client
undefinedChoose deployment type based on traffic pattern, not by default
根据流量模式选择部署类型,而非默认选项
The skill previously recommended CodeZip over Container when possible. That's an oversimplification. Here's the real trade-off:
- CodeZip: simpler to iterate on, smaller surface area. Cold start includes code download + extract — a ~95 MB package adds around 1.3 seconds of platform download before application startup even begins.
- Container: you control the full image, needed for custom system dependencies. Larger images cost more per cold start, but you can optimize aggressively with multi-stage builds.
Neither wins universally. Both benefit the same way from session reuse and from keeping the package small. If your traffic pattern has lots of bursty cold sessions, invest in shrinking whichever deployment artifact you're using. If your traffic pattern reuses sessions, the deployment type matters much less.
该技能之前建议尽可能使用CodeZip而非Container。这是一种简化说法。实际权衡:
- **CodeZip:**迭代更简单,攻击面更小。冷启动包括代码下载+解压——约95 MB的包会在应用启动前增加约1.3秒的平台下载时间。
- **Container:**你控制完整镜像,适用于自定义系统依赖。更大的镜像会增加冷启动成本,但你可以通过多阶段构建进行大幅优化。
没有绝对的赢家。两者都能从会话复用和缩小包大小中获益。如果你的流量模式包含大量突发冷会话,请投入资源缩小你使用的部署产物。如果你的流量模式复用会话,部署类型的影响则小得多。
For Lambda targets behind Gateway
对于Gateway后的Lambda目标
Use provisioned concurrency on the Lambda function to eliminate Lambda cold starts. This is separate from Runtime initialization — it's the Lambda itself that adds latency on first invocation of a cold Lambda.
在Lambda函数上使用预置并发来消除Lambda冷启动。这与Runtime初始化分开——Lambda本身会在冷Lambda首次调用时增加延迟。
Session lifecycle management
会话生命周期管理
Session management is tightly linked to cost, performance, and the quota. Getting this right is often the difference between a smooth production launch and a quota-blocked one.
maxVms会话管理与成本、性能和配额紧密相关。正确配置会话管理往往是生产上线顺利与否的关键。
maxVmsThe default lifecycle
默认生命周期
When a request arrives with a new session ID, the runtime initializes a fresh environment for it. That environment stays alive until one of:
- The session is explicitly stopped via .
StopRuntimeSession - The idle timeout expires. The runtime reclaims environments that haven't received a request for (default 900 seconds).
idleRuntimeSessionTimeout - The maximum lifetime is reached (, default 8 hours).
maxLifetime
Idle environments count against your quota until they're reclaimed, even though they're not serving traffic. This is the #1 cause of unexpected errors.
maxVmsmaxVms当带有新会话ID的请求到达时,运行时会为其初始化一个全新的环境。该环境会在以下情况之一发生时被回收:
- 会话被显式停止——通过。
StopRuntimeSession - 空闲超时到期——运行时会回收(默认900秒)内未收到请求的环境。
idleRuntimeSessionTimeout - 达到最大生命周期(,默认8小时)。
maxLifetime
空闲环境在被回收前会占用你的配额,即使它们没有处理流量。这是错误的最常见原因。
maxVmsmaxVmsPick timeouts by workload shape
根据工作负载形状选择超时时间
Don't leave defaults for production. Pick values that match how your workload actually uses sessions:
| Workload | | | Reasoning |
|---|---|---|---|
| Interactive chat / support agent | 600–900s (default) | 3600–7200s | Users pause to read/think. Reclaim fast after they leave. |
| Request/reply API with no follow-up | 60–120s | 1800s | Each call is self-contained — release the VM quickly. |
| Batch processing, one session per job | 120s | match job length + buffer | Idle gap between items in the batch is small; reclaim aggressively between jobs. |
Background / long-running tasks (use | 120–300s | up to 28800s (8h) | Async task API keeps the VM alive during tracked work; idle timeout applies between tasks. |
Trade-offs at a glance:
- Low idle timeout = more headroom under , lower cost. Risk: reclaim mid-conversation causing next turn to cold-start.
maxVms - High idle timeout = warm turns, lower latency. Risk: idle VMs consume quota; errors on bursts.
maxVms - Low max lifetime = predictable recycle, bounds memory leaks / stale state. Risk: active long sessions get killed mid-flow.
- High max lifetime = sticky sessions, big warm-state savings. Risk: drift, stale in-memory state, harder rollouts.
生产环境不要保留默认值。选择与你的工作负载实际使用会话方式匹配的值:
| 工作负载 | | | 理由 |
|---|---|---|---|
| 交互式聊天/支持Agent | 600–900秒(默认) | 3600–7200秒 | 用户会暂停阅读/思考。用户离开后快速回收。 |
| 无后续请求的请求/响应API | 60–120秒 | 1800秒 | 每个调用都是独立的——快速释放VM。 |
| 批量处理,每个作业一个会话 | 120秒 | 匹配作业时长+缓冲 | 批量项之间的空闲间隙很小;作业之间积极回收。 |
后台/长时间运行任务(使用 | 120–300秒 | 最多28800秒(8小时) | 异步任务API在跟踪的工作运行期间保持VM活动;空闲超时适用于任务之间。 |
权衡一览:
- 低空闲超时 = 配额有更多余量,成本更低。**风险:**对话中途回收环境导致下一轮冷启动。
maxVms - 高空闲超时 = 会话保持温暖,延迟更低。**风险:**空闲VM占用配额;突发流量时出现错误。
maxVms - 低最大生命周期 = 可预测的回收,限制内存泄漏/ stale状态。**风险:**活跃的长会话在流程中途被终止。
- 高最大生命周期 = 粘性会话,节省大量温暖状态成本。**风险:**状态漂移、内存中 stale状态、更难部署。
Best practices
最佳实践
Call when the work is done. If your agent finishes a task and doesn't expect more requests on that session, explicitly stop it. This releases the environment immediately instead of waiting for idle timeout.
StopRuntimeSessionpython
undefined工作完成后调用。如果你的Agent完成任务且不期望该会话有更多请求,请显式停止它。这会立即释放环境,而非等待空闲超时。
StopRuntimeSessionpython
undefinedAfter your invocation logic completes and you know the session is done:
你的调用逻辑完成且确定会话结束后:
client.stop_runtime_session(
agentRuntimeArn=runtime_arn,
runtimeSessionId=session_id,
)
**Reuse session IDs for related work.** A new session ID for every HTTP request means a new environment for every HTTP request. For multi-turn conversations, batch jobs, or user-facing interactions, use one session ID per conversation/batch/user-interaction and route all related requests to it.
**Tune `idleRuntimeSessionTimeout` to your workload.** The default 900 seconds is appropriate for interactive workloads where you expect quick follow-up requests. For request-reply workloads where sessions are short-lived, lower it.
Edit the runtime's entry in `agentcore/agentcore.json`:
```json
{
"runtimes": [
{
"name": "MyAgent",
"lifecycleConfiguration": {
"idleRuntimeSessionTimeout": 120,
"maxLifetime": 3600
}
}
]
}Then to apply. The CLI and CDK handle the underlying call for you.
agentcore deployUpdateAgentRuntimeIf you prefer the CLI, writes the same fields into . The file is the source of truth — every field in it has IDE autocomplete via the URL at the top of the file ().
agentcore add agent ... --idle-timeout 120 --max-lifetime 3600agentcore.json$schemahttps://schema.agentcore.aws.dev/v1/agentcore.jsonLower timeout = faster VM reclamation = more headroom under . Too low = environments get reclaimed mid-conversation, causing the next turn to cold-start.
maxVmsDon't pass both and together. For MCP agents, use one. Passing both can bind two separate VMs to the same logical session.
runtimeSessionIdmcpSessionIdclient.stop_runtime_session(
agentRuntimeArn=runtime_arn,
runtimeSessionId=session_id,
)
**为相关工作复用会话ID**。每个HTTP请求使用新会话ID意味着每个HTTP请求都需要新环境。对于多轮对话、批量作业或面向用户的交互,请为每个对话/批量/用户交互使用一个会话ID,并将所有相关请求路由到该会话。
**根据工作负载调整`idleRuntimeSessionTimeout`**。默认900秒适用于期望快速后续请求的交互式工作负载。对于会话短暂的请求-响应工作负载,请降低该值。
编辑`agentcore/agentcore.json`中的运行时条目:
```json
{
"runtimes": [
{
"name": "MyAgent",
"lifecycleConfiguration": {
"idleRuntimeSessionTimeout": 120,
"maxLifetime": 3600
}
}
]
}然后运行以应用更改。CLI和CDK会为你处理底层的调用。
agentcore deployUpdateAgentRuntime如果你偏好CLI,会将相同字段写入。该文件是事实来源——其中的每个字段都可通过文件顶部的 URL()获得IDE自动补全。
agentcore add agent ... --idle-timeout 120 --max-lifetime 3600agentcore.json$schemahttps://schema.agentcore.aws.dev/v1/agentcore.json更低的超时 = 更快的VM回收 = 配额有更多余量。超时过低 = 对话中途回收环境,导致下一轮冷启动。
maxVms不要同时传递和。对于MCP Agent,请使用其中一个。同时传递两者可能将两个独立VM绑定到同一逻辑会话。
runtimeSessionIdmcpSessionIdDiagnosing maxVms
problems
maxVms诊断maxVms
问题
maxVmsIf you hit , don't request a quota increase first. CloudWatch's concurrent-sessions metric is not the same as live VM count — idle environments count against the quota until reclaimed.
ServiceQuotaExceededException: maxVms limit exceededWork through this order:
- Add after each logical request completes
StopRuntimeSession - Audit session-ID generation — are you creating a new ID per request that should reuse one?
- Lower if your sessions are short-lived
idleRuntimeSessionTimeout - Only then, if you've done all of the above and still hit the limit, request an increase
See for the increase-request workflow (via the Service Quotas console) and the justification template.
references/limits.md如果你遇到,不要首先请求配额增加。CloudWatch的并发会话指标与实际VM数量不同——空闲环境在被回收前会占用配额。
ServiceQuotaExceededException: maxVms limit exceeded按以下顺序处理:
- 在每个逻辑请求完成后添加调用
StopRuntimeSession - 审计会话ID生成——你是否为应该复用的请求创建了新ID?
- 如果你的会话短暂,请降低
idleRuntimeSessionTimeout - 只有在完成以上所有操作后仍遇到限制,才请求配额增加
配额增加请求流程(通过Service Quotas控制台)和理由模板请参见。
references/limits.mdLong-running background tasks
长时间运行的后台任务
If your agent fires off work that outlives the response — background processing, async jobs, long tool chains — a fire-and-forget pattern isn't enough. The environment can be reclaimed at even while your background task is still running, because the runtime considers the session idle once the invocation response is sent.
/invocationsidleRuntimeSessionTimeout如果你的Agent启动的工作超出响应的生命周期——后台处理、异步作业、长工具链——仅使用“即发即弃”模式是不够的。环境可能在时被回收,即使你的后台任务仍在运行,因为运行时会在发送调用响应后认为会话空闲。
/invocationsidleRuntimeSessionTimeoutUse the SDK's async task API to signal "still busy"
使用SDK的异步任务API标记“仍在忙碌”
The bedrock-agentcore SDK provides task registration that keeps the environment alive while tracked work runs. In Python:
python
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload, context):
# Register the task BEFORE starting it
task_id = app.add_async_task("background_work")
# Kick off the work (in a thread, asyncio, etc.)
start_background_work(task_id, payload)
# Return the invocation response — the task is still tracked
return {"status": "processing", "taskId": task_id}
def start_background_work(task_id, payload):
try:
# Long-running work here
do_the_work(payload)
finally:
# Mark the task complete when done — this releases the "busy" signal
app.complete_async_task(task_id)
if __name__ == "__main__":
app.run()While at least one registered task is active, the runtime sees the environment as busy and doesn't reclaim it at . (default 8 hours) still applies as a hard ceiling.
idleRuntimeSessionTimeoutmaxLifetimeCheck the bedrock-agentcore SDK docs for your language for the equivalent API — the TypeScript SDK has an analogous pattern.
bedrock-agentcore SDK提供任务注册功能,可在跟踪的工作运行期间保持环境活动。Python示例:
python
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload, context):
# 在启动任务前注册
task_id = app.add_async_task("background_work")
# 启动工作(线程、asyncio等)
start_background_work(task_id, payload)
# 返回调用响应——任务仍在被跟踪
return {"status": "processing", "taskId": task_id}
def start_background_work(task_id, payload):
try:
# 长时间运行的工作
do_the_work(payload)
finally:
# 完成后标记任务——释放“忙碌”信号
app.complete_async_task(task_id)
if __name__ == "__main__":
app.run()只要至少有一个注册任务处于活动状态,运行时就会认为环境忙碌,不会在时回收它。(默认8小时)仍作为硬上限适用。
idleRuntimeSessionTimeoutmaxLifetime请查看对应语言的bedrock-agentcore SDK文档获取等效API——TypeScript SDK有类似模式。
Alternatives when async task API isn't an option
无法使用异步任务API时的替代方案
- Increase to match your expected task duration. If you know tasks run up to 10 minutes, set the timeout to 12 minutes. Keep it well under
idleRuntimeSessionTimeout.maxLifetime - Keep the HTTP connection open with a streaming response and emit periodic heartbeat events. Useful when you want the caller to wait for the result rather than polling. See the SSE keepalive pattern in ("Connection drops mid-stream" section).
agents-debug/SKILL.md - Split long work across multiple invocations on the same session. Each invocation resets the idle clock.
- 将增加到与预期任务时长匹配。如果知道任务最多运行10分钟,请将超时设置为12分钟。请保持远低于
idleRuntimeSessionTimeout。maxLifetime - 保持HTTP连接打开,使用流式响应并定期发送心跳事件。适用于希望调用者等待结果而非轮询的场景。请参见中的SSE保活模式(“Connection drops mid-stream”部分)。
agents-debug/SKILL.md - 将长时间工作拆分为同一会话中的多个调用。每个调用都会重置空闲时钟。
Quotas and limits
配额与限制
If you're hitting throttling, , or any other quota-related error — or you're about to launch and want to make sure quotas won't block you — load .
ServiceQuotaExceededExceptionreferences/limits.mdThat reference covers:
- Which quota each error maps to
- Mitigations to try before requesting an increase (critical — most "quota" errors are actually session-lifecycle issues)
- How to request an increase through the Service Quotas console (the edge case where a direct Support case is needed is rare)
- A copy-paste justification template with everything a reviewer needs to approve
如果你遇到限流、或任何其他配额相关错误——或者即将上线并希望确保配额不会阻碍你——请加载。
ServiceQuotaExceededExceptionreferences/limits.md该参考文档涵盖:
- 每个错误对应的配额
- 请求增加配额前可尝试的缓解措施(关键——大多数“配额”错误实际上是会话生命周期问题)
- 如何通过Service Quotas控制台请求增加配额(极少数情况下需要直接提交支持工单)
- 可直接复制粘贴的理由模板,包含审核者需要的所有信息
Production checklist summary
生产环境检查清单摘要
Generate a checklist specific to the project:
Production Readiness Checklist for <AgentName>
IAM
[ ] Execution role Bedrock access scoped to specific model ARNs
[ ] ECR access scoped to specific repository
[ ] Trust policy scoped to your account ID
Authentication
[ ] Inbound auth is AWS_IAM or CUSTOM_JWT (not NONE)
[ ] If CUSTOM_JWT: discovery URL, audience, and client IDs configured
Shell Access (if using InvokeAgentRuntimeCommand)
[ ] InvokeAgentRuntimeCommand permission granted only to identities that need it
[ ] Separate IAM policy from InvokeAgentRuntime policy
[ ] CloudTrail / EventBridge alert configured for InvokeAgentRuntimeCommand calls
[ ] If commands constructed from user input: shell injection validation implemented
Code quality
[ ] Error handling wraps all agent logic
[ ] Input validation on payload fields (type, length, format)
[ ] No secrets hardcoded in agent code
[ ] Credentials registered via agentcore add credential
Observability
[ ] X-Ray tracing enabled (auto-configured)
[ ] CloudWatch log retention policy set
[ ] Eval baseline established
Performance
[ ] Agent initialization time measured and optimized
[ ] Deployment package size under 200 MB (target under 100 MB)
[ ] Dependencies audited — no unused packages
[ ] Heavy initialization deferred to request time
[ ] Session reuse strategy chosen for multi-turn / batch workloads
[ ] `StopRuntimeSession` called after work completes where applicable
[ ] `idleRuntimeSessionTimeout` tuned to workload (default 900s)
[ ] For long-running background tasks: `add_async_task` / `complete_async_task` used
Resources
[ ] Memory strategies appropriate for use case (if using memory)
[ ] Gateway auth configured (if using gateway)
[ ] Policy engine attached (if restricting tool access)
Testing
[ ] Agent tested with production-representative inputs
[ ] Error cases tested (tool failures, model errors)
[ ] Memory cross-session tested (if using LTM)生成项目专属的检查清单:
<AgentName>生产就绪检查清单
IAM
[ ] 执行角色的Bedrock访问权限已限定为特定模型ARN
[ ] ECR访问权限已限定为特定仓库
[ ] 信任策略已限定为你的账户ID
认证
[ ] 入站认证为AWS_IAM或CUSTOM_JWT(非NONE)
[ ] 如果使用CUSTOM_JWT:已配置发现URL、受众和客户端ID
Shell访问权限(如果使用InvokeAgentRuntimeCommand)
[ ] InvokeAgentRuntimeCommand权限仅授予需要的身份
[ ] 与InvokeAgentRuntime权限使用单独的IAM策略
[ ] 已为InvokeAgentRuntimeCommand调用配置CloudTrail / EventBridge告警
[ ] 如果命令由用户输入构造:已实现shell注入验证
代码质量
[ ] 错误处理包裹所有Agent逻辑
[ ] 对负载字段进行输入验证(类型、长度、格式)
[ ] Agent代码中无硬编码密钥
[ ] 凭证已通过agentcore add credential注册
可观测性
[ ] X-Ray追踪已启用(自动配置)
[ ] 已设置CloudWatch日志保留策略
[ ] 已建立评估基线
性能
[ ] 已测量并优化Agent初始化时间
[ ] 部署包大小在200 MB以下(目标100 MB以下)
[ ] 已审核依赖——无未使用的包
[ ] 重型初始化已延迟到请求时执行
[ ] 已为多轮/批量工作负载选择会话复用策略
[ ] 适用场景下,工作完成后调用了StopRuntimeSession
[ ] 已根据工作负载调整idleRuntimeSessionTimeout(默认900秒)
[ ] 对于长时间运行的后台任务:使用了add_async_task / complete_async_task
资源
[ ] 内存策略符合使用场景(如果使用内存)
[ ] 已配置Gateway认证(如果使用Gateway)
[ ] 已附加策略引擎(如果限制工具访问)
测试
[ ] 已使用生产环境代表性输入测试Agent
[ ] 已测试错误场景(工具失败、模型错误)
[ ] 已测试跨会话内存(如果使用LTM)Output
输出
- Checklist with specific findings for the project
- Specific commands to fix any issues found
- Recommended IAM policy for the detected model and resources
- 包含项目专属检查结果的清单
- 修复发现问题的具体命令
- 针对检测到的模型和资源推荐的IAM策略