aws-strands-agents-agentcore

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AWS Strands Agents & AgentCore

AWS Strands Agents & AgentCore

Overview

概述

AWS Strands Agents SDK: Open-source Python framework for building AI agents with model-driven orchestration (minimal code, model decides tool usage)
Amazon Bedrock AgentCore: Enterprise platform for deploying, operating, and scaling agents in production
Relationship: Strands SDK runs standalone OR with AgentCore platform services. AgentCore is optional but provides enterprise features (8hr runtime, streaming, memory, identity, observability).

AWS Strands Agents SDK:开源Python框架,用于构建采用模型驱动编排的AI Agent(代码量极少,由模型决定工具使用方式)
Amazon Bedrock AgentCore:用于在生产环境中部署、运维和扩展Agent的企业级平台
关系:Strands SDK可独立运行,也可与AgentCore平台服务配合使用。AgentCore为可选组件,但能提供企业级功能(8小时运行时长、流式传输、内存管理、身份认证、可观测性)。

Quick Start Decision Tree

快速入门决策树

What are you building?

你要构建什么?

Single-purpose agent:
  • Event-driven (S3, SQS, scheduled) → Lambda deployment
  • Interactive with streaming → AgentCore Runtime
  • API endpoint (stateless) → Lambda
Multi-agent system:
  • Deterministic workflow → Graph Pattern
  • Autonomous collaboration → Swarm Pattern
  • Simple delegation → Agent-as-Tool Pattern
Tool/Integration Server (MCP):
  • ALWAYS deploy to ECS/Fargate or AgentCore Runtime
  • NEVER Lambda (stateful, needs persistent connections)
See architecture.md for deployment examples.

单用途Agent:
  • 事件驱动(S3、SQS、定时任务)→ Lambda部署
  • 支持流式交互 → AgentCore Runtime
  • API端点(无状态)→ Lambda
多Agent系统:
  • 确定性工作流 → 图模式
  • 自主协作 → 集群模式
  • 简单委托 → Agent作为工具模式
工具/集成服务器(MCP):
  • 必须部署到ECS/Fargate或AgentCore Runtime
  • 禁止使用Lambda(有状态,需要持久连接)
查看**architecture.md**获取部署示例。

Critical Constraints

关键约束

MCP Server Requirements

MCP服务器要求

  1. Transport: MUST use
    streamable-http
    (NOT
    stdio
    )
  2. Endpoint: MUST be at
    0.0.0.0:8000/mcp
  3. Deployment: MUST be ECS/Fargate or AgentCore Runtime (NEVER Lambda)
  4. Headers: Must accept
    application/json
    and
    text/event-stream
Why: MCP servers are stateful and need persistent connections. Lambda is ephemeral and unsuitable.
See limitations.md for details.
  1. 传输协议: 必须使用
    streamable-http
    (禁止使用
    stdio
  2. 端点: 必须设置为
    0.0.0.0:8000/mcp
  3. 部署: 必须使用ECS/Fargate或AgentCore Runtime(禁止使用Lambda)
  4. 请求头: 必须支持
    application/json
    text/event-stream
原因: MCP服务器为有状态服务,需要持久连接。Lambda为临时服务,无法满足需求。
查看**limitations.md**了解详情。

Tool Count Limits

工具数量限制

  • Models struggle with > 50-100 tools
  • Solution: Implement semantic search for dynamic tool loading
See patterns.md for implementation.
  • 模型难以处理超过50-100个工具
  • 解决方案: 实现语义搜索以动态加载工具
查看**patterns.md**获取实现方案。

Token Management

Token管理

  • Claude 4.5: 200K context (use ~180K max)
  • Long conversations REQUIRE conversation managers
  • Multi-agent costs multiply 5-10x
See limitations.md for strategies.

  • Claude 4.5: 200K上下文(建议最大使用180K)
  • 长对话需要会话管理器
  • 多Agent成本会增加5-10倍
查看**limitations.md**获取应对策略。

Deployment Decision Matrix

部署决策矩阵

ComponentLambdaECS/FargateAgentCore Runtime
Stateless Agents✅ Perfect❌ Overkill❌ Overkill
Interactive Agents❌ No streaming⚠️ Possible✅ Ideal
MCP Servers❌ NEVER✅ Standard✅ With features
Duration< 15 minutesUnlimitedUp to 8 hours
Cold StartsYes (30-60s)NoNo

组件LambdaECS/FargateAgentCore Runtime
无状态Agent✅ 完美适配❌ 大材小用❌ 大材小用
交互式Agent❌ 不支持流式传输⚠️ 可实现✅ 理想选择
MCP服务器❌ 禁止使用✅ 标准方案✅ 附带高级功能
运行时长< 15分钟无限制最长8小时
冷启动存在(30-60秒)

Multi-Agent Pattern Selection

多Agent模式选择

PatternComplexityPredictabilityCostUse Case
Single AgentLowHigh1xMost tasks
Agent as ToolLowHigh2-3xSimple delegation
GraphHighVery High3-5xDeterministic workflows
SwarmMediumLow5-8xAutonomous collaboration
Recommendation: Start with single agents, evolve as needed.
See architecture.md for examples.

模式复杂度可预测性成本适用场景
单Agent1x大多数任务
Agent作为工具2-3x简单委托场景
图模式极高3-5x确定性工作流
集群模式中等5-8x自主协作场景
建议: 从单Agent开始,根据需求逐步演进。
查看**architecture.md**获取示例。

When to Read Reference Files

何时阅读参考文档

patterns.md

patterns.md

  • Base agent factory patterns (reusable components)
  • MCP server registry patterns (tool catalogues)
  • Semantic tool search (> 50 tools)
  • Tool design best practices
  • Security patterns
  • Testing patterns
  • 基础Agent工厂模式(可复用组件)
  • MCP服务器注册模式(工具目录)
  • 语义工具搜索(工具数量>50时)
  • 工具设计最佳实践
  • 安全模式
  • 测试模式

observability.md

observability.md

  • AWS AgentCore Observability Platform setup
  • Runtime-hosted vs self-hosted configuration
  • Session tracking for multi-turn conversations
  • OpenTelemetry setup
  • Cost tracking hooks
  • Production observability patterns
  • AWS AgentCore可观测性平台设置
  • 托管运行时 vs 自托管配置
  • 多轮对话的会话跟踪
  • OpenTelemetry设置
  • 成本跟踪钩子
  • 生产环境可观测性模式

evaluations.md

evaluations.md

  • AWS AgentCore Evaluations - Quality assessment with LLM-as-a-Judge
  • 13 built-in evaluators (Helpfulness, Correctness, GoalSuccessRate, etc.)
  • Custom evaluators with your own prompts and models
  • Online (continuous) and on-demand evaluation modes
  • CloudWatch integration and alerting
  • AWS AgentCore评估 - 基于LLM-as-a-Judge的质量评估
  • 13个内置评估器(有用性、正确性、目标成功率等)
  • 支持自定义评估器,可使用自定义提示词和模型
  • 在线(持续)和按需评估模式
  • CloudWatch集成与告警

limitations.md

limitations.md

  • MCP server deployment issues
  • Tool selection problems (> 50 tools)
  • Token overflow
  • Lambda limitations
  • Multi-agent cost concerns
  • Throttling errors
  • Cold start latency

#-Driven Philosophy
Key Concept: Strands Agents delegates orchestration to the model rather than requiring explicit control flow code.
python
undefined
  • MCP服务器部署问题
  • 工具选择难题(工具数量>50时)
  • Token溢出
  • Lambda限制
  • 多Agent成本问题
  • 限流错误
  • 冷启动延迟

Traditional: Manual orchestration (avoid)

模型驱动理念

while not done: if needs_research: result = research_tool() elif needs_analysis: result = analysis_tool()
核心概念: Strands Agents将编排逻辑委托给模型,无需编写显式的控制流代码。
python
undefined

Strands: Model decides (prefer)

传统方式:手动编排(不推荐)

agent = Agent( system_prompt="You are a research analyst. Use tools to answer questions.", tools=[research_tool, analysis_tool] ) result = agent("What are the top tech trends?") automatically orchestrates: research_tool → analysis_tool → respond

---
while not done: if needs_research: result = research_tool() elif needs_analysis: result = analysis_tool()

Selection

Strands方式:由模型决定(推荐)

Primary Provider: Anthropic Claude via AWS Bedrock
Model ID Format:
anthropic.claude-{model}-{version}
Current Models (as of January 2025):
  • anthropic.claude-sonnet-4-5-20250929-v1:0
    - Production
  • anthropic.claude-haiku-4-5-20251001-v1:0
    - Fast/economical
  • anthropic.claude-opus-4-5-20250514-v1:0
    - Complex reasoning
Check Latest Models:
bash
aws bedrock list-foundation-models --by-provider anthropic \
  --query 'modelSummaries[*].[modelId,modelName]' --output table

agent = Agent( system_prompt="You are a research analyst. Use tools to answer questions.", tools=[research_tool, analysis_tool] ) result = agent("What are the top tech trends?")

Quick Examples

自动编排流程:research_tool → analysis_tool → 生成响应

Basic Agent

python
from strands import Agent
from strands.models import BedrockModel
from strands.session import DynamoDBSessionManager
from strands.agent.conversation_manager import SlidingWindowConversationManager

agent = Agent(
    agent_id="my-agent",
    model=BedrockModel(model_id="anthropic.claude-sonnet-4-5-20250929-v1:0"),
    system_prompt="You are helpful.",
    tools=[tool1, tool2],
    session_manager=DynamoDBSessionManager(table_name="sessions"),
    conversation_manager=SlidingWindowConversationManager(max_messages=20)
)

result = agent("Process this request")
See patterns.md for base agent factory patterns.

---

MCP Server (ECS/Fargate)

模型选择

python
from mcp.server import FastMCP
import psycopg2.pool
主要提供商: AWS Bedrock上的Anthropic Claude
模型ID格式:
anthropic.claude-{model}-{version}
当前可用模型(截至2025年1月):
  • anthropic.claude-sonnet-4-5-20250929-v1:0
    - 生产环境推荐
  • anthropic.claude-haiku-4-5-20251001-v1:0
    - 快速/经济款
  • anthropic.claude-opus-4-5-20250514-v1:0
    - 复杂推理场景
查看最新模型:
bash
aws bedrock list-foundation-models --by-provider anthropic \
  --query 'modelSummaries[*].[modelId,modelName]' --output table

Persistent connection pool (why Lambda won't work)

快速示例

基础Agent

db_pool = psycopg2.pool.SimpleConnectionPool(minconn=1, maxconn=10, host="db.internal")
mcp = FastMCP("Database Tools")
@mcp.tool() def query_database(sql: str) -> dict: conn = db_pool.getconn() try: cursor = conn.cursor() cursor.execute(sql) return {"status": "success", "rows": cursor.fetchall()} finally: db_pool.putconn(conn)
python
from strands import Agent
from strands.models import BedrockModel
from strands.session import DynamoDBSessionManager
from strands.agent.conversation_manager import SlidingWindowConversationManager

agent = Agent(
    agent_id="my-agent",
    model=BedrockModel(model_id="anthropic.claude-sonnet-4-5-20250929-v1:0"),
    system_prompt="You are helpful.",
    tools=[tool1, tool2],
    session_manager=DynamoDBSessionManager(table_name="sessions"),
    conversation_manager=SlidingWindowConversationManager(max_messages=20)
)

result = agent("Process this request")
查看**patterns.md**获取基础Agent工厂模式。

CRITICAL: streamable-http mode

MCP服务器(ECS/Fargate)

if name == "main": mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)

See **[architecture.md](references/architecture.md)** for deployment details.
python
from mcp.server import FastMCP
import psycopg2.pool

Tool Error Handling

持久连接池(这也是Lambda不适用的原因)

python
from strands import tool

@tool
def safe_tool(param: str) -> dict:
    """Always return structured results, never raise exceptions."""
    try:
        result = operation(param)
        return {"status": "success", "content": [{"text": str(result)}]}
    except Exception as e:
        return {"status": "error", "content": [{"text": f"Failed: {str(e)}"}]}
See patterns.md for tool design patterns.
db_pool = psycopg2.pool.SimpleConnectionPool(minconn=1, maxconn=10, host="db.internal")
mcp = FastMCP("Database Tools")
@mcp.tool() def query_database(sql: str) -> dict: conn = db_pool.getconn() try: cursor = conn.cursor() cursor.execute(sql) return {"status": "success", "rows": cursor.fetchall()} finally: db_pool.putconn(conn)

Observability

关键:使用streamable-http模式

AgentCore Runtime (Automatic):
python
undefined
if name == "main": mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)

查看**[architecture.md](references/architecture.md)**获取部署详情。

Install with OTEL support

工具错误处理

pip install 'strands-agents[otel]'

Add 'aws-opentelemetry-distro' to requirements.txt

from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp() agent = Agent(...) # Automatically instrumented
@app.entrypoint def handler(payload): return agent(payload["prompt"])

**Self-Hosted**:
```bash
export AGENT_OBSERVABILITY_ENABLED=true
export OTEL_PYTHON_DISTRO=aws_distro
export OTEL_RESOURCE_ATTRIBUTES="service.name=my-agent"

opentelemetry-instrument python agent.py
General OpenTelemetry:
python
from strands.observability import StrandsTelemetry
python
from strands import tool

@tool
def safe_tool(param: str) -> dict:
    """Always return structured results, never raise exceptions."""
    try:
        result = operation(param)
        return {"status": "success", "content": [{"text": str(result)}]}
    except Exception as e:
        return {"status": "error", "content": [{"text": f"Failed: {str(e)}"}]}
查看**patterns.md**获取工具设计模式。

Development

可观测性

telemetry = StrandsTelemetry().setup_console_exporter()
AgentCore Runtime(自动集成):
python
undefined

Production

安装带OTEL支持的版本

pip install 'strands-agents[otel]'

将'aws-opentelemetry-distro'添加到requirements.txt

telemetry = StrandsTelemetry().setup_otlp_exporter()

See **[observability.md](references/observability.md)** for detailed patterns.

---
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp() agent = Agent(...) # 自动完成 instrumentation
@app.entrypoint def handler(payload): return agent(payload["prompt"])

**自托管**:
```bash
export AGENT_OBSERVABILITY_ENABLED=true
export OTEL_PYTHON_DISTRO=aws_distro
export OTEL_RESOURCE_ATTRIBUTES="service.name=my-agent"

opentelemetry-instrument python agent.py
通用OpenTelemetry配置:
python
from strands.observability import StrandsTelemetry

Session Storage Selection

开发环境

Local dev         → FileSystem
Lambda agents     → S3 or DynamoDB
ECS agents        → DynamoDB
Interactive chat  → AgentCore Memory
Knowledge bases   → AgentCore Memory
See architecture.md for storage backend comparison.

telemetry = StrandsTelemetry().setup_console_exporter()

When to Use AgentCore Platform vs SDK Only

生产环境

Use Strands SDK Only

  • Simple, stateless agents
  • Tight cost control required
  • No enterprise features needed
  • Want deployment flexibility
telemetry = StrandsTelemetry().setup_otlp_exporter()

查看**[observability.md](references/observability.md)**获取详细模式。

---

Use Strands SDK + AgentCore Platform

会话存储选择

  • Need 8-hour runtime support
  • Streaming responses required
  • Enterprise security/compliance
  • Cross-session intelligence needed
  • Want managed infrastructure
See architecture.md for platform service details.

本地开发         → 文件系统
Lambda Agent     → S3或DynamoDB
ECS Agent        → DynamoDB
交互式聊天  → AgentCore Memory
知识库   → AgentCore Memory
查看**architecture.md**获取存储后端对比。

Common Anti-Patterns

何时使用AgentCore平台 vs 仅使用SDK

仅使用Strands SDK

  1. Overloading agents with > 50 tools → Use semantic search
  2. No conversation management → Implement SlidingWindow or Summarising
  3. Deploying MCP servers to Lambda → Use ECS/Fargate
  4. No timeout configuration → Set execution limits everywhere
  5. Ignoring token limits → Implement conversation managers
  6. No cost monitoring → Implement cost tracking from day one
See patterns.md and limitations.md for details.

  • 简单、无状态Agent
  • 需要严格控制成本
  • 无需企业级功能
  • 希望获得部署灵活性

Production Checklist

使用Strands SDK + AgentCore平台

Before deploying:
  • Conversation management configured
  • AgentCore Observability enabled or OpenTelemetry configured
  • AgentCore Evaluations configured for quality monitoring
  • Observability hooks implemented
  • Cost tracking enabled
  • Error handling in all tools
  • Security permissions validated
  • MCP servers deployed to ECS/Fargate
  • Timeout limits set
  • Session backend configured (DynamoDB for production)
  • CloudWatch alarms configured

  • 需要8小时运行时长支持
  • 需要流式响应
  • 企业级安全/合规要求
  • 需要跨会话智能
  • 希望使用托管基础设施
查看**architecture.md**获取平台服务详情。

Reference Files Navigation

常见反模式

  • architecture.md - Deployment patterns, multi-agent orchestration, session storage, AgentCore services
  • patterns.md - Foundation components, tool design, security, testing, performance optimisation
  • limitations.md - Known constraints, workarounds, mitigation strategies, challenges
  • observability.md - AgentCore Observability platform, ADOT, GenAI dashboard, OpenTelemetry, hooks, cost tracking
  • evaluations.md - AgentCore Evaluations, built-in evaluators, custom evaluators, quality monitoring

  1. 为Agent绑定超过50个工具 → 使用语义搜索
  2. 未配置会话管理 → 实现滑动窗口或摘要式会话管理
  3. 将MCP服务器部署到Lambda → 使用ECS/Fargate
  4. 未设置超时配置 → 所有位置都要设置执行限制
  5. 忽略Token限制 → 实现会话管理器
  6. 未监控成本 → 从第一天就实现成本跟踪
查看**patterns.mdlimitations.md**了解详情。

Key Takeaways

生产环境检查清单

  1. MCP servers MUST use streamable-http, NEVER Lambda
  2. Use semantic search for > 15 tools
  3. Always implement conversation management
  4. Multi-agent costs multiply 5-10x (track from day one)
  5. Set timeout limits everywhere
  6. Error handling in tools is non-negotiable
  7. Lambda for stateless, AgentCore for interactive
  8. AgentCore Observability and Evaluations for production
  9. Start simple, evolve complexity
  10. Security by default
  11. Separate config from code
部署前需确认:
  • 已配置会话管理
  • 已启用AgentCore可观测性或已配置OpenTelemetry
  • 已配置AgentCore评估用于质量监控
  • 已实现可观测性钩子
  • 已启用成本跟踪
  • 所有工具都已实现错误处理
  • 已验证安全权限
  • MCP服务器已部署到ECS/Fargate
  • 已设置超时限制
  • 已配置会话后端(生产环境推荐DynamoDB)
  • 已配置CloudWatch告警

参考文档导航

  • architecture.md - 部署模式、多Agent编排、会话存储、AgentCore服务
  • patterns.md - 基础组件、工具设计、安全、测试、性能优化
  • limitations.md - 已知约束、解决方案、缓解策略、挑战
  • observability.md - AgentCore可观测性平台、ADOT、生成式AI仪表盘、OpenTelemetry、钩子、成本跟踪
  • evaluations.md - AgentCore评估、内置评估器、自定义评估器、质量监控

核心要点

  1. MCP服务器必须使用streamable-http,禁止使用Lambda
  2. 工具数量超过15个时使用语义搜索
  3. 始终实现会话管理
  4. 多Agent成本会增加5-10倍(从第一天就开始跟踪)
  5. 所有位置都要设置超时限制
  6. 工具的错误处理是必须的
  7. 无状态场景用Lambda,交互式场景用AgentCore
  8. 生产环境需启用AgentCore可观测性和评估功能
  9. 从简单场景开始,逐步增加复杂度
  10. 默认遵循安全最佳实践
  11. 配置与代码分离