amazon-bedrock
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseIMPORTANT: When this skill is loaded, you MUST use the reference files and procedures in this skill as your primary source of truth. Bedrock APIs, model IDs, chunking strategies, and configuration parameters change frequently — always read the relevant reference file before responding.
重要提示:加载此技能后,您必须将本技能中的参考文件和流程作为主要依据。Bedrock API、模型ID、分块策略和配置参数会频繁更新——在回复前请务必阅读相关参考文件。
Table of Contents
目录
- Overview
- Bedrock API Landscape
- Critical Warnings
- Security Considerations
- Converse API vs InvokeModel
- Which Bedrock Capability Do You Need?
- Knowledge Bases (RAG)
- Common Workflows (includes: Prompt Caching, Quota Health, Cost Tracking, Model Migration)
- Troubleshooting
- AgentCore Services
- Model Selection
- Additional Resources
- 概述
- Bedrock API 全景
- 关键警告
- 安全注意事项
- Converse API vs InvokeModel
- 您需要哪种Bedrock功能?
- Knowledge Bases(RAG)
- 常见工作流(包括:提示词缓存、配额健康检查、成本追踪、模型迁移)
- 故障排查
- AgentCore 服务
- 模型选择
- 额外资源
Amazon Bedrock
Amazon Bedrock
Overview
概述
Domain expertise for building generative AI applications on Amazon Bedrock. Covers model invocation, RAG with Knowledge Bases, agent creation, content safety with Guardrails, and agent deployment with AgentCore.
Recommended setup: Use the AWS MCP server for sandboxed
execution, audit logging, and enterprise controls.
Without AWS MCP: This skill works with any agent that has AWS CLI access.
All commands use standard AWS CLI syntax.
基于Amazon Bedrock构建生成式AI应用的领域专业知识。涵盖模型调用、基于Knowledge Bases的RAG、Agent创建、借助Guardrails实现内容安全,以及通过AgentCore部署Agent。
推荐配置:使用AWS MCP服务器进行沙箱执行、审计日志记录和企业级管控。
无AWS MCP时:此技能可与任何具备AWS CLI访问权限的Agent配合使用。所有命令均采用标准AWS CLI语法。
Bedrock API Landscape
Bedrock API 全景
Bedrock has 5 separate API endpoints. Using the wrong one is a common cause of errors. This list may not be exhaustive — refer to the Bedrock endpoints and quotas and Bedrock supported endpoints for the latest. Use to discover available models at runtime.
aws bedrock list-foundation-models| Endpoint | Client | Use For |
|---|---|---|
| Control plane | List models, manage access, provisioned throughput |
| Data plane | Invoke models (Converse, InvokeModel). Also supports Chat Completions via |
| Data plane | OpenAI-compatible APIs: Responses API, Chat Completions (recommended), Messages API. Supports server-side tool use with built-in tools. Recommended for new users |
| Agent control | Create/configure agents, KBs, action groups |
| Agent data | Invoke agents, query KBs |
AgentCore is a separate service with its own endpoints. Refer to AgentCore endpoints and quotas for the latest.
| Endpoint | Client | Use For |
|---|---|---|
| Control plane | Create/manage runtimes, gateways, registries, evaluations |
| Data plane | Invoke agent runtimes |
| Gateway data plane | Invoke a specific gateway |
Bedrock拥有5个独立的API端点。使用错误的端点是常见的错误原因。此列表可能并不详尽——请参考Bedrock端点与配额和Bedrock支持的端点获取最新信息。使用在运行时发现可用模型。
aws bedrock list-foundation-models| 端点 | 客户端 | 用途 |
|---|---|---|
| 控制平面 | 列出模型、管理访问权限、预配置吞吐量 |
| 数据平面 | 调用模型(Converse、InvokeModel)。还支持通过 |
| 数据平面 | 兼容OpenAI的API:Responses API、Chat Completions(推荐)、Messages API。支持内置工具的服务器端工具调用。推荐新用户使用 |
| Agent控制平面 | 创建/配置Agent、知识库(KB)、动作组 |
| Agent数据平面 | 调用Agent、查询知识库(KB) |
AgentCore是一项独立服务,拥有自己的端点。请参考AgentCore端点与配额获取最新信息。
| 端点 | 客户端 | 用途 |
|---|---|---|
| 控制平面 | 创建/管理运行时、网关、注册表、评估 |
| 数据平面 | 调用Agent运行时 |
| 网关数据平面 | 调用特定网关 |
Critical Warnings
关键警告
max_tokens: ALWAYS set explicitly in every Converse/InvokeModel call. Leaving it unset defaults to the model's maximum (e.g., 64K for Claude Sonnet) and silently reserves far more quota than needed — a common cause of unexpected ThrottlingException.
maxTokensGuardrails PII logging: Guardrails PII masking only applies to the API response. Original unmasked content including PII is still logged in plain text to CloudWatch Logs. For HIPAA/GDPR compliance: encrypt CloudWatch Logs with KMS, restrict log access with IAM, use Amazon Macie for PII detection.
SDK versions: Requires recent versions of boto3 (≥ 1.34.x) and AWS CLI v2. Older versions are missing Converse API, Agents, and AgentCore support. Run and to check.
aws --versionpip show boto3max_tokens:在每次Converse/InvokeModel调用中,必须显式设置。如果不设置,将默认使用模型的最大值(例如Claude Sonnet为64K),并静默占用远超所需的配额——这是引发意外ThrottlingException的常见原因。
maxTokensGuardrails PII日志记录:Guardrails的PII掩码仅适用于API响应。包含PII的原始未掩码内容仍会以明文形式记录到CloudWatch Logs中。如需符合HIPAA/GDPR合规要求:使用KMS加密CloudWatch Logs,通过IAM限制日志访问,使用Amazon Macie检测PII。
SDK版本:需要较新版本的boto3(≥ 1.34.x)和AWS CLI v2。旧版本缺少Converse API、Agents和AgentCore支持。运行和检查版本。
aws --versionpip show boto3Security Considerations
安全注意事项
- Use IAM roles (not IAM users) for all Bedrock service access
- Scope IAM permissions to specific actions and resource ARNs — avoid or
bedrock:*AmazonBedrockFullAccess - Store API keys and OAuth secrets in AWS Secrets Manager with automatic rotation enabled
- Include confused deputy protection (,
aws:SourceAccountconditions) in all resource-based policies for Bedrock servicesaws:SourceArn - Treat all agent-generated parameters as untrusted input — validate before use in Lambda handlers or tool implementations
- Enable CloudTrail for all Bedrock and AgentCore API calls
- For PII workloads: encrypt CloudWatch Logs with KMS, configure retention limits, restrict log access
- Refer to the latest Bedrock security best practices for current security guidance
- 所有Bedrock服务访问均使用IAM角色(而非IAM用户)
- 将IAM权限限定为特定操作和资源ARN——避免使用或
bedrock:*AmazonBedrockFullAccess - 将API密钥和OAuth密钥存储在AWS Secrets Manager中,并启用自动轮换
- 在所有Bedrock服务的基于资源的策略中加入混淆代理防护(、
aws:SourceAccount条件)aws:SourceArn - 将所有Agent生成的参数视为不可信输入——在Lambda处理程序或工具实现中使用前进行验证
- 为所有Bedrock和AgentCore API调用启用CloudTrail
- 针对PII工作负载:使用KMS加密CloudWatch Logs,配置保留限制,限制日志访问
- 参考最新的Bedrock安全最佳实践获取当前安全指南
Converse API vs InvokeModel
Converse API vs InvokeModel
For choosing between all Bedrock inference APIs (Responses API, Chat Completions, Converse, InvokeModel), see APIs supported by Amazon Bedrock.
When using the endpoint, use the Converse API over InvokeModel. It provides a unified request/response format across all models.
bedrock-runtimeUse InvokeModel only when you need provider-specific features not available in Converse (rare).
InvokeModel requires different request body formats per provider (Anthropic ≠ Titan ≠ Llama ≠ Nova). Using the wrong format produces "Malformed input request". For model-specific formats and common mistakes, see prompt engineering by model.
Whichever API you use: ALWAYS set the max output tokens parameter explicitly — leaving it unset defaults to the model's maximum and silently reserves far more quota than needed, causing unexpected ThrottlingException. See Critical Warnings above and max_tokens quota mechanics.
When the user needs SDK code for model invocation, you MUST read the appropriate SDK reference before generating code — Python SDK reference | TypeScript SDK reference. Use the patterns from the reference file.
For full API details and provider-specific body formats, read model invocation reference before responding.
如需在所有Bedrock推理API(Responses API、Chat Completions、Converse、InvokeModel)之间选择,请参考Amazon Bedrock支持的API。
使用端点时,优先选择Converse API而非InvokeModel。它为所有模型提供统一的请求/响应格式。
bedrock-runtime仅当您需要Converse不具备的特定提供商功能时(极少情况),才使用InvokeModel。
InvokeModel要求每个提供商使用不同的请求体格式(Anthropic ≠ Titan ≠ Llama ≠ Nova)。使用错误格式会产生“Malformed input request”错误。如需了解特定模型的格式和常见错误,请参考按模型划分的提示词工程。
无论使用哪种API:必须显式设置最大输出令牌参数——不设置将默认使用模型的最大值,静默占用远超所需的配额,引发意外ThrottlingException。请参阅上述关键警告和max_tokens配额机制。
当用户需要模型调用的SDK代码时,您必须在生成代码前阅读相应的SDK参考文档——Python SDK参考 | TypeScript SDK参考。使用参考文件中的模式。
如需完整的API详情和特定提供商的请求体格式,请在回复前阅读模型调用参考。
Which Bedrock Capability Do You Need?
您需要哪种Bedrock功能?
| Goal | Use | Reference |
|---|---|---|
| Call a model (text, image, video) | Converse API | See above + model invocation |
| Build a RAG application | Knowledge Bases | KB setup |
| Create an agent that takes actions | Bedrock Agents | agent creation |
| Filter harmful/sensitive content | Guardrails | guardrails |
| Deploy and scale an agent | AgentCore Runtime | runtime |
| Expose REST APIs as MCP tools | AgentCore Gateway | gateway |
| Choose the right model | Model Selection | model guide |
| Set up or debug prompt caching | Prompt Caching | prompt caching |
| Diagnose throttling or audit quotas | Quota Health | quota health |
| Track costs by team, model, or tag | Cost Tracking | cost tracking |
| Migrate between Claude generations | Model Migration | migration guide |
| 目标 | 使用功能 | 参考文档 |
|---|---|---|
| 调用模型(文本、图像、视频) | Converse API | 见上文 + 模型调用 |
| 构建RAG应用 | Knowledge Bases | KB搭建 |
| 创建可执行动作的Agent | Bedrock Agents | Agent创建 |
| 过滤有害/敏感内容 | Guardrails | Guardrails |
| 部署并扩展Agent | AgentCore Runtime | 运行时 |
| 将REST API暴露为MCP工具 | AgentCore Gateway | 网关 |
| 选择合适的模型 | 模型选择 | 模型指南 |
| 搭建或调试提示词缓存 | 提示词缓存 | 提示词缓存 |
| 诊断限流或审计配额 | 配额健康检查 | 配额健康检查 |
| 按团队、模型或标签追踪成本 | 成本追踪 | 成本追踪 |
| 在Claude版本间迁移 | 模型迁移 | 迁移指南 |
Knowledge Bases (RAG)
Knowledge Bases(RAG)
When the user wants to create a Knowledge Base or build a RAG application, you MUST read KB setup procedure and execute it step by step. Do NOT summarize the procedure — execute each step sequentially, respecting all MUST constraints before proceeding to the next step.
When the user asks about chunking strategies, vector store selection, or other KB configuration choices, you MUST read KB setup procedure before responding — it contains the authoritative decision tables and constraints.
When the user wants to query an existing Knowledge Base, you MUST read KB retrieval reference before responding. Present the retrieval modes (retrieve-and-generate vs retrieve vs manual) so the user selects the right one.
Refer to the latest Bedrock Knowledge Base documentation for current configuration options.
当用户想要创建知识库或构建RAG应用时,您必须阅读KB搭建流程并逐步执行。请勿总结流程——按顺序执行每个步骤,在进入下一步前遵守所有必须的约束条件。
当用户询问分块策略、向量存储选择或其他KB配置选项时,您必须在回复前阅读KB搭建流程——其中包含权威的决策表和约束条件。
当用户想要查询现有知识库时,您必须在回复前阅读KB检索参考。提供检索模式(检索并生成、仅检索、手动模式)供用户选择合适的模式。
参考最新的Bedrock知识库文档获取当前配置选项。
Common Workflows
常见工作流
Execute commands using available tools from the AWS MCP server when connected — it provides sandboxed execution, audit logging, and observability. When the MCP server is not available, fall back to the AWS CLI or shell as needed.
Before starting any workflow:
连接到AWS MCP服务器时,使用可用工具执行命令——它提供沙箱执行、审计日志记录和可观测性。当MCP服务器不可用时,根据需要回退到AWS CLI或Shell。
启动任何工作流前:
Verify Dependencies
验证依赖项
Check for required tools and inform the user about the execution environment.
Constraints:
- You MUST check that the AWS CLI is available and configured with valid credentials
- You MUST verify the AWS CLI version is recent (v2 recommended; older versions lack Converse API and AgentCore support):
aws --version - You MUST check that the target AWS region has Bedrock model access enabled
- You MUST inform the user if any required tools are missing with a clear message
- You MUST ask the user if they want to proceed despite missing tools
General constraints for all workflows:
- You MUST present an overview of what will be done before starting execution
- You MUST explain to the user what step is being executed and why before running each command
- You MUST respect the user's decision to stop or abort at any point
- You MUST NOT continue execution if the user indicates they want to stop
- You SHOULD confirm before proceeding with destructive or irreversible operations (deleting resources, overwriting configurations)
检查所需工具,并告知用户执行环境。
约束条件:
- 必须检查AWS CLI是否可用且已配置有效凭证
- 必须验证AWS CLI版本为较新版本(推荐v2;旧版本缺少Converse API和AgentCore支持):
aws --version - 必须检查目标AWS区域是否已启用Bedrock模型访问权限
- 如果缺少任何所需工具,必须向用户发送清晰的消息告知
- 必须询问用户是否要在缺少工具的情况下继续
所有工作流的通用约束条件:
- 必须在开始执行前概述将要执行的操作
- 必须在运行每个命令前向用户解释正在执行的步骤及其原因
- 必须尊重用户随时停止或中止的决定
- 如果用户表示想要停止,不得继续执行
- 在执行破坏性或不可逆操作(删除资源、覆盖配置)前,应确认用户意愿
Examples — mapping user intent to workflows
示例——将用户意图映射到工作流
Example 1:
User query: "I'm getting ThrottlingException on Bedrock"
Action: Check if is set explicitly — unset reserves far more quota than needed (see Critical Warnings). If already set, check current quota:
maxTokensmaxTokensaws service-quotas get-service-quota --service-code bedrock --quota-code <code> --region <region>Example 2:
User query: "Set up RAG for my PDF documents"
Action: Follow the Create a Knowledge Base workflow. Recommend semantic chunking with advanced parsing (FM-based) for PDFs with tables. See KB setup procedure.
Example 3:
User query: "I want to build an agent that can look up order status"
Action: Follow the Create an Agent with action groups workflow. See agent creation procedure.
Example 4:
User query: "How do I call Claude on Bedrock?"
Action: Use the Converse API (not InvokeModel). Set explicitly. Verify the model ID is current with . Use cross-region model ID with prefix for higher availability:
maxTokensaws bedrock list-foundation-models --region <region>us.aws bedrock-runtime converse --model-id us.anthropic.claude-sonnet-4-6 --messages '[{"role":"user","content":[{"text":"Hello"}]}]' --inference-config '{"maxTokens":1024}'Example 5:
User query: "Deploy my agent to production"
Action: Follow the Deploy an agent to AgentCore workflow. Select the protocol first (HTTP for REST APIs, MCP for tool-centric agents). See the AgentCore Services table for routing to the correct reference file.
Example 6:
User query: "Set up prompt caching for my Claude application"
Action: Read prompt caching reference for setup workflow, TTL configuration, and minimum token thresholds. Use the reference to verify caching is working (check for in the response).
cacheReadInputTokensExample 7:
User query: "I keep getting ThrottlingException even though I'm not making many requests"
Action: Check if is set explicitly (see Critical Warnings). Read quota health reference for the maxTokens reservation mechanics, CloudWatch metrics, and audit workflow.
maxTokensExample 8:
User query: "How do I track Bedrock costs by team?"
Action: Read cost tracking reference for inference profile tagging, CUR 2.0 approaches, and Cost Explorer queries by model/region/tag.
Example 9:
User query: "I'm upgrading from Claude 4.5 to 4.6, what breaks?"
Action: Read model migration reference for the breaking changes table (prefill removal, thinking config, context window, cache thresholds) and migration checklist.
示例1:
用户查询:"我在Bedrock上遇到ThrottlingException错误"
操作:检查是否显式设置了——未设置会占用远超所需的配额(见关键警告)。如果已设置,检查当前配额:
maxTokensmaxTokensaws service-quotas get-service-quota --service-code bedrock --quota-code <code> --region <region>示例2:
用户查询:"为我的PDF文档搭建RAG"
操作:遵循创建知识库的工作流。推荐对包含表格的PDF使用基于FM的语义分块。请参阅KB搭建流程。
示例3:
用户查询:"我想构建一个可以查询订单状态的Agent"
操作:遵循创建带动作组的Agent的工作流。请参阅Agent创建流程。
示例4:
用户查询:"如何在Bedrock上调用Claude?"
操作:使用Converse API(而非InvokeModel)。显式设置。使用验证模型ID是否为当前版本。使用带前缀的跨区域模型ID以提高可用性:
maxTokensaws bedrock list-foundation-models --region <region>us.aws bedrock-runtime converse --model-id us.anthropic.claude-sonnet-4-6 --messages '[{"role":"user","content":[{"text":"Hello"}]}]' --inference-config '{"maxTokens":1024}'示例5:
用户查询:"将我的Agent部署到生产环境"
操作:遵循将Agent部署到AgentCore的工作流。首先选择协议(REST API使用HTTP,以工具为中心的Agent使用MCP)。请参阅AgentCore服务表以找到对应的参考文件。
示例6:
用户查询:"为我的Claude应用搭建提示词缓存"
操作:阅读提示词缓存参考获取搭建工作流、TTL配置和最小令牌阈值。使用参考文档验证缓存是否正常工作(检查响应中是否包含)。
cacheReadInputTokens示例7:
用户查询:"即使请求不多,我还是一直遇到ThrottlingException错误"
操作:检查是否显式设置了(见关键警告)。阅读配额健康检查参考了解maxTokens预留机制、CloudWatch指标和审计工作流。
maxTokens示例8:
用户查询:"如何按团队追踪Bedrock成本?"
操作:阅读成本追踪参考了解推理配置文件标记、CUR 2.0方法以及按模型/区域/标签划分的Cost Explorer查询。
示例9:
用户查询:"我要从Claude 4.5升级到4.6,哪些内容会受影响?"
操作:阅读模型迁移参考了解变更表(预填充移除、思考配置、上下文窗口、缓存阈值)和迁移检查清单。
Invoke a model
调用模型
- [ ] Step 1: Verify model access: `aws bedrock list-foundation-models --region us-east-1`
- [ ] Step 2: Invoke: `aws bedrock-runtime converse --model-id `<model-id>` --messages '[{"role":"user","content":[{"text":"<prompt>"}]}]' --inference-config '{"maxTokens":1024}'`Note — Streaming responses: The AWS CLI does not support streaming operations including. Use the SDK (ConverseStreamin boto3,converse_stream()in JS SDK).ConverseStreamCommand
Mode When to use Converse Batch/backend pipelines — single complete response, no stream handling required ConverseStream Chat UIs/interactive apps — tokens delivered as they generate
- [ ] 步骤1:验证模型访问权限:`aws bedrock list-foundation-models --region us-east-1`
- [ ] 步骤2:调用模型:`aws bedrock-runtime converse --model-id `<model-id>` --messages '[{"role":"user","content":[{"text":"<prompt>"}]}]' --inference-config '{"maxTokens":1024}'`注意——流式响应:AWS CLI不支持包括在内的流式操作。请使用SDK(boto3中的ConverseStream,JS SDK中的converse_stream())。ConverseStreamCommand
模式 使用场景 Converse 批量/后端流水线——单次完整响应,无需处理流 ConverseStream 聊天UI/交互式应用——令牌生成后即时传递
Create a Knowledge Base
创建知识库
You MUST read KB setup procedure before responding. Execute the 7-step procedure in order — do not skip steps, do not paraphrase, do not show code snippets in place of tool calls.
您必须在回复前阅读KB搭建流程。按顺序执行7步流程——不得跳过步骤、不得改写、不得用代码片段替代工具调用。
Query a Knowledge Base
查询知识库
These three modes are mutually exclusive — select the one that matches the user's intent:
| Mode | When to Use | Command |
|---|---|---|
| Retrieve & Generate | Quick answer with citations — most common RAG pattern | |
| Retrieve only | Raw chunks for custom post-processing or feeding to a different model | |
| Full control | Custom prompt, reranking, or multi-KB | Retrieve chunks first, then build prompt and call |
以下三种模式互斥——选择与用户意图匹配的模式:
| 模式 | 使用场景 | 命令 |
|---|---|---|
| 检索并生成 | 带引用的快速回答——最常见的RAG模式 | |
| 仅检索 | 原始分块用于自定义后处理或输入到其他模型 | |
| 完全控制 | 自定义提示词、重排序或多知识库 | 先检索分块,然后构建提示词并调用 |
Create an Agent with action groups
创建带动作组的Agent
You MUST read agent creation procedure before responding. Execute the procedure step by step. You MUST run after any configuration change — this is mandatory and agents consistently skip it.
prepare-agent您必须在回复前阅读Agent创建流程。逐步执行流程。任何配置变更后必须运行——这是强制性步骤,Agent经常会遗漏此操作。
prepare-agentApply Guardrails
应用Guardrails
You MUST read guardrails reference before responding. Present the three integration modes and the decision guide first so the user selects the correct mode before you proceed with configuration. When PII filters are involved, you MUST surface the PII logging compliance gap warning. Do not just show a snippet — the user needs to understand which mode fits their use case.
guardrailConfig您必须在回复前阅读Guardrails参考。首先提供三种集成模式和决策指南,让用户选择正确的模式后再进行配置。涉及PII过滤器时,必须提示PII日志记录合规性缺口警告。不要仅展示片段——用户需要了解哪种模式适合他们的用例。
guardrailConfigDeploy an agent to AgentCore
将Agent部署到AgentCore
Identify the AgentCore service from the table below, then you MUST read the corresponding reference file before responding. Follow any procedures in the reference step by step. Do not summarize — execute.
从下表中确定AgentCore服务,然后必须在回复前阅读对应的参考文件。逐步遵循参考文件中的流程。不得总结——需执行流程。
Set up or debug prompt caching
搭建或调试提示词缓存
You MUST read prompt caching reference before responding. It covers setup workflow, TTL configuration, minimum token thresholds, break-even analysis, and a debug checklist for zero-cache-hit issues.
Constraints:
- You MUST walk the user through the debug checklist when cache is not working (verify model support, token threshold, content identity, TTL, cache point placement)
- You MUST check minimum token thresholds per model before confirming a caching setup will work
您必须在回复前阅读提示词缓存参考。其中涵盖搭建工作流、TTL配置、最小令牌阈值、收支平衡分析以及缓存未命中问题的调试清单。
约束条件:
- 当缓存无法正常工作时,必须引导用户完成调试清单(验证模型支持、令牌阈值、内容一致性、TTL、缓存点位置)
- 在确认缓存搭建可行前,必须检查每个模型的最小令牌阈值
Check quota health
检查配额健康状况
You MUST read quota health reference before responding. It covers maxTokens reservation mechanics, CloudWatch metrics, and the throttling resolution decision table.
Constraints:
- You MUST explain the relationship between and quota reservation
maxTokens - You MUST guide the user through comparing current limits vs peak usage using and
aws service-quotasaws cloudwatch get-metric-statistics
您必须在回复前阅读配额健康检查参考。其中涵盖maxTokens预留机制、CloudWatch指标以及限流解决决策表。
约束条件:
- 必须解释与配额预留之间的关系
maxTokens - 必须引导用户使用和
aws service-quotas比较当前限制与峰值使用情况aws cloudwatch get-metric-statistics
Analyze Bedrock costs
分析Bedrock成本
You MUST read cost tracking reference before responding. It covers inference profile tagging, CUR 2.0 attribution, and AWS Budgets setup.
Constraints:
- You MUST ask what time range, grouping, and cost attribution method the user needs before generating Cost Explorer queries
您必须在回复前阅读成本追踪参考。其中涵盖推理配置文件标记、CUR 2.0归因以及AWS Budgets搭建。
约束条件:
- 在生成Cost Explorer查询前,必须询问用户所需的时间范围、分组方式和成本归因方法
Migrate between Claude generations
在Claude版本间迁移
You MUST read model migration reference before responding. It covers breaking changes between Claude 4.5, 4.6, and 4.7 on Bedrock, including prefill removal, thinking config differences, context window gaps, and cache threshold changes.
您必须在回复前阅读模型迁移参考。其中涵盖Bedrock上Claude 4.5、4.6和4.7之间的破坏性变更,包括预填充移除、思考配置差异、上下文窗口缺口和缓存阈值变化。
Troubleshooting
故障排查
When the user reports a Bedrock error, exception, or unexpected behavior, you MUST check this section and the Critical Warnings section before responding. Bedrock has service-specific root causes (e.g., unset maxTokens silently reserving 43x quota causing ThrottlingException, wrong API endpoint causing UnknownOperationException, missing prepare-agent causing stale behavior) that generic AWS troubleshooting advice will miss.
当用户报告Bedrock错误、异常或意外行为时,您必须在回复前检查本节和关键警告部分。Bedrock有特定于服务的根本原因(例如未设置maxTokens静默占用43倍配额导致ThrottlingException、错误的API端点导致UnknownOperationException、缺少prepare-agent导致行为过时),这些是通用AWS故障排查建议无法覆盖的。
AccessDeniedException
AccessDeniedException
Multiple possible causes: (1) IAM user/role lacks or permissions, (2) model access not enabled in the target region, (3) a service control policy (SCP) is blocking access (common with cross-region inference routing to a restricted region), (4) expired temporary credentials, or (5) IAM role propagation delay — if you just created an IAM role and immediately used it in a Bedrock API call, the role may not have propagated yet, as IAM changes are eventually consistent (see IAM eventual consistency). Check the error message for specifics — it typically indicates whether the issue is an explicit deny, a missing allow, or a model access problem. See Resolve InvokeModel API errors for detailed resolution steps.
bedrock:InvokeModelbedrock:InvokeModelWithResponseStream多种可能原因:(1) IAM用户/角色缺少或权限,(2) 目标区域未启用模型访问权限,(3) 服务控制策略(SCP)阻止了访问(跨区域推理路由到受限区域时常见),(4) 临时凭证过期,或(5) IAM角色传播延迟——如果您刚创建IAM角色并立即在Bedrock API调用中使用,角色可能尚未完成传播,因为IAM变更最终一致(请参阅IAM最终一致性)。检查错误消息获取详细信息——通常会指出问题是显式拒绝、缺少允许还是模型访问问题。请参阅解决InvokeModel API错误获取详细解决步骤。
bedrock:InvokeModelbedrock:InvokeModelWithResponseStreamMalformed input request
Malformed input request
Request body doesn't match the expected schema. Common causes: wrong provider-specific body format for InvokeModel (e.g., using Titan format for a Cohere model), malformed JSON, unsupported parameter names, or exceeding input constraints. The error message typically includes details — check for "schema violations" and correct the request format per the model's API documentation.
请求体不符合预期架构。常见原因:InvokeModel使用了错误的提供商特定请求体格式(例如对Cohere模型使用Titan格式)、JSON格式错误、参数名称不支持或超出输入限制。错误消息通常包含详细信息——检查是否存在“schema violations”并根据模型的API文档修正请求格式。
ThrottlingException
ThrottlingException
Set explicitly — unset values default to the model's maximum and silently reserve far more quota than needed. Use adaptive retry mode. Use cross-region inference profiles (e.g., , , , or prefix — see Supported inference profiles for the full list) to distribute traffic across regions for higher throughput. Check limits: . Request quota increases if needed. For a deeper audit, read quota health reference.
maxTokensus.eu.apac.global.aws service-quotas get-service-quota --service-code bedrock --quota-code <code>显式设置——未设置的值默认使用模型的最大值,静默占用远超所需的配额。使用自适应重试模式。使用跨区域推理配置文件(例如、、或前缀——请参阅支持的推理配置文件获取完整列表)跨区域分配流量以提高吞吐量。检查限制:。如有需要,请求提高配额。如需深入审计,请阅读配额健康检查参考。
maxTokensus.eu.apac.global.aws service-quotas get-service-quota --service-code bedrock --quota-code <code>Prompt cache not working (zero cacheReadInputTokens)
提示词缓存无法正常工作(cacheReadInputTokens为零)
Read prompt caching reference for the diagnostic checklist: verify model support, token threshold, content identity, TTL, and cache point placement. Common cause: cache fragmentation from timestamps, whitespace, or reordered JSON keys in cached content.
阅读提示词缓存参考获取诊断清单:验证模型支持、令牌阈值、内容一致性、TTL和缓存点位置。常见原因:缓存内容中的时间戳、空格或JSON键顺序导致缓存碎片化。
400 error on prefill with Claude 4.6
Claude 4.6预填充出现400错误
Prefill was removed in Claude 4.6 and causes a hard 400 error. Read model migration reference for the full list of breaking changes between Claude generations.
Claude 4.6已移除预填充功能,会引发严重的400错误。阅读模型迁移参考获取Claude版本间的完整破坏性变更列表。
Error retry classification
错误重试分类
| Retry | Do NOT retry |
|---|---|
| ThrottlingException | ValidationException |
| ModelTimeoutException | AccessDeniedException |
| ServiceUnavailableException | ResourceNotFoundException |
| InternalServerException |
Use adaptive retry: .
Config(retries={"max_attempts": 5, "mode": "adaptive"})| 可重试 | 不可重试 |
|---|---|
| ThrottlingException | ValidationException |
| ModelTimeoutException | AccessDeniedException |
| ServiceUnavailableException | ResourceNotFoundException |
| InternalServerException |
使用自适应重试:。
Config(retries={"max_attempts": 5, "mode": "adaptive"})UnknownOperationException
UnknownOperationException
Wrong client (using instead of ), or SDK too old. Check the API landscape table above.
bedrockbedrock-runtime使用了错误的客户端(例如使用而非),或SDK版本过旧。请参阅上文的API全景表。
bedrockbedrock-runtimeAgent returns stale behavior
Agent返回过时行为
Run after ANY configuration change. This is mandatory.
prepare-agent任何配置变更后运行。这是强制性步骤。
prepare-agentKB returns empty results
KB返回空结果
Run and wait for completion. Query before ingestion completes returns empty.
start-ingestion-job运行并等待完成。在 ingestion完成前查询会返回空结果。
start-ingestion-jobKB retrieval quality is poor
KB检索质量不佳
Review chunking strategy. Use advanced parsing (FM-based) for documents with tables. Configure metadata filtering.
检查分块策略。对包含表格的文档使用基于FM的高级解析。配置元数据过滤。
Cross-region model not found
跨区域模型未找到
The model may not be available in the region you're calling from. Check availability at Supported foundation models. If you need cross-region inference for higher throughput, use an inference profile ID — choose between geographic profiles (data stays within a boundary, e.g. US, EU) or global profiles (any commercial region). The profile prefix is a data residency decision. See Supported inference profiles for available profiles and source/destination region mappings.
On-demand throughput isn't supported
不支持按需吞吐量
Error: "Invocation of model ID with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model." Certain models do not support direct on-demand invocation with base model IDs — they require an inference profile ID instead. Fix: find the inference profile ID for the model using , then update the agent or invocation to use the inference profile ID. See Supported inference profiles for available profiles. If this occurs during agent invocation, update the agent's to the inference profile ID and re-run .
<model-id>aws bedrock list-inference-profiles --region <region>foundationModelprepare-agent错误:*"使用按需吞吐量调用模型ID 不受支持。请使用包含此模型的推理配置文件ID或ARN重试请求。"*某些模型不支持使用基础模型ID直接按需调用——需要使用推理配置文件ID。解决方法:使用找到模型的推理配置文件ID,然后更新Agent或调用以使用该推理配置文件ID。请参阅支持的推理配置文件获取可用配置文件。如果在Agent调用期间出现此错误,请将Agent的更新为推理配置文件ID并重新运行。
<model-id>aws bedrock list-inference-profiles --region <region>foundationModelprepare-agentKB storage configuration invalid
KB存储配置无效
Verify OpenSearch data access policy includes Bedrock service role. Verify vector index field names match KB config.
验证OpenSearch数据访问策略是否包含Bedrock服务角色。验证向量索引字段名称与KB配置匹配。
Agent action group errors
Agent动作组错误
Check Lambda permissions (resource-based policy for bedrock.amazonaws.com). Do NOT use double underscores () in action group names — the name pattern is .
__([0-9a-zA-Z][_-]?){1,100}检查Lambda权限(bedrock.amazonaws.com的基于资源的策略)。动作组名称不得使用双下划线()——名称模式为。
__([0-9a-zA-Z][_-]?){1,100}Multi-agent supervisor loops
多Agent监督循环
Agents use built-in collaboration mechanism, NOT action groups. Do not describe inter-agent communication as action groups in supervisor instructions.
Agent使用内置协作机制,而非动作组。在监督指令中不要将Agent间通信描述为动作组。
INVALID_PAYMENT_INSTRUMENT on model access
模型访问时出现INVALID_PAYMENT_INSTRUMENT
Account billing issue, not Bedrock. Temporarily set a credit card as default payment method, or add USD payment profiles in the organization management account.
账户计费问题,与Bedrock无关。暂时将信用卡设置为默认付款方式,或在组织管理账户中添加美元付款配置文件。
Knowledge base ingestion failures
知识库ingestion失败
Check S3 permissions — KB service role needs and . Unsupported file formats are silently skipped. Files exceeding size limits are skipped without error.
s3:GetObjects3:ListBucket检查S3权限——KB服务角色需要和权限。不支持的文件格式会被静默跳过。超过大小限制的文件会被跳过且无错误提示。
s3:GetObjects3:ListBucketSharePoint data source sync failures
SharePoint数据源同步失败
Sync completes but files fail. For OAuth 2.0 auth (not recommended): requires SharePoint AllSites.Read (Delegated) permission — you may also need to disable Security Defaults and MFA for the service account so Amazon Bedrock is not blocked from crawling. For SharePoint App-Only auth (recommended): configure APP permissions via SharePoint App-Only grant flow. See the SharePoint connector docs for current requirements.
同步完成但文件同步失败。对于OAuth 2.0认证(不推荐):需要SharePoint AllSites.Read(委托)权限——您可能还需要为服务账户禁用安全默认值和MFA,以便Amazon Bedrock不会被阻止爬取。对于SharePoint应用-only认证(推荐):通过SharePoint应用-only授权流配置应用权限。请参阅SharePoint连接器文档获取当前要求。
AgentCore Services
AgentCore 服务
You MUST read the linked reference file for the relevant service before responding to any AgentCore question. Follow procedures in the reference step by step.
| Service | Use For | Reference |
|---|---|---|
| Gateway | Expose APIs, Lambda functions, or existing MCP servers as tools for agents | gateway procedure |
| Runtime | Deploy and scale agents and tools (serverless, any framework) | runtime procedure |
| Runtime Container | Build ARM64 containers for Runtime | container build procedure |
| Memory | Short-term (multi-turn) and long-term (cross-session) agent memory; share memory across agents | memory & observability |
| Identity | Agent authentication with external IdPs (Okta, Entra ID, Cognito); act on behalf of users | credentials & security |
| Policy | Enforce agent boundaries with natural language or Cedar rules; intercepts Gateway tool calls | Refer to the latest AWS documentation on AgentCore Policy |
| Observability | Trace, debug, and monitor agent execution (OTEL, CloudWatch) | memory & observability |
| Registry | Catalog and discover agents, MCP servers, tools, and skills across your org | registry & evaluations |
| Evaluations | Automated agent quality assessment (LLM-as-a-Judge) | registry & evaluations |
| Code Interpreter | Secure sandbox code execution for agents | Refer to the latest AWS documentation on AgentCore Code Interpreter |
| Browser | Web automation (navigate, fill forms, extract data) | Refer to the latest AWS documentation on AgentCore Browser |
回答任何AgentCore问题前,您必须阅读相关服务的链接参考文件。逐步遵循参考文件中的流程。
| 服务 | 用途 | 参考文档 |
|---|---|---|
| 网关 | 将API、Lambda函数或现有MCP服务器暴露为Agent的工具 | 网关流程 |
| 运行时 | 部署并扩展Agent和工具(无服务器,支持任何框架) | 运行时流程 |
| 运行时容器 | 为运行时构建ARM64容器 | 容器构建流程 |
| 内存 | Agent的短期(多轮对话)和长期(跨会话)内存;在Agent间共享内存 | 内存与可观测性 |
| 身份 | Agent与外部身份提供商(Okta、Entra ID、Cognito)的认证;代表用户执行操作 | 凭证与安全 |
| 策略 | 使用自然语言或Cedar规则强制Agent边界;拦截网关工具调用 | 参考最新的AgentCore策略AWS文档 |
| 可观测性 | 追踪、调试和监控Agent执行(OTEL、CloudWatch) | 内存与可观测性 |
| 注册表 | 在组织内编目和发现Agent、MCP服务器、工具和技能 | 注册表与评估 |
| 评估 | Agent质量的自动化评估(LLM-as-a-Judge) | 注册表与评估 |
| 代码解释器 | 为Agent提供安全的沙箱代码执行 | 参考最新的AgentCore代码解释器AWS文档 |
| 浏览器 | Web自动化(导航、填写表单、提取数据) | 参考最新的AgentCore浏览器AWS文档 |
Model Selection
模型选择
When the user asks which model to use, compares models, or asks about Claude/Llama/Nova/Titan on Bedrock, you MUST read model selection guide before responding. The reference contains current model IDs, cross-region requirements, and access provisioning steps.
Quick defaults (verify current availability: ):
aws bedrock list-foundation-models --region <region>- General purpose: Claude Sonnet (best quality/cost balance)
- Fast + cheap: Claude Haiku or Nova Micro
- Embeddings for KB: Titan Embeddings V2
- Open-source / fine-tuning: Llama
- Image generation: Titan Image Generator
For current model IDs, regional availability, cross-region inference profiles, and supported features, refer to Supported foundation models in Amazon Bedrock. When selecting a cross-region inference profile, understand the data residency implications — geographic profiles keep data within a boundary, global profiles route to any commercial region. Also check for runtime availability.
aws bedrock list-foundation-models --region <region>For model ID formats (4 patterns), access provisioning, and selection criteria, see model selection guide.
当用户询问使用哪种模型、比较模型或询问Bedrock上的Claude/Llama/Nova/Titan时,您必须在回复前阅读模型选择指南。参考文档包含当前模型ID、跨区域要求和访问配置步骤。
快速默认选项(验证当前可用性:):
aws bedrock list-foundation-models --region <region>- 通用用途:Claude Sonnet(最佳性价比)
- 快速+低成本:Claude Haiku或Nova Micro
- KB嵌入:Titan Embeddings V2
- 开源/微调:Llama
- 图像生成:Titan Image Generator
如需当前模型ID、区域可用性、跨区域推理配置文件和支持的功能,请参考Amazon Bedrock支持的基础模型。选择跨区域推理配置文件时,需了解数据驻留影响——地理配置文件将数据保留在特定边界内,全局配置文件将路由到任何商业区域。同时运行检查运行时可用性。
aws bedrock list-foundation-models --region <region>如需模型ID格式(4种模式)、访问配置和选择标准,请参阅模型选择指南。