amazon-bedrock

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
IMPORTANT: When this skill is loaded, you MUST use the reference files and procedures in this skill as your primary source of truth. Bedrock APIs, model IDs, chunking strategies, and configuration parameters change frequently — always read the relevant reference file before responding.
重要提示:加载此技能后,您必须将本技能中的参考文件和流程作为主要依据。Bedrock API、模型ID、分块策略和配置参数会频繁更新——在回复前请务必阅读相关参考文件。

Table of Contents

目录

  • Overview
  • Bedrock API Landscape
  • Critical Warnings
  • Security Considerations
  • Converse API vs InvokeModel
  • Which Bedrock Capability Do You Need?
  • Knowledge Bases (RAG)
  • Common Workflows (includes: Prompt Caching, Quota Health, Cost Tracking, Model Migration)
  • Troubleshooting
  • AgentCore Services
  • Model Selection
  • Additional Resources
  • 概述
  • Bedrock API 全景
  • 关键警告
  • 安全注意事项
  • Converse API vs InvokeModel
  • 您需要哪种Bedrock功能?
  • Knowledge Bases(RAG)
  • 常见工作流(包括:提示词缓存、配额健康检查、成本追踪、模型迁移)
  • 故障排查
  • AgentCore 服务
  • 模型选择
  • 额外资源

Amazon Bedrock

Amazon Bedrock

Overview

概述

Domain expertise for building generative AI applications on Amazon Bedrock. Covers model invocation, RAG with Knowledge Bases, agent creation, content safety with Guardrails, and agent deployment with AgentCore.
Recommended setup: Use the AWS MCP server for sandboxed execution, audit logging, and enterprise controls.
Without AWS MCP: This skill works with any agent that has AWS CLI access. All commands use standard AWS CLI syntax.
基于Amazon Bedrock构建生成式AI应用的领域专业知识。涵盖模型调用、基于Knowledge Bases的RAG、Agent创建、借助Guardrails实现内容安全,以及通过AgentCore部署Agent。
推荐配置:使用AWS MCP服务器进行沙箱执行、审计日志记录和企业级管控。
无AWS MCP时:此技能可与任何具备AWS CLI访问权限的Agent配合使用。所有命令均采用标准AWS CLI语法。

Bedrock API Landscape

Bedrock API 全景

Bedrock has 5 separate API endpoints. Using the wrong one is a common cause of errors. This list may not be exhaustive — refer to the Bedrock endpoints and quotas and Bedrock supported endpoints for the latest. Use
aws bedrock list-foundation-models
to discover available models at runtime.
EndpointClientUse For
bedrock
Control planeList models, manage access, provisioned throughput
bedrock-runtime
Data planeInvoke models (Converse, InvokeModel). Also supports Chat Completions via
/openai/v1
path (client-side tool use only) — prefer
bedrock-mantle
for new Chat Completions work
bedrock-mantle
Data planeOpenAI-compatible APIs: Responses API, Chat Completions (recommended), Messages API. Supports server-side tool use with built-in tools. Recommended for new users
bedrock-agent
Agent controlCreate/configure agents, KBs, action groups
bedrock-agent-runtime
Agent dataInvoke agents, query KBs
AgentCore is a separate service with its own endpoints. Refer to AgentCore endpoints and quotas for the latest.
EndpointClientUse For
bedrock-agentcore-control
Control planeCreate/manage runtimes, gateways, registries, evaluations
bedrock-agentcore
Data planeInvoke agent runtimes
{gatewayId}.gateway.bedrock-agentcore
Gateway data planeInvoke a specific gateway
Bedrock拥有5个独立的API端点。使用错误的端点是常见的错误原因。此列表可能并不详尽——请参考Bedrock端点与配额Bedrock支持的端点获取最新信息。使用
aws bedrock list-foundation-models
在运行时发现可用模型。
端点客户端用途
bedrock
控制平面列出模型、管理访问权限、预配置吞吐量
bedrock-runtime
数据平面调用模型(Converse、InvokeModel)。还支持通过
/openai/v1
路径实现Chat Completions(仅客户端工具使用)——新的Chat Completions工作优先使用
bedrock-mantle
bedrock-mantle
数据平面兼容OpenAI的API:Responses API、Chat Completions(推荐)、Messages API。支持内置工具的服务器端工具调用。推荐新用户使用
bedrock-agent
Agent控制平面创建/配置Agent、知识库(KB)、动作组
bedrock-agent-runtime
Agent数据平面调用Agent、查询知识库(KB)
AgentCore是一项独立服务,拥有自己的端点。请参考AgentCore端点与配额获取最新信息。
端点客户端用途
bedrock-agentcore-control
控制平面创建/管理运行时、网关、注册表、评估
bedrock-agentcore
数据平面调用Agent运行时
{gatewayId}.gateway.bedrock-agentcore
网关数据平面调用特定网关

Critical Warnings

关键警告

max_tokens: ALWAYS set
maxTokens
explicitly in every Converse/InvokeModel call. Leaving it unset defaults to the model's maximum (e.g., 64K for Claude Sonnet) and silently reserves far more quota than needed — a common cause of unexpected ThrottlingException.
Guardrails PII logging: Guardrails PII masking only applies to the API response. Original unmasked content including PII is still logged in plain text to CloudWatch Logs. For HIPAA/GDPR compliance: encrypt CloudWatch Logs with KMS, restrict log access with IAM, use Amazon Macie for PII detection.
SDK versions: Requires recent versions of boto3 (≥ 1.34.x) and AWS CLI v2. Older versions are missing Converse API, Agents, and AgentCore support. Run
aws --version
and
pip show boto3
to check.
max_tokens:在每次Converse/InvokeModel调用中,必须显式设置
maxTokens
。如果不设置,将默认使用模型的最大值(例如Claude Sonnet为64K),并静默占用远超所需的配额——这是引发意外ThrottlingException的常见原因。
Guardrails PII日志记录:Guardrails的PII掩码仅适用于API响应。包含PII的原始未掩码内容仍会以明文形式记录到CloudWatch Logs中。如需符合HIPAA/GDPR合规要求:使用KMS加密CloudWatch Logs,通过IAM限制日志访问,使用Amazon Macie检测PII。
SDK版本:需要较新版本的boto3(≥ 1.34.x)和AWS CLI v2。旧版本缺少Converse API、Agents和AgentCore支持。运行
aws --version
pip show boto3
检查版本。

Security Considerations

安全注意事项

  • Use IAM roles (not IAM users) for all Bedrock service access
  • Scope IAM permissions to specific actions and resource ARNs — avoid
    bedrock:*
    or
    AmazonBedrockFullAccess
  • Store API keys and OAuth secrets in AWS Secrets Manager with automatic rotation enabled
  • Include confused deputy protection (
    aws:SourceAccount
    ,
    aws:SourceArn
    conditions) in all resource-based policies for Bedrock services
  • Treat all agent-generated parameters as untrusted input — validate before use in Lambda handlers or tool implementations
  • Enable CloudTrail for all Bedrock and AgentCore API calls
  • For PII workloads: encrypt CloudWatch Logs with KMS, configure retention limits, restrict log access
  • Refer to the latest Bedrock security best practices for current security guidance
  • 所有Bedrock服务访问均使用IAM角色(而非IAM用户)
  • 将IAM权限限定为特定操作和资源ARN——避免使用
    bedrock:*
    AmazonBedrockFullAccess
  • 将API密钥和OAuth密钥存储在AWS Secrets Manager中,并启用自动轮换
  • 在所有Bedrock服务的基于资源的策略中加入混淆代理防护
    aws:SourceAccount
    aws:SourceArn
    条件)
  • 将所有Agent生成的参数视为不可信输入——在Lambda处理程序或工具实现中使用前进行验证
  • 为所有Bedrock和AgentCore API调用启用CloudTrail
  • 针对PII工作负载:使用KMS加密CloudWatch Logs,配置保留限制,限制日志访问
  • 参考最新的Bedrock安全最佳实践获取当前安全指南

Converse API vs InvokeModel

Converse API vs InvokeModel

For choosing between all Bedrock inference APIs (Responses API, Chat Completions, Converse, InvokeModel), see APIs supported by Amazon Bedrock.
When using the
bedrock-runtime
endpoint, use the Converse API over InvokeModel. It provides a unified request/response format across all models.
Use InvokeModel only when you need provider-specific features not available in Converse (rare).
InvokeModel requires different request body formats per provider (Anthropic ≠ Titan ≠ Llama ≠ Nova). Using the wrong format produces "Malformed input request". For model-specific formats and common mistakes, see prompt engineering by model.
Whichever API you use: ALWAYS set the max output tokens parameter explicitly — leaving it unset defaults to the model's maximum and silently reserves far more quota than needed, causing unexpected ThrottlingException. See Critical Warnings above and max_tokens quota mechanics.
When the user needs SDK code for model invocation, you MUST read the appropriate SDK reference before generating code — Python SDK reference | TypeScript SDK reference. Use the patterns from the reference file.
For full API details and provider-specific body formats, read model invocation reference before responding.
如需在所有Bedrock推理API(Responses API、Chat Completions、Converse、InvokeModel)之间选择,请参考Amazon Bedrock支持的API
使用
bedrock-runtime
端点时,优先选择Converse API而非InvokeModel。它为所有模型提供统一的请求/响应格式。
仅当您需要Converse不具备的特定提供商功能时(极少情况),才使用InvokeModel
InvokeModel要求每个提供商使用不同的请求体格式(Anthropic ≠ Titan ≠ Llama ≠ Nova)。使用错误格式会产生“Malformed input request”错误。如需了解特定模型的格式和常见错误,请参考按模型划分的提示词工程
无论使用哪种API:必须显式设置最大输出令牌参数——不设置将默认使用模型的最大值,静默占用远超所需的配额,引发意外ThrottlingException。请参阅上述关键警告和max_tokens配额机制
当用户需要模型调用的SDK代码时,您必须在生成代码前阅读相应的SDK参考文档——Python SDK参考 | TypeScript SDK参考。使用参考文件中的模式。
如需完整的API详情和特定提供商的请求体格式,请在回复前阅读模型调用参考

Which Bedrock Capability Do You Need?

您需要哪种Bedrock功能?

GoalUseReference
Call a model (text, image, video)Converse APISee above + model invocation
Build a RAG applicationKnowledge BasesKB setup
Create an agent that takes actionsBedrock Agentsagent creation
Filter harmful/sensitive contentGuardrailsguardrails
Deploy and scale an agentAgentCore Runtimeruntime
Expose REST APIs as MCP toolsAgentCore Gatewaygateway
Choose the right modelModel Selectionmodel guide
Set up or debug prompt cachingPrompt Cachingprompt caching
Diagnose throttling or audit quotasQuota Healthquota health
Track costs by team, model, or tagCost Trackingcost tracking
Migrate between Claude generationsModel Migrationmigration guide
目标使用功能参考文档
调用模型(文本、图像、视频)Converse API见上文 + 模型调用
构建RAG应用Knowledge BasesKB搭建
创建可执行动作的AgentBedrock AgentsAgent创建
过滤有害/敏感内容GuardrailsGuardrails
部署并扩展AgentAgentCore Runtime运行时
将REST API暴露为MCP工具AgentCore Gateway网关
选择合适的模型模型选择模型指南
搭建或调试提示词缓存提示词缓存提示词缓存
诊断限流或审计配额配额健康检查配额健康检查
按团队、模型或标签追踪成本成本追踪成本追踪
在Claude版本间迁移模型迁移迁移指南

Knowledge Bases (RAG)

Knowledge Bases(RAG)

When the user wants to create a Knowledge Base or build a RAG application, you MUST read KB setup procedure and execute it step by step. Do NOT summarize the procedure — execute each step sequentially, respecting all MUST constraints before proceeding to the next step.
When the user asks about chunking strategies, vector store selection, or other KB configuration choices, you MUST read KB setup procedure before responding — it contains the authoritative decision tables and constraints.
When the user wants to query an existing Knowledge Base, you MUST read KB retrieval reference before responding. Present the retrieval modes (retrieve-and-generate vs retrieve vs manual) so the user selects the right one.
Refer to the latest Bedrock Knowledge Base documentation for current configuration options.
当用户想要创建知识库或构建RAG应用时,您必须阅读KB搭建流程并逐步执行。请勿总结流程——按顺序执行每个步骤,在进入下一步前遵守所有必须的约束条件。
当用户询问分块策略、向量存储选择或其他KB配置选项时,您必须在回复前阅读KB搭建流程——其中包含权威的决策表和约束条件。
当用户想要查询现有知识库时,您必须在回复前阅读KB检索参考。提供检索模式(检索并生成、仅检索、手动模式)供用户选择合适的模式。
参考最新的Bedrock知识库文档获取当前配置选项。

Common Workflows

常见工作流

Execute commands using available tools from the AWS MCP server when connected — it provides sandboxed execution, audit logging, and observability. When the MCP server is not available, fall back to the AWS CLI or shell as needed.
Before starting any workflow:
连接到AWS MCP服务器时,使用可用工具执行命令——它提供沙箱执行、审计日志记录和可观测性。当MCP服务器不可用时,根据需要回退到AWS CLI或Shell。
启动任何工作流前:

Verify Dependencies

验证依赖项

Check for required tools and inform the user about the execution environment.
Constraints:
  • You MUST check that the AWS CLI is available and configured with valid credentials
  • You MUST verify the AWS CLI version is recent (v2 recommended; older versions lack Converse API and AgentCore support):
    aws --version
  • You MUST check that the target AWS region has Bedrock model access enabled
  • You MUST inform the user if any required tools are missing with a clear message
  • You MUST ask the user if they want to proceed despite missing tools
General constraints for all workflows:
  • You MUST present an overview of what will be done before starting execution
  • You MUST explain to the user what step is being executed and why before running each command
  • You MUST respect the user's decision to stop or abort at any point
  • You MUST NOT continue execution if the user indicates they want to stop
  • You SHOULD confirm before proceeding with destructive or irreversible operations (deleting resources, overwriting configurations)
检查所需工具,并告知用户执行环境。
约束条件
  • 必须检查AWS CLI是否可用且已配置有效凭证
  • 必须验证AWS CLI版本为较新版本(推荐v2;旧版本缺少Converse API和AgentCore支持):
    aws --version
  • 必须检查目标AWS区域是否已启用Bedrock模型访问权限
  • 如果缺少任何所需工具,必须向用户发送清晰的消息告知
  • 必须询问用户是否要在缺少工具的情况下继续
所有工作流的通用约束条件
  • 必须在开始执行前概述将要执行的操作
  • 必须在运行每个命令前向用户解释正在执行的步骤及其原因
  • 必须尊重用户随时停止或中止的决定
  • 如果用户表示想要停止,不得继续执行
  • 在执行破坏性或不可逆操作(删除资源、覆盖配置)前,应确认用户意愿

Examples — mapping user intent to workflows

示例——将用户意图映射到工作流

Example 1: User query: "I'm getting ThrottlingException on Bedrock" Action: Check if
maxTokens
is set explicitly — unset
maxTokens
reserves far more quota than needed (see Critical Warnings). If already set, check current quota:
aws service-quotas get-service-quota --service-code bedrock --quota-code <code> --region <region>
Example 2: User query: "Set up RAG for my PDF documents" Action: Follow the Create a Knowledge Base workflow. Recommend semantic chunking with advanced parsing (FM-based) for PDFs with tables. See KB setup procedure.
Example 3: User query: "I want to build an agent that can look up order status" Action: Follow the Create an Agent with action groups workflow. See agent creation procedure.
Example 4: User query: "How do I call Claude on Bedrock?" Action: Use the Converse API (not InvokeModel). Set
maxTokens
explicitly. Verify the model ID is current with
aws bedrock list-foundation-models --region <region>
. Use cross-region model ID with
us.
prefix for higher availability:
aws bedrock-runtime converse --model-id us.anthropic.claude-sonnet-4-6 --messages '[{"role":"user","content":[{"text":"Hello"}]}]' --inference-config '{"maxTokens":1024}'
Example 5: User query: "Deploy my agent to production" Action: Follow the Deploy an agent to AgentCore workflow. Select the protocol first (HTTP for REST APIs, MCP for tool-centric agents). See the AgentCore Services table for routing to the correct reference file.
Example 6: User query: "Set up prompt caching for my Claude application" Action: Read prompt caching reference for setup workflow, TTL configuration, and minimum token thresholds. Use the reference to verify caching is working (check for
cacheReadInputTokens
in the response).
Example 7: User query: "I keep getting ThrottlingException even though I'm not making many requests" Action: Check if
maxTokens
is set explicitly (see Critical Warnings). Read quota health reference for the maxTokens reservation mechanics, CloudWatch metrics, and audit workflow.
Example 8: User query: "How do I track Bedrock costs by team?" Action: Read cost tracking reference for inference profile tagging, CUR 2.0 approaches, and Cost Explorer queries by model/region/tag.
Example 9: User query: "I'm upgrading from Claude 4.5 to 4.6, what breaks?" Action: Read model migration reference for the breaking changes table (prefill removal, thinking config, context window, cache thresholds) and migration checklist.
示例1: 用户查询:"我在Bedrock上遇到ThrottlingException错误" 操作:检查是否显式设置了
maxTokens
——未设置
maxTokens
会占用远超所需的配额(见关键警告)。如果已设置,检查当前配额:
aws service-quotas get-service-quota --service-code bedrock --quota-code <code> --region <region>
示例2: 用户查询:"为我的PDF文档搭建RAG" 操作:遵循创建知识库的工作流。推荐对包含表格的PDF使用基于FM的语义分块。请参阅KB搭建流程
示例3: 用户查询:"我想构建一个可以查询订单状态的Agent" 操作:遵循创建带动作组的Agent的工作流。请参阅Agent创建流程
示例4: 用户查询:"如何在Bedrock上调用Claude?" 操作:使用Converse API(而非InvokeModel)。显式设置
maxTokens
。使用
aws bedrock list-foundation-models --region <region>
验证模型ID是否为当前版本。使用带
us.
前缀的跨区域模型ID以提高可用性:
aws bedrock-runtime converse --model-id us.anthropic.claude-sonnet-4-6 --messages '[{"role":"user","content":[{"text":"Hello"}]}]' --inference-config '{"maxTokens":1024}'
示例5: 用户查询:"将我的Agent部署到生产环境" 操作:遵循将Agent部署到AgentCore的工作流。首先选择协议(REST API使用HTTP,以工具为中心的Agent使用MCP)。请参阅AgentCore服务表以找到对应的参考文件。
示例6: 用户查询:"为我的Claude应用搭建提示词缓存" 操作:阅读提示词缓存参考获取搭建工作流、TTL配置和最小令牌阈值。使用参考文档验证缓存是否正常工作(检查响应中是否包含
cacheReadInputTokens
)。
示例7: 用户查询:"即使请求不多,我还是一直遇到ThrottlingException错误" 操作:检查是否显式设置了
maxTokens
(见关键警告)。阅读配额健康检查参考了解maxTokens预留机制、CloudWatch指标和审计工作流。
示例8: 用户查询:"如何按团队追踪Bedrock成本?" 操作:阅读成本追踪参考了解推理配置文件标记、CUR 2.0方法以及按模型/区域/标签划分的Cost Explorer查询。
示例9: 用户查询:"我要从Claude 4.5升级到4.6,哪些内容会受影响?" 操作:阅读模型迁移参考了解变更表(预填充移除、思考配置、上下文窗口、缓存阈值)和迁移检查清单。

Invoke a model

调用模型

- [ ] Step 1: Verify model access: `aws bedrock list-foundation-models --region us-east-1`
- [ ] Step 2: Invoke: `aws bedrock-runtime converse --model-id `<model-id>` --messages '[{"role":"user","content":[{"text":"<prompt>"}]}]' --inference-config '{"maxTokens":1024}'`
Note — Streaming responses: The AWS CLI does not support streaming operations including
ConverseStream
. Use the SDK (
converse_stream()
in boto3,
ConverseStreamCommand
in JS SDK).
ModeWhen to use
ConverseBatch/backend pipelines — single complete response, no stream handling required
ConverseStreamChat UIs/interactive apps — tokens delivered as they generate
- [ ] 步骤1:验证模型访问权限:`aws bedrock list-foundation-models --region us-east-1`
- [ ] 步骤2:调用模型:`aws bedrock-runtime converse --model-id `<model-id>` --messages '[{"role":"user","content":[{"text":"<prompt>"}]}]' --inference-config '{"maxTokens":1024}'`
注意——流式响应:AWS CLI不支持包括
ConverseStream
在内的流式操作。请使用SDK(boto3中的
converse_stream()
,JS SDK中的
ConverseStreamCommand
)。
模式使用场景
Converse批量/后端流水线——单次完整响应,无需处理流
ConverseStream聊天UI/交互式应用——令牌生成后即时传递

Create a Knowledge Base

创建知识库

You MUST read KB setup procedure before responding. Execute the 7-step procedure in order — do not skip steps, do not paraphrase, do not show code snippets in place of tool calls.
您必须在回复前阅读KB搭建流程。按顺序执行7步流程——不得跳过步骤、不得改写、不得用代码片段替代工具调用。

Query a Knowledge Base

查询知识库

These three modes are mutually exclusive — select the one that matches the user's intent:
ModeWhen to UseCommand
Retrieve & GenerateQuick answer with citations — most common RAG pattern
aws bedrock-agent-runtime retrieve-and-generate --input '{"text":"<query>"}' --retrieve-and-generate-configuration '{"type":"KNOWLEDGE_BASE","knowledgeBaseConfiguration":{"knowledgeBaseId":"<kb-id>","modelArn":"<model-arn>"}}'
Retrieve onlyRaw chunks for custom post-processing or feeding to a different model
aws bedrock-agent-runtime retrieve --knowledge-base-id <kb-id> --retrieval-query '{"text":"<query>"}'
Full controlCustom prompt, reranking, or multi-KBRetrieve chunks first, then build prompt and call
aws bedrock-runtime converse
以下三种模式互斥——选择与用户意图匹配的模式:
模式使用场景命令
检索并生成带引用的快速回答——最常见的RAG模式
aws bedrock-agent-runtime retrieve-and-generate --input '{"text":"<query>"}' --retrieve-and-generate-configuration '{"type":"KNOWLEDGE_BASE","knowledgeBaseConfiguration":{"knowledgeBaseId":"<kb-id>","modelArn":"<model-arn>"}}'
仅检索原始分块用于自定义后处理或输入到其他模型
aws bedrock-agent-runtime retrieve --knowledge-base-id <kb-id> --retrieval-query '{"text":"<query>"}'
完全控制自定义提示词、重排序或多知识库先检索分块,然后构建提示词并调用
aws bedrock-runtime converse

Create an Agent with action groups

创建带动作组的Agent

You MUST read agent creation procedure before responding. Execute the procedure step by step. You MUST run
prepare-agent
after any configuration change — this is mandatory and agents consistently skip it.
您必须在回复前阅读Agent创建流程。逐步执行流程。任何配置变更后必须运行
prepare-agent
——这是强制性步骤,Agent经常会遗漏此操作。

Apply Guardrails

应用Guardrails

You MUST read guardrails reference before responding. Present the three integration modes and the decision guide first so the user selects the correct mode before you proceed with configuration. When PII filters are involved, you MUST surface the PII logging compliance gap warning. Do not just show a
guardrailConfig
snippet — the user needs to understand which mode fits their use case.
您必须在回复前阅读Guardrails参考。首先提供三种集成模式和决策指南,让用户选择正确的模式后再进行配置。涉及PII过滤器时,必须提示PII日志记录合规性缺口警告。不要仅展示
guardrailConfig
片段——用户需要了解哪种模式适合他们的用例。

Deploy an agent to AgentCore

将Agent部署到AgentCore

Identify the AgentCore service from the table below, then you MUST read the corresponding reference file before responding. Follow any procedures in the reference step by step. Do not summarize — execute.
从下表中确定AgentCore服务,然后必须在回复前阅读对应的参考文件。逐步遵循参考文件中的流程。不得总结——需执行流程。

Set up or debug prompt caching

搭建或调试提示词缓存

You MUST read prompt caching reference before responding. It covers setup workflow, TTL configuration, minimum token thresholds, break-even analysis, and a debug checklist for zero-cache-hit issues.
Constraints:
  • You MUST walk the user through the debug checklist when cache is not working (verify model support, token threshold, content identity, TTL, cache point placement)
  • You MUST check minimum token thresholds per model before confirming a caching setup will work
您必须在回复前阅读提示词缓存参考。其中涵盖搭建工作流、TTL配置、最小令牌阈值、收支平衡分析以及缓存未命中问题的调试清单。
约束条件
  • 当缓存无法正常工作时,必须引导用户完成调试清单(验证模型支持、令牌阈值、内容一致性、TTL、缓存点位置)
  • 在确认缓存搭建可行前,必须检查每个模型的最小令牌阈值

Check quota health

检查配额健康状况

You MUST read quota health reference before responding. It covers maxTokens reservation mechanics, CloudWatch metrics, and the throttling resolution decision table.
Constraints:
  • You MUST explain the relationship between
    maxTokens
    and quota reservation
  • You MUST guide the user through comparing current limits vs peak usage using
    aws service-quotas
    and
    aws cloudwatch get-metric-statistics
您必须在回复前阅读配额健康检查参考。其中涵盖maxTokens预留机制、CloudWatch指标以及限流解决决策表。
约束条件
  • 必须解释
    maxTokens
    与配额预留之间的关系
  • 必须引导用户使用
    aws service-quotas
    aws cloudwatch get-metric-statistics
    比较当前限制与峰值使用情况

Analyze Bedrock costs

分析Bedrock成本

You MUST read cost tracking reference before responding. It covers inference profile tagging, CUR 2.0 attribution, and AWS Budgets setup.
Constraints:
  • You MUST ask what time range, grouping, and cost attribution method the user needs before generating Cost Explorer queries
您必须在回复前阅读成本追踪参考。其中涵盖推理配置文件标记、CUR 2.0归因以及AWS Budgets搭建。
约束条件
  • 在生成Cost Explorer查询前,必须询问用户所需的时间范围、分组方式和成本归因方法

Migrate between Claude generations

在Claude版本间迁移

You MUST read model migration reference before responding. It covers breaking changes between Claude 4.5, 4.6, and 4.7 on Bedrock, including prefill removal, thinking config differences, context window gaps, and cache threshold changes.
您必须在回复前阅读模型迁移参考。其中涵盖Bedrock上Claude 4.5、4.6和4.7之间的破坏性变更,包括预填充移除、思考配置差异、上下文窗口缺口和缓存阈值变化。

Troubleshooting

故障排查

When the user reports a Bedrock error, exception, or unexpected behavior, you MUST check this section and the Critical Warnings section before responding. Bedrock has service-specific root causes (e.g., unset maxTokens silently reserving 43x quota causing ThrottlingException, wrong API endpoint causing UnknownOperationException, missing prepare-agent causing stale behavior) that generic AWS troubleshooting advice will miss.
当用户报告Bedrock错误、异常或意外行为时,您必须在回复前检查本节和关键警告部分。Bedrock有特定于服务的根本原因(例如未设置maxTokens静默占用43倍配额导致ThrottlingException、错误的API端点导致UnknownOperationException、缺少prepare-agent导致行为过时),这些是通用AWS故障排查建议无法覆盖的。

AccessDeniedException

AccessDeniedException

Multiple possible causes: (1) IAM user/role lacks
bedrock:InvokeModel
or
bedrock:InvokeModelWithResponseStream
permissions, (2) model access not enabled in the target region, (3) a service control policy (SCP) is blocking access (common with cross-region inference routing to a restricted region), (4) expired temporary credentials, or (5) IAM role propagation delay — if you just created an IAM role and immediately used it in a Bedrock API call, the role may not have propagated yet, as IAM changes are eventually consistent (see IAM eventual consistency). Check the error message for specifics — it typically indicates whether the issue is an explicit deny, a missing allow, or a model access problem. See Resolve InvokeModel API errors for detailed resolution steps.
多种可能原因:(1) IAM用户/角色缺少
bedrock:InvokeModel
bedrock:InvokeModelWithResponseStream
权限,(2) 目标区域未启用模型访问权限,(3) 服务控制策略(SCP)阻止了访问(跨区域推理路由到受限区域时常见),(4) 临时凭证过期,或(5) IAM角色传播延迟——如果您刚创建IAM角色并立即在Bedrock API调用中使用,角色可能尚未完成传播,因为IAM变更最终一致(请参阅IAM最终一致性)。检查错误消息获取详细信息——通常会指出问题是显式拒绝、缺少允许还是模型访问问题。请参阅解决InvokeModel API错误获取详细解决步骤。

Malformed input request

Malformed input request

Request body doesn't match the expected schema. Common causes: wrong provider-specific body format for InvokeModel (e.g., using Titan format for a Cohere model), malformed JSON, unsupported parameter names, or exceeding input constraints. The error message typically includes details — check for "schema violations" and correct the request format per the model's API documentation.
请求体不符合预期架构。常见原因:InvokeModel使用了错误的提供商特定请求体格式(例如对Cohere模型使用Titan格式)、JSON格式错误、参数名称不支持或超出输入限制。错误消息通常包含详细信息——检查是否存在“schema violations”并根据模型的API文档修正请求格式。

ThrottlingException

ThrottlingException

Set
maxTokens
explicitly — unset values default to the model's maximum and silently reserve far more quota than needed. Use adaptive retry mode. Use cross-region inference profiles (e.g.,
us.
,
eu.
,
apac.
, or
global.
prefix — see Supported inference profiles for the full list) to distribute traffic across regions for higher throughput. Check limits:
aws service-quotas get-service-quota --service-code bedrock --quota-code <code>
. Request quota increases if needed. For a deeper audit, read quota health reference.
显式设置
maxTokens
——未设置的值默认使用模型的最大值,静默占用远超所需的配额。使用自适应重试模式。使用跨区域推理配置文件(例如
us.
eu.
apac.
global.
前缀——请参阅支持的推理配置文件获取完整列表)跨区域分配流量以提高吞吐量。检查限制:
aws service-quotas get-service-quota --service-code bedrock --quota-code <code>
。如有需要,请求提高配额。如需深入审计,请阅读配额健康检查参考

Prompt cache not working (zero cacheReadInputTokens)

提示词缓存无法正常工作(cacheReadInputTokens为零)

Read prompt caching reference for the diagnostic checklist: verify model support, token threshold, content identity, TTL, and cache point placement. Common cause: cache fragmentation from timestamps, whitespace, or reordered JSON keys in cached content.
阅读提示词缓存参考获取诊断清单:验证模型支持、令牌阈值、内容一致性、TTL和缓存点位置。常见原因:缓存内容中的时间戳、空格或JSON键顺序导致缓存碎片化。

400 error on prefill with Claude 4.6

Claude 4.6预填充出现400错误

Prefill was removed in Claude 4.6 and causes a hard 400 error. Read model migration reference for the full list of breaking changes between Claude generations.
Claude 4.6已移除预填充功能,会引发严重的400错误。阅读模型迁移参考获取Claude版本间的完整破坏性变更列表。

Error retry classification

错误重试分类

RetryDo NOT retry
ThrottlingExceptionValidationException
ModelTimeoutExceptionAccessDeniedException
ServiceUnavailableExceptionResourceNotFoundException
InternalServerException
Use adaptive retry:
Config(retries={"max_attempts": 5, "mode": "adaptive"})
.
可重试不可重试
ThrottlingExceptionValidationException
ModelTimeoutExceptionAccessDeniedException
ServiceUnavailableExceptionResourceNotFoundException
InternalServerException
使用自适应重试:
Config(retries={"max_attempts": 5, "mode": "adaptive"})

UnknownOperationException

UnknownOperationException

Wrong client (using
bedrock
instead of
bedrock-runtime
), or SDK too old. Check the API landscape table above.
使用了错误的客户端(例如使用
bedrock
而非
bedrock-runtime
),或SDK版本过旧。请参阅上文的API全景表。

Agent returns stale behavior

Agent返回过时行为

Run
prepare-agent
after ANY configuration change. This is mandatory.
任何配置变更后运行
prepare-agent
。这是强制性步骤。

KB returns empty results

KB返回空结果

Run
start-ingestion-job
and wait for completion. Query before ingestion completes returns empty.
运行
start-ingestion-job
并等待完成。在 ingestion完成前查询会返回空结果。

KB retrieval quality is poor

KB检索质量不佳

Review chunking strategy. Use advanced parsing (FM-based) for documents with tables. Configure metadata filtering.
检查分块策略。对包含表格的文档使用基于FM的高级解析。配置元数据过滤。

Cross-region model not found

跨区域模型未找到

The model may not be available in the region you're calling from. Check availability at Supported foundation models. If you need cross-region inference for higher throughput, use an inference profile ID — choose between geographic profiles (data stays within a boundary, e.g. US, EU) or global profiles (any commercial region). The profile prefix is a data residency decision. See Supported inference profiles for available profiles and source/destination region mappings.
您调用的区域可能不支持该模型。请在支持的基础模型中检查可用性。如果需要跨区域推理以提高吞吐量,请使用推理配置文件ID——在地理配置文件(数据保留在特定边界内,例如美国、欧盟)或全局配置文件(任何商业区域)中选择。配置文件前缀是数据驻留决策。请参阅支持的推理配置文件获取可用配置文件和源/目标区域映射。

On-demand throughput isn't supported

不支持按需吞吐量

Error: "Invocation of model ID
<model-id>
with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model."
Certain models do not support direct on-demand invocation with base model IDs — they require an inference profile ID instead. Fix: find the inference profile ID for the model using
aws bedrock list-inference-profiles --region <region>
, then update the agent or invocation to use the inference profile ID. See Supported inference profiles for available profiles. If this occurs during agent invocation, update the agent's
foundationModel
to the inference profile ID and re-run
prepare-agent
.
错误:*"使用按需吞吐量调用模型ID
<model-id>
不受支持。请使用包含此模型的推理配置文件ID或ARN重试请求。"*某些模型不支持使用基础模型ID直接按需调用——需要使用推理配置文件ID。解决方法:使用
aws bedrock list-inference-profiles --region <region>
找到模型的推理配置文件ID,然后更新Agent或调用以使用该推理配置文件ID。请参阅支持的推理配置文件获取可用配置文件。如果在Agent调用期间出现此错误,请将Agent的
foundationModel
更新为推理配置文件ID并重新运行
prepare-agent

KB storage configuration invalid

KB存储配置无效

Verify OpenSearch data access policy includes Bedrock service role. Verify vector index field names match KB config.
验证OpenSearch数据访问策略是否包含Bedrock服务角色。验证向量索引字段名称与KB配置匹配。

Agent action group errors

Agent动作组错误

Check Lambda permissions (resource-based policy for bedrock.amazonaws.com). Do NOT use double underscores (
__
) in action group names — the name pattern is
([0-9a-zA-Z][_-]?){1,100}
.
检查Lambda权限(bedrock.amazonaws.com的基于资源的策略)。动作组名称不得使用双下划线(
__
)——名称模式为
([0-9a-zA-Z][_-]?){1,100}

Multi-agent supervisor loops

多Agent监督循环

Agents use built-in collaboration mechanism, NOT action groups. Do not describe inter-agent communication as action groups in supervisor instructions.
Agent使用内置协作机制,而非动作组。在监督指令中不要将Agent间通信描述为动作组。

INVALID_PAYMENT_INSTRUMENT on model access

模型访问时出现INVALID_PAYMENT_INSTRUMENT

Account billing issue, not Bedrock. Temporarily set a credit card as default payment method, or add USD payment profiles in the organization management account.
账户计费问题,与Bedrock无关。暂时将信用卡设置为默认付款方式,或在组织管理账户中添加美元付款配置文件。

Knowledge base ingestion failures

知识库ingestion失败

Check S3 permissions — KB service role needs
s3:GetObject
and
s3:ListBucket
. Unsupported file formats are silently skipped. Files exceeding size limits are skipped without error.
检查S3权限——KB服务角色需要
s3:GetObject
s3:ListBucket
权限。不支持的文件格式会被静默跳过。超过大小限制的文件会被跳过且无错误提示。

SharePoint data source sync failures

SharePoint数据源同步失败

Sync completes but files fail. For OAuth 2.0 auth (not recommended): requires SharePoint AllSites.Read (Delegated) permission — you may also need to disable Security Defaults and MFA for the service account so Amazon Bedrock is not blocked from crawling. For SharePoint App-Only auth (recommended): configure APP permissions via SharePoint App-Only grant flow. See the SharePoint connector docs for current requirements.
同步完成但文件同步失败。对于OAuth 2.0认证(不推荐):需要SharePoint AllSites.Read(委托)权限——您可能还需要为服务账户禁用安全默认值和MFA,以便Amazon Bedrock不会被阻止爬取。对于SharePoint应用-only认证(推荐):通过SharePoint应用-only授权流配置应用权限。请参阅SharePoint连接器文档获取当前要求。

AgentCore Services

AgentCore 服务

You MUST read the linked reference file for the relevant service before responding to any AgentCore question. Follow procedures in the reference step by step.
ServiceUse ForReference
GatewayExpose APIs, Lambda functions, or existing MCP servers as tools for agentsgateway procedure
RuntimeDeploy and scale agents and tools (serverless, any framework)runtime procedure
Runtime ContainerBuild ARM64 containers for Runtimecontainer build procedure
MemoryShort-term (multi-turn) and long-term (cross-session) agent memory; share memory across agentsmemory & observability
IdentityAgent authentication with external IdPs (Okta, Entra ID, Cognito); act on behalf of userscredentials & security
PolicyEnforce agent boundaries with natural language or Cedar rules; intercepts Gateway tool callsRefer to the latest AWS documentation on AgentCore Policy
ObservabilityTrace, debug, and monitor agent execution (OTEL, CloudWatch)memory & observability
RegistryCatalog and discover agents, MCP servers, tools, and skills across your orgregistry & evaluations
EvaluationsAutomated agent quality assessment (LLM-as-a-Judge)registry & evaluations
Code InterpreterSecure sandbox code execution for agentsRefer to the latest AWS documentation on AgentCore Code Interpreter
BrowserWeb automation (navigate, fill forms, extract data)Refer to the latest AWS documentation on AgentCore Browser
回答任何AgentCore问题前,您必须阅读相关服务的链接参考文件。逐步遵循参考文件中的流程。
服务用途参考文档
网关将API、Lambda函数或现有MCP服务器暴露为Agent的工具网关流程
运行时部署并扩展Agent和工具(无服务器,支持任何框架)运行时流程
运行时容器为运行时构建ARM64容器容器构建流程
内存Agent的短期(多轮对话)和长期(跨会话)内存;在Agent间共享内存内存与可观测性
身份Agent与外部身份提供商(Okta、Entra ID、Cognito)的认证;代表用户执行操作凭证与安全
策略使用自然语言或Cedar规则强制Agent边界;拦截网关工具调用参考最新的AgentCore策略AWS文档
可观测性追踪、调试和监控Agent执行(OTEL、CloudWatch)内存与可观测性
注册表在组织内编目和发现Agent、MCP服务器、工具和技能注册表与评估
评估Agent质量的自动化评估(LLM-as-a-Judge)注册表与评估
代码解释器为Agent提供安全的沙箱代码执行参考最新的AgentCore代码解释器AWS文档
浏览器Web自动化(导航、填写表单、提取数据)参考最新的AgentCore浏览器AWS文档

Model Selection

模型选择

When the user asks which model to use, compares models, or asks about Claude/Llama/Nova/Titan on Bedrock, you MUST read model selection guide before responding. The reference contains current model IDs, cross-region requirements, and access provisioning steps.
Quick defaults (verify current availability:
aws bedrock list-foundation-models --region <region>
):
  • General purpose: Claude Sonnet (best quality/cost balance)
  • Fast + cheap: Claude Haiku or Nova Micro
  • Embeddings for KB: Titan Embeddings V2
  • Open-source / fine-tuning: Llama
  • Image generation: Titan Image Generator
For current model IDs, regional availability, cross-region inference profiles, and supported features, refer to Supported foundation models in Amazon Bedrock. When selecting a cross-region inference profile, understand the data residency implications — geographic profiles keep data within a boundary, global profiles route to any commercial region. Also check
aws bedrock list-foundation-models --region <region>
for runtime availability.
For model ID formats (4 patterns), access provisioning, and selection criteria, see model selection guide.
当用户询问使用哪种模型、比较模型或询问Bedrock上的Claude/Llama/Nova/Titan时,您必须在回复前阅读模型选择指南。参考文档包含当前模型ID、跨区域要求和访问配置步骤。
快速默认选项(验证当前可用性:
aws bedrock list-foundation-models --region <region>
):
  • 通用用途:Claude Sonnet(最佳性价比)
  • 快速+低成本:Claude Haiku或Nova Micro
  • KB嵌入:Titan Embeddings V2
  • 开源/微调:Llama
  • 图像生成:Titan Image Generator
如需当前模型ID、区域可用性、跨区域推理配置文件和支持的功能,请参考Amazon Bedrock支持的基础模型。选择跨区域推理配置文件时,需了解数据驻留影响——地理配置文件将数据保留在特定边界内,全局配置文件将路由到任何商业区域。同时运行
aws bedrock list-foundation-models --region <region>
检查运行时可用性。
如需模型ID格式(4种模式)、访问配置和选择标准,请参阅模型选择指南

Additional Resources

额外资源