claude-api

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Building LLM-Powered Applications with Claude

基于Claude构建大语言模型驱动的应用

This skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.
此技能可帮助你构建基于Claude的大语言模型(LLM)驱动应用。根据需求选择合适的实现方式,检测项目语言,然后阅读对应语言的专属文档。

Before You Start

开始之前

Scan the target file (or, if no target file, the prompt and project) for non-Anthropic provider markers —
import openai
,
from openai
,
langchain_openai
,
OpenAI(
,
gpt-4
,
gpt-5
, file names like
agent-openai.py
or
*-generic.py
, or any explicit instruction to keep the code provider-neutral. If you find any, stop and tell the user that this skill produces Claude/Anthropic SDK code; ask whether they want to switch the file to Claude or want a non-Claude implementation. Do not edit a non-Anthropic file with Anthropic SDK calls.
扫描目标文件(若没有目标文件则扫描提示内容和项目),查找非Anthropic服务商标记——
import openai
from openai
langchain_openai
OpenAI(
gpt-4
gpt-5
,文件名类似
agent-openai.py
*-generic.py
,或任何明确要求保持代码服务商中立的指令。如果找到任何此类标记,请停止操作并告知用户:此技能生成Claude/Anthropic SDK代码;询问用户是否要将文件切换为Claude实现,还是需要非Claude的实现方案。请勿使用Anthropic SDK调用修改非Anthropic文件。

Output Requirement

输出要求

When the user asks you to add, modify, or implement a Claude feature, your code must call Claude through one of:
  1. The official Anthropic SDK for the project's language (
    anthropic
    ,
    @anthropic-ai/sdk
    ,
    com.anthropic.*
    , etc.). This is the default whenever a supported SDK exists for the project.
  2. Raw HTTP (
    curl
    ,
    requests
    ,
    fetch
    ,
    httpx
    , etc.) — only when the user explicitly asks for cURL/REST/raw HTTP, the project is a shell/cURL project, or the language has no official SDK.
Never mix the two — don't reach for
requests
/
fetch
in a Python or TypeScript project just because it feels lighter. Never fall back to OpenAI-compatible shims.
Never guess SDK usage. Function names, class names, namespaces, method signatures, and import paths must come from explicit documentation — either the
{lang}/
files in this skill or the official SDK repositories or documentation links listed in
shared/live-sources.md
. If the binding you need is not explicitly documented in the skill files, WebFetch the relevant SDK repo from
shared/live-sources.md
before writing code. Do not infer Ruby/Java/Go/PHP/C# APIs from cURL shapes or from another language's SDK.
当用户要求添加、修改或实现Claude功能时,你的代码必须通过以下方式之一调用Claude:
  1. 官方Anthropic SDK(对应项目语言的SDK,如
    anthropic
    @anthropic-ai/sdk
    com.anthropic.*
    等)。只要项目有支持的SDK,这就是默认选择。
  2. 原生HTTP
    curl
    requests
    fetch
    httpx
    等)——仅当用户明确要求cURL/REST/原生HTTP、项目是shell/cURL项目,或该语言没有官方SDK时使用。
绝不要混合使用这两种方式——不要因为觉得更简便就在Python或TypeScript项目中使用
requests
/
fetch
。绝不要退而使用OpenAI兼容的垫片(shim)。
绝不猜测SDK用法。函数名、类名、命名空间、方法签名和导入路径必须来自明确的文档——要么是此技能中的
{lang}/
文件,要么是
shared/live-sources.md
中列出的官方SDK仓库或文档链接。如果技能文件中没有明确记录你需要的绑定,请先从
shared/live-sources.md
中获取相关SDK仓库的最新内容,再编写代码。不要从cURL格式或其他语言的SDK推断Ruby/Java/Go/PHP/C#的API。

Defaults

默认设置

Unless the user requests otherwise:
For the Claude model version, please use Claude Opus 4.7, which you can access via the exact model string
claude-opus-4-7
. Please default to using adaptive thinking (
thinking: {type: "adaptive"}
) for anything remotely complicated. And finally, please default to streaming for any request that may involve long input, long output, or high
max_tokens
— it prevents hitting request timeouts. Use the SDK's
.get_final_message()
/
.finalMessage()
helper to get the complete response if you don't need to handle individual stream events

除非用户另有要求:
Claude模型版本请使用Claude Opus 4.7,可通过精确模型字符串
claude-opus-4-7
访问。对于任何稍有复杂度的任务,请默认使用自适应思考(
thinking: {type: "adaptive"}
)。最后,对于可能涉及长输入、长输出或高
max_tokens
的请求,请默认使用流式传输——这可避免请求超时。如果不需要处理单个流事件,可使用SDK的
.get_final_message()
/
.finalMessage()
助手获取完整响应。

Subcommands

子命令

If the User Request at the bottom of this prompt is a bare subcommand string (no prose), search every Subcommands table in this document — including any in sections appended below — and follow the matching Action column directly. This lets users invoke specific flows via
/claude-api <subcommand>
. If no table in the document matches, treat the request as normal prose.

如果提示底部的用户请求是纯子命令字符串(无描述性文字),请搜索本文档中的所有子命令表格——包括下方附录部分的表格——并直接执行匹配的操作列内容。用户可通过
/claude-api <subcommand>
调用特定流程。如果文档中没有匹配的表格,则将请求视为普通文本处理。

Language Detection

语言检测

Before reading code examples, determine which language the user is working in:
  1. Look at project files to infer the language:
    • *.py
      ,
      requirements.txt
      ,
      pyproject.toml
      ,
      setup.py
      ,
      Pipfile
      Python — read from
      python/
    • *.ts
      ,
      *.tsx
      ,
      package.json
      ,
      tsconfig.json
      TypeScript — read from
      typescript/
    • *.js
      ,
      *.jsx
      (no
      .ts
      files present) → TypeScript — JS uses the same SDK, read from
      typescript/
    • *.java
      ,
      pom.xml
      ,
      build.gradle
      Java — read from
      java/
    • *.kt
      ,
      *.kts
      ,
      build.gradle.kts
      Java — Kotlin uses the Java SDK, read from
      java/
    • *.scala
      ,
      build.sbt
      Java — Scala uses the Java SDK, read from
      java/
    • *.go
      ,
      go.mod
      Go — read from
      go/
    • *.rb
      ,
      Gemfile
      Ruby — read from
      ruby/
    • *.cs
      ,
      *.csproj
      C# — read from
      csharp/
    • *.php
      ,
      composer.json
      PHP — read from
      php/
  2. If multiple languages detected (e.g., both Python and TypeScript files):
    • Check which language the user's current file or question relates to
    • If still ambiguous, ask: "I detected both Python and TypeScript files. Which language are you using for the Claude API integration?"
  3. If language can't be inferred (empty project, no source files, or unsupported language):
    • Use AskUserQuestion with options: Python, TypeScript, Java, Go, Ruby, cURL/raw HTTP, C#, PHP
    • If AskUserQuestion is unavailable, default to Python examples and note: "Showing Python examples. Let me know if you need a different language."
  4. If unsupported language detected (Rust, Swift, C++, Elixir, etc.):
    • Suggest cURL/raw HTTP examples from
      curl/
      and note that community SDKs may exist
    • Offer to show Python or TypeScript examples as reference implementations
  5. If user needs cURL/raw HTTP examples, read from
    curl/
    .
在阅读代码示例之前,请确定用户使用的开发语言:
  1. 查看项目文件以推断语言:
    • *.py
      requirements.txt
      pyproject.toml
      setup.py
      Pipfile
      Python — 阅读
      python/
      目录下的内容
    • *.ts
      *.tsx
      package.json
      tsconfig.json
      TypeScript — 阅读
      typescript/
      目录下的内容
    • *.js
      *.jsx
      (无
      .ts
      文件) → TypeScript — JS使用相同的SDK,阅读
      typescript/
      目录下的内容
    • *.java
      pom.xml
      build.gradle
      Java — 阅读
      java/
      目录下的内容
    • *.kt
      *.kts
      build.gradle.kts
      Java — Kotlin使用Java SDK,阅读
      java/
      目录下的内容
    • *.scala
      build.sbt
      Java — Scala使用Java SDK,阅读
      java/
      目录下的内容
    • *.go
      go.mod
      Go — 阅读
      go/
      目录下的内容
    • *.rb
      Gemfile
      Ruby — 阅读
      ruby/
      目录下的内容
    • *.cs
      *.csproj
      C# — 阅读
      csharp/
      目录下的内容
    • *.php
      composer.json
      PHP — 阅读
      php/
      目录下的内容
  2. 如果检测到多种语言(例如同时存在Python和TypeScript文件):
    • 检查用户当前文件或问题关联的语言
    • 如果仍不明确,请询问:"我检测到Python和TypeScript文件。你在集成Claude API时使用哪种语言?"
  3. 如果无法推断语言(项目为空、无源码文件或语言不受支持):
    • 使用提问功能提供选项:Python、TypeScript、Java、Go、Ruby、cURL/原生HTTP、C#、PHP
    • 如果无法提问,则默认使用Python示例,并说明:"展示Python示例。若需要其他语言,请告知我。"
  4. 如果检测到不受支持的语言(Rust、Swift、C++、Elixir等):
    • 建议使用
      curl/
      目录下的cURL/原生HTTP示例,并说明可能存在社区SDK
    • 可提供Python或TypeScript示例作为参考实现
  5. 如果用户需要cURL/原生HTTP示例,请阅读
    curl/
    目录下的内容。

Language-Specific Feature Support

各语言功能支持情况

LanguageTool RunnerManaged AgentsNotes
PythonYes (beta)Yes (beta)Full support —
@beta_tool
decorator
TypeScriptYes (beta)Yes (beta)Full support —
betaZodTool
+ Zod
JavaYes (beta)Yes (beta)Beta tool use with annotated classes
GoYes (beta)Yes (beta)
BetaToolRunner
in
toolrunner
pkg
RubyYes (beta)Yes (beta)
BaseTool
+
tool_runner
in beta
C#NoNoOfficial SDK
PHPYes (beta)Yes (beta)
BetaRunnableTool
+
toolRunner()
cURLN/AYes (beta)Raw HTTP, no SDK features
Managed Agents code examples: dedicated language-specific READMEs are provided for Python, TypeScript, Go, Ruby, PHP, Java, and cURL (
{lang}/managed-agents/README.md
,
curl/managed-agents.md
). Read your language's README plus the language-agnostic
shared/managed-agents-*.md
concept files. Agents are persistent — create once, reference by ID. Store the agent ID returned by
agents.create
and pass it to every subsequent
sessions.create
; do not call
agents.create
in the request path. The Anthropic CLI is one convenient way to create agents and environments from version-controlled YAML — its URL is in
shared/live-sources.md
. If a binding you need isn't shown in the README, WebFetch the relevant entry from
shared/live-sources.md
rather than guess. C# does not currently have Managed Agents support; use cURL-style raw HTTP requests against the API.

语言工具运行器Managed Agents备注
Python是(测试版)是(测试版)完整支持 —
@beta_tool
装饰器
TypeScript是(测试版)是(测试版)完整支持 —
betaZodTool
+ Zod
Java是(测试版)是(测试版)带注解类的测试版工具调用
Go是(测试版)是(测试版)
toolrunner
包中的
BetaToolRunner
Ruby是(测试版)是(测试版)测试版中的
BaseTool
+
tool_runner
C#官方SDK
PHP是(测试版)是(测试版)
BetaRunnableTool
+
toolRunner()
cURL不适用是(测试版)原生HTTP,无SDK功能
Managed Agents代码示例:为Python、TypeScript、Go、Ruby、PHP、Java和cURL提供了专属语言的README(
{lang}/managed-agents/README.md
curl/managed-agents.md
)。阅读对应语言的README以及与语言无关的
shared/managed-agents-*.md
概念文件。Agents是持久化的——创建一次,通过ID引用。存储
agents.create
返回的agent ID,并在后续所有
sessions.create
调用中传入;不要在请求路径中调用
agents.create
。Anthropic CLI是通过版本控制的YAML创建agents和环境的便捷方式,其URL在
shared/live-sources.md
中。如果语言README中没有展示你需要的绑定,请从
shared/live-sources.md
获取相关内容,不要自行猜测。C#目前不支持Managed Agents;请使用
curl/managed-agents.md
中的原生HTTP请求作为参考。

Which Surface Should I Use?

选择哪种实现方式?

Start simple. Default to the simplest tier that meets your needs. Single API calls and workflows handle most use cases — only reach for agents when the task genuinely requires open-ended, model-driven exploration.
Use CaseTierRecommended SurfaceWhy
Classification, summarization, extraction, Q&ASingle LLM callClaude APIOne request, one response
Batch processing or embeddingsSingle LLM callClaude APISpecialized endpoints
Multi-step pipelines with code-controlled logicWorkflowClaude API + tool useYou orchestrate the loop
Custom agent with your own toolsAgentClaude API + tool useMaximum flexibility
Server-managed stateful agent with workspaceAgentManaged AgentsAnthropic runs the loop and hosts the tool-execution sandbox
Persisted, versioned agent configsAgentManaged AgentsAgents are stored objects; sessions pin to a version
Long-running multi-turn agent with file mountsAgentManaged AgentsPer-session containers, SSE event stream, Skills + MCP
Note: Managed Agents is the right choice when you want Anthropic to run the agent loop and host the container where tools execute — file ops, bash, code execution all run in the per-session workspace. If you want to host the compute yourself or run your own custom tool runtime, Claude API + tool use is the right choice — use the tool runner for automatic loop handling, or the manual loop for fine-grained control (approval gates, custom logging, conditional execution).
Third-party providers (Amazon Bedrock, Google Vertex AI, Microsoft Foundry): Managed Agents is not available on Bedrock, Vertex, or Foundry. If you are deploying through any third-party provider, use Claude API + tool use for all use cases — including ones where Managed Agents would otherwise be the recommended surface.
从简单开始。默认选择满足需求的最简单层级。单次API调用和工作流可处理大多数用例——只有当任务确实需要开放式、模型驱动的探索时,才使用agents。
用例层级推荐实现方式原因
分类、摘要、提取、问答单次LLM调用Claude API一次请求,一次响应
批量处理或嵌入向量单次LLM调用Claude API专用端点
代码控制逻辑的多步骤流水线工作流Claude API + 工具调用由你编排循环逻辑
自定义agent(含自有工具)AgentClaude API + 工具调用最大灵活性
服务器托管的有状态agent(含工作区)AgentManaged AgentsAnthropic运行循环并托管工具执行沙箱
持久化、版本化的agent配置AgentManaged AgentsAgents是存储对象;会话固定到特定版本
长期运行的多轮agent(含文件挂载)AgentManaged Agents每会话容器、SSE事件流、Skills + MCP
注意:当你希望Anthropic运行agent循环托管工具执行的容器(文件操作、bash、代码执行都在每会话工作区中运行)时,Managed Agents是正确选择。如果你希望自行托管计算资源或运行自定义工具运行时,Claude API + 工具调用是正确选择——使用工具运行器自动处理循环,或使用手动循环实现细粒度控制(审批门、自定义日志、条件执行)。
第三方服务商(Amazon Bedrock、Google Vertex AI、Microsoft Foundry):Managed Agents在Bedrock、Vertex或Foundry上不可用。如果你通过任何第三方服务商部署,请使用Claude API + 工具调用处理所有用例——包括原本推荐使用Managed Agents的场景。

Decision Tree

决策树

What does your application need?

0. Are you deploying through Amazon Bedrock, Google Vertex AI, or Microsoft Foundry?
   └── Yes → Claude API (+ tool use for agents) — Managed Agents is 1P only.
   No → continue.

1. Single LLM call (classification, summarization, extraction, Q&A)
   └── Claude API — one request, one response

2. Do you want Anthropic to run the agent loop and host a per-session
   container where Claude executes tools (bash, file ops, code)?
   └── Yes → Managed Agents — server-managed sessions, persisted agent configs,
       SSE event stream, Skills + MCP, file mounts.
       Examples: "stateful coding agent with a workspace per task",
                 "long-running research agent that streams events to a UI",
                 "agent with persisted, versioned config used across many sessions"

3. Workflow (multi-step, code-orchestrated, with your own tools)
   └── Claude API with tool use — you control the loop

4. Open-ended agent (model decides its own trajectory, your own tools, you host the compute)
   └── Claude API agentic loop (maximum flexibility)
你的应用需要什么?

0. 你是否通过Amazon Bedrock、Google Vertex AI或Microsoft Foundry部署?
   └── 是 → Claude API(+ 工具调用实现agents)—— Managed Agents仅Anthropic官方可用。
   否 → 继续。

1. 单次LLM调用(分类、摘要、提取、问答)
   └── Claude API — 一次请求,一次响应

2. 你是否希望Anthropic运行agent循环,并托管Claude执行工具(bash、文件操作、代码)的每会话容器?
   └── 是 → Managed Agents — 服务器托管会话、持久化agent配置、
       SSE事件流、Skills + MCP、文件挂载。
       示例:"每个任务一个工作区的有状态编码agent"、
             "向UI流式传输事件的长期研究agent"、
             "跨多个会话使用的持久化、版本化配置agent"

3. 工作流(多步骤、代码编排、含自有工具)
   └── Claude API + 工具调用 — 由你控制循环

4. 开放式agent(模型自行决定执行路径、含自有工具、自行托管计算)
   └── Claude API agentic循环(最大灵活性)

Should I Build an Agent?

是否应该构建Agent?

Before choosing the agent tier, check all four criteria:
  • Complexity — Is the task multi-step and hard to fully specify in advance? (e.g., "turn this design doc into a PR" vs. "extract the title from this PDF")
  • Value — Does the outcome justify higher cost and latency?
  • Viability — Is Claude capable at this task type?
  • Cost of error — Can errors be caught and recovered from? (tests, review, rollback)
If the answer is "no" to any of these, stay at a simpler tier (single call or workflow).

在选择agent层级之前,请检查以下四个标准:
  • 复杂度 — 任务是否为多步骤且难以提前完全明确?(例如:"将此设计文档转换为PR" vs "从PDF中提取标题")
  • 价值 — 结果是否值得更高的成本和延迟?
  • 可行性 — Claude是否擅长此类任务?
  • 错误成本 — 错误是否可被捕获和恢复?(测试、审核、回滚)
如果任何一个标准的答案是"否",请停留在更简单的层级(单次调用或工作流)。

Architecture

架构

Everything goes through
POST /v1/messages
. Tools and output constraints are features of this single endpoint — not separate APIs.
User-defined tools — You define tools (via decorators, Zod schemas, or raw JSON), and the SDK's tool runner handles calling the API, executing your functions, and looping until Claude is done. For full control, you can write the loop manually.
Server-side tools — Anthropic-hosted tools that run on Anthropic's infrastructure. Code execution is fully server-side (declare it in
tools
, Claude runs code automatically). Computer use can be server-hosted or self-hosted.
Structured outputs — Constrains the Messages API response format (
output_config.format
) and/or tool parameter validation (
strict: true
). The recommended approach is
client.messages.parse()
which validates responses against your schema automatically. Note: the old
output_format
parameter is deprecated; use
output_config: {format: {...}}
on
messages.create()
.
Supporting endpoints — Batches (
POST /v1/messages/batches
), Files (
POST /v1/files
), Token Counting, and Models (
GET /v1/models
,
GET /v1/models/{id}
— live capability/context-window discovery) feed into or support Messages API requests.

所有请求都通过
POST /v1/messages
处理。工具和输出约束是此单一端点的功能——而非独立API。
用户定义工具 — 你可通过装饰器、Zod模式或原生JSON定义工具,SDK的工具运行器会处理API调用、执行你的函数,并循环直到Claude完成任务。如需完全控制,你可手动编写循环逻辑。
服务器端工具 — Anthropic托管的工具,运行在Anthropic基础设施上。代码执行完全在服务器端(在
tools
中声明,Claude自动运行代码)。计算机使用可由服务器托管或自行托管。
结构化输出 — 约束Messages API的响应格式(
output_config.format
)和/或工具参数验证(
strict: true
)。推荐使用
client.messages.parse()
,它会自动根据你的架构验证响应。注意:旧的
output_format
参数已弃用;请在
messages.create()
中使用
output_config: {format: {...}}
辅助端点 — 批量处理(
POST /v1/messages/batches
)、文件(
POST /v1/files
)、令牌计数和模型(
GET /v1/models
GET /v1/models/{id}
— 实时能力/上下文窗口发现)为Messages API请求提供支持或补充。

Current Models (cached: 2026-04-15)

当前模型(缓存时间:2026-04-15)

ModelModel IDContextInput $/1MOutput $/1M
Claude Opus 4.7
claude-opus-4-7
1M$5.00$25.00
Claude Opus 4.6
claude-opus-4-6
1M$5.00$25.00
Claude Sonnet 4.6
claude-sonnet-4-6
1M$3.00$15.00
Claude Haiku 4.5
claude-haiku-4-5
200K$1.00$5.00
ALWAYS use
claude-opus-4-7
unless the user explicitly names a different model.
This is non-negotiable. Do not use
claude-sonnet-4-6
,
claude-sonnet-4-5
, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours.
CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes. For example, use
claude-sonnet-4-5
, never
claude-sonnet-4-5-20250514
or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read
shared/models.md
for the exact ID — do not construct one yourself.
A note: if any of the model strings above look unfamiliar to you, that's to be expected — that just means they were released after your training data cutoff. Rest assured they are real models; we wouldn't mess with you like that.
Live capability lookup: The table above is cached. When the user asks "what's the context window for X", "does X support vision/thinking/effort", or "which models support Y", query the Models API (
client.models.retrieve(id)
/
client.models.list()
) — see
shared/models.md
for the field reference and capability-filter examples.

模型模型ID上下文窗口输入成本/$/1M输出成本/$/1M
Claude Opus 4.7
claude-opus-4-7
1M$5.00$25.00
Claude Opus 4.6
claude-opus-4-6
1M$5.00$25.00
Claude Sonnet 4.6
claude-sonnet-4-6
1M$3.00$15.00
Claude Haiku 4.5
claude-haiku-4-5
200K$1.00$5.00
除非用户明确指定其他模型,否则始终使用
claude-opus-4-7
。这是硬性要求。不要使用
claude-sonnet-4-6
claude-sonnet-4-5
或任何其他模型,除非用户明确说"使用sonnet"或"使用haiku"。绝不要为了降低成本而降级模型——这是用户的决定,而非你的职责。
重要提示:仅使用上表中的精确模型ID字符串——它们是完整的。不要添加日期后缀。例如,使用
claude-sonnet-4-5
,绝不要使用
claude-sonnet-4-5-20250514
或你可能从训练数据中记得的任何带日期后缀的变体。如果用户请求上表中没有的旧模型(例如"opus 4.5"、"sonnet 3.7"),请阅读
shared/models.md
获取精确ID——不要自行构造。
说明:如果上表中的某些模型字符串对你来说不熟悉,这是正常的——这意味着它们是在你的训练数据截止日期之后发布的。请放心,它们是真实存在的模型;我们不会误导你。
实时能力查询:上表是缓存数据。当用户询问"X的上下文窗口是多少"、"X是否支持视觉/思考/effort"或"哪些模型支持Y"时,请查询Models API(
client.models.retrieve(id)
/
client.models.list()
)——请参阅
shared/models.md
获取字段参考和能力过滤示例。

Thinking & Effort (Quick Reference)

思考与Effort(快速参考)

Opus 4.7 — Adaptive thinking only: Use
thinking: {type: "adaptive"}
.
thinking: {type: "enabled", budget_tokens: N}
returns a 400 on Opus 4.7 — adaptive is the only on-mode.
{type: "disabled"}
and omitting
thinking
both work. Sampling parameters (
temperature
,
top_p
,
top_k
) are also removed and will 400. See
shared/model-migration.md
→ Migrating to Opus 4.7 for the full breaking-change list. Opus 4.6 — Adaptive thinking (recommended): Use
thinking: {type: "adaptive"}
. Claude dynamically decides when and how much to think. No
budget_tokens
needed —
budget_tokens
is deprecated on Opus 4.6 and Sonnet 4.6 and should not be used for new code. Adaptive thinking also automatically enables interleaved thinking (no beta header needed). When the user asks for "extended thinking", a "thinking budget", or
budget_tokens
: always use Opus 4.7 or 4.6 with
thinking: {type: "adaptive"}
. The concept of a fixed token budget for thinking is deprecated — adaptive thinking replaces it. Do NOT use
budget_tokens
for new 4.6/4.7 code and do NOT switch to an older model.
Gradual-migration carve-out:
budget_tokens
is still functional on Opus 4.6 and Sonnet 4.6 as a transitional escape hatch — if you're migrating existing code and need a hard token ceiling before you've tuned
effort
, see
shared/model-migration.md
→ Transitional escape hatch. Note: this carve-out does not apply to Opus 4.7 —
budget_tokens
is fully removed there. Effort parameter (GA, no beta header): Controls thinking depth and overall token spend via
output_config: {effort: "low"|"medium"|"high"|"max"}
(inside
output_config
, not top-level). Default is
high
(equivalent to omitting it).
max
is Opus-tier only (Opus 4.6 and later — not Sonnet or Haiku). Opus 4.7 adds
"xhigh"
(between
high
and
max
) — the best setting for most coding and agentic use cases on 4.7, and the default in Claude Code; use a minimum of
high
for most intelligence-sensitive work. Works on Opus 4.5, Opus 4.6, Opus 4.7, and Sonnet 4.6. Will error on Sonnet 4.5 / Haiku 4.5. On Opus 4.7, effort matters more than on any prior Opus — re-tune it when migrating. Combine with adaptive thinking for the best cost-quality tradeoffs. Lower effort means fewer and more-consolidated tool calls, less preamble, and terser confirmations —
high
is often the sweet spot balancing quality and token efficiency; use
max
when correctness matters more than cost; use
low
for subagents or simple tasks.
Opus 4.7 — thinking content omitted by default:
thinking
blocks still stream but their text is empty unless you opt in with
thinking: {type: "adaptive", display: "summarized"}
(default is
"omitted"
). Silent change — no error. If you stream reasoning to users, the default looks like a long pause before output; set
"summarized"
to restore visible progress.
Task Budgets (beta, Opus 4.7):
output_config: {task_budget: {type: "tokens", total: N}}
tells the model how many tokens it has for a full agentic loop — it sees a running countdown and self-moderates (minimum 20,000; beta header
task-budgets-2026-03-13
). Distinct from
max_tokens
, which is an enforced per-response ceiling the model is not aware of. See
shared/model-migration.md
→ Task Budgets.
Sonnet 4.6: Supports adaptive thinking (
thinking: {type: "adaptive"}
).
budget_tokens
is deprecated on Sonnet 4.6 — use adaptive thinking instead.
Older models (only if explicitly requested): If the user specifically asks for Sonnet 4.5 or another older model, use
thinking: {type: "enabled", budget_tokens: N}
.
budget_tokens
must be less than
max_tokens
(minimum 1024). Never choose an older model just because the user mentions
budget_tokens
— use Opus 4.7 with adaptive thinking instead.

Opus 4.7 — 仅支持自适应思考:使用
thinking: {type: "adaptive"}
thinking: {type: "enabled", budget_tokens: N}
在Opus 4.7上会返回400错误——自适应是唯一可用的开启模式。
{type: "disabled"}
和省略
thinking
参数都有效。采样参数(
temperature
top_p
top_k
)也已移除,使用会返回400错误。请参阅
shared/model-migration.md
→ 迁移到Opus 4.7获取完整的破坏性变更列表。 Opus 4.6 — 自适应思考(推荐):使用
thinking: {type: "adaptive"}
。Claude会动态决定思考的时机和深度。无需
budget_tokens
——
budget_tokens
在Opus 4.6和Sonnet 4.6上已弃用,不应在新代码中使用。自适应思考还会自动启用交错思考(无需测试版标头)。当用户要求"扩展思考"、"思考预算"或
budget_tokens
时:始终使用Opus 4.7或4.6并设置
thinking: {type: "adaptive"}
。固定令牌思考预算的概念已弃用——自适应思考取代了它。不要在新的4.6/4.7代码中使用
budget_tokens
,也不要切换到旧模型
逐步迁移例外
budget_tokens
在Opus 4.6和Sonnet 4.6上仍可使用,作为过渡性解决方案——如果你在迁移现有代码时需要硬令牌上限,在调整
effort
之前,请参阅
shared/model-migration.md
→ 过渡性解决方案。注意:此例外不适用于Opus 4.7——
budget_tokens
在Opus 4.7上已完全移除。 Effort参数(正式发布,无需测试版标头):通过
output_config: {effort: "low"|"medium"|"high"|"max"}
(在
output_config
内部,而非顶层)控制思考深度和整体令牌消耗。默认值为
high
(与省略该参数等效)。
max
仅适用于Opus层级(Opus 4.6及更高版本——不适用于Sonnet或Haiku)。Opus 4.7新增
"xhigh"
(介于
high
max
之间)——这是4.7上大多数编码和agentic用例的最佳设置,也是Claude Code中的默认值;对于大多数对智能要求较高的工作,至少使用
high
。适用于Opus 4.5、Opus 4.6、Opus 4.7和Sonnet 4.6。在Sonnet 4.5 / Haiku 4.5上使用会报错。在Opus 4.7上,effort的影响比之前任何Opus版本都大——迁移时需重新调整。结合自适应思考可实现最佳成本-质量平衡。较低的effort意味着更少、更集中的工具调用,更少的开场白,更简洁的确认——
high
通常是平衡质量和令牌效率的最佳点;当正确性比成本更重要时使用
max
;对于子agent或简单任务使用
low
Opus 4.7 — 默认省略思考内容
thinking
块仍会流式传输,但默认情况下文本为空,除非你通过
thinking: {type: "adaptive", display: "summarized"}
选择显示(默认值为
"omitted"
)。这是静默变更——不会报错。如果你向用户流式传输推理过程,默认设置会看起来像是输出前有长时间停顿;设置为
"summarized"
可恢复可见进度。
任务预算(测试版,Opus 4.7)
output_config: {task_budget: {type: "tokens", total: N}}
告知模型完整agentic循环可用的令牌数量——模型会看到倒计时并自我调节(最低20,000;测试版标头
task-budgets-2026-03-13
)。与
max_tokens
不同,
max_tokens
是模型不知道的强制每响应上限。请参阅
shared/model-migration.md
→ 任务预算。
Sonnet 4.6:支持自适应思考(
thinking: {type: "adaptive"}
)。
budget_tokens
在Sonnet 4.6上已弃用——请改用自适应思考。
旧模型(仅当明确请求时使用):如果用户特别要求Sonnet 4.5或其他旧模型,请使用
thinking: {type: "enabled", budget_tokens: N}
budget_tokens
必须小于
max_tokens
(最小值1024)。绝不要因为用户提到
budget_tokens
就选择旧模型——请改用Opus 4.7并设置自适应思考。

Compaction (Quick Reference)

压缩(快速参考)

Beta, Opus 4.7, Opus 4.6, and Sonnet 4.6. For long-running conversations that may exceed the 1M context window, enable server-side compaction. The API automatically summarizes earlier context when it approaches the trigger threshold (default: 150K tokens). Requires beta header
compact-2026-01-12
.
Critical: Append
response.content
(not just the text) back to your messages on every turn. Compaction blocks in the response must be preserved — the API uses them to replace the compacted history on the next request. Extracting only the text string and appending that will silently lose the compaction state.
See
{lang}/claude-api/README.md
(Compaction section) for code examples. Full docs via WebFetch in
shared/live-sources.md
.

测试版,适用于Opus 4.7、Opus 4.6和Sonnet 4.6。对于可能超过1M上下文窗口的长期对话,请启用服务器端压缩。当接近触发阈值(默认:150K令牌)时,API会自动总结早期上下文。需要测试版标头
compact-2026-01-12
重要提示:在每一轮中将
response.content
(不仅仅是文本)追加到消息中。必须保留响应中的压缩块——API会在下一次请求中使用它们替换已压缩的历史记录。仅提取文本字符串并追加会静默丢失压缩状态。
请参阅
{lang}/claude-api/README.md
(压缩部分)获取代码示例。完整文档可通过
shared/live-sources.md
中的链接获取。

Prompt Caching (Quick Reference)

提示缓存(快速参考)

Prefix match. Any byte change anywhere in the prefix invalidates everything after it. Render order is
tools
system
messages
. Keep stable content first (frozen system prompt, deterministic tool list), put volatile content (timestamps, per-request IDs, varying questions) after the last
cache_control
breakpoint.
Top-level auto-caching (
cache_control: {type: "ephemeral"}
on
messages.create()
) is the simplest option when you don't need fine-grained placement. Max 4 breakpoints per request. Minimum cacheable prefix is ~1024 tokens — shorter prefixes silently won't cache.
Verify with
usage.cache_read_input_tokens
— if it's zero across repeated requests, a silent invalidator is at work (
datetime.now()
in system prompt, unsorted JSON, varying tool set).
For placement patterns, architectural guidance, and the silent-invalidator audit checklist: read
shared/prompt-caching.md
. Language-specific syntax:
{lang}/claude-api/README.md
(Prompt Caching section).

前缀匹配。前缀中任何位置的字节变更都会使其后的所有内容失效。渲染顺序为
tools
system
messages
。将稳定内容放在前面(固定的系统提示、确定性工具列表),将易变内容(时间戳、每个请求的ID、不同的问题)放在最后一个
cache_control
断点之后。
顶层自动缓存
messages.create()
中设置
cache_control: {type: "ephemeral"}
)是不需要细粒度控制时的最简单选项。每个请求最多支持4个断点。可缓存的最小前缀约为1024令牌——更短的前缀不会被缓存(静默失效)。
通过
usage.cache_read_input_tokens
验证
——如果重复请求中该值为0,则存在静默失效因素(系统提示中的
datetime.now()
、未排序的JSON、变化的工具集)。
关于放置模式、架构指导和静默失效因素检查清单,请阅读
shared/prompt-caching.md
。各语言的语法请参阅
{lang}/claude-api/README.md
(提示缓存部分)。

Managed Agents (Beta)

Managed Agents(测试版)

Managed Agents is a third surface: server-managed stateful agents with Anthropic-hosted tool execution. You create a persisted, versioned Agent config (
POST /v1/agents
), then start Sessions that reference it. Each session provisions a container as the agent's workspace — bash, file ops, and code execution run there; the agent loop itself runs on Anthropic's orchestration layer and acts on the container via tools. The session streams events; you send messages and tool results back.
Managed Agents is first-party only. It is not available on Amazon Bedrock, Google Vertex AI, or Microsoft Foundry. For agents on third-party providers, use Claude API + tool use.
Mandatory flow: Agent (once) → Session (every run).
model
/
system
/
tools
live on the agent, never the session. See
shared/managed-agents-overview.md
for the full reading guide, beta headers, and pitfalls.
Beta headers:
managed-agents-2026-04-01
— the SDK sets this automatically for all
client.beta.{agents,environments,sessions,vaults}.*
calls. Skills API uses
skills-2025-10-02
and Files API uses
files-api-2025-04-14
, but you don't need to explicitly pass those in for endpoints other than
/v1/skills
and
/v1/files
.
Subcommands — invoke directly with
/claude-api <subcommand>
:
SubcommandAction
managed-agents-onboard
Walk the user through setting up a Managed Agent from scratch. Read
shared/managed-agents-onboarding.md
immediately
and follow its interview script: mental model → know-or-explore branch → template config → session setup → emit code. Do not summarize — run the interview.
Reading guide: Start with
shared/managed-agents-overview.md
, then the topical
shared/managed-agents-*.md
files (core, environments, tools, events, client-patterns, onboarding, api-reference). For Python, TypeScript, Go, Ruby, PHP, and Java, read
{lang}/managed-agents/README.md
for code examples. For cURL, read
curl/managed-agents.md
. Agents are persistent — create once, reference by ID. Store the agent ID returned by
agents.create
and pass it to every subsequent
sessions.create
; do not call
agents.create
in the request path. The Anthropic CLI is one convenient way to create agents and environments from version-controlled YAML (URL in
shared/live-sources.md
). If a binding you need isn't shown in the language README, WebFetch the relevant entry from
shared/live-sources.md
rather than guess. C# does not currently have Managed Agents support; use raw HTTP from
curl/managed-agents.md
as a reference.
When the user wants to set up a Managed Agent from scratch (e.g. "how do I get started", "walk me through creating one", "set up a new agent"): read
shared/managed-agents-onboarding.md
and run its interview — same flow as the
managed-agents-onboard
subcommand.
When the user asks "how do I write the client code for X": reach for
shared/managed-agents-client-patterns.md
— covers lossless stream reconnect,
processed_at
queued/processed gate, interrupt,
tool_confirmation
round-trip, the correct idle/terminated break gate, post-idle status race, stream-first ordering, file-mount gotchas, keeping credentials host-side via custom tools, etc.

Managed Agents是第三种实现方式:服务器托管的有状态agent,带有Anthropic托管的工具执行。你创建一个持久化、版本化的Agent配置(
POST /v1/agents
),然后启动引用该配置的Sessions。每个会话会为agent的工作区分配一个容器——bash、文件操作和代码执行都在其中运行;agent循环本身运行在Anthropic的编排层,并通过工具对容器进行操作。会话会流式传输事件;你可发送消息和工具结果返回。
Managed Agents仅Anthropic官方可用。在Amazon Bedrock、Google Vertex AI或Microsoft Foundry上不可用。对于第三方服务商上的agents,请使用Claude API + 工具调用。
强制流程:Agent(创建一次)→ Session(每次运行)。
model
/
system
/
tools
属于agent,不属于session。请参阅
shared/managed-agents-overview.md
获取完整阅读指南、测试版标头和注意事项。
测试版标头
managed-agents-2026-04-01
——SDK会自动为所有
client.beta.{agents,environments,sessions,vaults}.*
调用设置此标头。Skills API使用
skills-2025-10-02
,Files API使用
files-api-2025-04-14
,但除
/v1/skills
/v1/files
端点外,你无需显式传递这些标头。
子命令 — 可通过
/claude-api <subcommand>
直接调用:
子命令操作
managed-agents-onboard
引导用户从头开始设置Managed Agent。**立即阅读
shared/managed-agents-onboarding.md
**并遵循其中的访谈脚本:心智模型 → 已知/探索分支 → 模板配置 → 会话设置 → 生成代码。不要总结——完整执行访谈流程。
阅读指南:先阅读
shared/managed-agents-overview.md
,然后阅读主题相关的
shared/managed-agents-*.md
文件(核心、环境、工具、事件、客户端模式、入门、API参考)。对于Python、TypeScript、Go、Ruby、PHP和Java,请阅读
{lang}/managed-agents/README.md
获取代码示例。对于cURL,请阅读
curl/managed-agents.md
Agents是持久化的——创建一次,通过ID引用。存储
agents.create
返回的agent ID,并在后续所有
sessions.create
调用中传入;不要在请求路径中调用
agents.create
。Anthropic CLI是通过版本控制的YAML创建agents和环境的便捷方式(URL在
shared/live-sources.md
中)。如果语言README中没有展示你需要的绑定,请从
shared/live-sources.md
获取相关内容,不要自行猜测。C#目前不支持Managed Agents;请使用
curl/managed-agents.md
中的原生HTTP作为参考。
当用户想要从头开始设置Managed Agent时(例如"如何入门"、"引导我创建一个"、"设置新agent"):阅读
shared/managed-agents-onboarding.md
并执行其中的访谈流程——与
managed-agents-onboard
子命令流程相同。
当用户询问"如何为X编写客户端代码"时:请查阅
shared/managed-agents-client-patterns.md
——涵盖无损流重连、
processed_at
排队/处理 gate、中断、
tool_confirmation
往返、正确的空闲/终止中断 gate、空闲后状态竞争、流优先排序、文件挂载注意事项、通过自定义工具将凭据保留在主机端等内容。

Reading Guide

阅读指南

After detecting the language, read the relevant files based on what the user needs:
检测语言后,根据用户需求阅读相关文件:

Quick Task Reference

快速任务参考

Single text classification/summarization/extraction/Q&A: → Read only
{lang}/claude-api/README.md
Chat UI or real-time response display: → Read
{lang}/claude-api/README.md
+
{lang}/claude-api/streaming.md
Long-running conversations (may exceed context window): → Read
{lang}/claude-api/README.md
— see Compaction section Migrating to a newer model (Opus 4.7 / Opus 4.6 / Sonnet 4.6) or replacing a retired model: → Read
shared/model-migration.md
Prompt caching / optimize caching / "why is my cache hit rate low": → Read
shared/prompt-caching.md
+
{lang}/claude-api/README.md
(Prompt Caching section)
Function calling / tool use / agents: → Read
{lang}/claude-api/README.md
+
shared/tool-use-concepts.md
+
{lang}/claude-api/tool-use.md
Agent design (tool surface, context management, caching strategy): → Read
shared/agent-design.md
Batch processing (non-latency-sensitive): → Read
{lang}/claude-api/README.md
+
{lang}/claude-api/batches.md
File uploads across multiple requests: → Read
{lang}/claude-api/README.md
+
{lang}/claude-api/files-api.md
Managed Agents (server-managed stateful agents with workspace): → Read
shared/managed-agents-overview.md
+ the rest of the
shared/managed-agents-*.md
files. For Python, TypeScript, Go, Ruby, PHP, and Java, read
{lang}/managed-agents/README.md
for code examples. For cURL, read
curl/managed-agents.md
. Agents are persistent — create once, reference by ID. Store the agent ID returned by
agents.create
and pass it to every subsequent
sessions.create
; do not call
agents.create
in the request path. The Anthropic CLI is one convenient way to create agents and environments from version-controlled YAML (URL in
shared/live-sources.md
). If a binding you need isn't shown in the language README, WebFetch the relevant entry from
shared/live-sources.md
rather than guess. C# does not currently support Managed Agents — use raw HTTP from
curl/managed-agents.md
as a reference.
单一文本分类/摘要/提取/问答: → 仅阅读
{lang}/claude-api/README.md
聊天UI或实时响应显示: → 阅读
{lang}/claude-api/README.md
+
{lang}/claude-api/streaming.md
长期对话(可能超过上下文窗口): → 阅读
{lang}/claude-api/README.md
— 查看压缩部分 迁移到更新的模型(Opus 4.7 / Opus 4.6 / Sonnet 4.6)或替换已停用模型: → 阅读
shared/model-migration.md
提示缓存 / 优化缓存 / "为什么我的缓存命中率低": → 阅读
shared/prompt-caching.md
+
{lang}/claude-api/README.md
(提示缓存部分)
函数调用 / 工具使用 / agents: → 阅读
{lang}/claude-api/README.md
+
shared/tool-use-concepts.md
+
{lang}/claude-api/tool-use.md
Agent设计(工具界面、上下文管理、缓存策略): → 阅读
shared/agent-design.md
批量处理(对延迟不敏感): → 阅读
{lang}/claude-api/README.md
+
{lang}/claude-api/batches.md
跨多个请求上传文件: → 阅读
{lang}/claude-api/README.md
+
{lang}/claude-api/files-api.md
Managed Agents(服务器托管的有状态agent,含工作区): → 阅读
shared/managed-agents-overview.md
+ 其他
shared/managed-agents-*.md
文件。对于Python、TypeScript、Go、Ruby、PHP和Java,请阅读
{lang}/managed-agents/README.md
获取代码示例。对于cURL,请阅读
curl/managed-agents.md
Agents是持久化的——创建一次,通过ID引用。存储
agents.create
返回的agent ID,并在后续所有
sessions.create
调用中传入;不要在请求路径中调用
agents.create
。Anthropic CLI是通过版本控制的YAML创建agents和环境的便捷方式(URL在
shared/live-sources.md
中)。如果语言README中没有展示你需要的绑定,请从
shared/live-sources.md
获取相关内容,不要自行猜测。C#目前不支持Managed Agents;请使用
curl/managed-agents.md
中的原生HTTP作为参考。

Claude API (Full File Reference)

Claude API(完整文件参考)

Read the language-specific Claude API folder (
{language}/claude-api/
):
  1. {language}/claude-api/README.md
    Read this first. Installation, quick start, common patterns, error handling.
  2. shared/tool-use-concepts.md
    — Read when the user needs function calling, code execution, memory, or structured outputs. Covers conceptual foundations.
  3. shared/agent-design.md
    — Read when designing an agent: bash vs. dedicated tools, programmatic tool calling, tool search/skills, context editing vs. compaction vs. memory, caching principles.
  4. {language}/claude-api/tool-use.md
    — Read for language-specific tool use code examples (tool runner, manual loop, code execution, memory, structured outputs).
  5. {language}/claude-api/streaming.md
    — Read when building chat UIs or interfaces that display responses incrementally.
  6. {language}/claude-api/batches.md
    — Read when processing many requests offline (not latency-sensitive). Runs asynchronously at 50% cost.
  7. {language}/claude-api/files-api.md
    — Read when sending the same file across multiple requests without re-uploading.
  8. shared/prompt-caching.md
    — Read when adding or optimizing prompt caching. Covers prefix-stability design, breakpoint placement, and anti-patterns that silently invalidate cache.
  9. shared/error-codes.md
    — Read when debugging HTTP errors or implementing error handling.
  10. shared/model-migration.md
    — Read when upgrading to newer models, replacing retired models, or translating
    budget_tokens
    / prefill patterns to the current API.
  11. shared/live-sources.md
    — WebFetch URLs for fetching the latest official documentation.
Note: For Java, Go, Ruby, C#, PHP, and cURL — these have a single file each covering all basics. Read that file plus
shared/tool-use-concepts.md
and
shared/error-codes.md
as needed.
Note: For the Managed Agents file reference, see the
## Managed Agents (Beta)
section above — it lists every
shared/managed-agents-*.md
file and the language-specific READMEs.

阅读语言专属Claude API文件夹
{language}/claude-api/
):
  1. {language}/claude-api/README.md
    首先阅读此文件。安装、快速入门、常见模式、错误处理。
  2. shared/tool-use-concepts.md
    — 当用户需要函数调用、代码执行、记忆或结构化输出时阅读。涵盖概念基础。
  3. shared/agent-design.md
    — 设计agent时阅读:bash vs 专用工具、程序化工具调用、工具搜索/skills、上下文编辑 vs 压缩 vs 记忆、缓存原则。
  4. {language}/claude-api/tool-use.md
    — 阅读语言专属工具使用代码示例(工具运行器、手动循环、代码执行、记忆、结构化输出)。
  5. {language}/claude-api/streaming.md
    — 构建聊天UI或增量显示响应的界面时阅读。
  6. {language}/claude-api/batches.md
    — 离线处理大量请求(对延迟不敏感)时阅读。异步运行,成本降低50%。
  7. {language}/claude-api/files-api.md
    — 无需重新上传即可跨多个请求发送同一文件时阅读。
  8. shared/prompt-caching.md
    — 添加或优化提示缓存时阅读。涵盖前缀稳定性设计、断点放置和静默失效的反模式。
  9. shared/error-codes.md
    — 调试HTTP错误或实现错误处理时阅读。
  10. shared/model-migration.md
    — 升级到更新模型、替换已停用模型,或将
    budget_tokens
    / 预填充模式转换为当前API时阅读。
  11. shared/live-sources.md
    — 获取最新官方文档的WebFetch URL。
注意:对于Java、Go、Ruby、C#、PHP和cURL——每种语言都有一个涵盖所有基础内容的单一文件。阅读该文件,并根据需要阅读
shared/tool-use-concepts.md
shared/error-codes.md
注意:Managed Agents的文件参考请参阅上方
## Managed Agents (Beta)
部分——其中列出了所有
shared/managed-agents-*.md
文件和语言专属README。

When to Use WebFetch

何时使用WebFetch

Use WebFetch to get the latest documentation when:
  • User asks for "latest" or "current" information
  • Cached data seems incorrect
  • User asks about features not covered here
Live documentation URLs are in
shared/live-sources.md
.
在以下情况下使用WebFetch获取最新文档:
  • 用户询问"最新"或"当前"信息
  • 缓存数据似乎不正确
  • 用户询问此处未涵盖的功能
实时文档URL在
shared/live-sources.md
中。

Common Pitfalls

常见陷阱

  • Don't truncate inputs when passing files or content to the API. If the content is too long to fit in the context window, notify the user and discuss options (chunking, summarization, etc.) rather than silently truncating.
  • Opus 4.7 thinking: Adaptive only.
    thinking: {type: "enabled", budget_tokens: N}
    returns 400 on Opus 4.7 —
    budget_tokens
    is fully removed there (along with
    temperature
    ,
    top_p
    ,
    top_k
    ). Use
    thinking: {type: "adaptive"}
    .
  • Opus 4.6 / Sonnet 4.6 thinking: Use
    thinking: {type: "adaptive"}
    — do NOT use
    budget_tokens
    for new 4.6 code (deprecated on both Opus 4.6 and Sonnet 4.6; for gradual migration of existing code, see the transitional escape hatch in
    shared/model-migration.md
    — note this carve-out does not apply to Opus 4.7). For older models,
    budget_tokens
    must be less than
    max_tokens
    (minimum 1024). This will throw an error if you get it wrong.
  • 4.6/4.7 family prefill removed: Assistant message prefills (last-assistant-turn prefills) return a 400 error on Opus 4.6, Opus 4.7, and Sonnet 4.6. Use structured outputs (
    output_config.format
    ) or system prompt instructions to control response format instead.
  • Confirm migration scope before editing: When a user asks to migrate code to a newer Claude model without naming a specific file, directory, or file list, ask which scope to apply first — the entire working directory, a specific subdirectory, or a specific set of files. Do not start editing until the user confirms. Imperative phrasings like "migrate my codebase", "move my project to X", "upgrade to Sonnet 4.6", or bare "migrate to Opus 4.7" are still ambiguous — they tell you what to do but not where, so ask. Proceed without asking only when the prompt names an exact file, a specific directory, or an explicit file list ("migrate
    app.py
    ", "migrate everything under
    services/
    ", "update
    a.py
    and
    b.py
    "). See
    shared/model-migration.md
    Step 0.
  • max_tokens
    defaults:
    Don't lowball
    max_tokens
    — hitting the cap truncates output mid-thought and requires a retry. For non-streaming requests, default to
    ~16000
    (keeps responses under SDK HTTP timeouts). For streaming requests, default to
    ~64000
    (timeouts aren't a concern, so give the model room). Only go lower when you have a hard reason: classification (
    ~256
    ), cost caps, or deliberately short outputs.
  • 128K output tokens: Opus 4.6 and Opus 4.7 support up to 128K
    max_tokens
    , but the SDKs require streaming for values that large to avoid HTTP timeouts. Use
    .stream()
    with
    .get_final_message()
    /
    .finalMessage()
    .
  • Tool call JSON parsing (4.6/4.7 family): Opus 4.6, Opus 4.7, and Sonnet 4.6 may produce different JSON string escaping in tool call
    input
    fields (e.g., Unicode or forward-slash escaping). Always parse tool inputs with
    json.loads()
    /
    JSON.parse()
    — never do raw string matching on the serialized input.
  • Structured outputs (all models): Use
    output_config: {format: {...}}
    instead of the deprecated
    output_format
    parameter on
    messages.create()
    . This is a general API change, not 4.6-specific.
  • Don't reimplement SDK functionality: The SDK provides high-level helpers — use them instead of building from scratch. Specifically: use
    stream.finalMessage()
    instead of wrapping
    .on()
    events in
    new Promise()
    ; use typed exception classes (
    Anthropic.RateLimitError
    , etc.) instead of string-matching error messages; use SDK types (
    Anthropic.MessageParam
    ,
    Anthropic.Tool
    ,
    Anthropic.Message
    , etc.) instead of redefining equivalent interfaces.
  • Don't define custom types for SDK data structures: The SDK exports types for all API objects. Use
    Anthropic.MessageParam
    for messages,
    Anthropic.Tool
    for tool definitions,
    Anthropic.ToolUseBlock
    /
    Anthropic.ToolResultBlockParam
    for tool results,
    Anthropic.Message
    for responses. Defining your own
    interface ChatMessage { role: string; content: unknown }
    duplicates what the SDK already provides and loses type safety.
  • Report and document output: For tasks that produce reports, documents, or visualizations, the code execution sandbox has
    python-docx
    ,
    python-pptx
    ,
    matplotlib
    ,
    pillow
    , and
    pypdf
    pre-installed. Claude can generate formatted files (DOCX, PDF, charts) and return them via the Files API — consider this for "report" or "document" type requests instead of plain stdout text.
  • 向API传递文件或内容时不要截断输入。如果内容太长无法放入上下文窗口,请通知用户并讨论解决方案(分块、摘要等),不要静默截断。
  • Opus 4.7思考:仅支持自适应思考。
    thinking: {type: "enabled", budget_tokens: N}
    在Opus 4.7上会返回400错误——
    budget_tokens
    已完全移除(
    temperature
    top_p
    top_k
    也已移除)。请使用
    thinking: {type: "adaptive"}
  • Opus 4.6 / Sonnet 4.6思考:使用
    thinking: {type: "adaptive"}
    ——不要在新的4.6代码中使用
    budget_tokens
    (在Opus 4.6和Sonnet 4.6上已弃用;对于现有代码的逐步迁移,请参阅
    shared/model-migration.md
    中的过渡性解决方案——注意此例外不适用于Opus 4.7)。对于旧模型,
    budget_tokens
    必须小于
    max_tokens
    (最小值1024)。如果设置错误会抛出错误。
  • 4.6/4.7系列预填充移除:在Opus 4.6、Opus 4.7和Sonnet 4.6上,助手消息预填充(最后一个助手回合预填充)会返回400错误。请改用结构化输出(
    output_config.format
    )或系统提示指令控制响应格式。
  • 编辑前确认迁移范围:当用户要求将代码迁移到更新的Claude模型但未指定具体文件、目录或文件列表时,先询问要应用的范围——整个工作目录、特定子目录还是特定文件集。在用户确认之前不要开始编辑。诸如"迁移我的代码库"、"将我的项目迁移到X"、"升级到Sonnet 4.6"或单纯的"迁移到Opus 4.7"等命令式表述仍然模糊——它们告诉你要做什么,但没说在哪里做,所以请询问。只有当提示中指定了确切文件、特定目录或明确文件列表("迁移
    app.py
    "、"迁移
    services/
    下的所有内容"、"更新
    a.py
    b.py
    ")时,才可直接操作。请参阅
    shared/model-migration.md
    步骤0。
  • max_tokens
    默认值
    :不要低估
    max_tokens
    ——达到上限会截断输出,需要重试。对于非流式请求,默认设置为
    ~16000
    (可将响应保持在SDK HTTP超时内)。对于流式请求,默认设置为
    ~64000
    (无需担心超时,给模型足够空间)。只有在有充分理由时才降低值:分类(
    ~256
    )、成本上限或故意缩短输出。
  • 128K输出令牌:Opus 4.6和Opus 4.7支持最高128K
    max_tokens
    ,但SDK要求对此类大值使用流式传输以避免HTTP超时。使用
    .stream()
    并结合
    .get_final_message()
    /
    .finalMessage()
  • 工具调用JSON解析(4.6/4.7系列):Opus 4.6、Opus 4.7和Sonnet 4.6在工具调用
    input
    字段中可能产生不同的JSON字符串转义(例如Unicode或正斜杠转义)。始终使用
    json.loads()
    /
    JSON.parse()
    解析工具输入——不要对序列化输入进行原始字符串匹配。
  • 结构化输出(所有模型):在
    messages.create()
    中使用
    output_config: {format: {...}}
    代替已弃用的
    output_format
    参数。这是通用API变更,并非仅针对4.6版本。
  • 不要重新实现SDK功能:SDK提供了高级助手——使用它们而非从头构建。具体来说:使用
    stream.finalMessage()
    代替将
    .on()
    事件包装在
    new Promise()
    中;使用类型化异常类(
    Anthropic.RateLimitError
    等)代替字符串匹配错误消息;使用SDK类型(
    Anthropic.MessageParam
    Anthropic.Tool
    Anthropic.Message
    等)代替重新定义等效接口。
  • 不要为SDK数据结构定义自定义类型:SDK导出了所有API对象的类型。使用
    Anthropic.MessageParam
    表示消息,
    Anthropic.Tool
    表示工具定义,
    Anthropic.ToolUseBlock
    /
    Anthropic.ToolResultBlockParam
    表示工具结果,
    Anthropic.Message
    表示响应。定义自己的
    interface ChatMessage { role: string; content: unknown }
    会重复SDK已提供的内容,并失去类型安全性。
  • 报告和文档输出:对于生成报告、文档或可视化的任务,代码执行沙箱预装了
    python-docx
    python-pptx
    matplotlib
    pillow
    pypdf
    。Claude可生成格式化文件(DOCX、PDF、图表)并通过Files API返回——对于"报告"或"文档"类请求,考虑使用此方式代替纯标准输出文本。