context-fundamentals

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Context Engineering Fundamentals

Context Engineering 基础

Context is the complete state available to a language model at inference time — system instructions, tool definitions, retrieved documents, message history, and tool outputs. Context engineering is the discipline of curating the smallest high-signal token set that maximizes the likelihood of desired outcomes. Every paragraph below earns its tokens by teaching a non-obvious technique or providing an actionable threshold.

Context是语言模型在推理时可用的完整状态——包括系统指令、工具定义、检索到的文档、消息历史和工具输出。Context engineering是一种精选最小高信号token集以最大化期望结果可能性的学科。以下每一段内容都通过传授非显而易见的技巧或提供可操作的阈值来体现其价值。

When to Activate

激活场景

Activate this skill when:

Designing new agent systems or modifying existing architectures
Debugging unexpected agent behavior that may relate to context
Optimizing context usage to reduce token costs or improve performance
Onboarding new team members to context engineering concepts
Reviewing context-related design decisions

在以下场景中激活此技能：

设计新的Agent系统或修改现有架构
调试可能与上下文相关的Agent异常行为
优化上下文使用以降低token成本或提升性能
向新团队成员介绍上下文工程概念
审查与上下文相关的设计决策

Core Concepts

核心概念

Treat context as a finite attention budget, not a storage bin. Every token added competes for the model's attention and depletes a budget that cannot be refilled mid-inference. The engineering problem is maximizing utility per token against three constraints: the hard token limit, the softer effective-capacity ceiling (typically 60-70% of the advertised window), and the U-shaped attention curve that penalizes information placed in the middle of context.

Apply four principles when assembling context:

Informativity over exhaustiveness — include only what matters for the current decision; design systems that can retrieve additional information on demand.
Position-aware placement — place critical constraints at the beginning and end of context, where recall accuracy runs 85-95%; the middle drops to 76-82% (the "lost-in-the-middle" effect).
Progressive disclosure — load skill names and summaries at startup; load full content only when a skill activates for a specific task.
Iterative curation — context engineering is not a one-time prompt-writing exercise but an ongoing discipline applied every time content is passed to the model.

将上下文视为有限的注意力预算，而非存储容器。每添加一个token都会争夺模型的注意力，并消耗一份在推理过程中无法补充的预算。工程问题在于在三个约束条件下最大化每token的效用：硬性token限制、较软的有效容量上限（通常为标称窗口的60-70%），以及对放置在上下文中间的信息进行惩罚的U型注意力曲线。

组装上下文时应用四大原则：

信息性优先于完备性——仅包含当前决策所需的内容；设计可按需检索额外信息的系统。
位置感知放置——将关键约束放在上下文的开头和结尾，这些位置的召回准确率可达85-95%；中间位置的准确率会降至76-82%（即“中间丢失”效应）。
渐进式披露——启动时仅加载技能名称和摘要；仅当技能针对特定任务激活时才加载完整内容。
迭代精选——上下文工程并非一次性的提示编写工作，而是每次向模型传递内容时都需应用的持续实践。

Detailed Topics

详细主题

The Anatomy of Context

上下文的构成

System Prompts Organize system prompts into distinct sections using XML tags or Markdown headers (background, instructions, tool guidance, output format). System prompts persist throughout the conversation, so place the most critical constraints at the beginning and end where attention is strongest.

Calibrate instruction altitude to balance two failure modes. Too-low altitude hardcodes brittle logic that breaks when conditions shift. Too-high altitude provides vague guidance that fails to give concrete signals for desired behavior. Aim for heuristic-driven instructions: specific enough to guide behavior, flexible enough to generalize — for example, numbered steps with room for judgment at each step.

Start minimal, then add instructions reactively based on observed failure modes rather than preemptively stuffing edge cases. Curate diverse, canonical few-shot examples that portray expected behavior instead of listing every possible scenario.

Tool Definitions Write tool descriptions that answer three questions: what the tool does, when to use it, and what it returns. Include usage context, parameter defaults, and error cases — agents cannot disambiguate tools that a human engineer cannot disambiguate either.

Keep the tool set minimal. Consolidate overlapping tools because bloated tool sets create ambiguous decision points and consume disproportionate context after JSON serialization (tool schemas typically inflate 2-3x compared to equivalent plain-text descriptions).

Retrieved Documents Maintain lightweight identifiers (file paths, stored queries, web links) and load data into context dynamically using just-in-time retrieval. This mirrors human cognition — maintain an index, not a copy. Strong identifiers (e.g.,

customer_pricing_rates.json

) let agents locate relevant files even without search tools; weak identifiers (e.g.,

data/file1.json

) force unnecessary loads.

When chunking large documents, split at natural semantic boundaries (section headers, paragraph breaks) rather than arbitrary character limits that sever mid-concept.

Message History Message history serves as the agent's scratchpad memory for tracking progress, maintaining task state, and preserving reasoning across turns. For long-running tasks, it can grow to dominate context usage — monitor and apply compaction before it crowds out active instructions.

Cyclically refine history: once a tool has been called deep in the conversation, the raw result rarely needs to remain verbatim. Replace stale tool outputs with compact summaries or references to reduce low-signal bulk.

Tool Outputs Tool outputs typically dominate context — research shows observations can reach 83.9% of total tokens in agent trajectories. Apply observation masking: replace verbose outputs with compact references once the agent has processed the result. Retain only the five most recently accessed file contents; compress or evict older ones.

系统提示词 使用XML标签或Markdown标题（背景、指令、工具指南、输出格式）将系统提示词组织为不同部分。系统提示词会在整个对话过程中保留，因此将最关键的约束放在注意力最强的开头和结尾位置。

调整指令粒度以平衡两种失效模式。粒度太细会硬编码脆弱的逻辑，当条件变化时容易崩溃。粒度太粗则提供模糊的指导，无法为期望行为提供具体信号。目标是启发式驱动的指令：足够具体以指导行为，同时足够灵活以实现泛化——例如，带有每个步骤判断空间的编号步骤。

从最小化内容开始，然后根据观察到的失效模式被动添加指令，而非预先填充边缘案例。精选多样的典型少样本示例来描绘期望行为，而非列出所有可能的场景。

工具定义 编写工具描述时需回答三个问题：工具的功能、使用场景以及返回内容。包含使用上下文、参数默认值和错误情况——人类工程师无法区分的工具，Agent也无法区分。

保持工具集精简。合并重叠工具，因为臃肿的工具集会产生模糊的决策点，并且在JSON序列化后会占用过多上下文（工具架构通常比等效的纯文本描述膨胀2-3倍）。

检索到的文档 维护轻量级标识符（文件路径、存储的查询、网页链接），并使用即时检索动态将数据加载到上下文中。这与人类认知模式相似——维护索引，而非副本。强标识符（如

customer_pricing_rates.json

）可让Agent无需搜索工具即可定位相关文件；弱标识符（如

data/file1.json

）会导致不必要的加载。

拆分大型文档时，按自然语义边界（章节标题、段落分隔符）拆分，而非按会切断概念的任意字符限制拆分。

消息历史 消息历史作为Agent的临时记忆，用于跟踪进度、维护任务状态以及保留多轮对话中的推理过程。对于长期运行的任务，消息历史可能会占据大部分上下文——需进行监控，并在它挤占活跃指令之前应用压缩。

循环优化历史：当工具在对话深处被调用后，原始结果很少需要完整保留。用简洁的摘要或引用替换过时的工具输出，以减少低信号内容。

工具输出 工具输出通常占据上下文的主要部分——研究表明，在Agent的任务轨迹中，观测结果可占总token的83.9%。应用观测掩码：一旦Agent处理完结果，就用简洁的引用替换冗长的输出。仅保留最近访问的五个文件内容；压缩或清除较旧的内容。

Context Windows and Attention Mechanics

上下文窗口与注意力机制

The Attention Budget For n tokens, the attention mechanism computes n-squared pairwise relationships. As context grows, the model's ability to maintain these relationships degrades — not as a hard cliff but as a performance gradient. Models trained predominantly on shorter sequences have fewer specialized parameters for context-wide dependencies, creating an effective ceiling well below the nominal window size.

Design for this gradient: assume effective capacity is 60-70% of the advertised window. A 200K-token model starts degrading around 120-140K tokens, and complex retrieval accuracy can drop to as low as 15% at extreme lengths.

Position Encoding Limits Position encoding interpolation extends sequence handling beyond training lengths but introduces degradation in positional precision. Expect reduced accuracy for information retrieval and long-range reasoning at extended contexts compared to performance on shorter inputs.

Progressive Disclosure in Practice Implement progressive disclosure at three levels:

Skill selection — load only names and descriptions at startup; activate full skill content on demand.
Document loading — load summaries first; fetch detail sections only when the task requires them.
Tool result retention — keep recent results in full; compress or evict older results.

Keep the boundary crisp: if a skill or document is activated, load it fully rather than partially — partial loads create confusing gaps that degrade reasoning quality.

注意力预算 对于n个token，注意力机制会计算n²个两两关系。随着上下文增长，模型维持这些关系的能力会下降——并非突然失效，而是性能逐渐降低。主要基于较短序列训练的模型用于上下文范围依赖关系的专用参数较少，导致有效容量远低于标称窗口大小。

针对这种梯度进行设计：假设有效容量为标称窗口的60-70%。一个200K-token的模型在120-140K token左右开始性能下降，在极端长度下，复杂检索的准确率可低至15%。

位置编码限制 位置编码插值可将序列处理扩展到训练长度之外，但会降低位置精度。与较短输入的性能相比，在扩展上下文下的信息检索和长程推理准确率会有所下降。

渐进式披露的实践 在三个层面实现渐进式披露：

技能选择——启动时仅加载名称和描述；按需激活完整技能内容。
文档加载——先加载摘要；仅当任务需要时才获取详细章节。
工具结果保留——完整保留最近的结果；压缩或清除较旧的结果。

保持边界清晰：如果技能或文档被激活，则完整加载而非部分加载——部分加载会造成混淆的信息缺口，降低推理质量。

Context Quality Versus Quantity

上下文质量与数量

Reject the assumption that larger context windows solve memory problems. Processing cost grows disproportionately with context length — not just linear cost scaling, but degraded model performance beyond effective capacity thresholds. Long inputs remain expensive even with prefix caching.

Apply the signal-density test: for each piece of context, ask whether removing it would change the model's output. If not, remove it. Redundant content does not merely waste tokens — it actively dilutes attention from high-signal content.

拒绝“更大的上下文窗口解决内存问题”的假设。处理成本随上下文长度不成比例地增长——不仅是线性成本增长，超过有效容量阈值后模型性能也会下降。即使使用前缀缓存，长输入仍然成本高昂。

应用信号密度测试：对于每一段上下文，询问移除它是否会改变模型的输出。如果不会，则移除它。冗余内容不仅浪费token，还会主动分散对高信号内容的注意力。

Practical Guidance

实践指导

File-System-Based Access

基于文件系统的访问

Agents with filesystem access implement progressive disclosure naturally. Store reference materials, documentation, and data externally. Load files only when the current task requires them. Leverage the filesystem's own structure as metadata: file sizes suggest complexity, naming conventions hint at purpose, timestamps serve as proxies for relevance.

拥有文件系统访问权限的Agent可自然实现渐进式披露。将参考资料、文档和数据存储在外部。仅当当前任务需要时才加载文件。利用文件系统自身的结构作为元数据：文件大小暗示复杂度，命名规则提示用途，时间戳作为相关性的代理。

Hybrid Context Strategies

混合上下文策略

Pre-load stable context for speed (CLAUDE.md files, project rules, core instructions) but enable autonomous exploration for dynamic content. The decision boundary depends on content volatility:

Low volatility (project conventions, team standards): pre-load at session start.
High volatility (code state, external data, user-specific info): retrieve just-in-time to avoid stale context.

For complex multi-hour tasks, maintain a structured notes file (e.g., NOTES.md) that the agent updates as it works. This enables coherence across context resets without keeping everything in the active window.

预加载稳定上下文以提高速度（CLAUDE.md文件、项目规则、核心指令），但允许自主探索动态内容。决策边界取决于内容的波动性：

低波动性（项目惯例、团队标准）：会话开始时预加载。
高波动性（代码状态、外部数据、用户特定信息）：即时检索以避免上下文过时。

对于复杂的多小时任务，维护一个结构化的笔记文件（如NOTES.md），Agent在工作时更新该文件。这可在上下文重置时保持连贯性，而无需将所有内容保留在活跃窗口中。

Context Budgeting

上下文预算

Allocate explicit budgets per component and monitor during development. Implement compaction triggers at 70-80% utilization — do not wait for the window to fill. Design systems that degrade gracefully: when compaction fires, preserve architectural decisions, unresolved bugs, and implementation details while discarding redundant outputs.

For sub-agent architectures, enforce a compression ratio: a sub-agent may explore using tens of thousands of tokens but must return a condensed summary of 1,000-2,000 tokens. This converts exploration breadth into context-efficient results.

为每个组件分配明确的预算，并在开发过程中进行监控。在利用率达到70-80%时触发压缩——不要等到窗口填满。设计可优雅降级的系统：当压缩触发时，保留架构决策、未解决的bug和实现细节，同时丢弃冗余输出。

对于子Agent架构，强制执行压缩比率：子Agent可能使用数万个token进行探索，但必须返回1000-2000个token的浓缩摘要。这将探索广度转换为上下文高效的结果。

Examples

示例

Example 1: Organizing System Prompts

markdown

<BACKGROUND_INFORMATION>
You are a Python expert helping a development team.
Current project: Data processing pipeline in Python 3.9+
</BACKGROUND_INFORMATION>

<INSTRUCTIONS>
- Write clean, idiomatic Python code
- Include type hints for function signatures
- Add docstrings for public functions
- Follow PEP 8 style guidelines
</INSTRUCTIONS>

<TOOL_GUIDANCE>
Use bash for shell operations, python for code tasks.
File operations should use pathlib for cross-platform compatibility.
</TOOL_GUIDANCE>

<OUTPUT_DESCRIPTION>
Provide code blocks with syntax highlighting.
Explain non-obvious decisions in comments.
</OUTPUT_DESCRIPTION>

Example 2: Progressive Document Loading

markdown

undefined

示例1：组织系统提示词

markdown

<BACKGROUND_INFORMATION>
你是一名帮助开发团队的Python专家。
当前项目：基于Python 3.9+的数据处理管道
</BACKGROUND_INFORMATION>

<INSTRUCTIONS>
- 编写简洁、符合Python风格的代码
- 为函数签名添加类型提示
- 为公共函数添加文档字符串
- 遵循PEP 8风格指南
</INSTRUCTIONS>

<TOOL_GUIDANCE>
使用bash执行shell操作，使用python处理代码任务。
文件操作应使用pathlib以实现跨平台兼容性。
</TOOL_GUIDANCE>

<OUTPUT_DESCRIPTION>
提供带有语法高亮的代码块。
在注释中解释非显而易见的决策。
</OUTPUT_DESCRIPTION>

示例2：渐进式文档加载

markdown

undefined

Instead of loading all documentation at once:

不要一次性加载所有文档：

Step 1: Load summary

步骤1：加载摘要

docs/api_summary.md # Lightweight overview

docs/api_summary.md # 轻量级概述

Step 2: Load specific section as needed

步骤2：按需加载特定章节

docs/api/endpoints.md # Only when API calls needed docs/api/authentication.md # Only when auth context needed

undefined

docs/api/endpoints.md # 仅当需要进行API调用时加载 docs/api/authentication.md # 仅当需要认证上下文时加载

undefined

Guidelines

指南

Treat context as a finite resource with diminishing returns
Place critical information at attention-favored positions (beginning and end)
Use progressive disclosure to defer loading until needed
Organize system prompts with clear section boundaries
Monitor context usage during development
Implement compaction triggers at 70-80% utilization
Design for context degradation rather than hoping to avoid it
Prefer smaller high-signal context over larger low-signal context

将上下文视为具有边际效益递减特性的有限资源
将关键信息放在注意力偏好位置（开头和结尾）
使用渐进式披露延迟加载，直到需要时再加载
使用清晰的章节边界组织系统提示词
在开发过程中监控上下文使用情况
在利用率达到70-80%时触发压缩
针对上下文降级进行设计，而非期望避免降级
优先选择小体积高信号的上下文，而非大体积低信号的上下文

Gotchas

注意事项

Nominal window is not effective capacity: A model advertising 200K tokens begins degrading around 120-140K. Budget for 60-70% of the nominal window as usable capacity. Exceeding this threshold causes sudden accuracy drops, not gradual degradation — test at realistic context sizes, not toy examples.
Character-based token estimates silently drift: The ~4 characters/token heuristic for English prose breaks down for code (2-3 chars/token), URLs and file paths (each slash, dot, and colon is a separate token), and non-English text (often 1-2 chars/token). Use the provider's actual tokenizer (e.g., tiktoken for OpenAI models, Anthropic's token counting API) for any budget-critical calculation.
Tool schemas inflate 2-3x after JSON serialization: A tool definition that looks compact in source code expands significantly when serialized — brackets, quotes, colons, and commas each consume tokens. Ten tools with moderate schemas can consume 5,000-8,000 tokens before a single message is sent. Audit serialized tool token counts, not source-code line counts.
Message history balloons silently in agentic loops: Each tool call adds both the request and the full response to history. After 20-30 iterations, history can consume 70-80% of the window while the agent shows no visible symptoms until reasoning quality collapses. Set a hard token ceiling on history and trigger compaction proactively.
Critical instructions in the middle get lost: The U-shaped attention curve means the middle of context receives 10-40% less recall accuracy than the beginning and end. Never place safety constraints, output format requirements, or behavioral guardrails in the middle of a long system prompt — anchor them at the top or bottom.
Progressive disclosure that loads too eagerly defeats its purpose: Loading every "potentially relevant" skill or document at the first hint of relevance recreates the context-stuffing problem. Set strict activation thresholds — a skill should load only when the task explicitly matches its trigger conditions, not when the topic is merely adjacent.
Mixing instruction altitudes causes inconsistent behavior: Combining hyper-specific rules ("always use exactly 3 bullet points") with vague directives ("be helpful") in the same prompt creates conflicting signals. Group instructions by altitude level and keep each section internally consistent — either heuristic-driven or prescriptive, not both interleaved.

标称窗口不等于有效容量：标称200K token的模型在120-140K token左右开始性能下降。将标称窗口的60-70%作为可用容量进行预算。超过此阈值会导致准确率突然下降，而非逐渐降低——需在真实上下文大小下测试，而非使用示例场景。
基于字符的token估算会悄然偏差：英文散文约4字符/token的经验法则不适用于代码（2-3字符/token）、URL和文件路径（每个斜杠、点和冒号都是单独的token）以及非英文文本（通常1-2字符/token）。对于任何预算关键的计算，使用提供商的实际tokenizer（如OpenAI模型的tiktoken，Anthropic的token计数API）。
工具架构在JSON序列化后膨胀2-3倍：源代码中看似简洁的工具定义在序列化后会显著膨胀——括号、引号、冒号和逗号都会消耗token。10个带有中等架构的工具在发送第一条消息之前就可能消耗5000-8000个token。审核序列化后的工具token计数，而非源代码行数。
消息历史在Agent循环中悄然膨胀：每次工具调用都会将请求和完整响应添加到历史中。经过20-30次迭代后，历史可占据窗口的70-80%，而Agent在推理质量崩溃前不会表现出明显症状。为历史设置硬性token上限，并主动触发压缩。
中间位置的关键指令会丢失：U型注意力曲线意味着上下文中间位置的召回准确率比开头和结尾低10-40%。切勿将安全约束、输出格式要求或行为防护措施放在长系统提示词的中间——将它们固定在顶部或底部。
过于急切的渐进式披露会适得其反：在第一次提示相关性时加载所有“潜在相关”的技能或文档，会重新造成上下文填充问题。设置严格的激活阈值——仅当任务明确匹配其触发条件时才加载技能，而非仅当主题相关时。
混合指令粒度会导致行为不一致：在同一提示词中结合超具体规则（“始终使用恰好3个项目符号”）和模糊指令（“提供帮助”）会产生冲突信号。按粒度级别对指令进行分组，并保持每个部分内部一致——要么是启发式驱动，要么是规定性的，不要混合使用。

Integration

集成

This skill provides foundational context that all other skills build upon. It should be studied first before exploring:

context-degradation - Understanding how context fails
context-optimization - Techniques for extending context capacity
multi-agent-patterns - How context isolation enables multi-agent systems
tool-design - How tool definitions interact with context

此技能提供所有其他技能构建的基础上下文。在探索以下内容之前，应首先学习此技能：

context-degradation - 理解上下文如何失效
context-optimization - 扩展上下文容量的技巧
multi-agent-patterns - 上下文隔离如何支持多Agent系统
tool-design - 工具定义如何与上下文交互

References

参考资料

Internal reference:

Context Components Reference - Read when: debugging a specific context component (system prompts, tool definitions, message history, tool outputs) or implementing chunking, observation masking, or budget allocation tables

Related skills in this collection:

context-degradation - Read when: agent performance drops as conversations grow or context fills beyond 60% capacity
context-optimization - Read when: token costs are too high or compaction/compression strategies are needed

External resources:

Anthropic's "Effective Context Engineering for AI Agents" — production patterns for compaction, sub-agents, and hybrid retrieval
Research on transformer attention mechanisms and the lost-in-the-middle effect
Tokenomics research on agentic software engineering token distribution

内部参考：

上下文组件参考 - 当调试特定上下文组件（系统提示词、工具定义、消息历史、工具输出）或实现拆分、观测掩码或预算分配表时阅读

此集合中的相关技能：

context-degradation - 当Agent性能随对话增长或上下文填充超过60%容量而下降时阅读
context-optimization - 当token成本过高或需要压缩/压缩策略时阅读

外部资源：

Anthropic的《AI Agent有效上下文工程》——关于压缩、子Agent和混合检索的生产模式
关于Transformer注意力机制和“中间丢失”效应的研究
关于Agent化软件工程token分布的Tokenomics研究

Skill Metadata

技能元数据

Created: 2025-12-20 Last Updated: 2026-03-17 Author: Agent Skills for Context Engineering Contributors Version: 2.0.0

创建时间: 2025-12-20 最后更新时间: 2026-03-17 作者: Agent Skills for Context Engineering Contributors 版本: 2.0.0