Back to Details

context-engineering

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Context Engineering

上下文工程

Context engineering curates the smallest high-signal token set for LLM tasks. The goal: maximize reasoning quality while minimizing token usage.

上下文工程为LLM任务筛选出最小规模的高信号Token集合。目标：在最小化Token使用量的同时最大化推理质量。

When to Activate

适用场景

Designing/debugging agent systems
Context limits constrain performance
Optimizing cost/latency
Building multi-agent coordination
Implementing memory systems
Evaluating agent performance
Developing LLM-powered pipelines

设计/调试Agent系统
上下文限制制约性能时
优化成本/延迟
构建多Agent协作机制
实现内存系统
评估Agent性能
开发基于LLM的流水线

Core Principles

核心原则

Context quality > quantity - High-signal tokens beat exhaustive content
Attention is finite - U-shaped curve favors beginning/end positions
Progressive disclosure - Load information just-in-time
Isolation prevents degradation - Partition work across sub-agents
Measure before optimizing - Know your baseline

上下文质量 > 数量 - 高信号Token优于详尽内容
注意力是有限的 - U型曲线显示开头/结尾位置更受关注
渐进式披露 - 仅在需要时加载信息
隔离防止退化 - 将工作拆分到子Agent中
先衡量再优化 - 了解你的基准线

Quick Reference

快速参考

Topic	When to Use	Reference
Fundamentals	Understanding context anatomy, attention mechanics	context-fundamentals.md
Degradation	Debugging failures, lost-in-middle, poisoning	context-degradation.md
Optimization	Compaction, masking, caching, partitioning	context-optimization.md
Compression	Long sessions, summarization strategies	context-compression.md
Memory	Cross-session persistence, knowledge graphs	memory-systems.md
Multi-Agent	Coordination patterns, context isolation	multi-agent-patterns.md
Evaluation	Testing agents, LLM-as-Judge, metrics	evaluation.md
Tool Design	Tool consolidation, description engineering	tool-design.md
Pipelines	Project development, batch processing	project-development.md

主题	适用场景	参考文档
基础原理	理解上下文结构、注意力机制	context-fundamentals.md
退化问题	调试故障、中间信息丢失、信息污染	context-degradation.md
优化技术	压缩、掩码、缓存、分区	context-optimization.md
压缩策略	长会话、总结策略	context-compression.md
内存系统	跨会话持久化、知识图谱	memory-systems.md
多Agent模式	协作模式、上下文隔离	multi-agent-patterns.md
评估方法	Agent测试、LLM-as-Judge、指标	evaluation.md
工具设计	工具整合、描述工程	tool-design.md
流水线开发	项目开发、批处理	project-development.md

Key Metrics

关键指标

Token utilization: Warning at 70%, trigger optimization at 80%
Token variance: Explains 80% of agent performance variance
Multi-agent cost: ~15x single agent baseline
Compaction target: 50-70% reduction, <5% quality loss
Cache hit target: 70%+ for stable workloads

Token利用率：70%时发出警告，80%时触发优化
Token方差：解释了80%的Agent性能差异
多Agent成本：约为单Agent基准的15倍
压缩目标：减少50-70%的Token，质量损失<5%
缓存命中率目标：稳定工作负载下达到70%+

Four-Bucket Strategy

四桶策略

Write: Save context externally (scratchpads, files)
Select: Pull only relevant context (retrieval, filtering)
Compress: Reduce tokens while preserving info (summarization)
Isolate: Split across sub-agents (partitioning)

写入：将上下文保存到外部（草稿本、文件）
选择：仅提取相关上下文（检索、过滤）
压缩：在保留信息的同时减少Token（总结）
隔离：拆分到子Agent中（分区）

Anti-Patterns

反模式

Exhaustive context over curated context
Critical info in middle positions
No compaction triggers before limits
Single agent for parallelizable tasks
Tools without clear descriptions

优先使用详尽上下文而非精选上下文
关键信息放在上下文中间位置
达到限制前未触发压缩机制
用单Agent处理可并行任务
工具没有清晰的描述

Guidelines

指导方针

Place critical info at beginning/end of context
Implement compaction at 70-80% utilization
Use sub-agents for context isolation, not role-play
Design tools with 4-question framework (what, when, inputs, returns)
Optimize for tokens-per-task, not tokens-per-request
Validate with probe-based evaluation
Monitor KV-cache hit rates in production
Start minimal, add complexity only when proven necessary

将关键信息放在上下文的开头/结尾
在Token利用率达到70-80%时触发压缩
使用子Agent进行上下文隔离，而非角色扮演
用4问题框架设计工具（功能、适用场景、输入、返回值）
针对“每任务Token数”优化，而非“每请求Token数”
使用基于探针的评估进行验证
在生产环境中监控KV-cache命中率
从最小化方案开始，仅在验证必要时增加复杂度

Scripts

脚本

context_analyzer.py - Context health analysis, degradation detection
compression_evaluator.py - Compression quality evaluation

context_analyzer.py - 上下文健康分析、退化检测
compression_evaluator.py - 压缩质量评估