context-management

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Context Management Skill

上下文管理技能

Overview

概述

This skill teaches you how to proactively manage your conversation context in AMCP to avoid LLM API errors caused by exceeding context window limits. Context management is critical when:
  • Working on long coding sessions with many files
  • Processing large tool outputs (e.g.,
    grep
    results, file reads)
  • Running multi-step debugging sessions
  • Reviewing or refactoring large codebases
本技能将教你如何主动在AMCP中管理对话上下文,避免因超出上下文窗口限制而引发LLM API错误。在以下场景中,上下文管理至关重要:
  • 处理包含大量文件的长时间编码会话
  • 处理大型工具输出(如
    grep
    结果、文件读取内容)
  • 运行多步骤调试会话
  • 评审或重构大型代码库

Understanding Context Windows

理解上下文窗口

Different LLM models have different context window sizes:
Model FamilyContext Window
GPT-4 Turbo / GPT-4o128,000 tokens
GPT-4.11,000,000 tokens
Claude 3.5 Sonnet200,000 tokens
DeepSeek V364,000 tokens
Gemini 2.0 Flash1,000,000 tokens
Qwen 2.5128,000 tokens
AMCP automatically detects most model context windows. For unknown models, it uses 32,000 tokens as the default.
不同LLM模型的上下文窗口大小各不相同:
模型系列上下文窗口
GPT-4 Turbo / GPT-4o128,000 tokens
GPT-4.11,000,000 tokens
Claude 3.5 Sonnet200,000 tokens
DeepSeek V364,000 tokens
Gemini 2.0 Flash1,000,000 tokens
Qwen 2.5128,000 tokens
AMCP会自动检测大多数模型的上下文窗口。对于未知模型,默认使用32,000 tokens作为上下文窗口大小。

Key Metrics

关键指标

When managing context, track:
  • Current tokens: Estimated size of current conversation history
  • Threshold tokens: When compaction should trigger (default: 70% of context)
  • Target tokens: What to aim for after compaction (default: 30% of context)
  • Safety margin: Reserved for response generation (default: 10%)
管理上下文时,需跟踪以下指标:
  • 当前tokens:当前对话历史的估算大小
  • 阈值tokens:触发压缩的阈值(默认:上下文的70%)
  • 目标tokens:压缩后要达到的目标大小(默认:上下文的30%)
  • 安全余量:为响应生成预留的空间(默认:10%)

How AMCP Compacts Context

AMCP如何压缩上下文

AMCP's
SmartCompactor
automatically compresses context when it exceeds the threshold:
python
undefined
当上下文超出阈值时,AMCP的
SmartCompactor
会自动压缩上下文:
python
undefined

In agent.py (line 500-505):

In agent.py (line 500-505):

compactor = SmartCompactor(client, model) if compactor.should_compact(history_to_add): history_to_add, _ = compactor.compact(history_to_add)

**This happens automatically during conversation!** You don't need to trigger it manually.
compactor = SmartCompactor(client, model) if compactor.should_compact(history_to_add): history_to_add, _ = compactor.compact(history_to_add)

**此过程会在对话中自动进行!** 你无需手动触发。

Compaction Strategies

压缩策略

AMCP supports four strategies (configurable via
CompactionConfig
):
  1. SUMMARY (default): Uses LLM to create intelligent summary of old messages
    • Best for: Long sessions where earlier context is important
    • Preserves: Errors, working solutions, current task state, file paths
  2. TRUNCATE: Simple removal of old messages, keeping first and last few
    • Best for: Fast compaction when context is less important
    • Fastest option
  3. SLIDING_WINDOW: Keeps only the most recent messages that fit in target
    • Best for: Sessions where only recent context matters
    • Very efficient
  4. HYBRID: Combines sliding window with summary of removed content
    • Best for: Balance between summary quality and speed
AMCP支持四种可通过
CompactionConfig
配置的策略:
  1. SUMMARY(默认):使用LLM对旧消息生成智能摘要
    • 最适用于:早期上下文仍很重要的长会话
    • 保留内容:错误信息、可行解决方案、当前任务状态、文件路径
  2. TRUNCATE:简单移除旧消息,保留首尾少量内容
    • 最适用于:上下文重要性较低时的快速压缩
    • 速度最快的选项
  3. SLIDING_WINDOW:仅保留能容纳在目标大小内的最新消息
    • 最适用于:仅近期上下文重要的会话
    • 效率极高
  4. HYBRID:结合滑动窗口与已移除内容的摘要
    • 最适用于:在摘要质量和速度之间取得平衡

Best Practices for Context Management

上下文管理最佳实践

1. Read Files Selectively

1. 选择性读取文件

Instead of reading entire large files:
python
undefined
避免读取整个大文件:
python
undefined

BAD: Reads entire file

错误示例:读取整个文件

read_file(path="src/large_module.py")
read_file(path="src/large_module.py")

GOOD: Read specific sections

正确示例:读取特定章节

read_file(path="src/large_module.py", mode="indentation", offset=100, limit=50)
read_file(path="src/large_module.py", mode="indentation", offset=100, limit=50)

GOOD: Read specific line ranges

正确示例:读取特定行范围

read_file(path="src/large_module.py", mode="slice", ranges=["1-50", "200-250"])

**Use `read_file` in indentation mode** - it intelligently captures code blocks around your target, providing context without excessive content.
read_file(path="src/large_module.py", mode="slice", ranges=["1-50", "200-250"])

**使用`read_file`的缩进模式** - 它会智能捕获目标周围的代码块,在不引入过多内容的前提下提供上下文。

2. Use Grep with Context Limits

2. 限制Grep的上下文范围

python
undefined
python
undefined

BAD: Returns all matches with full context

错误示例:返回所有匹配结果及完整上下文

grep(pattern="function", paths=["src/"])
grep(pattern="function", paths=["src/"])

GOOD: Limited context

正确示例:限制上下文范围

grep(pattern="function", paths=["src/"], context=2)
undefined
grep(pattern="function", paths=["src/"], context=2)
undefined

3. Iterate Over Small Batches

3. 分小批量处理

When processing multiple files:
python
undefined
处理多个文件时:
python
undefined

Instead of processing all files at once:

不要一次性处理所有文件:

for file in files: # Process one file at a time result = process_file(file) # Save intermediate results
undefined
for file in files: # 一次处理一个文件 result = process_file(file) # 保存中间结果
undefined

4. Clear Conversation When Starting New Tasks

4. 开始新任务时清除对话

After completing a complex task, consider suggesting:
"会话历史较长。如果开始新的无关任务,建议清除历史或创建新会话以减少上下文。"
完成复杂任务后,可以建议:
"会话历史较长。如果开始新的无关任务,建议清除历史或创建新会话以减少上下文。"

5. Be Strategic with Tool Calls

5. 合理调用工具

  • Avoid redundant tool calls (e.g., reading the same file multiple times)
  • Use
    grep
    to find what you need before reading files
  • Process results incrementally rather than all at once
  • 避免重复调用工具(如多次读取同一文件)
  • 先使用
    grep
    找到所需内容,再读取文件
  • 增量处理结果,而非一次性处理全部

Monitoring Context Usage

监控上下文使用情况

You can check context usage programmatically:
python
from amcp import SmartCompactor
你可以通过编程方式检查上下文使用情况:
python
from amcp import SmartCompactor

Create compactor

创建压缩器

compactor = SmartCompactor(client, model="gpt-4-turbo")
compactor = SmartCompactor(client, model="gpt-4-turbo")

Get detailed usage info

获取详细使用信息

usage = compactor.get_token_usage(messages) print(f"Current: {usage['current_tokens']:,} tokens") print(f"Usage: {usage['usage_ratio']:.1%} of context") print(f"Headroom: {usage['headroom_tokens']:,} tokens") print(f"Should compact: {usage['should_compact']}")
undefined
usage = compactor.get_token_usage(messages) print(f"当前tokens: {usage['current_tokens']:,} tokens") print(f"使用率: {usage['usage_ratio']:.1%} 上下文") print(f"剩余空间: {usage['headroom_tokens']:,} tokens") print(f"需要压缩: {usage['should_compact']}")
undefined

Configuration

配置

Context compaction is configured via
CompactionConfig
:
python
from amcp import SmartCompactor, CompactionConfig, CompactionStrategy

config = CompactionConfig(
    strategy=CompactionStrategy.SUMMARY,
    threshold_ratio=0.7,  # Compact at 70% usage
    target_ratio=0.3,     # Aim for 30% after compaction
    preserve_last=6,      # Keep last 6 user/assistant messages
    preserve_tool_results=True,  # Preserve recent tool results
    max_tool_results=10,  # Max tool results to preserve
    min_tokens_to_compact=5000,  # Don't compact tiny contexts
    safety_margin=0.1,    # 10% margin for responses
)
You can configure compaction in
~/.config/amcp/config.toml
:
toml
[chat]
model = "deepseek-chat"

[compaction]
strategy = "summary"  # summary, truncate, sliding_window, hybrid
threshold_ratio = 0.7
target_ratio = 0.3
上下文压缩可通过
CompactionConfig
进行配置:
python
from amcp import SmartCompactor, CompactionConfig, CompactionStrategy

config = CompactionConfig(
    strategy=CompactionStrategy.SUMMARY,
    threshold_ratio=0.7,  # 使用率达70%时触发压缩
    target_ratio=0.3,     # 压缩后目标使用率为30%
    preserve_last=6,      # 保留最后6条用户/助手消息
    preserve_tool_results=True,  # 保留近期工具结果
    max_tool_results=10,  # 保留工具结果的最大数量
    min_tokens_to_compact=5000,  # 上下文过小时不压缩
    safety_margin=0.1,    # 为响应预留10%的安全余量
)
你也可以在
~/.config/amcp/config.toml
中配置压缩:
toml
[chat]
model = "deepseek-chat"

[compaction]
strategy = "summary"  # summary, truncate, sliding_window, hybrid
threshold_ratio = 0.7
target_ratio = 0.3

Automatic Summary Structure

自动摘要结构

When using SUMMARY or HYBRID strategies, the summary follows this structure:
xml
<current_task>
What we're working on now - be specific about files and goals
</current_task>

<completed>
- Task 1: Brief outcome + key changes made
- Task 2: Brief outcome + key changes made
</completed>

<code_state>
Key files and their current state - signatures + key logic only
Include file paths that were modified
</code_state>

<important>
Any crucial context: errors, decisions made, constraints, blockers
</important>
使用SUMMARY或HYBRID策略时,摘要遵循以下结构:
xml
<current_task>
当前正在处理的任务 - 需明确文件和目标
</current_task>

<completed>
- 任务1:简要结果 + 关键修改内容
- 任务2:简要结果 + 关键修改内容
</completed>

<code_state>
关键文件及其当前状态 - 仅保留签名和核心逻辑
包含已修改的文件路径
</code_state>

<important>
所有关键上下文:错误信息、已做决策、约束条件、阻塞问题
</important>

Token Estimation

Token估算

AMCP provides accurate token estimation:
python
from amcp import estimate_tokens

tokens = estimate_tokens(messages)
  • Uses
    tiktoken
    library when available (recommended)
  • Falls back to character-based estimation (4 chars ≈ 1 token)
  • Accounts for message role overhead
AMCP提供精准的token估算功能:
python
from amcp import estimate_tokens

tokens = estimate_tokens(messages)
  • 可用时使用
    tiktoken
    库(推荐)
  • 回退到基于字符的估算(4个字符≈1个token)
  • 考虑消息角色的额外开销

Common Issues and Solutions

常见问题及解决方案

Issue 1: "Context length exceeded" API Error

问题1:“Context length exceeded” API错误

Symptom: LLM API returns error about context window being exceeded.
Causes:
  • Large tool outputs (e.g.,
    grep
    or
    find
    results)
  • Reading many large files
  • Long multi-step task conversations
Solutions:
  1. Let AMCP's auto-compaction handle it (it will compact automatically)
  2. Reduce tool output size (use
    grep --limit
    ,
    read_file
    with ranges)
  3. Process files in batches, not all at once
  4. Suggest clearing conversation history if starting new task
症状:LLM API返回上下文窗口超出的错误。
原因
  • 大型工具输出(如
    grep
    find
    结果)
  • 读取大量大文件
  • 多步骤任务的长对话
解决方案
  1. 让AMCP的自动压缩功能处理(会自动触发压缩)
  2. 减小工具输出大小(使用
    grep --limit
    、带范围的
    read_file
  3. 分批次处理文件,而非一次性处理
  4. 如果开始新任务,建议清除对话历史

Issue 2: Context Losing Important Information

问题2:上下文丢失重要信息

Symptom: After compaction, agent forgets critical details.
Causes:
  • SUMMARY strategy missed important context
  • Old critical messages were removed
Solutions:
  1. Use
    preserve_last
    to keep more recent messages
  2. Consider HYBRID strategy (keeps sliding window + summary)
  3. Manually restate critical context in your prompt
症状:压缩后,Agent忘记了关键细节。
原因
  • SUMMARY策略遗漏了重要上下文
  • 旧的关键消息被移除
解决方案
  1. 使用
    preserve_last
    参数保留更多近期消息
  2. 考虑使用HYBRID策略(保留滑动窗口+摘要)
  3. 在提示词中手动重述关键上下文

Issue 3: Summary Losing Code Details

问题3:摘要丢失代码细节

Symptom: Agent forgets specific code changes after compaction.
Solutions:
  1. Use
    preserve_tool_results=True
    (default) to keep recent tool outputs
  2. Increase
    max_tool_results
    (default: 10)
  3. Review summary and add missing details
症状:压缩后,Agent忘记了特定的代码修改。
解决方案
  1. 设置
    preserve_tool_results=True
    (默认开启)以保留近期工具输出
  2. 增大
    max_tool_results
    的值(默认:10)
  3. 检查摘要并补充缺失的细节

Progressive Disclosure

渐进式披露

AMCP uses progressive disclosure to manage skill instructions:
python
undefined
AMCP使用渐进式披露来管理技能说明:
python
undefined

In skills.py (line 297):

In skills.py (line 297):

skills_summary = skill_manager.build_skills_summary()

This means:
- You only get a compact summary of all skills initially
- Full skill content is available when needed
- Reduces initial context overhead
skills_summary = skill_manager.build_skills_summary()

这意味着:
- 初始时你只会收到所有技能的精简摘要
- 完整技能内容会在需要时提供
- 减少初始上下文开销

Memory System vs Context

记忆系统与上下文的区别

AMCP separates:
  1. Conversation Context: The messages sent to the LLM (limited by context window)
  2. Persistent Memory: Long-term storage in
    .amcp/memory/
    (unlimited)
Use the memory system to store important information that should persist long-term:
python
memory(action="write", content="# Project Notes\n- Uses PostgreSQL database")
This information is NOT in the conversation context - it's stored separately and retrieved when relevant.
AMCP将两者分开:
  1. 对话上下文:发送给LLM的消息(受上下文窗口限制)
  2. 持久化记忆:存储在
    .amcp/memory/
    中的长期存储(无限制)
使用记忆系统存储需要长期保留的重要信息:
python
memory(action="write", content="# 项目笔记\n- 使用PostgreSQL数据库")
此信息不会出现在对话上下文中 - 它会被单独存储,并在相关时被检索。

Practical Workflow

实用工作流

For complex tasks with large context:
  1. Start: Use
    grep
    or
    find
    to locate relevant files
    python
    grep(pattern="class User", paths=["src/"])
  2. Explore: Read specific sections using indentation mode
    python
    read_file(path="src/models/user.py", mode="indentation", offset=1)
  3. Process: Work incrementally, one file at a time
  4. Monitor: If context gets large, compaction happens automatically
  5. Persist: Save important findings to memory if needed for future sessions
    python
    memory(action="append", content="Found authentication bug in src/auth.py:45")
处理上下文较大的复杂任务时:
  1. 定位:使用
    grep
    find
    找到相关文件
    python
    grep(pattern="class User", paths=["src/"])
  2. 探索:使用缩进模式读取特定章节
    python
    read_file(path="src/models/user.py", mode="indentation", offset=1)
  3. 处理:增量工作,一次处理一个文件
  4. 监控:如果上下文变大,压缩会自动进行
  5. 持久化:若未来会话需要,将重要发现保存到记忆中
    python
    memory(action="append", content="在src/auth.py:45处发现认证漏洞")

Event Monitoring

事件监控

AMCP emits events when compaction occurs:
python
from amcp import get_event_bus, EventType

@get_event_bus().on(EventType.CONTEXT_COMPACTED)
async def on_compaction(event):
    data = event.data
    print(f"Context compacted: {data['original_tokens']} -> {data['compacted_tokens']}")
压缩发生时,AMCP会触发事件:
python
from amcp import get_event_bus, EventType

@get_event_bus().on(EventType.CONTEXT_COMPACTED)
async def on_compaction(event):
    data = event.data
    print(f"上下文已压缩: {data['original_tokens']} -> {data['compacted_tokens']}")

Key Takeaways

核心要点

  1. Auto-compaction happens automatically - you don't need to trigger it
  2. Read files selectively - use indentation mode or ranges
  3. Monitor context usage - be aware of large tool outputs
  4. Persist important info to memory - for long-term retention
  5. Process incrementally - handle large tasks in steps
  6. Configure as needed - adjust
    CompactionConfig
    for your use case
  1. 自动压缩会自动进行 - 你无需手动触发
  2. 选择性读取文件 - 使用缩进模式或范围读取
  3. 监控上下文使用情况 - 注意大型工具输出
  4. 将重要信息持久化到记忆中 - 用于长期保存
  5. 增量处理 - 分步骤处理大型任务
  6. 按需配置 - 根据使用场景调整
    CompactionConfig

When to Suggest Clearing Context

何时建议清除上下文

If you notice:
  • Conversation is very long (>50 messages)
  • You're starting a completely different task
  • Context from earlier sessions is no longer relevant
Then suggest:
"会话历史较长。如果开始新的无关任务,建议清除历史或创建新会话以减少上下文。"
This is proactive context management!
当你注意到以下情况时:
  • 对话非常长(超过50条消息)
  • 你要开始完全不同的任务
  • 早期会话的上下文已不再相关
此时建议:
"会话历史较长。如果开始新的无关任务,建议清除历史或创建新会话以减少上下文。"
这是主动式的上下文管理!