multi-agent-patterns

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Multi-Agent Architecture Patterns for Claude Code

适用于Claude Code的多Agent架构模式

Multi-agent architectures distribute work across multiple agent invocations, each with its own focused context. When designed well, this distribution enables capabilities beyond single-agent limits. When designed poorly, it introduces coordination overhead that negates benefits. The critical insight is that sub-agents exist primarily to isolate context, not to anthropomorphize role division.

多Agent架构将工作分配到多个Agent调用实例中，每个实例都有其专注的上下文。设计良好的架构能够实现超越单Agent的能力；设计不佳则会引入协调开销，抵消其优势。核心要点在于：子Agent的主要作用是隔离上下文，而非拟人化地划分角色。

Core Concepts

核心概念

Multi-agent systems address single-agent context limitations through distribution. Three dominant patterns exist: supervisor/orchestrator for centralized control, peer-to-peer/swarm for flexible handoffs, and hierarchical for layered abstraction. The critical design principle is context isolation—sub-agents exist primarily to partition context rather than to simulate organizational roles.

Effective multi-agent systems require explicit coordination protocols, consensus mechanisms that avoid sycophancy, and careful attention to failure modes including bottlenecks, divergence, and error propagation.

多Agent系统通过分布式方式解决单Agent的上下文限制问题。目前存在三种主流模式：用于集中控制的监督者（Supervisor）/编排者（Orchestrator）模式、用于灵活交接的对等/集群（Peer-to-Peer/Swarm）模式，以及用于分层抽象的层级（Hierarchical）模式。关键设计原则是上下文隔离——子Agent的主要作用是划分上下文，而非模拟组织角色。

有效的多Agent系统需要明确的协调协议、避免盲从的共识机制，以及对瓶颈、分歧和错误传播等故障模式的密切关注。

Why Multi-Agent Architectures

为何选择多Agent架构

The Context Bottleneck

上下文瓶颈

Single agents face inherent ceilings in reasoning capability, context management, and tool coordination. As tasks grow more complex, context windows fill with accumulated history, retrieved documents, and tool outputs. Performance degrades according to predictable patterns: the lost-in-middle effect, attention scarcity, and context poisoning.

Multi-agent architectures address these limitations by partitioning work across multiple context windows. Each agent operates in a clean context focused on its subtask. Results aggregate at a coordination layer without any single context bearing the full burden.

单Agent在推理能力、上下文管理和工具协调方面存在固有上限。随着任务复杂度提升，上下文窗口会被累积的历史记录、检索到的文档和工具输出填满。性能会按照可预测的模式下降：中间信息丢失效应、注意力稀缺和上下文污染。

多Agent架构通过将工作分配到多个上下文窗口来解决这些限制。每个Agent在专注于其子任务的干净上下文中运行。结果在协调层聚合，没有任何单个上下文需要承担全部负担。

The Parallelization Argument

并行化优势

Many tasks contain parallelizable subtasks that a single agent must execute sequentially. A research task might require searching multiple independent sources, analyzing different documents, or comparing competing approaches. A single agent processes these sequentially, accumulating context with each step.

Multi-agent architectures assign each subtask to a dedicated agent with a fresh context. All agents work simultaneously, then return results to a coordinator. The total real-world time approaches the duration of the longest subtask rather than the sum of all subtasks.

许多任务包含可并行处理的子任务，而单Agent必须按顺序执行。例如研究任务可能需要搜索多个独立来源、分析不同文档或对比多种方案。单Agent会按顺序处理这些任务，每一步都会累积上下文。

多Agent架构将每个子任务分配给拥有全新上下文的专用Agent。所有Agent同时工作，然后将结果返回给协调者。实际总耗时接近最长子任务的耗时，而非所有子任务耗时的总和。

The Specialization Argument

专业化优势

Different tasks benefit from different agent configurations: different system prompts, different tool sets, different context structures. A general-purpose agent must carry all possible configurations in context. Specialized agents carry only what they need.

Multi-agent architectures enable specialization without combinatorial explosion. The coordinator routes to specialized agents; each agent operates with lean context optimized for its domain.

不同任务受益于不同的Agent配置：不同的系统提示词、不同的工具集、不同的上下文结构。通用Agent必须在上下文中携带所有可能的配置，而专业化Agent只需携带其所需的内容。

多Agent架构无需组合爆炸即可实现专业化。协调者将任务路由到专业化Agent；每个Agent在针对其领域优化的精简上下文中运行。

Architectural Patterns

架构模式

Pattern 1: Supervisor/Orchestrator

模式1：监督者/编排者

The supervisor pattern places a central agent in control, delegating to specialists and synthesizing results. The supervisor maintains global state and trajectory, decomposes user objectives into subtasks, and routes to appropriate workers.

User Request -> Supervisor -> [Specialist A, Specialist B, Specialist C] -> Aggregation -> Final Output

When to use: Complex tasks with clear decomposition, tasks requiring coordination across domains, tasks where human oversight is important.

Advantages: Strict control over workflow, easier to implement human-in-the-loop interventions, ensures adherence to predefined plans.

Disadvantages: Supervisor context becomes bottleneck, supervisor failures cascade to all workers, "telephone game" problem where supervisors paraphrase sub-agent responses incorrectly.

Claude Code Implementation: Create a main command that orchestrates by calling specialized subagents using the Task tool. The supervisor command contains the coordination logic and calls subagents for specialized work.

markdown

<!-- Example supervisor command structure -->
1. Analyze the user request and decompose into subtasks
2. For each subtask, dispatch to appropriate specialist:
   - Use Task tool to spawn subagent with focused context
   - Pass only relevant context to each subagent
3. Collect and synthesize results from all subagents
4. Return unified response to user

The Telephone Game Problem: Supervisor architectures can perform worse when supervisors paraphrase sub-agent responses incorrectly, losing fidelity. The fix: allow sub-agents to pass responses directly when synthesis would lose important details. In Claude Code, this means letting subagents write directly to shared files or return their output verbatim rather than having the supervisor rewrite everything.

监督者模式由一个中央Agent控制，将任务委派给专业Agent并综合结果。监督者维护全局状态和任务轨迹，将用户目标分解为子任务，并路由到合适的执行Agent。

User Request -> Supervisor -> [Specialist A, Specialist B, Specialist C] -> Aggregation -> Final Output

适用场景： 可清晰分解的复杂任务、需要跨领域协调的任务、需要人工监督的任务。

优势： 对工作流严格控制，易于实现人工介入，确保遵循预定义计划。

劣势： 监督者上下文会成为瓶颈，监督者故障会影响所有执行Agent，存在“传话游戏”问题——监督者可能错误转述子Agent的响应。

Claude Code实现： 创建一个主命令，通过Task工具调用专业子Agent来实现编排。监督者命令包含协调逻辑，并调用子Agent处理专业工作。

markdown

<!-- Example supervisor command structure -->
1. Analyze the user request and decompose into subtasks
2. For each subtask, dispatch to appropriate specialist:
   - Use Task tool to spawn subagent with focused context
   - Pass only relevant context to each subagent
3. Collect and synthesize results from all subagents
4. Return unified response to user

传话游戏问题： 当监督者错误转述子Agent的响应导致信息丢失时，监督者架构的性能会下降。解决方法：当综合会丢失重要细节时，允许子Agent直接传递响应。在Claude Code中，这意味着让子Agent直接写入共享文件或原样返回输出，而非由监督者重写所有内容。

Pattern 2: Peer-to-Peer/Swarm

模式2：对等/集群

The peer-to-peer pattern removes central control, allowing agents to communicate directly based on predefined protocols. Any agent can transfer control to any other through explicit handoff mechanisms.

When to use: Tasks requiring flexible exploration, tasks where rigid planning is counterproductive, tasks with emergent requirements that defy upfront decomposition.

Advantages: No single point of failure, scales effectively for breadth-first exploration, enables emergent problem-solving behaviors.

Disadvantages: Coordination complexity increases with agent count, risk of divergence without central state keeper, requires robust convergence constraints.

Claude Code Implementation: Create commands that can invoke other commands based on discovered needs. Use shared files (like task lists or state files) as the coordination mechanism.

markdown

<!-- Example peer handoff structure -->
1. Analyze current state from shared context file
2. Determine if this agent can complete the task
3. If specialized help needed:
   - Write current findings to shared state
   - Invoke appropriate peer command/skill
4. Continue until task complete or hand off

对等模式移除中央控制，允许Agent根据预定义协议直接通信。任何Agent都可以通过显式交接机制将控制权转移给其他Agent。

适用场景： 需要灵活探索的任务、刚性规划适得其反的任务、需求随情况变化难以预先分解的任务。

优势： 无单点故障，可有效扩展以进行广度优先探索，支持涌现式问题解决行为。

劣势： 协调复杂度随Agent数量增加而上升，若无中央状态管理器则存在分歧风险，需要强大的收敛约束。

Claude Code实现： 创建可根据发现的需求调用其他命令的命令。使用共享文件（如任务列表或状态文件）作为协调机制。

markdown

<!-- Example peer handoff structure -->
1. Analyze current state from shared context file
2. Determine if this agent can complete the task
3. If specialized help needed:
   - Write current findings to shared state
   - Invoke appropriate peer command/skill
4. Continue until task complete or hand off

Pattern 3: Hierarchical

模式3：层级结构

Hierarchical structures organize agents into layers of abstraction: strategic, planning, and execution layers. Strategy layer agents define goals and constraints; planning layer agents break goals into actionable plans; execution layer agents perform atomic tasks.

Strategy Layer (Goal Definition) -> Planning Layer (Task Decomposition) -> Execution Layer (Atomic Tasks)

When to use: Large-scale projects with clear hierarchical structure, enterprise workflows with management layers, tasks requiring both high-level planning and detailed execution.

Advantages: Mirrors organizational structures, clear separation of concerns, enables different context structures at different levels.

Disadvantages: Coordination overhead between layers, potential for misalignment between strategy and execution, complex error propagation.

Claude Code Implementation: Structure your plugin with commands at different abstraction levels. High-level commands focus on strategy and call mid-level planning commands, which in turn call atomic execution commands.

层级结构将Agent组织为抽象层：战略层、规划层和执行层。战略层Agent定义目标和约束；规划层Agent将目标分解为可执行计划；执行层Agent执行原子任务。

Strategy Layer (Goal Definition) -> Planning Layer (Task Decomposition) -> Execution Layer (Atomic Tasks)

适用场景： 具有清晰层级结构的大型项目、具有管理层的企业工作流、同时需要高层规划和详细执行的任务。

优势： 镜像组织结构，关注点清晰分离，不同层级可使用不同的上下文结构。

劣势： 层间协调开销大，战略与执行可能出现不一致，错误传播复杂。

Claude Code实现： 按不同抽象级别构建插件命令。高级命令专注于战略，并调用中级规划命令，中级命令再调用原子执行命令。

Context Isolation as Design Principle

以上下文隔离为设计原则

The primary purpose of multi-agent architectures is context isolation. Each sub-agent operates in a clean context window focused on its subtask without carrying accumulated context from other subtasks.

多Agent架构的主要目的是上下文隔离。每个子Agent在专注于其子任务的干净上下文窗口中运行，无需携带其他子任务的累积上下文。

Isolation Mechanisms

隔离机制

Instruction passing: For simple, well-defined subtasks, the coordinator creates focused instructions. The sub-agent receives only the instructions needed for its specific task. In Claude Code, this means passing minimal, targeted prompts to subagents via the Task tool.

File system memory: For complex tasks requiring shared state, agents read and write to persistent storage. The file system serves as the coordination mechanism, avoiding context bloat from shared state passing. This is the most natural pattern for Claude Code—agents communicate through markdown files, JSON state files, or structured documents.

Full context delegation: For complex tasks where the sub-agent needs complete understanding, the coordinator shares its entire context. The sub-agent has its own tools and instructions but receives full context for its decisions. Use sparingly as it defeats the purpose of context isolation.

指令传递： 对于简单、定义明确的子任务，协调者创建专注的指令。子Agent仅接收其特定任务所需的指令。在Claude Code中，这意味着通过Task工具向子Agent传递最小化、针对性的提示词。

文件系统内存： 对于需要共享状态的复杂任务，Agent读写持久化存储。文件系统作为协调机制，避免因传递共享状态导致上下文膨胀。这是Claude Code最自然的模式——Agent通过markdown文件、JSON状态文件或结构化文档进行通信。

全上下文委托： 对于子Agent需要完整理解的复杂任务，协调者共享其全部上下文。子Agent拥有自己的工具和指令，但接收完整上下文用于决策。需谨慎使用，因为这违背了上下文隔离的目的。

Isolation Trade-offs

隔离权衡

Full context delegation provides maximum capability but defeats the purpose of sub-agents. Instruction passing maintains isolation but limits sub-agent flexibility. File system memory enables shared state without context passing but introduces consistency challenges.

The right choice depends on task complexity, coordination needs, and the nature of the work.

全上下文委托提供最大能力，但违背了子Agent的存在意义。指令传递保持隔离，但限制了子Agent的灵活性。文件系统内存支持共享状态而无需传递上下文，但引入了一致性挑战。

选择哪种机制取决于任务复杂度、协调需求和工作性质。

Consensus and Coordination

共识与协调

The Voting Problem

投票问题

Simple majority voting treats hallucinations from weak reasoning as equal to sound reasoning. Without intervention, multi-agent discussions can devolve into consensus on false premises due to inherent bias toward agreement.

简单多数投票将弱推理产生的幻觉与合理推理同等对待。若无干预，多Agent讨论可能因天生的趋同偏差而达成基于错误前提的共识。

Weighted Contributions

加权贡献

Weight agent contributions by confidence or expertise. Agents with higher confidence or domain expertise carry more weight in final decisions.

根据置信度或专业知识对Agent的贡献加权。置信度更高或领域专业知识更丰富的Agent在最终决策中拥有更大权重。

Debate Protocols

辩论协议

Debate protocols require agents to critique each other's outputs over multiple rounds. Adversarial critique often yields higher accuracy on complex reasoning than collaborative consensus.

Claude Code Implementation: Create a review stage where one agent critiques another's output. Structure this as separate commands: one for initial work, one for critique, and optionally one for revision based on critique.

辩论协议要求Agent在多轮讨论中互相批判对方的输出。对抗性批判通常比协作共识在复杂推理上产生更高的准确性。

Claude Code实现： 创建一个评审阶段，让一个Agent批判另一个Agent的输出。将其构建为独立命令：一个用于初始工作，一个用于批判，可选一个用于根据批判进行修订。

Trigger-Based Intervention

触发式干预

Monitor multi-agent interactions for specific behavioral markers:

Stall triggers: Activate when discussions make no progress
Sycophancy triggers: Detect when agents mimic each other's answers without unique reasoning
Divergence triggers: Detect when agents are moving away from the original objective

监控多Agent交互中的特定行为标记：

停滞触发： 当讨论无进展时激活
盲从触发： 检测Agent是否在无独特推理的情况下模仿他人答案
分歧触发： 检测Agent是否偏离原始目标

Failure Modes and Mitigations

故障模式与缓解措施

Failure: Supervisor Bottleneck

故障：监督者瓶颈

The supervisor accumulates context from all workers, becoming susceptible to saturation and degradation.

Mitigation: Implement output constraints so workers return only distilled summaries. Use file-based checkpointing to persist state without carrying full history in context.

监督者累积所有执行Agent的上下文，容易出现饱和和性能下降。

缓解措施： 实现输出约束，让执行Agent仅返回提炼后的摘要。使用基于文件的检查点来持久化状态，无需在上下文中携带完整历史。

Failure: Coordination Overhead

故障：协调开销

Agent communication consumes tokens and introduces latency. Complex coordination can negate parallelization benefits.

Mitigation: Minimize communication through clear handoff protocols. Use structured file formats for inter-agent communication. Batch results where possible.

Agent通信消耗令牌并引入延迟。复杂协调可能抵消并行化优势。

缓解措施： 通过清晰的交接协议最小化通信。使用结构化文件格式进行Agent间通信。尽可能批量处理结果。

Failure: Divergence

故障：分歧

Agents pursuing different goals without central coordination can drift from intended objectives.

Mitigation: Define clear objective boundaries for each agent. Implement convergence checks that verify progress toward shared goals. Use iteration limits on agent execution.

无中央协调的Agent追求不同目标时，可能偏离预期目标。

缓解措施： 为每个Agent定义清晰的目标边界。实现收敛检查，验证是否朝着共享目标进展。设置Agent执行的迭代限制。

Failure: Error Propagation

故障：错误传播

Errors in one agent's output propagate to downstream agents that consume that output.

Mitigation: Validate agent outputs before passing to consumers. Implement retry logic. Design for graceful degradation when components fail.

一个Agent输出中的错误会传播到使用该输出的下游Agent。

缓解措施： 在传递给消费者之前验证Agent输出。实现重试逻辑。设计组件故障时的优雅降级机制。

Applying Patterns in Claude Code

在Claude Code中应用模式

Command as Supervisor

命令作为监督者

Create a main command that:

Analyzes the task and creates a plan
Dispatches subagents via Task tool for specialized work
Collects results (via return values or shared files)
Synthesizes final output

创建一个主命令：

分析任务并制定计划
通过Task工具派遣子Agent处理专业工作
收集结果（通过返回值或共享文件）
综合最终输出

Subagents as Specialists

子Agent作为专业角色

Define Subagents for specialized domains:

Each Subagents focuses on one area of expertise
Subagents receive focused context relevant to their specialty
Subagents return structured outputs that coordinators can aggregate

为专业领域定义子Agent：

每个子Agent专注于一个专业领域
子Agent接收与其专业相关的专注上下文
子Agent返回协调者可聚合的结构化输出

Files as Shared Memory

文件作为共享内存

Use the file system for inter-agent coordination:

State files track progress across agents
Output files collect results from parallel work
Task lists coordinate remaining work

使用文件系统进行Agent间协调：

状态文件跟踪跨Agent的进展
输出文件收集并行工作的结果
任务列表协调剩余工作

Example: Code Review Multi-Agent

示例：代码评审多Agent

Supervisor Command: review-code
├── Subagent: security-review (security specialist)
├── Subagent: performance-review (performance specialist)
├── Subagent: style-review (style/conventions specialist)
└── Aggregation: combine findings, deduplicate, prioritize

Each subagent receives only the code to review and their specialty focus. The supervisor aggregates all findings into a unified review.

Supervisor Command: review-code
├── Subagent: security-review (security specialist)
├── Subagent: performance-review (performance specialist)
├── Subagent: style-review (style/conventions specialist)
└── Aggregation: combine findings, deduplicate, prioritize

每个子Agent仅接收待评审的代码和其专业关注点。监督者将所有发现聚合为统一的评审结果。

Guidelines

指导原则

Design for context isolation as the primary benefit of multi-agent systems
Choose architecture pattern based on coordination needs, not organizational metaphor
Use file-based communication as the default for Claude Code multi-agent patterns
Implement explicit handoff protocols with clear state passing
Use critique/debate patterns for consensus rather than simple agreement
Monitor for supervisor bottlenecks and implement checkpointing via files
Validate outputs before passing between agents
Set iteration limits to prevent infinite loops
Test failure scenarios explicitly
Start simple—add multi-agent complexity only when single-agent approaches fail

将上下文隔离作为多Agent系统的主要设计优势
根据协调需求选择架构模式，而非组织隐喻
在Claude Code多Agent模式中默认使用基于文件的通信
实现具有清晰状态传递的显式交接协议
使用批判/辩论模式达成共识，而非简单同意
监控监督者瓶颈，并通过文件实现检查点
在Agent间传递前验证输出
设置迭代限制以防止无限循环
显式测试故障场景
从简单开始——仅当单Agent方法失败时才增加多Agent复杂度

Memory and State Management

内存与状态管理

For tasks spanning multiple sessions or requiring persistent state, use file-based memory:

对于跨多个会话或需要持久状态的任务，使用基于文件的内存：

Working Memory

工作内存

The context window itself. Provides immediate access but vanishes when sessions end. Keep only active information; summarize completed work.

即上下文窗口本身。提供即时访问，但会话结束后消失。仅保留活跃信息；总结已完成的工作。

Session Memory

会话内存

Files created during a session that track progress:

Task lists (what's done, what remains)
Intermediate results
Decision logs

会话期间创建的跟踪进展的文件：

任务列表（已完成项、待完成项）
中间结果
决策日志

Long-Term Memory

长期内存

Persistent files that survive across sessions:

CLAUDE.md for project-level context
Memory files in designated directories
Structured knowledge bases in markdown or JSON

跨会话持久存在的文件：

CLAUDE.md用于项目级上下文
指定目录中的内存文件
markdown或JSON格式的结构化知识库

Memory Patterns for Multi-Agent

多Agent内存模式

Handoff files: Agent A writes state, Agent B reads and continues
Result aggregation: Multiple agents write to separate files, supervisor reads all
Progress tracking: Shared task list updated by all agents
Knowledge accumulation: Agents append findings to shared knowledge files

Choose the simplest memory mechanism that meets your needs. File-based memory is transparent, debuggable, and requires no infrastructure.

交接文件： Agent A写入状态，Agent B读取并继续
结果聚合： 多个Agent写入不同文件，监督者读取所有文件
进展跟踪： 所有Agent更新共享任务列表
知识积累： Agent将发现追加到共享知识文件

选择满足需求的最简单内存机制。基于文件的内存透明、可调试且无需额外基础设施。

Memory System Design

内存系统设计

Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.

内存提供持久层，允许Agent跨会话保持连续性，并基于累积知识进行推理。简单Agent完全依赖上下文作为内存，会话结束后所有状态丢失。复杂Agent实现分层内存架构，平衡即时上下文需求与长期知识保留。从向量存储到知识图谱再到时序知识图谱的演进，代表了为改进检索和推理而对结构化内存的投入不断增加。

Core Concepts

核心概念

Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context.

Simple vector stores lack relationship and temporal structure. Knowledge graphs preserve relationships for reasoning. Temporal knowledge graphs add validity periods for time-aware queries. Implementation choices depend on query complexity, infrastructure constraints, and accuracy requirements.

内存存在于从即时上下文到永久存储的范围内。一端是上下文窗口中的工作内存，提供零延迟访问，但会话结束后消失；另一端是永久存储，无限期保留，但需要检索才能进入上下文。

简单向量存储缺乏关系和时序结构。知识图谱保留关系以支持推理。时序知识图谱为时间感知查询添加有效期。实现选择取决于查询复杂度、基础设施约束和准确性要求。

Detailed Topics

详细主题

Memory Architecture Fundamentals

内存架构基础

The Context-Memory Spectrum Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context. Effective architectures use multiple layers along this spectrum.

The spectrum includes working memory (context window, zero latency, volatile), short-term memory (session-persistent, searchable, volatile), long-term memory (cross-session persistent, structured, semi-permanent), and permanent memory (archival, queryable, permanent). Each layer has different latency, capacity, and persistence characteristics.

Why Simple Vector Stores Fall Short Vector RAG provides semantic retrieval by embedding queries and documents in a shared embedding space. Similarity search retrieves the most semantically similar documents. This works well for document retrieval but lacks structure for agent memory.

Vector stores lose relationship information. If an agent learns that "Customer X purchased Product Y on Date Z," a vector store can retrieve this fact if asked directly. But it cannot answer "What products did customers who purchased Product Y also buy?" because relationship structure is not preserved.

Vector stores also struggle with temporal validity. Facts change over time, but vector stores provide no mechanism to distinguish "current fact" from "outdated fact" except through explicit metadata and filtering.

The Move to Graph-Based Memory Knowledge graphs preserve relationships between entities. Instead of isolated document chunks, graphs encode that Entity A has Relationship R to Entity B. This enables queries that traverse relationships rather than just similarity.

Temporal knowledge graphs add validity periods to facts. Each fact has a "valid from" and optionally "valid until" timestamp. This enables time-travel queries that reconstruct knowledge at specific points in time.

Benchmark Performance Comparison The Deep Memory Retrieval (DMR) benchmark provides concrete performance data across memory architectures:

Memory System	DMR Accuracy	Retrieval Latency	Notes
Zep (Temporal KG)	94.8%	2.58s	Best accuracy, fast retrieval
MemGPT	93.4%	Variable	Good general performance
GraphRAG	~75-85%	Variable	20-35% gains over baseline RAG
Vector RAG	~60-70%	Fast	Loses relationship structure
Recursive Summarization	35.3%	Low	Severe information loss

Zep demonstrated 90% reduction in retrieval latency compared to full-context baselines (2.58s vs 28.9s for GPT-5.2). This efficiency comes from retrieving only relevant subgraphs rather than entire context history.

GraphRAG achieves approximately 20-35% accuracy gains over baseline RAG in complex reasoning tasks and reduces hallucination by up to 30% through community-based summarization.

上下文-内存范围 内存存在于从即时上下文到永久存储的范围内。一端是上下文窗口中的工作内存，提供零延迟访问，但会话结束后消失；另一端是永久存储，无限期保留，但需要检索才能进入上下文。有效的架构使用该范围内的多个层级。

该范围包括工作内存（上下文窗口，零延迟，易失性）、短期内存（会话持久，可搜索，易失性）、长期内存（跨会话持久，结构化，半永久）和永久内存（归档，可查询，永久）。每个层级具有不同的延迟、容量和持久性特征。

为何简单向量存储不足 向量RAG通过将查询和文档嵌入共享嵌入空间提供语义检索。相似性搜索检索语义最相似的文档。这在文档检索中效果良好，但缺乏适合Agent内存的结构。

向量存储丢失关系信息。如果Agent得知“客户X在日期Z购买了产品Y”，向量存储在直接询问时可以检索到这个事实，但无法回答“购买产品Y的客户还购买了哪些产品？”，因为关系结构未被保留。

向量存储还难以处理时序有效性。事实会随时间变化，但向量存储除了通过显式元数据和过滤外，没有区分“当前事实”与“过时事实”的机制。

向基于图的内存演进 知识图谱保留实体之间的关系。与孤立的文档块不同，图编码实体A与实体B具有关系R。这支持遍历关系而非仅相似性的查询。

时序知识图谱为事实添加有效期。每个事实具有“生效时间”和可选的“失效时间”戳。这支持时间旅行查询，可重建特定时间点的知识。

基准性能比较 Deep Memory Retrieval（DMR）基准提供了跨内存架构的具体性能数据：

Memory System	DMR Accuracy	Retrieval Latency	Notes
Zep (Temporal KG)	94.8%	2.58s	准确性最佳，检索速度快
MemGPT	93.4%	Variable	通用性能良好
GraphRAG	~75-85%	Variable	比基线RAG提升20-35%
Vector RAG	~60-70%	Fast	丢失关系结构
Recursive Summarization	35.3%	Low	信息丢失严重

与全上下文基线相比，Zep的检索延迟降低了90%（2.58s vs GPT-5.2的28.9s）。这种效率来自于仅检索相关子图，而非整个上下文历史。

在复杂推理任务中，GraphRAG比基线RAG实现了约20-35%的准确性提升，并通过基于社区的总结将幻觉减少了多达30%。

Memory Layer Architecture

内存层级架构

Layer 1: Working Memory Working memory is the context window itself. It provides immediate access to information currently being processed but has limited capacity and vanishes when sessions end.

Working memory usage patterns include scratchpad calculations where agents track intermediate results, conversation history that preserves dialogue for current task, current task state that tracks progress on active objectives, and active retrieved documents that hold information currently being used.

Optimize working memory by keeping only active information, summarizing completed work before it falls out of attention, and using attention-favored positions for critical information.

Layer 2: Short-Term Memory Short-term memory persists across the current session but not across sessions. It provides search and retrieval capabilities without the latency of permanent storage.

Common implementations include session-scoped databases that persist until session end, file-system storage in designated session directories, and in-memory caches keyed by session ID.

Short-term memory use cases include tracking conversation state across turns without stuffing context, storing intermediate results from tool calls that may be needed later, maintaining task checklists and progress tracking, and caching retrieved information within sessions.

Layer 3: Long-Term Memory Long-term memory persists across sessions indefinitely. It enables agents to learn from past interactions and build knowledge over time.

Long-term memory implementations range from simple key-value stores to sophisticated graph databases. The choice depends on complexity of relationships to model, query patterns required, and acceptable infrastructure complexity.

Long-term memory use cases include learning user preferences across sessions, building domain knowledge bases that grow over time, maintaining entity registries with relationship history, and storing successful patterns that can be reused.

Layer 4: Entity Memory Entity memory specifically tracks information about entities (people, places, concepts, objects) to maintain consistency. This creates a rudimentary knowledge graph where entities are recognized across multiple interactions.

Entity memory maintains entity identity by tracking that "John Doe" mentioned in one conversation is the same person in another. It maintains entity properties by storing facts discovered about entities over time. It maintains entity relationships by tracking relationships between entities as they are discovered.

Layer 5: Temporal Knowledge Graphs Temporal knowledge graphs extend entity memory with explicit validity periods. Facts are not just true or false but true during specific time ranges.

This enables queries like "What was the user's address on Date X?" by retrieving facts valid during that date range. It prevents context clash when outdated information contradicts new data. It enables temporal reasoning about how entities changed over time.

层级1：工作内存 工作内存即上下文窗口本身。它提供对当前正在处理的信息的即时访问，但容量有限且会话结束后消失。

工作内存使用模式包括：Agent跟踪中间结果的草稿计算、保留当前任务对话的会话历史、跟踪活动目标进展的当前任务状态，以及保存当前正在使用的已检索文档。

通过仅保留活跃信息、在已完成工作脱离注意力前进行总结、将关键信息放在注意力优先位置来优化工作内存。

层级2：短期内存 短期内存在当前会话中持久存在，但不跨会话。它提供搜索和检索能力，且无永久存储的延迟。

常见实现包括会话范围的数据库（会话结束前持久）、指定会话目录中的文件系统存储，以及按会话ID键控的内存缓存。

短期内存用例包括：跟踪多轮对话状态而不填充上下文、存储工具调用的中间结果（可能稍后需要）、维护任务清单和进展跟踪、在会话内缓存检索到的信息。

层级3：长期内存 长期内存无限期跨会话持久存在。它使Agent能够从过去的交互中学习，并随时间积累知识。

长期内存实现从简单的键值存储到复杂的图数据库不等。选择取决于要建模的关系复杂度、所需的查询模式和可接受的基础设施复杂度。

长期内存用例包括：跨会话学习用户偏好、构建随时间增长的领域知识库、维护带有关系历史的实体注册表、存储可重用的成功模式。

层级4：实体内存 实体内存专门跟踪实体（人、地点、概念、对象）的信息以保持一致性。这创建了一个基本的知识图谱，其中实体在多个交互中被识别。

实体内存通过跟踪“John Doe”在一次对话中提到的人与另一次对话中的是同一人来维护实体身份。它通过存储随时间发现的实体事实来维护实体属性。它通过跟踪发现的实体间关系来维护实体关系。

层级5：时序知识图谱 时序知识图谱为实体内存添加显式有效期。事实不仅是真或假，而是在特定时间范围内为真。

这支持诸如“用户在日期X的地址是什么？”的查询，通过检索该日期范围内有效的事实来实现。它防止过时信息与新数据冲突导致的上下文冲突。它支持关于实体随时间如何变化的时序推理。

Memory Implementation Patterns

内存实现模式

Pattern 1: File-System-as-Memory The file system itself can serve as a memory layer. This pattern is simple, requires no additional infrastructure, and enables the same just-in-time loading that makes file-system-based context effective.

Implementation uses the file system hierarchy for organization. Use naming conventions that convey meaning. Store facts in structured formats (JSON, YAML). Use timestamps in filenames or metadata for temporal tracking.

Advantages: Simplicity, transparency, portability. Disadvantages: No semantic search, no relationship tracking, manual organization required.

Pattern 2: Vector RAG with Metadata Vector stores enhanced with rich metadata provide semantic search with filtering capabilities.

Implementation embeds facts or documents and stores with metadata including entity tags, temporal validity, source attribution, and confidence scores. Query includes metadata filters alongside semantic search.

Pattern 3: Knowledge Graph Knowledge graphs explicitly model entities and relationships. Implementation defines entity types and relationship types, uses graph database or property graph storage, and maintains indexes for common query patterns.

Pattern 4: Temporal Knowledge Graph Temporal knowledge graphs add validity periods to facts, enabling time-travel queries and preventing context clash from outdated information.

模式1：文件系统作为内存 文件系统本身可作为内存层。此模式简单，无需额外基础设施，并支持使基于文件系统的上下文有效的即时加载。

实现使用文件系统层次结构进行组织。使用传达含义的命名约定。以结构化格式（JSON、YAML）存储事实。在文件名或元数据中使用时间戳进行时序跟踪。

优势：简单、透明、可移植。劣势：无语义搜索、无关系跟踪、需要手动组织。

模式2：带元数据的向量RAG 增强了丰富元数据的向量存储提供带过滤功能的语义检索。

实现将事实或文档嵌入，并存储带有实体标签、时序有效性、来源归属和置信度分数的元数据。查询包含元数据过滤和语义搜索。

模式3：知识图谱 知识图谱显式建模实体和关系。实现定义实体类型和关系类型，使用图数据库或属性图存储，并为常见查询模式维护索引。

模式4：时序知识图谱 时序知识图谱为事实添加有效期，支持时间旅行查询并防止过时信息导致的上下文冲突。

Memory Retrieval Patterns

内存检索模式

Semantic Retrieval Retrieve memories semantically similar to current query using embedding similarity search.

Entity-Based Retrieval Retrieve all memories related to specific entities by traversing graph relationships.

Temporal Retrieval Retrieve memories valid at specific time or within time range using validity period filters.

语义检索 使用嵌入相似性搜索检索与当前查询语义相似的内存。

基于实体的检索 通过遍历图关系检索与特定实体相关的所有内存。

时序检索 使用有效期过滤器检索特定时间或时间范围内有效的内存。

Memory Consolidation

内存整合

Memories accumulate over time and require consolidation to prevent unbounded growth and remove outdated information.

Consolidation Triggers Trigger consolidation after significant memory accumulation, when retrieval returns too many outdated results, periodically on a schedule, or when explicit consolidation is requested.

Consolidation Process Identify outdated facts, merge related facts, update validity periods, archive or delete obsolete facts, and rebuild indexes.

内存随时间累积，需要整合以防止无限增长并移除过时信息。

整合触发条件 在内存显著累积后、检索返回过多过时结果时、定期按计划，或在明确请求整合时触发整合。

整合过程 识别过时事实、合并相关事实、更新有效期、归档或删除过时事实，并重建索引。