memory-systems

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Memory System Design

记忆系统设计

Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.

记忆为Agent提供了持久化层，使其能够跨会话保持连续性，并基于积累的知识进行推理。简单的Agent完全依赖上下文实现记忆，会话结束时所有状态都会丢失。复杂的Agent则采用分层记忆架构，在即时上下文需求与长期知识留存之间取得平衡。从向量存储到知识图谱再到时序知识图谱的演进，代表着为提升检索和推理能力而在结构化记忆上的投入不断增加。

When to Activate

激活场景

Activate this skill when:

Building agents that must persist across sessions
Needing to maintain entity consistency across conversations
Implementing reasoning over accumulated knowledge
Designing systems that learn from past interactions
Creating knowledge bases that grow over time
Building temporal-aware systems that track state changes

在以下场景中激活本技能：

构建必须跨会话持久化的Agent
需要在对话中保持实体一致性
基于积累的知识实现推理功能
设计可从过往交互中学习的系统
创建随时间不断扩展的知识库
构建可跟踪状态变化的时序感知系统

Core Concepts

核心概念

Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context.

Simple vector stores lack relationship and temporal structure. Knowledge graphs preserve relationships for reasoning. Temporal knowledge graphs add validity periods for time-aware queries. Implementation choices depend on query complexity, infrastructure constraints, and accuracy requirements.

记忆涵盖了从即时上下文到永久存储的范围。在一个极端，上下文窗口中的工作内存提供零延迟访问，但会话结束后就会消失。在另一个极端，永久存储可以无限期留存，但需要检索才能进入上下文。

简单的向量存储缺乏关系和时序结构。知识图谱保留了用于推理的关系。时序知识图谱则为时序感知查询添加了有效期。实现方案的选择取决于查询复杂度、基础设施限制和准确性要求。

Detailed Topics

详细主题

Memory Architecture Fundamentals

记忆架构基础

The Context-Memory Spectrum Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context. Effective architectures use multiple layers along this spectrum.

The spectrum includes working memory (context window, zero latency, volatile), short-term memory (session-persistent, searchable, volatile), long-term memory (cross-session persistent, structured, semi-permanent), and permanent memory (archival, queryable, permanent). Each layer has different latency, capacity, and persistence characteristics.

Why Simple Vector Stores Fall Short Vector RAG provides semantic retrieval by embedding queries and documents in a shared embedding space. Similarity search retrieves the most semantically similar documents. This works well for document retrieval but lacks structure for agent memory.

Vector stores lose relationship information. If an agent learns that "Customer X purchased Product Y on Date Z," a vector store can retrieve this fact if asked directly. But it cannot answer "What products did customers who purchased Product Y also buy?" because relationship structure is not preserved.

Vector stores also struggle with temporal validity. Facts change over time, but vector stores provide no mechanism to distinguish "current fact" from "outdated fact" except through explicit metadata and filtering.

The Move to Graph-Based Memory Knowledge graphs preserve relationships between entities. Instead of isolated document chunks, graphs encode that Entity A has Relationship R to Entity B. This enables queries that traverse relationships rather than just similarity.

Temporal knowledge graphs add validity periods to facts. Each fact has a "valid from" and optionally "valid until" timestamp. This enables time-travel queries that reconstruct knowledge at specific points in time.

Benchmark Performance Comparison The Deep Memory Retrieval (DMR) benchmark provides concrete performance data across memory architectures:

Memory System	DMR Accuracy	Retrieval Latency	Notes
Zep (Temporal KG)	94.8%	2.58s	Best accuracy, fast retrieval
MemGPT	93.4%	Variable	Good general performance
GraphRAG	~75-85%	Variable	20-35% gains over baseline RAG
Vector RAG	~60-70%	Fast	Loses relationship structure
Recursive Summarization	35.3%	Low	Severe information loss

Zep demonstrated 90% reduction in retrieval latency compared to full-context baselines (2.58s vs 28.9s for GPT-5.2). This efficiency comes from retrieving only relevant subgraphs rather than entire context history.

GraphRAG achieves approximately 20-35% accuracy gains over baseline RAG in complex reasoning tasks and reduces hallucination by up to 30% through community-based summarization.

上下文-记忆范围 记忆涵盖了从即时上下文到永久存储的范围。在一个极端，上下文窗口中的工作内存提供零延迟访问，但会话结束后就会消失。在另一个极端，永久存储可以无限期留存，但需要检索才能进入上下文。有效的架构会在这个范围中使用多个层级。

该范围包括工作内存（上下文窗口、零延迟、易失性）、短期内存（会话持久化、可搜索、易失性）、长期内存（跨会话持久化、结构化、半永久）和永久内存（归档、可查询、永久）。每个层级都有不同的延迟、容量和持久化特性。

为什么简单向量存储存在不足 向量RAG通过将查询和文档嵌入共享嵌入空间来提供语义检索。相似度搜索会检索语义最相似的文档。这种方式在文档检索中表现良好，但缺乏Agent记忆所需的结构。

向量存储会丢失关系信息。如果Agent得知「客户X在日期Z购买了产品Y」，向量存储在被直接询问时可以检索到这个事实，但无法回答「购买了产品Y的客户还购买了哪些产品？」，因为关系结构没有被保留。

向量存储在时序有效性方面也存在问题。事实会随时间变化，但除了通过显式元数据和过滤外，向量存储没有机制区分「当前事实」和「过时事实」。

转向基于图谱的记忆 知识图谱保留了实体之间的关系。与孤立的文档片段不同，图谱会编码实体A与实体B之间存在关系R。这使得查询可以遍历关系，而不仅仅是基于相似度。

时序知识图谱为事实添加了有效期。每个事实都有一个「生效时间」和可选的「失效时间」戳记。这使得可以通过时间回溯查询重建特定时间点的知识。

基准性能对比 深度记忆检索（DMR）基准测试提供了不同记忆架构的具体性能数据：

记忆系统	DMR准确率	检索延迟	说明
Zep（时序知识图谱）	94.8%	2.58秒	准确率最高，检索速度快
MemGPT	93.4%	可变	综合性能良好
GraphRAG	~75-85%	可变	相比基准RAG提升20-35%
向量RAG	~60-70%	快	丢失关系结构
递归摘要	35.3%	低	信息损失严重

与全上下文基线相比，Zep的检索延迟降低了90%（2.58秒 vs GPT-5.2的28.9秒）。这种效率来自于仅检索相关子图，而非整个上下文历史。

GraphRAG在复杂推理任务中相比基准RAG实现了约20-35%的准确率提升，并通过基于社区的摘要将幻觉减少了多达30%。

Memory Layer Architecture

记忆层级架构

Layer 1: Working Memory Working memory is the context window itself. It provides immediate access to information currently being processed but has limited capacity and vanishes when sessions end.

Working memory usage patterns include scratchpad calculations where agents track intermediate results, conversation history that preserves dialogue for current task, current task state that tracks progress on active objectives, and active retrieved documents that hold information currently being used.

Optimize working memory by keeping only active information, summarizing completed work before it falls out of attention, and using attention-favored positions for critical information.

Layer 2: Short-Term Memory Short-term memory persists across the current session but not across sessions. It provides search and retrieval capabilities without the latency of permanent storage.

Common implementations include session-scoped databases that persist until session end, file-system storage in designated session directories, and in-memory caches keyed by session ID.

Short-term memory use cases include tracking conversation state across turns without stuffing context, storing intermediate results from tool calls that may be needed later, maintaining task checklists and progress tracking, and caching retrieved information within sessions.

Layer 3: Long-Term Memory Long-term memory persists across sessions indefinitely. It enables agents to learn from past interactions and build knowledge over time.

Long-term memory implementations range from simple key-value stores to sophisticated graph databases. The choice depends on complexity of relationships to model, query patterns required, and acceptable infrastructure complexity.

Long-term memory use cases include learning user preferences across sessions, building domain knowledge bases that grow over time, maintaining entity registries with relationship history, and storing successful patterns that can be reused.

Layer 4: Entity Memory Entity memory specifically tracks information about entities (people, places, concepts, objects) to maintain consistency. This creates a rudimentary knowledge graph where entities are recognized across multiple interactions.

Entity memory maintains entity identity by tracking that "John Doe" mentioned in one conversation is the same person in another. It maintains entity properties by storing facts discovered about entities over time. It maintains entity relationships by tracking relationships between entities as they are discovered.

Layer 5: Temporal Knowledge Graphs Temporal knowledge graphs extend entity memory with explicit validity periods. Facts are not just true or false but true during specific time ranges.

This enables queries like "What was the user's address on Date X?" by retrieving facts valid during that date range. It prevents context clash when outdated information contradicts new data. It enables temporal reasoning about how entities changed over time.

第一层：工作内存 工作内存就是上下文窗口本身。它提供对当前处理信息的即时访问，但容量有限，会话结束后会消失。

工作内存的使用场景包括：Agent跟踪中间结果的草稿计算、保留当前任务对话历史、跟踪活跃目标的当前任务状态，以及存储当前正在使用的已检索文档。

优化工作内存的方法包括：仅保留活跃信息、在已完成工作超出注意力范围前进行摘要、将关键信息放在注意力优先的位置。

第二层：短期内存 短期内存可在当前会话中持久化，但不会跨会话留存。它提供搜索和检索功能，且没有永久存储的延迟。

常见实现包括：会话范围的数据库（会话结束后持久化）、指定会话目录中的文件系统存储、按会话ID键控的内存缓存。

短期内存的使用场景包括：跨轮次跟踪对话状态而不填充上下文、存储后续可能需要的工具调用中间结果、维护任务清单和进度跟踪、在会话内缓存检索到的信息。

第三层：长期内存 长期内存可无限期跨会话持久化。它使Agent能够从过往交互中学习，并随时间积累知识。

长期内存的实现从简单的键值存储到复杂的图数据库不等。选择取决于要建模的关系复杂度、所需的查询模式以及可接受的基础设施复杂度。

长期内存的使用场景包括：跨会话学习用户偏好、构建随时间扩展的领域知识库、维护带有关系历史的实体注册表、存储可复用的成功模式。

第四层：实体内存 实体内存专门跟踪有关实体（人物、地点、概念、对象）的信息，以保持一致性。这会创建一个基础的知识图谱，其中实体可在多次交互中被识别。

实体内存通过跟踪一次对话中提到的「John Doe」与另一次对话中的是同一人来维护实体标识；通过存储随时间发现的实体事实来维护实体属性；通过跟踪发现的实体之间的关系来维护实体关系。

第五层：时序知识图谱 时序知识图谱通过显式有效期扩展了实体内存。事实并非单纯的真或假，而是在特定时间范围内为真。

这使得可以进行诸如「用户在日期X的地址是什么？」的查询，通过检索该日期范围内有效的事实来实现。它可以防止过时信息与新数据冲突导致的上下文矛盾，还能实现关于实体随时间如何变化的时序推理。

Memory Implementation Patterns

记忆实现模式

Pattern 1: File-System-as-Memory The file system itself can serve as a memory layer. This pattern is simple, requires no additional infrastructure, and enables the same just-in-time loading that makes file-system-based context effective.

Implementation uses the file system hierarchy for organization. Use naming conventions that convey meaning. Store facts in structured formats (JSON, YAML). Use timestamps in filenames or metadata for temporal tracking.

Advantages: Simplicity, transparency, portability. Disadvantages: No semantic search, no relationship tracking, manual organization required.

Pattern 2: Vector RAG with Metadata Vector stores enhanced with rich metadata provide semantic search with filtering capabilities.

Implementation embeds facts or documents and stores with metadata including entity tags, temporal validity, source attribution, and confidence scores. Query includes metadata filters alongside semantic search.

Pattern 3: Knowledge Graph Knowledge graphs explicitly model entities and relationships. Implementation defines entity types and relationship types, uses graph database or property graph storage, and maintains indexes for common query patterns.

Pattern 4: Temporal Knowledge Graph Temporal knowledge graphs add validity periods to facts, enabling time-travel queries and preventing context clash from outdated information.

模式1：文件系统作为内存 文件系统本身可作为一个内存层。这种模式简单，无需额外基础设施，并且能实现使基于文件系统的上下文有效的即时加载功能。

实现时使用文件系统层级进行组织，使用有意义的命名约定，以结构化格式（JSON、YAML）存储事实，在文件名或元数据中使用时间戳进行时序跟踪。

优点：简单、透明、可移植。缺点：无语义搜索、无关系跟踪、需要手动组织。

模式2：带元数据的向量RAG 增强了丰富元数据的向量存储提供了带过滤功能的语义搜索。

实现时将事实或文档嵌入，并与包含实体标签、时序有效性、来源归属和置信度分数的元数据一起存储。查询时将元数据过滤与语义搜索结合使用。

模式3：知识图谱 知识图谱显式建模实体和关系。实现时定义实体类型和关系类型，使用图数据库或属性图存储，并为常见查询模式维护索引。

模式4：时序知识图谱 时序知识图谱为事实添加了有效期，支持时间回溯查询，并防止过时信息导致的上下文矛盾。

Memory Retrieval Patterns

记忆检索模式

Semantic Retrieval Retrieve memories semantically similar to current query using embedding similarity search.

Entity-Based Retrieval Retrieve all memories related to specific entities by traversing graph relationships.

Temporal Retrieval Retrieve memories valid at specific time or within time range using validity period filters.

语义检索 使用嵌入相似度搜索检索与当前查询语义相似的记忆。

基于实体的检索 通过遍历图谱关系检索与特定实体相关的所有记忆。

时序检索 使用有效期过滤检索特定时间或时间范围内有效的记忆。

Memory Consolidation

记忆整合

Memories accumulate over time and require consolidation to prevent unbounded growth and remove outdated information.

Consolidation Triggers Trigger consolidation after significant memory accumulation, when retrieval returns too many outdated results, periodically on a schedule, or when explicit consolidation is requested.

Consolidation Process Identify outdated facts, merge related facts, update validity periods, archive or delete obsolete facts, and rebuild indexes.

记忆会随时间积累，需要进行整合以防止无限制增长并移除过时信息。

整合触发条件 在以下情况触发整合：记忆大量积累后、检索返回过多过时结果时、按定期计划、或当明确请求整合时。

整合流程 识别过时事实、合并相关事实、更新有效期、归档或删除废弃事实、重建索引。

Practical Guidance

实践指南

Integration with Context

与上下文的集成

Memories must integrate with context systems to be useful. Use just-in-time memory loading to retrieve relevant memories when needed. Use strategic injection to place memories in attention-favored positions.

记忆必须与上下文系统集成才能发挥作用。使用即时记忆加载，在需要时检索相关记忆；使用策略性注入，将记忆放在注意力优先的位置。

Memory System Selection

记忆系统选择

Choose memory architecture based on requirements:

Simple persistence needs: File-system memory
Semantic search needs: Vector RAG with metadata
Relationship reasoning needs: Knowledge graph
Temporal validity needs: Temporal knowledge graph

根据需求选择记忆架构：

简单持久化需求：文件系统内存
语义搜索需求：带元数据的向量RAG
关系推理需求：知识图谱
时序有效性需求：时序知识图谱

Examples

示例

Example 1: Entity Tracking

python

undefined

示例1：实体跟踪

python

undefined

Track entity across conversations

def remember_entity(entity_id, properties): memory.store({ "type": "entity", "id": entity_id, "properties": properties, "last_updated": now() })

def get_entity(entity_id): return memory.retrieve_entity(entity_id)


**Example 2: Temporal Query**
```python

def remember_entity(entity_id, properties): memory.store({ "type": "entity", "id": entity_id, "properties": properties, "last_updated": now() })

def get_entity(entity_id): return memory.retrieve_entity(entity_id)


**示例2：时序查询**
```python

What was the user's address on January 15, 2024?

def query_address_at_time(user_id, query_time): return temporal_graph.query(""" MATCH (user)-[r:LIVES_AT]->(address) WHERE user.id = $user_id AND r.valid_from <= $query_time AND (r.valid_until IS NULL OR r.valid_until > $query_time) RETURN address """, {"user_id": user_id, "query_time": query_time})

undefined

undefined

Guidelines

指南

Match memory architecture to query requirements
Implement progressive disclosure for memory access
Use temporal validity to prevent outdated information conflicts
Consolidate memories periodically to prevent unbounded growth
Design for memory retrieval failures gracefully
Consider privacy implications of persistent memory
Implement backup and recovery for critical memories
Monitor memory growth and performance over time

使记忆架构与查询需求匹配
为内存访问实现渐进式披露
使用时序有效性防止过时信息冲突
定期整合记忆以防止无限制增长
优雅设计记忆检索失败的处理逻辑
考虑持久化内存的隐私影响
为关键记忆实现备份与恢复
随时间监控内存增长和性能

Integration

集成

This skill builds on context-fundamentals. It connects to:

multi-agent-patterns - Shared memory across agents
context-optimization - Memory-based context loading
evaluation - Evaluating memory quality

本技能基于上下文基础知识构建。它与以下内容相关：

multi-agent-patterns - Agent间共享内存
context-optimization - 基于记忆的上下文加载
evaluation - 评估记忆质量

References

参考资料

Internal reference:

Implementation Reference - Detailed implementation patterns

Related skills in this collection:

context-fundamentals - Context basics
multi-agent-patterns - Cross-agent memory

External resources:

Graph database documentation (Neo4j, etc.)
Vector store documentation (Pinecone, Weaviate, etc.)
Research on knowledge graphs and reasoning

内部参考：

实现参考 - 详细实现模式

本集合中的相关技能：

context-fundamentals - 上下文基础
multi-agent-patterns - 跨Agent内存

外部资源：

图数据库文档（Neo4j等）
向量存储文档（Pinecone、Weaviate等）
知识图谱与推理相关研究

Skill Metadata

技能元数据

Created: 2025-12-20 Last Updated: 2025-12-20 Author: Agent Skills for Context Engineering Contributors Version: 1.0.0

创建时间: 2025-12-20 最后更新: 2025-12-20 作者: Agent Skills for Context Engineering 贡献者版本: 1.0.0