memory-systems
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWhen to Use This Skill
何时使用该技能
Design short-term, long-term, and graph-based memory architectures
Use this skill when working with design short-term, long-term, and graph-based memory architectures.
设计短期、长期及基于图的记忆架构
当你需要设计短期、长期及基于图的记忆架构时,可使用该技能。
Memory System Design
记忆系统设计
Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.
记忆层为Agent提供持久化能力,使其能够在不同会话间保持连续性,并基于积累的知识进行推理。简单的Agent完全依赖上下文作为记忆,会话结束后所有状态都会丢失。复杂的Agent则采用分层记忆架构,平衡即时上下文需求与长期知识留存。从向量存储到知识图谱再到时序知识图谱的演进,代表着为提升检索和推理能力而在结构化记忆领域持续投入的趋势。
When to Activate
何时启用
Activate this skill when:
- Building agents that must persist across sessions
- Needing to maintain entity consistency across conversations
- Implementing reasoning over accumulated knowledge
- Designing systems that learn from past interactions
- Creating knowledge bases that grow over time
- Building temporal-aware systems that track state changes
在以下场景启用该技能:
- 构建需要跨会话持久化的Agent
- 需要在多轮对话中保持实体一致性
- 基于积累的知识实现推理功能
- 设计可从过往交互中学习的系统
- 创建随时间不断扩展的知识库
- 构建可跟踪状态变化的时序感知系统
Core Concepts
核心概念
Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context.
Simple vector stores lack relationship and temporal structure. Knowledge graphs preserve relationships for reasoning. Temporal knowledge graphs add validity periods for time-aware queries. Implementation choices depend on query complexity, infrastructure constraints, and accuracy requirements.
记忆存在于从即时上下文到永久存储的连续区间中。一端是上下文窗口中的工作记忆,它提供零延迟访问,但会话结束后就会消失;另一端是永久存储,可无限期保留数据,但需要检索才能进入上下文。
简单向量存储缺乏关系和时序结构,知识图谱则能保留关系以支持推理,时序知识图谱还为事实添加了有效期,支持时序感知查询。具体实现选择取决于查询复杂度、基础设施限制和准确性要求。
Detailed Topics
详细主题
Memory Architecture Fundamentals
记忆架构基础
The Context-Memory Spectrum
Memory exists on a spectrum from immediate context to permanent storage. At one extreme, working memory in the context window provides zero-latency access but vanishes when sessions end. At the other extreme, permanent storage persists indefinitely but requires retrieval to enter context. Effective architectures use multiple layers along this spectrum.
The spectrum includes working memory (context window, zero latency, volatile), short-term memory (session-persistent, searchable, volatile), long-term memory (cross-session persistent, structured, semi-permanent), and permanent memory (archival, queryable, permanent). Each layer has different latency, capacity, and persistence characteristics.
Why Simple Vector Stores Fall Short
Vector RAG provides semantic retrieval by embedding queries and documents in a shared embedding space. Similarity search retrieves the most semantically similar documents. This works well for document retrieval but lacks structure for agent memory.
Vector stores lose relationship information. If an agent learns that "Customer X purchased Product Y on Date Z," a vector store can retrieve this fact if asked directly. But it cannot answer "What products did customers who purchased Product Y also buy?" because relationship structure is not preserved.
Vector stores also struggle with temporal validity. Facts change over time, but vector stores provide no mechanism to distinguish "current fact" from "outdated fact" except through explicit metadata and filtering.
The Move to Graph-Based Memory
Knowledge graphs preserve relationships between entities. Instead of isolated document chunks, graphs encode that Entity A has Relationship R to Entity B. This enables queries that traverse relationships rather than just similarity.
Temporal knowledge graphs add validity periods to facts. Each fact has a "valid from" and optionally "valid until" timestamp. This enables time-travel queries that reconstruct knowledge at specific points in time.
Benchmark Performance Comparison
The Deep Memory Retrieval (DMR) benchmark provides concrete performance data across memory architectures:
| Memory System | DMR Accuracy | Retrieval Latency | Notes |
|---|---|---|---|
| Zep (Temporal KG) | 94.8% | 2.58s | Best accuracy, fast retrieval |
| MemGPT | 93.4% | Variable | Good general performance |
| GraphRAG | ~75-85% | Variable | 20-35% gains over baseline RAG |
| Vector RAG | ~60-70% | Fast | Loses relationship structure |
| Recursive Summarization | 35.3% | Low | Severe information loss |
Zep demonstrated 90% reduction in retrieval latency compared to full-context baselines (2.58s vs 28.9s for GPT-5.2). This efficiency comes from retrieving only relevant subgraphs rather than entire context history.
GraphRAG achieves approximately 20-35% accuracy gains over baseline RAG in complex reasoning tasks and reduces hallucination by up to 30% through community-based summarization.
上下文-记忆连续区间
记忆存在于从即时上下文到永久存储的连续区间中。一端是上下文窗口中的工作记忆,它提供零延迟访问,但会话结束后就会消失;另一端是永久存储,可无限期保留数据,但需要检索才能进入上下文。高效的架构会在这个区间中采用多层设计。
该区间包括:工作记忆(上下文窗口、零延迟、易失性)、短期记忆(会话内持久、可搜索、易失性)、长期记忆(跨会话持久、结构化、半永久)和永久记忆(归档、可查询、永久)。每一层都有不同的延迟、容量和持久化特性。
简单向量存储的局限性
Vector RAG通过将查询和文档嵌入到共享向量空间实现语义检索,通过相似度搜索获取语义最相似的文档。这种方式在文档检索中表现良好,但缺乏Agent记忆所需的结构。
向量存储会丢失关系信息。例如,如果Agent得知“客户X在日期Z购买了产品Y”,向量存储在被直接询问时可以检索到这个事实,但无法回答“购买了产品Y的客户还买了哪些产品?”,因为它没有保留关系结构。
向量存储在处理时序有效性方面也存在不足。事实会随时间变化,但向量存储除了通过显式元数据和过滤外,没有其他机制区分“当前事实”和“过时事实”。
转向基于图的记忆
知识图谱保留实体之间的关系。它不再是孤立的文档片段,而是编码“实体A与实体B存在关系R”的结构,支持遍历关系的查询,而非仅依赖相似度。
时序知识图谱为事实添加了有效期,每个事实都有“生效时间”和可选的“失效时间”戳,支持时间回溯查询,重构特定时间点的知识状态。
性能基准对比
深度记忆检索(DMR)基准测试提供了不同记忆架构的具体性能数据:
| 记忆系统 | DMR准确率 | 检索延迟 | 说明 |
|---|---|---|---|
| Zep (Temporal KG) | 94.8% | 2.58s | 准确率最高,检索速度快 |
| MemGPT | 93.4% | 可变 | 综合性能良好 |
| GraphRAG | ~75-85% | 可变 | 比基线RAG提升20-35% |
| Vector RAG | ~60-70% | 快 | 丢失关系结构 |
| 递归摘要 | 35.3% | 低 | 信息丢失严重 |
与全上下文基线相比,Zep的检索延迟降低了90%(2.58s vs GPT-5.2的28.9s)。这种效率来自于仅检索相关子图,而非整个上下文历史。
GraphRAG在复杂推理任务中比基线RAG的准确率提升约20-35%,并通过基于社区的摘要将幻觉率降低了30%。
Memory Layer Architecture
记忆分层架构
Layer 1: Working Memory
Working memory is the context window itself. It provides immediate access to information currently being processed but has limited capacity and vanishes when sessions end.
Working memory usage patterns include scratchpad calculations where agents track intermediate results, conversation history that preserves dialogue for current task, current task state that tracks progress on active objectives, and active retrieved documents that hold information currently being used.
Optimize working memory by keeping only active information, summarizing completed work before it falls out of attention, and using attention-favored positions for critical information.
Layer 2: Short-Term Memory
Short-term memory persists across the current session but not across sessions. It provides search and retrieval capabilities without the latency of permanent storage.
Common implementations include session-scoped databases that persist until session end, file-system storage in designated session directories, and in-memory caches keyed by session ID.
Short-term memory use cases include tracking conversation state across turns without stuffing context, storing intermediate results from tool calls that may be needed later, maintaining task checklists and progress tracking, and caching retrieved information within sessions.
Layer 3: Long-Term Memory
Long-term memory persists across sessions indefinitely. It enables agents to learn from past interactions and build knowledge over time.
Long-term memory implementations range from simple key-value stores to sophisticated graph databases. The choice depends on complexity of relationships to model, query patterns required, and acceptable infrastructure complexity.
Long-term memory use cases include learning user preferences across sessions, building domain knowledge bases that grow over time, maintaining entity registries with relationship history, and storing successful patterns that can be reused.
Layer 4: Entity Memory
Entity memory specifically tracks information about entities (people, places, concepts, objects) to maintain consistency. This creates a rudimentary knowledge graph where entities are recognized across multiple interactions.
Entity memory maintains entity identity by tracking that "John Doe" mentioned in one conversation is the same person in another. It maintains entity properties by storing facts discovered about entities over time. It maintains entity relationships by tracking relationships between entities as they are discovered.
Layer 5: Temporal Knowledge Graphs
Temporal knowledge graphs extend entity memory with explicit validity periods. Facts are not just true or false but true during specific time ranges.
This enables queries like "What was the user's address on Date X?" by retrieving facts valid during that date range. It prevents context clash when outdated information contradicts new data. It enables temporal reasoning about how entities changed over time.
第一层:工作记忆
工作记忆就是上下文窗口本身,它提供对当前处理信息的即时访问,但容量有限,会话结束后消失。
工作记忆的使用场景包括:Agent跟踪中间结果的草稿本、保留当前任务对话历史、跟踪活跃目标的任务状态、存储当前正在使用的已检索文档。
优化工作记忆的方法:仅保留活跃信息、在完成的工作超出注意力范围前进行摘要、将关键信息放在注意力优先的位置。
第二层:短期记忆
短期记忆在当前会话内持久存在,但不会跨会话保留。它提供搜索和检索能力,且没有永久存储的延迟。
常见实现方式包括:会话作用域数据库(会话结束后销毁)、指定会话目录中的文件系统存储、按会话ID键控的内存缓存。
短期记忆的使用场景包括:在多轮对话中跟踪会话状态、存储后续可能需要的工具调用中间结果、维护任务清单和进度跟踪、在会话内缓存已检索的信息。
第三层:长期记忆
长期记忆可跨会话无限期持久存在,使Agent能够从过往交互中学习并随时间积累知识。
长期记忆的实现从简单的键值存储到复杂的图数据库不等,选择取决于要建模的关系复杂度、所需的查询模式以及可接受的基础设施复杂度。
长期记忆的使用场景包括:跨会话学习用户偏好、构建随时间扩展的领域知识库、维护带有关系历史的实体注册表、存储可复用的成功模式。
第四层:实体记忆
实体记忆专门跟踪实体(人物、地点、概念、对象)的信息以保持一致性,它构建了一个基础的知识图谱,可在多轮交互中识别同一实体。
实体记忆的功能包括:跟踪实体身份(比如识别不同对话中提到的“John Doe”是同一人)、存储随时间发现的实体属性、跟踪已发现的实体间关系。
第五层:时序知识图谱
时序知识图谱为实体记忆添加了显式的有效期,事实不再只是“真”或“假”,而是在特定时间范围内为真。
这支持诸如“用户在2024年1月15日的地址是什么?”的查询,通过检索该日期范围内有效的事实实现。它还能防止过时信息与新数据冲突导致的上下文矛盾,并支持关于实体随时间变化的时序推理。
Memory Implementation Patterns
记忆实现模式
Pattern 1: File-System-as-Memory
The file system itself can serve as a memory layer. This pattern is simple, requires no additional infrastructure, and enables the same just-in-time loading that makes file-system-based context effective.
Implementation uses the file system hierarchy for organization. Use naming conventions that convey meaning. Store facts in structured formats (JSON, YAML). Use timestamps in filenames or metadata for temporal tracking.
Advantages: Simplicity, transparency, portability.
Disadvantages: No semantic search, no relationship tracking, manual organization required.
Pattern 2: Vector RAG with Metadata
Vector stores enhanced with rich metadata provide semantic search with filtering capabilities.
Implementation embeds facts or documents and stores with metadata including entity tags, temporal validity, source attribution, and confidence scores. Query includes metadata filters alongside semantic search.
Pattern 3: Knowledge Graph
Knowledge graphs explicitly model entities and relationships. Implementation defines entity types and relationship types, uses graph database or property graph storage, and maintains indexes for common query patterns.
Pattern 4: Temporal Knowledge Graph
Temporal knowledge graphs add validity periods to facts, enabling time-travel queries and preventing context clash from outdated information.
模式1:文件系统作为记忆层
文件系统本身可作为记忆层,这种模式简单,无需额外基础设施,支持基于文件系统上下文的即时加载能力。
实现方式:利用文件系统层级进行组织、使用有意义的命名规范、以结构化格式(JSON、YAML)存储事实、在文件名或元数据中使用时间戳进行时序跟踪。
优势:简单、透明、可移植。
劣势:无语义搜索、无关系跟踪、需要手动组织。
模式2:带元数据的Vector RAG
增强了丰富元数据的向量存储,可提供带过滤能力的语义检索。
实现方式:将事实或文档嵌入向量并存储,同时包含实体标签、时序有效性、来源属性和置信度分数等元数据;查询时结合语义搜索与元数据过滤。
模式3:知识图谱
知识图谱显式建模实体和关系,实现方式包括:定义实体类型和关系类型、使用图数据库或属性图存储、为常见查询模式维护索引。
模式4:时序知识图谱
时序知识图谱为事实添加有效期,支持时间回溯查询,避免过时信息导致的上下文矛盾。
Memory Retrieval Patterns
记忆检索模式
Semantic Retrieval
Retrieve memories semantically similar to current query using embedding similarity search.
Entity-Based Retrieval
Retrieve all memories related to specific entities by traversing graph relationships.
Temporal Retrieval
Retrieve memories valid at specific time or within time range using validity period filters.
语义检索
通过嵌入相似度搜索,检索与当前查询语义相似的记忆。
基于实体的检索
通过遍历图关系,检索与特定实体相关的所有记忆。
时序检索
通过有效期过滤,检索特定时间或时间范围内有效的记忆。
Memory Consolidation
记忆整合
Memories accumulate over time and require consolidation to prevent unbounded growth and remove outdated information.
Consolidation Triggers
Trigger consolidation after significant memory accumulation, when retrieval returns too many outdated results, periodically on a schedule, or when explicit consolidation is requested.
Consolidation Process
Identify outdated facts, merge related facts, update validity periods, archive or delete obsolete facts, and rebuild indexes.
记忆会随时间积累,需要进行整合以防止无限制增长并移除过时信息。
整合触发条件
在以下情况触发整合:记忆大量积累后、检索返回过多过时结果时、按定期计划执行、或收到显式整合请求时。
整合流程
识别过时事实、合并相关事实、更新有效期、归档或删除废弃事实、重建索引。
Practical Guidance
实践指南
Integration with Context
与上下文系统集成
Memories must integrate with context systems to be useful. Use just-in-time memory loading to retrieve relevant memories when needed. Use strategic injection to place memories in attention-favored positions.
记忆必须与上下文系统集成才能发挥作用,采用即时记忆加载机制,在需要时检索相关记忆;将记忆放置在注意力优先的位置,实现策略性注入。
Memory System Selection
记忆系统选择
Choose memory architecture based on requirements:
- Simple persistence needs: File-system memory
- Semantic search needs: Vector RAG with metadata
- Relationship reasoning needs: Knowledge graph
- Temporal validity needs: Temporal knowledge graph
根据需求选择记忆架构:
- 简单持久化需求:文件系统记忆
- 语义搜索需求:带元数据的Vector RAG
- 关系推理需求:知识图谱
- 时序有效性需求:时序知识图谱
Examples
示例
Example 1: Entity Tracking
python
undefined示例1:实体跟踪
python
undefinedTrack entity across conversations
Track entity across conversations
def remember_entity(entity_id, properties):
memory.store({
"type": "entity",
"id": entity_id,
"properties": properties,
"last_updated": now()
})
def get_entity(entity_id):
return memory.retrieve_entity(entity_id)
**Example 2: Temporal Query**
```pythondef remember_entity(entity_id, properties):
memory.store({
"type": "entity",
"id": entity_id,
"properties": properties,
"last_updated": now()
})
def get_entity(entity_id):
return memory.retrieve_entity(entity_id)
**示例2:时序查询**
```pythonWhat was the user's address on January 15, 2024?
What was the user's address on January 15, 2024?
def query_address_at_time(user_id, query_time):
return temporal_graph.query("""
MATCH (user)-[r:LIVES_AT]->(address)
WHERE user.id = $user_id
AND r.valid_from <= $query_time
AND (r.valid_until IS NULL OR r.valid_until > $query_time)
RETURN address
""", {"user_id": user_id, "query_time": query_time})
undefineddef query_address_at_time(user_id, query_time):
return temporal_graph.query("""
MATCH (user)-[r:LIVES_AT]->(address)
WHERE user.id = $user_id
AND r.valid_from <= $query_time
AND (r.valid_until IS NULL OR r.valid_until > $query_time)
RETURN address
""", {"user_id": user_id, "query_time": query_time})
undefinedGuidelines
指南
- Match memory architecture to query requirements
- Implement progressive disclosure for memory access
- Use temporal validity to prevent outdated information conflicts
- Consolidate memories periodically to prevent unbounded growth
- Design for memory retrieval failures gracefully
- Consider privacy implications of persistent memory
- Implement backup and recovery for critical memories
- Monitor memory growth and performance over time
- 根据查询需求匹配记忆架构
- 为记忆访问实现渐进式披露
- 使用时序有效性避免过时信息冲突
- 定期整合记忆以防止无限制增长
- 优雅处理记忆检索失败的情况
- 考虑持久化记忆的隐私影响
- 为关键记忆实现备份与恢复机制
- 持续监控记忆增长与性能
Integration
集成
This skill builds on context-fundamentals. It connects to:
- multi-agent-patterns - Shared memory across agents
- context-optimization - Memory-based context loading
- evaluation - Evaluating memory quality
该技能基于上下文基础技能构建,可与以下技能集成:
- multi-agent-patterns - Agent间共享记忆
- context-optimization - 基于记忆的上下文加载
- evaluation - 评估记忆质量
References
参考资料
Internal reference:
- Implementation Reference - Detailed implementation patterns
Related skills in this collection:
- context-fundamentals - Context basics
- multi-agent-patterns - Cross-agent memory
External resources:
- Graph database documentation (Neo4j, etc.)
- Vector store documentation (Pinecone, Weaviate, etc.)
- Research on knowledge graphs and reasoning
内部参考:
- 实现参考 - 详细实现模式
本集合中的相关技能:
- context-fundamentals - 上下文基础
- multi-agent-patterns - 跨Agent记忆
外部资源:
- 图数据库文档(Neo4j等)
- 向量存储文档(Pinecone、Weaviate等)
- 知识图谱与推理相关研究
Skill Metadata
技能元数据
Created: 2025-12-20
Last Updated: 2025-12-20
Author: Agent Skills for Context Engineering Contributors
Version: 1.0.0
创建时间: 2025-12-20
最后更新时间: 2025-12-20
作者: Agent Skills for Context Engineering Contributors
版本: 1.0.0