memory-systems

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Memory System Design

内存系统设计

Memory provides the persistence layer that allows agents to maintain continuity across sessions and reason over accumulated knowledge. Simple agents rely entirely on context for memory, losing all state when sessions end. Sophisticated agents implement layered memory architectures that balance immediate context needs with long-term knowledge retention. The evolution from vector stores to knowledge graphs to temporal knowledge graphs represents increasing investment in structured memory for improved retrieval and reasoning.

内存为Agent提供了持久化层，使其能够在跨会话时保持连续性，并基于积累的知识进行推理。简单的Agent完全依赖上下文作为内存，会话结束时会丢失所有状态。复杂的Agent则实现了分层内存架构，平衡即时上下文需求与长期知识留存。从向量存储到知识图谱再到时序知识图谱的演进，代表着为提升检索和推理能力而在结构化内存上的投入不断增加。

When to Activate

激活场景

Activate this skill when:

Building agents that must persist knowledge across sessions
Choosing between memory frameworks (Mem0, Zep/Graphiti, Letta, LangMem)
Needing to maintain entity consistency across conversations
Implementing reasoning over accumulated knowledge
Designing memory architectures that scale in production
Evaluating memory systems against benchmarks (LoCoMo, LongMemEval, DMR)

在以下场景中激活本技能：

构建需要跨会话留存知识的Agent
在内存框架（Mem0、Zep/Graphiti、Letta、LangMem）之间做选择
需要在对话中保持实体一致性
基于积累的知识实现推理
设计可在生产环境中扩展的内存架构
基于基准测试（LoCoMo、LongMemEval、DMR）评估内存系统

Core Concepts

核心概念

Memory spans a spectrum from volatile context window to persistent storage. Key insight from benchmarks: tool complexity matters less than reliable retrieval — Letta's filesystem agents scored 74% on LoCoMo using basic file operations, beating Mem0's specialized tools at 68.5%. Start simple, add structure (graphs, temporal validity) only when retrieval quality demands it.

内存的范围从易失性上下文窗口延伸到持久化存储。基准测试的关键发现：工具复杂度的重要性低于可靠的检索能力 —— Letta的文件系统Agent使用基础文件操作在LoCoMo上得分74%，击败了Mem0的专用工具（得分68.5%）。从简单方案开始，仅当检索质量要求时再添加结构化设计（图谱、时序有效性）。

Detailed Topics

详细主题

Production Framework Landscape

生产级框架全景

Framework	Architecture	Best For	Trade-off
Mem0	Vector store + graph memory, pluggable backends	Multi-tenant systems, broad integrations	Less specialized for multi-agent
Zep/Graphiti	Temporal knowledge graph, bi-temporal model	Enterprise requiring relationship modeling + temporal reasoning	Advanced features cloud-locked
Letta	Self-editing memory with tiered storage (in-context/core/archival)	Full agent introspection, stateful services	Complexity for simple use cases
LangMem	Memory tools for LangGraph workflows	Teams already on LangGraph	Tightly coupled to LangGraph
File-system	Plain files with naming conventions	Simple agents, prototyping	No semantic search, no relationships

Zep's Graphiti engine builds a three-tier knowledge graph (episode, semantic entity, community subgraphs) with a bi-temporal model tracking both when events occurred and when they were ingested. Mem0 offers the fastest path to production with managed infrastructure. Letta provides the deepest agent control through its Agent Development Environment.

Benchmark Performance Comparison

System	DMR Accuracy	LoCoMo	Latency
Zep (Temporal KG)	94.8%	—	2.58s
Letta (filesystem)	—	74.0%	—
Mem0	—	68.5%	—
MemGPT	93.4%	—	Variable
GraphRAG	~75-85%	—	Variable
Vector RAG baseline	~60-70%	—	Fast

Zep achieves up to 18.5% accuracy improvement on LongMemEval while reducing latency by 90%. Key insight: Letta's filesystem-based agents achieved 74% on LoCoMo using basic file operations, outperforming specialized memory tools — tool complexity matters less than reliable retrieval.

框架	架构	适用场景	权衡点
Mem0	向量存储 + 图谱内存，可插拔后端	多租户系统、广泛集成	对多Agent场景的针对性较弱
Zep/Graphiti	时序知识图谱，双时态模型	需要关系建模 + 时序推理的企业场景	高级功能仅限云端使用
Letta	带分层存储的自编辑内存（上下文/核心/归档）	全Agent自省、有状态服务	对简单场景来说过于复杂
LangMem	用于LangGraph工作流的内存工具	已采用LangGraph的团队	与LangGraph深度耦合
文件系统	带命名规范的纯文件	简单Agent、原型开发	无语义搜索、无关系建模

Zep的Graphiti引擎构建了三层知识图谱（片段、语义实体、社区子图），并通过双时态模型跟踪事件发生时间和数据摄入时间。Mem0提供了最快的生产部署路径，带有托管基础设施。Letta通过其Agent开发环境实现了对Agent的深度控制。

基准测试性能对比

系统	DMR准确率	LoCoMo	延迟
Zep (时序KG)	94.8%	—	2.58s
Letta (文件系统)	—	74.0%	—
Mem0	—	68.5%	—
MemGPT	93.4%	—	可变
GraphRAG	~75-85%	—	可变
向量RAG基准线	~60-70%	—	快

Zep在LongMemEval上实现了高达18.5%的准确率提升，同时将延迟降低了90%。关键发现：Letta基于文件系统的Agent使用基础文件操作在LoCoMo上达到74%的得分，优于专用内存工具 —— 工具复杂度的重要性低于可靠的检索能力。

Memory Layers (Decision Points)

内存分层（决策要点）

Layer	Persistence	Implementation	When to Use
Working	Context window only	Scratchpad in system prompt	Always — optimize with attention-favored positions
Short-term	Session-scoped	File-system, in-memory cache	Intermediate tool results, conversation state
Long-term	Cross-session	Key-value store → graph DB	User preferences, domain knowledge, entity registries
Entity	Cross-session	Entity registry + properties	Maintaining identity ("John Doe" = same person across conversations)
Temporal KG	Cross-session + history	Graph with validity intervals	Facts that change over time, time-travel queries, preventing context clash

层级	持久化能力	实现方式	适用场景
工作内存	仅上下文窗口	系统提示中的草稿区	始终适用 —— 优化为注意力优先位置
短期内存	会话范围	文件系统、内存缓存	中间工具结果、对话状态
长期内存	跨会话	键值存储 → 图数据库	用户偏好、领域知识、实体注册表
实体内存	跨会话	实体注册表 + 属性	维护身份一致性（如“John Doe”在所有对话中为同一人）
时序KG	跨会话 + 历史	带有效性区间的图谱	随时间变化的事实、时间回溯查询、防止上下文冲突

Retrieval Strategies

检索策略

Strategy	Use When	Limitation
Semantic (embedding similarity)	Direct factual queries	Degrades on multi-hop reasoning
Entity-based (graph traversal)	"Tell me everything about X"	Requires graph structure
Temporal (validity filter)	Facts change over time	Requires validity metadata
Hybrid (semantic + keyword + graph)	Best overall accuracy	Most infrastructure

Zep's hybrid approach achieves 90% latency reduction (2.58s vs 28.9s) by retrieving only relevant subgraphs.

策略	适用场景	局限性
语义检索（嵌入相似度）	直接事实查询	在多跳推理上性能下降
基于实体的检索（图谱遍历）	“告诉我关于X的所有信息”	需要图谱结构
时序检索（有效性过滤）	随时间变化的事实	需要有效性元数据
混合检索（语义 + 关键词 + 图谱）	整体最佳准确率	所需基础设施最多

Zep的混合方法通过仅检索相关子图，实现了90%的延迟降低（2.58s vs 28.9s）。

Memory Consolidation

内存整合

Consolidate periodically to prevent unbounded growth. Invalidate but don't discard — preserving history matters for temporal queries. Trigger on memory count thresholds, degraded retrieval quality, or scheduled intervals. See implementation reference for working consolidation code.

定期进行整合以防止无限制增长。标记无效但不丢弃 —— 保留历史对时序查询至关重要。可在内存数量达到阈值、检索质量下降或按计划时间间隔触发整合。查看实现参考获取可用的整合代码。

Practical Guidance

实践指南

Choosing a Memory Architecture

选择内存架构

Start simple, add complexity only when retrieval fails. Most agents don't need a temporal knowledge graph on day one.

Prototype: File-system memory. Store facts as structured JSON with timestamps. Good enough to validate agent behavior.
Scale: Move to Mem0 or vector store with metadata when you need semantic search and multi-tenant isolation.
Complex reasoning: Add Zep/Graphiti when you need relationship traversal, temporal validity, or cross-session synthesis.
Full control: Use Letta when you need agent self-management of memory with deep introspection.

从简单方案开始，仅当检索失败时再增加复杂度。大多数Agent在初期不需要时序知识图谱。

原型开发：文件系统内存。将事实存储为带时间戳的结构化JSON。足以验证Agent行为。
扩展：当需要语义搜索和多租户隔离时，迁移到Mem0或带元数据的向量存储。
复杂推理：当需要关系遍历、时序有效性或跨会话合成时，添加Zep/Graphiti。
完全控制：当需要Agent自主管理内存并具备深度自省能力时，使用Letta。

Integration with Context

与上下文的集成

Memories must integrate with context systems to be useful. Use just-in-time memory loading to retrieve relevant memories when needed. Use strategic injection to place memories in attention-favored positions (beginning/end of context).

内存必须与上下文系统集成才能发挥作用。使用即时内存加载，在需要时检索相关内存。采用策略性注入，将内存放在注意力优先的位置（上下文的开头/结尾）。

Error Recovery

错误恢复

Empty retrieval: Fall back to broader search (remove entity filter, widen time range). If still empty, prompt user for clarification.
Stale results: Check
```
valid_until
```
timestamps. If most results are expired, trigger consolidation before retrying.
Conflicting facts: Prefer the fact with the most recent
```
valid_from
```
. Surface the conflict to the user if confidence is low.
Storage failure: Queue writes for retry. Never block the agent's response on a memory write.

空检索结果：回退到更广泛的搜索（移除实体过滤、扩大时间范围）。如果仍无结果，提示用户澄清。
陈旧结果：检查
```
valid_until
```
时间戳。如果大多数结果已过期，触发整合后再重试。
事实冲突：优先选择
```
valid_from
```
最新的事实。如果置信度低，向用户展示冲突。
存储失败：将写入操作加入重试队列。绝不要因内存写入阻塞Agent的响应。

Anti-Patterns

反模式

Stuffing everything into context: Long inputs are expensive and degrade performance. Use just-in-time retrieval.
Ignoring temporal validity: Facts go stale. Without validity tracking, outdated information poisons context.
Over-engineering early: A filesystem agent can outperform complex memory tooling. Add sophistication when simple approaches fail.
No consolidation strategy: Unbounded memory growth degrades retrieval quality over time.

将所有内容塞入上下文：长输入成本高且会降低性能。使用即时检索。
忽略时序有效性：事实会过时。没有有效性跟踪的话，过期信息会污染上下文。
过早过度设计：文件系统Agent的性能可能优于复杂的内存工具。当简单方法失效时再增加复杂度。
无整合策略：无限制的内存增长会随时间降低检索质量。

Examples

示例

Example 1: Mem0 Integration

python

from mem0 import Memory

m = Memory()
m.add("User prefers dark mode and Python 3.12", user_id="alice")
m.add("User switched to light mode", user_id="alice")

示例1：Mem0集成

python

from mem0 import Memory

m = Memory()
m.add("User prefers dark mode and Python 3.12", user_id="alice")
m.add("User switched to light mode", user_id="alice")

Retrieves current preference (light mode), not outdated one

results = m.search("What theme does the user prefer?", user_id="alice")


**Example 2: Temporal Query**
```python

results = m.search("What theme does the user prefer?", user_id="alice")


**示例2：时序查询**
```python

Track entity with validity periods

graph.create_temporal_relationship( source_id=user_node, rel_type="LIVES_AT", target_id=address_node, valid_from=datetime(2024, 1, 15), valid_until=datetime(2024, 9, 1), # moved out )

Query: Where did user live on March 1, 2024?

results = graph.query_at_time( {"type": "LIVES_AT", "source_label": "User"}, query_time=datetime(2024, 3, 1) )

undefined

results = graph.query_at_time( {"type": "LIVES_AT", "source_label": "User"}, query_time=datetime(2024, 3, 1) )

undefined

Guidelines

指南

Start with file-system memory; add complexity only when retrieval quality demands it
Track temporal validity for any fact that can change over time
Use hybrid retrieval (semantic + keyword + graph) for best accuracy
Consolidate memories periodically — invalidate but don't discard
Design for retrieval failure: always have a fallback when memory lookup returns nothing
Consider privacy implications of persistent memory (retention policies, deletion rights)
Benchmark your memory system against LoCoMo or LongMemEval before and after changes
Monitor memory growth and retrieval latency in production

从文件系统内存开始；仅当检索质量要求时再增加复杂度
为所有随时间变化的事实跟踪时序有效性
使用混合检索（语义 + 关键词 + 图谱）以获得最佳准确率
定期整合内存 —— 标记无效但不丢弃
为检索失败设计预案：当内存查询无结果时始终有回退方案
考虑持久化内存的隐私影响（留存政策、删除权限）
在变更前后使用LoCoMo或LongMemEval基准测试你的内存系统
在生产环境中监控内存增长和检索延迟

Integration

集成

This skill builds on context-fundamentals. It connects to:

multi-agent-patterns - Shared memory across agents
context-optimization - Memory-based context loading
evaluation - Evaluating memory quality

本技能基于上下文基础知识构建。它与以下技能关联：

multi-agent-patterns - Agent间共享内存
context-optimization - 基于内存的上下文加载
evaluation - 评估内存质量

References

参考资料

Internal reference:

Implementation Reference - Detailed implementation patterns

Related skills in this collection:

context-fundamentals - Context basics
multi-agent-patterns - Cross-agent memory

External resources:

Zep temporal knowledge graph paper (arXiv:2501.13956)
Mem0 production architecture paper (arXiv:2504.19413)
LoCoMo benchmark (Snap Research)
MemBench evaluation framework (ACL 2025)
Graphiti open-source temporal KG engine (github.com/getzep/graphiti)

内部参考：

实现参考 - 详细实现模式

本集合中的相关技能：

context-fundamentals - 上下文基础知识
multi-agent-patterns - 跨Agent内存

外部资源：

Zep时序知识图谱论文（arXiv:2501.13956）
Mem0生产架构论文（arXiv:2504.19413）
LoCoMo基准测试（Snap Research）
MemBench评估框架（ACL 2025）
Graphiti开源时序KG引擎（github.com/getzep/graphiti）

Skill Metadata

技能元数据

Created: 2025-12-20 Last Updated: 2026-02-12 Author: Agent Skills for Context Engineering Contributors Version: 2.0.0

创建时间: 2025-12-20 最后更新: 2026-02-12 作者: Agent Skills for Context Engineering Contributors 版本: 2.0.0