helix-memory-system

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Helix Memory System

Helix内存系统

Build a durable, per-tenant agent memory platform on Helix that combines graph relationships, vector similarity, and BM25 full-text in one database. This skill covers the whole memory lifecycle: raw context ingestion, extraction, memory generation, deduplication, updating/versioning, deletion/forgetting, categorisation, profile maintenance, and hybrid retrieval.

Helix is the storage and retrieval engine. A complete memory product also needs application workers for extraction, chunking, embeddings, relationship classification, reranking, connector sync, and profile summarisation.

在Helix上构建一个持久化的、按租户隔离的Agent内存平台，该平台在单一数据库中整合了graph relationships、vector similarity和BM25 full-text三种能力。本技能覆盖完整的内存生命周期：原始上下文导入、提取、内存生成、去重、更新/版本控制、删除/遗忘、分类、画像维护以及混合检索。

Helix是存储与检索引擎。一个完整的内存产品还需要应用层处理程序来完成提取、分块、嵌入、关系分类、重排序、连接器同步和画像总结等工作。

When To Use

使用场景

Use this skill when the task is to:

design the data model for agent memory, long-term memory, user profiles, document/chunk RAG, or a "remember what the user told me" feature
write queries that create, deduplicate, reinforce, consolidate, version, correct, expire, forget, categorise, or retrieve memories
decide which Helix capability (property index, graph edge, vector index, BM25 text index) a given memory operation should use
build hybrid recall that fuses semantic + keyword + graph + profile context
implement advanced memory components such as source documents, chunks, connectors, extracted facts, evolving profiles, relationship-aware recall, and forgetting

Do not use this skill for generic query syntax questions. For builder/method details defer to

helix-query-typescript

(the default DSL),

helix-query-rust

, or

helix-query-json-dynamic

. This skill assumes those and focuses on the memory architecture on top of Helix.

当你需要完成以下任务时，可使用本技能：

为Agent内存、长期记忆、用户画像、文档/片段RAG或“记住用户所说内容”功能设计数据模型
编写用于创建、去重、强化、合并、版本化、修正、过期、遗忘、分类或检索记忆的查询语句
确定给定的内存操作应使用Helix的哪种能力（属性索引、图边、向量索引、BM25文本索引）
构建融合语义+关键词+图+画像上下文的混合回忆系统
实现高级内存组件，如源文档、片段、连接器、提取的事实、动态演化的用户画像、感知关系的回忆功能以及遗忘机制

请勿将本技能用于通用查询语法问题。如需了解构建器/方法细节，请参考

helix-query-typescript

（默认DSL）、

helix-query-rust

或

helix-query-json-dynamic

。本技能假设你已掌握这些内容，重点聚焦于Helix之上的内存架构。

First Steps

初始步骤

Inspect the target repo for existing labels, edges, properties, indexes, and route style. Reuse exact casing if present.
Default to the TypeScript DSL (
```
@helix-db/helix-db
```
) so the app can keep query generation near service code. Use
```
EXAMPLES.rust.md
```
only if the runtime is Rust or the team explicitly ships Rust queries.
Decide the tenancy boundary before modeling anything. The canonical tenant property is tenant_id
because tenant-partitioned Helix text indexes currently require that name. Attach
```
tenant_id
```
to every tenant-owned node and edge.
Decide the memory visibility boundary separately from tenancy. In most apps,
```
tenant_id
```
partitions indexes while
```
userId
```
,
```
containerId
```
,
```
projectId
```
, or an app ACL decides which memories can be recalled. Default examples use
```
userId
```
as the second-level scope.
Reuse the canonical model below before inventing labels. Adapt names, not the shape.
Confirm how embeddings are produced. Default to OpenAI
text-embedding-3-small
for production and benchmarkable memory systems:
```
1536
```
dimensions, stored as
```
F32
```
arrays. The application computes embeddings client-side and passes numeric arrays (
```
param.array(param.f32())
```
). Helix does not embed text on the dynamic-query path; there is no
```
Embed()
```
/
```
SearchV
```
in the current DSL. Keep embedding model and dimension fixed for each vector index. Deterministic hash embeddings are acceptable only for local UI demos or smoke tests, not for quality benchmarks.
Identify the application workers outside Helix: extractor, chunker, embedder, memory writer, relationship classifier, decay/expiry sweeper, profile summariser, optional query rewriter, optional reranker, and connector sync jobs.

检查目标仓库中现有的标签、边、属性、索引和路由风格。如果存在，复用完全一致的大小写格式。
优先使用TypeScript DSL（
```
@helix-db/helix-db
```
），以便应用程序将查询生成逻辑放在服务代码附近。仅当运行环境为Rust或团队明确要求使用Rust查询时，才使用
```
EXAMPLES.rust.md
```
中的示例。
在建模之前确定租户边界。标准的租户属性为**
```
tenant_id
```
**，因为按租户分区的Helix文本索引目前要求使用该名称。将
```
tenant_id
```
附加到每个租户拥有的节点和边上。
单独确定内存可见性边界，与租户边界区分开。在大多数应用中，
```
tenant_id
```
用于分区索引，而
```
userId
```
、
```
containerId
```
、
```
projectId
```
或应用ACL决定哪些内存可以被召回。默认示例使用
```
userId
```
作为二级范围。
在创建新标签之前，先复用标准模型。仅调整名称，不要修改结构。
确认嵌入向量的生成方式。生产环境和可基准测试的内存系统默认使用OpenAI
text-embedding-3-small
：1536维度，存储为
```
F32
```
数组。应用程序在客户端计算嵌入向量，并传递数值数组（
```
param.array(param.f32())
```
）。Helix在动态查询路径上不执行文本嵌入操作；当前DSL中没有
```
Embed()
```
/
```
SearchV
```
方法。为每个向量索引固定嵌入模型和维度。确定性哈希嵌入仅适用于本地UI演示或冒烟测试，不适用于质量基准测试。
识别Helix之外的应用层处理程序：提取器、分块器、嵌入生成器、内存写入器、关系分类器、过期/衰减清理程序、画像总结器、可选的查询重写器、可选的重排序器以及连接器同步任务。

The Memory Model At A Glance

内存模型概览

Core labels: Tenant
, User
, UserProfile
, SourceDocument
, Chunk
, Memory
, Category
, Entity
, Session
, optional Connector
and IngestionJob
.

Core edges: OWNS
(Tenant/User→Memory), HAS_PROFILE
(User→UserProfile), HAS_CHUNK
(SourceDocument→Chunk), EXTRACTED_FROM
(Memory→Chunk or SourceDocument), IN_CATEGORY
(Memory→Category), MENTIONS
(Memory→Entity), UPDATES
(new Memory→old Memory), EXTENDS
(Memory→Memory enrichment), DERIVES
(inferred Memory→supporting Memory), RELATES_TO
(Memory→Memory association), DERIVED_FROM
(Memory→Session), optional PARENT_OF
(Category→Category).

Fast and safe fields:

```
tenant_id
```
on every tenant-owned node and edge, with equality indexes where used as an anchor
```
userId
```
or an equivalent scope key on user/container-specific memories, source documents, and chunks; only intentionally shared records should be tenant-wide

stable IDs such as

memoryId

documentId

chunkId

categoryKey

entityKey

sessionId

, and

profileId

Memory.isLatest

validFrom

validTo

expiresAt

, and

deletedAt

for record lifecycle filtering

optional real-world temporal fields such as
```
observedAt
```
,
```
eventStartAt
```
,
```
eventEndAt
```
,
```
temporalText
```
, and
```
timezone
```
when the memory is about a dated event or fact
tenant-partitioned vector/text indexes on
```
Memory.embedding
```
/
```
Memory.content
```
and optionally
```
Chunk.embedding
```
/
```
Chunk.content
```
, all partitioned by
```
tenant_id
```

Full spec, types, and index bootstrap are in

REFERENCE.md

核心标签：Tenant
、User
、UserProfile
、SourceDocument
、Chunk
、Memory
、Category
、Entity
、Session
，可选标签**

Connector

和

IngestionJob

**。

核心边：OWNS
（Tenant/User→Memory）、HAS_PROFILE
（User→UserProfile）、HAS_CHUNK
（SourceDocument→Chunk）、EXTRACTED_FROM
（Memory→Chunk或SourceDocument）、IN_CATEGORY
（Memory→Category）、MENTIONS
（Memory→Entity）、UPDATES
（新Memory→旧Memory）、EXTENDS
（Memory→Memory补充）、DERIVES
（推断Memory→支撑Memory）、RELATES_TO
（Memory→Memory关联）、DERIVED_FROM
（Memory→Session），可选边**

PARENT_OF

**（Category→Category）。

高效且安全的字段：

每个租户拥有的节点和边上都带有
```
tenant_id
```
，在用作锚点的位置创建等值索引
用户/容器专属内存、源文档和片段上带有
```
userId
```
或等效范围键；只有明确共享的记录才应设为租户级全局可见

稳定ID，如

memoryId

、

documentId

、

chunkId

、

categoryKey

、

entityKey

、

sessionId

和

profileId

Memory.isLatest

、

validFrom

、

validTo

、

expiresAt

和

deletedAt

，用于记录生命周期过滤

当内存涉及日期事件或事实时，可选的现实时间字段，如
```
observedAt
```
、
```
eventStartAt
```
、
```
eventEndAt
```
、
```
temporalText
```
和
```
timezone
```
按租户分区的向量/文本索引，基于
```
Memory.embedding
```
/
```
Memory.content
```
，可选基于
```
Chunk.embedding
```
/
```
Chunk.content
```
，所有索引均按
```
tenant_id
```
分区

完整规范、类型和索引初始化内容见

REFERENCE.md

。

Modality Decision Rules

模态决策规则

Pick the mechanism by the question you are answering, and combine them deliberately:

Need	Use	Why
Tenant isolation ( `tenant_id` ), exact identity, lifecycle flags ( `deletedAt` , `expiresAt` , `validTo` , `isLatest` ), ordering/filtering ( `createdAt` , `salience` )	Properties + equality/range index	Narrow anchors and safe filters. Tenant scope is non-negotiable.
Categorisation, entities, provenance, profile ownership, updates/extensions/derivations, association clusters, taxonomy	Graph edges	These are relationships; traverse and aggregate over them.
Deduplication, paraphrase recall, memories like this, chunks like this	Vector search	Semantic similarity; tolerant of rewording.
Exact names, ids, rare tokens, commands, file paths, product terms	BM25 text search	Embeddings blur exact tokens; BM25 preserves them.
Broad user context the model should always know	UserProfile node + summariser worker	Avoid multiple searches for stable identity/preferences/recent focus.
Raw documents and citations	SourceDocument + Chunk nodes	Memory facts are not a replacement for source-grounded RAG.

Rule of thumb: never collapse a memory system to vector-only. Vectors miss exact names and have no notion of ownership, recency, contradiction, provenance, profile state, or category.

Always scope vector/BM25 searches with

tenantValue = tenant_id

. Tenant scope is necessary but not always sufficient: default user-memory recall must also filter by

userId

or the app's equivalent container/ACL unless the record is explicitly shared tenant-wide. Every recall path must filter out forgotten/stale records:

deletedAt IsNull

isLatest = true

validTo IsNull

, and

expiresAt

absent or in the future. If a route cannot express one of those filters inside Helix, over-fetch and apply the remaining policy in application code before returning context.

根据你要解决的问题选择对应机制，并有意地组合使用：

需求	使用方式	原因
租户隔离（ `tenant_id` ）、精确身份识别、生命周期标记（ `deletedAt` 、 `expiresAt` 、 `validTo` 、 `isLatest` ）、排序/过滤（ `createdAt` 、 `salience` ）	属性 + 等值/范围索引	精准锚定和安全过滤。租户范围是不可协商的要求。
分类、实体、来源、画像归属、更新/补充/推导、关联集群、分类体系	图边	这些是关系型数据；可通过遍历和聚合操作处理。
去重、复述回忆、相似内存、相似片段	向量检索	语义相似度；对改写内容具有容错性。
精确名称、ID、稀有令牌、命令、文件路径、产品术语	BM25文本检索	嵌入向量会模糊精确令牌；BM25可保留这些信息。
模型应始终知晓的广泛用户上下文	UserProfile节点 + 总结处理程序	避免为获取稳定身份/偏好/近期关注点而执行多次检索。
原始文档和引用	SourceDocument + Chunk节点	内存事实无法替代基于源文件的RAG。

经验法则：永远不要将内存系统简化为仅依赖向量检索。向量检索会遗漏精确名称，且无法处理归属、时效性、矛盾、来源、画像状态或分类等概念。

始终使用

tenantValue = tenant_id

限定向量/BM25检索范围。租户范围是必要条件，但并非总是充分条件：默认的用户内存回忆还必须通过

userId

或应用等效的容器/ACL进行过滤，除非记录明确设为租户级全局共享。每个回忆路径必须过滤掉已遗忘/过期的记录：

deletedAt IsNull

、

isLatest = true

、

validTo IsNull

，且

expiresAt

不存在或在未来。如果路由无法在Helix内部表达其中某些过滤条件，则先超额获取数据，再在应用代码中应用剩余策略后返回上下文。

Product Layers

产品层

Helix gives you graph + search primitives. A full intelligent memory system also needs:

Layer	Responsibility
Ingestion API	Accept text, chats, files, URLs, connector events, and direct memory writes.
Extractors	Convert PDFs, docs, HTML, images/OCR, audio/video transcripts, code, and structured data into text.
Chunkers	Split raw context by semantic sections, message turns, document headings, code AST boundaries, or transcript segments.
Embedding worker	Generate `text-embedding-3-small` 1536-dim `F32` embeddings for memories and chunks before writing to Helix, unless the app has explicitly standardised on another model.
Memory generator	Extract atomic, entity-centric candidate facts from conversations/documents using the current turn plus recent context, active entities, recalled memories, and current date.
Relationship classifier	Decide whether each candidate `UPDATES` , `EXTENDS` , `DERIVES` , duplicates, or stands alone.
Profile summariser	Maintain `UserProfile.staticSummary` and `dynamicSummary` from latest memories.
Forgetting jobs	Run expiry, decay, stale-profile, and connector deletion sweeps.
Retrieval service	Rewrite queries, run vector + BM25 over memories/chunks, fuse, rerank, graph-expand, and pack context with citations.
Evaluation	Measure recall quality, stale-memory suppression, tenant isolation, latency, and token efficiency.

Do not imply Helix automatically does extraction, chunking, embedding, relationship classification, profile generation, connector sync, reranking, or TTL. Those are application responsibilities unless the user has a managed service that provides them.

Helix提供图+检索原语。一个完整的智能内存系统还需要以下层：

层	职责
导入API	接收文本、聊天记录、文件、URL、连接器事件和直接内存写入请求。
提取器	将PDF、文档、HTML、图像/OCR、音频/视频转录文本、代码和结构化数据转换为文本格式。
分块器	按语义段落、消息轮次、文档标题、代码AST边界或转录片段拆分原始上下文。
嵌入处理程序	在写入Helix之前，为内存和片段生成 `text-embedding-3-small` 1536维 `F32` 嵌入向量，除非应用程序已明确采用其他标准模型。
内存生成器	使用当前对话轮次+近期上下文、活跃实体、召回的内存和当前日期，从对话/文档中提取原子化、以实体为中心的候选事实。
关系分类器	判断每个候选事实是 `UPDATES` 、 `EXTENDS` 、 `DERIVES` 、重复记录还是独立新记录。
画像总结器	根据最新内存维护 `UserProfile.staticSummary` 和 `dynamicSummary` 。
遗忘任务	执行过期、衰减、过期画像和连接器删除清理操作。
检索服务	重写查询，对内存/片段执行向量+BM25检索，融合结果、重排序、图扩展，并附带引用信息打包上下文。
评估	衡量回忆质量、过期内存抑制效果、租户隔离性、延迟和令牌效率。

请勿暗示Helix会自动执行提取、分块、嵌入、关系分类、画像生成、连接器同步、重排序或TTL任务。这些都是应用层的职责，除非用户使用的托管服务提供了这些功能。

The Memory Lifecycle

内存生命周期

Each step links to complete examples in

EXAMPLES.md

(TypeScript) and

EXAMPLES.rust.md

(Rust).

每个步骤都链接到

EXAMPLES.md

（TypeScript）和

EXAMPLES.rust.md

（Rust）中的完整示例。

1. Ingestion & Generation

1. 导入与生成

Accept raw context as a
```
SourceDocument
```
, conversation/session, direct memory write, or connector update.
Extract and chunk app-side when the input is not already an atomic memory.
Embed each candidate memory/chunk app-side with OpenAI
```
text-embedding-3-small
```
by default. Store/pass a 1536-length
```
F32
```
vector.
Extract atomic, self-contained candidate memories. Prefer entity-centric facts: "Alex prefers morning meetings" rather than "prefers morning meetings".
Classify candidate kind:
```
fact
```
,
```
preference
```
,
```
episode
```
,
```
procedure
```
, or app-specific equivalents.
Deduplicate before writing. A similarity threshold cannot be a batch condition, so use read-then-write for semantic dedup and idempotent upsert for exact repeats.
Write
```
Memory
```
with
```
tenant_id
```
,
```
memoryId
```
,
```
content
```
,
```
embedding
```
,
```
kind
```
,
```
salience
```
,
```
isLatest: true
```
, and lifecycle timestamps; link ownership and provenance edges.
Categorise and entity-link immediately.

接收原始上下文，格式可为
```
SourceDocument
```
、对话/会话、直接内存写入或连接器更新。
当输入不是原子化内存时，在应用层进行提取和分块。
默认情况下，在应用层使用OpenAI
```
text-embedding-3-small
```
为每个候选内存/片段生成嵌入向量。存储/传递1536长度的
```
F32
```
向量。
提取原子化、自包含的候选内存。优先选择以实体为中心的事实：例如“Alex偏好上午会议”而非“偏好上午会议”。
分类候选内存类型：
```
fact
```
、
```
preference
```
、
```
episode
```
、
```
procedure
```
或应用特定的等效类型。
写入前去重。相似度阈值无法作为批量条件，因此语义去重使用“先读取再写入”方式，精确重复记录使用幂等更新。
写入
```
Memory
```
，包含
```
tenant_id
```
、
```
memoryId
```
、
```
content
```
、
```
embedding
```
、
```
kind
```
、
```
salience
```
、
```
isLatest: true
```
和生命周期时间戳；关联归属和来源边。
立即进行分类和实体链接。

Contextual Extraction Rules

上下文提取规则

Do not extract from the latest user message in isolation. The extraction worker should receive:

the current user message
the previous assistant message, because it often defines what a short answer means
a bounded recent conversation window
recalled active memories and active entities
the current date/time for relative time phrases
the memory scope (
```
tenant_id
```
plus
```
userId
```
,
```
containerId
```
,
```
projectId
```
, or ACL context)

Resolve pronouns, ellipsis, and short follow-up answers before deciding whether to store a memory. If the assistant asks a memory-bearing follow-up question and the user answers briefly, convert the answer into a self-contained memory.

Extractor output should be structured enough for deterministic writes:

shouldStore

, self-contained

content

kind

confidence

salience

entities

source

pointers,

scope

, optional temporal fields, and a relationship decision (

new

duplicate

EXTENDS

UPDATES

, or

DERIVES

). Do not let a single vector-distance threshold decide updates; retrieve candidates with vector + BM25 and adjudicate exact duplicate vs update vs extension in application code.

Example:

text

Existing memory: User is planning a trip to Japan with Maya.
Assistant: When are you going?
User: next April
Extract: User is planning a trip to Japan with Maya next April.
Relationship: EXTENDS the existing Japan trip memory; MENTIONS Maya and Japan.

Assistant: What do you want to do there?
User: mostly food, temples, and trains
Extract: User wants their Japan trip with Maya to focus on food, temples, and trains.
Relationship: EXTENDS the existing Japan trip memory; categorise as travel/preferences.

User later: actually we're going in May instead
Extract: User is planning a trip to Japan with Maya in May.
Relationship: UPDATES the previous next-April timing memory and invalidates the older version.

不要仅从最新用户消息中提取信息。提取处理程序应接收以下内容：

当前用户消息
上一条助手消息，因为它通常定义了简短回答的含义
有限的近期对话窗口
召回的活跃内存和活跃实体
当前日期/时间，用于解析相对时间短语
内存范围（
```
tenant_id
```
加上
```
userId
```
、
```
containerId
```
、
```
projectId
```
或ACL上下文）

在决定是否存储内存之前，解析代词、省略语和简短跟进回答。如果助手提出一个涉及记忆的跟进问题，用户给出简短回答，则将该回答转换为自包含的内存。

提取器的输出应足够结构化，以支持确定性写入：

shouldStore

、自包含的

content

、

kind

、

confidence

、

salience

、

entities

、

source

指针、

scope

、可选时间字段，以及关系决策（

new

、

duplicate

、

EXTENDS

、

UPDATES

或

DERIVES

）。不要仅通过向量距离阈值决定更新；使用向量+BM25检索候选记录，并在应用层代码中判断是精确重复、更新还是补充。

示例：

text

现有内存：用户正计划与Maya一起去日本旅行。
助手：你们什么时候去？
用户：明年四月
提取结果：用户正计划与Maya一起在明年四月去日本旅行。
关系：EXTENDS现有的日本旅行内存；MENTIONS Maya和Japan。

助手：你们想在那里做什么？
用户：主要是美食、寺庙和火车
提取结果：用户希望与Maya的日本旅行重点放在美食、寺庙和火车上。
关系：EXTENDS现有的日本旅行内存；分类为travel/preferences。

用户稍后：实际上我们改在五月去了
提取结果：用户正计划与Maya一起在五月去日本旅行。
关系：UPDATES之前的四月出行时间内存，并使旧版本失效。

2. Updating & Versioning

2. 更新与版本控制

Reinforce on access: bump
```
accessCount
```
,
```
lastAccessedAt
```
, and bounded
```
salience
```
.
Update/correct: create a new memory, link
```
new -UPDATES-> old
```
, set old
```
isLatest = false
```
and
```
validTo
```
, and optionally set
```
deletedAt
```
if it should disappear from normal recall.
Extend: link
```
new -EXTENDS-> existing
```
when the new fact enriches but does not replace the old fact.
Derive: link inferred facts with
```
DERIVES
```
edges to supporting memories and mark them as inferred with confidence metadata.
If
```
content
```
changes, re-embed and update
```
embedding
```
in the same write. Content and vector must never drift.
Keep lifecycle validity (
```
validFrom
```
,
```
validTo
```
,
```
deletedAt
```
) separate from real-world event time (
```
eventStartAt
```
,
```
eventEndAt
```
,
```
temporalText
```
). Updating a memory because a fact changed should invalidate the old record even if both facts refer to future or past dates.

访问时强化：增加
```
accessCount
```
、更新
```
lastAccessedAt
```
和有限范围的
```
salience
```
。
更新/修正：创建新内存，建立
```
new -UPDATES-> old
```
链接，将旧内存的
```
isLatest
```
设为
```
false
```
并设置
```
validTo
```
，如果需要从正常回忆中隐藏，可选择性设置
```
deletedAt
```
。
补充：当新事实丰富但不替换旧事实时，建立
```
new -EXTENDS-> existing
```
链接。
推导：使用
```
DERIVES
```
边将推断事实与支撑内存关联，并标记为推断事实，附带置信度元数据。
如果
```
content
```
发生变化，在同一次写入操作中重新生成嵌入向量并更新
```
embedding
```
。内容与向量必须始终保持一致。
将生命周期有效性（
```
validFrom
```
、
```
validTo
```
、
```
deletedAt
```
）与现实事件时间（
```
eventStartAt
```
、
```
eventEndAt
```
）分开。当事实发生变化而更新内存时，即使两个事实都涉及未来或过去日期，也应使旧记录失效。

3. Deletion / Forgetting

3. 删除 / 遗忘

Helix has no native TTL or decay. Forgetting is explicit write queries the app runs.

Soft-delete (preferred): set
```
deletedAt = Expr.datetime()
```
and filter it from reads. Reversible and audit-friendly.
Version invalidation: set
```
isLatest = false
```
and
```
validTo = Expr.datetime()
```
when a memory is superseded.
Expiry sweep: hide or hard-delete memories where
```
expiresAt < now
```
.
Decay sweep: hide weak, stale, rarely accessed episodic memories.
Hard delete: use
```
drop()
```
only when policy requires physical deletion.
```
drop()
```
removes the node and incident edges; use
```
dropEdgeById
```
for surgical edge cleanup on multigraph-sensitive paths.

Helix没有原生TTL或衰减机制。遗忘是应用程序执行的显式写入查询。

软删除（推荐）：设置
```
deletedAt = Expr.datetime()
```
，并在读取时过滤掉该记录。可恢复且便于审计。
版本失效：当内存被取代时，设置
```
isLatest = false
```
和
```
validTo = Expr.datetime()
```
。
过期清理：隐藏或硬删除
```
expiresAt < now
```
的内存。
衰减清理：隐藏弱关联、过期、很少访问的场景化内存。
硬删除：仅当策略要求物理删除时使用
```
drop()
```
。
```
drop()
```
会删除节点及其关联边；在对多图敏感的路径上，使用
```
dropEdgeById
```
进行精准的边清理。

4. Categorisation & Entity Linking

4. 分类与实体链接

Store display categories as
```
Category
```
nodes scoped by
```
tenant_id
```
and a unique
```
categoryKey
```
such as
```
${tenant_id}:${normalisedName}
```
.

Store entities as

Entity

nodes scoped by

tenant_id

and a unique

entityKey

such as

${tenant_id}:${normalisedName}

Prefer edges over arrays when you will traverse, aggregate, or recall by the tag/entity.
Use nested object metadata for display/audit fields that do not need graph expansion. Keep frequently filtered fields top-level, and prefer edges when you will traverse, aggregate, or recall by the tag/entity.

将显示类别存储为
```
Category
```
节点，按
```
tenant_id
```
和唯一的
```
categoryKey
```
（如
```
${tenant_id}:${normalisedName}
```
）限定范围。
将实体存储为
```
Entity
```
节点，按
```
tenant_id
```
和唯一的
```
entityKey
```
（如
```
${tenant_id}:${normalisedName}
```
）限定范围。
当需要遍历、聚合或按标签/实体召回时，优先使用边而非数组。
对于不需要图扩展的显示/审计字段，使用嵌套对象元数据。将频繁过滤的字段放在顶层，当需要遍历、聚合或按标签/实体召回时，优先使用边。

5. Profile Maintenance

5. 画像维护

Maintain one

UserProfile

per user/container with

profileId

tenant_id

userId

staticSummary

dynamicSummary

, and

updatedAt

Static profile: identity, stable preferences, long-lived background.
Dynamic profile: current projects, recent context, temporary goals, unresolved tasks.
Update profiles asynchronously after memory writes and deletions; keep profile generation deterministic enough to test.

为每个用户/容器维护一个

UserProfile

，包含

profileId

、

tenant_id

、

userId

、

staticSummary

、

dynamicSummary

和

updatedAt

。

静态画像：身份信息、稳定偏好、长期背景。
动态画像：当前项目、近期上下文、临时目标、未解决任务。
在内存写入和删除后异步更新画像；确保画像生成具有足够的确定性以便测试。

Retrieval

检索

Run multiple recall paths and fuse app-side:

Fetch the
```
UserProfile
```
for always-on context.
Run vector and BM25 over current
```
Memory
```
nodes, tenant-scoped, user/container-scoped, and freshness-filtered.
Optionally run vector and BM25 over
```
Chunk
```
nodes for source-grounded RAG and citations, with the same owner/scope policy unless documents are intentionally shared.
Fuse app-side with RRF, then re-rank by salience, recency, relationship type, and optional cross-encoder score.
Expand top memories through
```
MENTIONS
```
,
```
IN_CATEGORY
```
,
```
EXTENDS
```
,
```
UPDATES
```
, and
```
RELATES_TO
```
, bounded by depth and tenant filters.
Pack context without embeddings and include source/citation metadata when available.

运行多条回忆路径并在应用层融合结果：

获取
```
UserProfile
```
作为始终可用的上下文。
对当前
```
Memory
```
节点执行向量和BM25检索，限定租户范围、用户/容器范围，并进行新鲜度过滤。
可选地对
```
Chunk
```
节点执行向量和BM25检索，用于基于源文件的RAG和引用，除非文档明确共享，否则使用相同的所有者/范围策略。
在应用层使用RRF融合结果，然后按重要性、时效性、关系类型和可选的交叉编码器分数进行重排序。
通过
```
MENTIONS
```
、
```
IN_CATEGORY
```
、
```
EXTENDS
```
、
```
UPDATES
```
和
```
RELATES_TO
```
扩展顶级内存，按深度和租户过滤进行限制。
打包上下文时不包含嵌入向量，若有可用的源/引用元数据则一并包含。

Anti-Patterns

反模式

Do not:

use the deprecated
```
.hx
```
dialect (
```
Embed()
```
,
```
SearchV
```
,
```
SearchBM25
```
,
```
AddV
```
) for new dynamic/TS/Rust DSL work
use
```
userId
```
as the text-index tenant property; use
```
tenant_id
```
for tenant-partitioned text/vector indexes
assume
```
tenant_id
```
alone is a safe recall boundary for org/team tenants; filter by
```
userId
```
,
```
containerId
```
, project ACLs, or an explicit shared-memory flag
attach
```
tenant_id
```
only to
```
Memory
```
; every tenant-owned node and edge needs it
mutate, delete, categorise, or reinforce by
```
memoryId
```
without also checking
```
tenant_id
```
return superseded/forgotten/expired memories because recall only checked
```
deletedAt
```
mix lifecycle timestamps (
```
validTo
```
,
```
deletedAt
```
) with real-world event dates; use separate temporal fields for memories about trips, deadlines, appointments, or historical facts
build a vector-only store and call it memory
use a toy hash embedding for production recall or benchmark claims; default to
```
text-embedding-3-small
```
unless the app has a better standard model
decide dedup/update/extension by vector threshold alone; use exact checks, BM25 candidates, vector candidates, and app/LLM adjudication
extract memories from only the latest user message and miss contextual follow-ups such as "next April" or "mostly food, temples, and trains"
drop short follow-up answers because they are not self-contained before context resolution
write user-specific chunks/documents without an owner or scope field, then recall them tenant-wide
expect Helix to extract files, chunk documents, generate embeddings, classify updates, build profiles, rerank, sync connectors, or run TTL jobs automatically
read
```
$distance
```
after an
```
out
```
/
```
in
```
/
```
both
```
step; project it immediately after search
try to express a similarity-threshold dedup as a
```
BatchCondition
```
; it can only test variable emptiness/size
update
```
content
```
without re-embedding
return
```
embedding
```
arrays in API responses unless explicitly required
make
```
Category
```
or
```
Entity
```
global by display name in a multi-tenant memory app

请勿：

在新的动态/TS/Rust DSL工作中使用已弃用的
```
.hx
```
方言（
```
Embed()
```
、
```
SearchV
```
、
```
SearchBM25
```
、
```
AddV
```
）
使用
```
userId
```
作为文本索引的租户属性；对按租户分区的文本/向量索引使用
```
tenant_id
```
假设
```
tenant_id
```
本身是组织/团队租户的安全回忆边界；使用
```
userId
```
、
```
containerId
```
、项目ACL或显式共享内存标记进行过滤
仅在
```
Memory
```
上附加
```
tenant_id
```
；每个租户拥有的节点和边都需要该属性
在不检查
```
tenant_id
```
的情况下，按
```
memoryId
```
进行修改、删除、分类或强化操作
因回忆仅检查
```
deletedAt
```
而返回已被取代/遗忘/过期的内存
将生命周期时间戳（
```
validTo
```
、
```
deletedAt
```
）与现实事件日期混用；对于涉及旅行、截止日期、预约或历史事实的内存，使用单独的时间字段
构建仅依赖向量的存储并称之为内存系统
在生产环境回忆或基准测试声明中使用玩具哈希嵌入；默认使用
```
text-embedding-3-small
```
，除非应用程序有更好的标准模型
仅通过向量阈值决定去重/更新/补充；使用精确检查、BM25候选记录、向量候选记录以及应用/LLM判断
仅从最新用户消息中提取内存，而遗漏“明年四月”或“主要是美食、寺庙和火车”等上下文跟进内容
在解析上下文之前，因简短跟进回答不自包含而丢弃它们
写入用户专属片段/文档时不添加所有者或范围字段，然后在租户级全局召回这些内容
期望Helix自动执行文件提取、文档分块、嵌入生成、更新分类、画像构建、重排序、连接器同步或TTL任务
在
```
out
```
/
```
in
```
/
```
both
```
步骤后读取
```
$distance
```
；在检索后立即投影该值
尝试将相似度阈值去重表示为
```
BatchCondition
```
；它只能测试变量是否为空/大小
更新
```
content
```
而不重新生成嵌入向量
在API响应中返回
```
embedding
```
数组，除非明确要求
在多租户内存应用中，按显示名称将
```
Category
```
或
```
Entity
```
设为全局可见

Validation Checklist

验证清单

Before finishing:

```
readBatch()
```
vs
```
writeBatch()
```
is correct
every tenant-owned node and edge has
```
tenant_id
```

vector/text indexes use

tenant_property = "tenant_id"

, and searches pass

tenantValue = tenant_id

every memory read filters
```
tenant_id
```
, user/container visibility,
```
deletedAt IsNull
```
, current/latest state, and expiry validity
every write route accepts and filters by
```
tenant_id
```
IDs used for upsert are either globally unique or tenant-qualified (
```
categoryKey
```
,
```
entityKey
```
, etc.)
user/container-specific documents and chunks carry the same owner/scope fields used by recall, or are explicitly marked/shared through app policy
lifecycle validity fields are not overloaded as event-time fields; dated facts use
```
observedAt
```
,
```
eventStartAt
```
,
```
eventEndAt
```
,
```
temporalText
```
, or app equivalents
embedding model is
```
openai:text-embedding-3-small
```
and every vector is 1536-dim
```
F32
```
, unless the app explicitly standardises on another fixed model/dimension
content edits re-embed in the same write
generation deduplicates semantically and exact repeats are idempotent
extraction sees the previous assistant turn, recent conversation window, recalled active memories/entities, and current date before deciding what to store
extraction emits a structured relationship/scope/source/temporal decision that can be tested deterministically
source documents/chunks exist if the feature promises citations or RAG over raw context
user profile update jobs exist if the feature promises always-on personalization
evaluation covers tenant isolation, user/container isolation, stale-memory suppression, contextual follow-up extraction, exact-token recall, temporal corrections, deletion, profile rebuilds, latency, and token budget
timestamps use one consistent convention; this skill uses typed DateTime via
```
Expr.datetime()
```
and
```
param.dateTime()
```
no projected output includes
```
embedding
```
unless explicitly required
labels/edges/properties match existing repo casing

完成前请检查：

```
readBatch()
```
与
```
writeBatch()
```
的使用是否正确
每个租户拥有的节点和边都带有
```
tenant_id
```

向量/文本索引使用

tenant_property = "tenant_id"

，且检索时传递

tenantValue = tenant_id

每次内存读取都过滤
```
tenant_id
```
、用户/容器可见性、
```
deletedAt IsNull
```
、当前/最新状态和过期有效性
每个写入路由都接收并过滤
```
tenant_id
```
用于更新的ID要么是全局唯一的，要么是租户限定的（如
```
categoryKey
```
、
```
entityKey
```
等）
用户/容器专属文档和片段带有与回忆使用的相同所有者/范围字段，或通过应用策略明确标记为共享
生命周期有效性字段未被用作事件时间字段；带日期的事实使用
```
observedAt
```
、
```
eventStartAt
```
、
```
eventEndAt
```
、
```
temporalText
```
或应用等效字段
嵌入模型为
```
openai:text-embedding-3-small
```
，且每个向量都是1536维
```
F32
```
，除非应用程序明确采用其他固定模型/维度
内容编辑时在同一次写入中重新生成嵌入向量
生成过程中进行语义去重，精确重复记录采用幂等更新
提取处理程序在决定存储内容之前，会查看上一条助手消息、近期对话窗口、召回的活跃内存/实体和当前日期
提取器输出结构化的关系/范围/来源/时间决策，可进行确定性测试
如果功能承诺提供引用或基于原始上下文的RAG，则存在源文档/片段
如果功能承诺提供始终可用的个性化服务，则存在用户画像更新任务
评估涵盖租户隔离、用户/容器隔离、过期内存抑制、上下文跟进提取、精确令牌回忆、时间修正、删除、画像重建、延迟和令牌预算
时间戳使用一致的约定；本技能通过
```
Expr.datetime()
```
和
```
param.dateTime()
```
使用类型化DateTime
除非明确要求，否则投影输出不包含
```
embedding
```
标签/边/属性与现有仓库的大小写格式匹配

Reference Files

参考文件

```
REFERENCE.md
```
— full data-model spec, tenant rules, indexes, modality cheat-sheet, embedding guidance, fusion/re-ranking formula, and TypeScript ↔ Rust API mapping.
```
EXAMPLES.md
```
— lifecycle scenarios as
```
@helix-db/helix-db
```
TypeScript snippets. Default.
```
EXAMPLES.rust.md
```
— the same scenarios in the Rust DSL.

Adjacent skills:

helix-query-typescript

helix-query-rust

helix-query-json-dynamic

helix-query-optimize

```
REFERENCE.md
```
— 完整的数据模型规范、租户规则、索引、模态速查表、嵌入指南、融合/重排序公式以及TypeScript ↔ Rust API映射。
```
EXAMPLES.md
```
— 生命周期场景的
```
@helix-db/helix-db
```
TypeScript代码片段。默认参考。
```
EXAMPLES.rust.md
```
— 相同场景的Rust DSL代码片段。