Helix Memory System
Build a durable, per-tenant agent memory platform on Helix that combines graph relationships, vector similarity, and BM25 full-text in one database. This skill covers the whole memory lifecycle: raw context ingestion, extraction, memory generation, deduplication, updating/versioning, deletion/forgetting, categorisation, profile maintenance, and hybrid retrieval.
Helix is the storage and retrieval engine. A complete memory product also needs application workers for extraction, chunking, embeddings, relationship classification, reranking, connector sync, and profile summarisation.
When To Use
Use this skill when the task is to:
- design the data model for agent memory, long-term memory, user profiles, document/chunk RAG, or a "remember what the user told me" feature
- write queries that create, deduplicate, reinforce, consolidate, version, correct, expire, forget, categorise, or retrieve memories
- decide which Helix capability (property index, graph edge, vector index, BM25 text index) a given memory operation should use
- build hybrid recall that fuses semantic + keyword + graph + profile context
- implement advanced memory components such as source documents, chunks, connectors, extracted facts, evolving profiles, relationship-aware recall, and forgetting
Do
not use this skill for generic query syntax questions. For builder/method details defer to
(the default DSL),
, or
. This skill assumes those and focuses on the memory architecture on top of Helix.
First Steps
- Inspect the target repo for existing labels, edges, properties, indexes, and route style. Reuse exact casing if present.
- Default to the TypeScript DSL () so the app can keep query generation near service code. Use only if the runtime is Rust or the team explicitly ships Rust queries.
- Decide the tenancy boundary before modeling anything. The canonical tenant property is because tenant-partitioned Helix text indexes currently require that name. Attach to every tenant-owned node and edge.
- Decide the memory visibility boundary separately from tenancy. In most apps, partitions indexes while , , , or an app ACL decides which memories can be recalled. Default examples use as the second-level scope.
- Reuse the canonical model below before inventing labels. Adapt names, not the shape.
- Confirm how embeddings are produced. Default to OpenAI for production and benchmarkable memory systems: dimensions, stored as arrays. The application computes embeddings client-side and passes numeric arrays (). Helix does not embed text on the dynamic-query path; there is no / in the current DSL. Keep embedding model and dimension fixed for each vector index. Deterministic hash embeddings are acceptable only for local UI demos or smoke tests, not for quality benchmarks.
- Identify the application workers outside Helix: extractor, chunker, embedder, memory writer, relationship classifier, decay/expiry sweeper, profile summariser, optional query rewriter, optional reranker, and connector sync jobs.
The Memory Model At A Glance
Core labels:
,
,
,
,
,
,
,
,
, optional
and
.
Core edges:
(Tenant/User→Memory),
(User→UserProfile),
(SourceDocument→Chunk),
(Memory→Chunk or SourceDocument),
(Memory→Category),
(Memory→Entity),
(new Memory→old Memory),
(Memory→Memory enrichment),
(inferred Memory→supporting Memory),
(Memory→Memory association),
(Memory→Session), optional
(Category→Category).
Fast and safe fields:
- on every tenant-owned node and edge, with equality indexes where used as an anchor
- or an equivalent scope key on user/container-specific memories, source documents, and chunks; only intentionally shared records should be tenant-wide
- stable IDs such as , , , , , , and
- , , , , and for record lifecycle filtering
- optional real-world temporal fields such as , , , , and when the memory is about a dated event or fact
- tenant-partitioned vector/text indexes on / and optionally /, all partitioned by
Full spec, types, and index bootstrap are in
.
Modality Decision Rules
Pick the mechanism by the question you are answering, and combine them deliberately:
| Need | Use | Why |
|---|
| Tenant isolation (), exact identity, lifecycle flags (, , , ), ordering/filtering (, ) | Properties + equality/range index | Narrow anchors and safe filters. Tenant scope is non-negotiable. |
| Categorisation, entities, provenance, profile ownership, updates/extensions/derivations, association clusters, taxonomy | Graph edges | These are relationships; traverse and aggregate over them. |
| Deduplication, paraphrase recall, memories like this, chunks like this | Vector search | Semantic similarity; tolerant of rewording. |
| Exact names, ids, rare tokens, commands, file paths, product terms | BM25 text search | Embeddings blur exact tokens; BM25 preserves them. |
| Broad user context the model should always know | UserProfile node + summariser worker | Avoid multiple searches for stable identity/preferences/recent focus. |
| Raw documents and citations | SourceDocument + Chunk nodes | Memory facts are not a replacement for source-grounded RAG. |
Rule of thumb: never collapse a memory system to vector-only. Vectors miss exact names and have no notion of ownership, recency, contradiction, provenance, profile state, or category.
Always scope vector/BM25 searches with
. Tenant scope is necessary but not always sufficient: default user-memory recall must also filter by
or the app's equivalent container/ACL unless the record is explicitly shared tenant-wide. Every recall path must filter out forgotten/stale records:
,
,
, and
absent or in the future. If a route cannot express one of those filters inside Helix, over-fetch and apply the remaining policy in application code before returning context.
Product Layers
Helix gives you graph + search primitives. A full intelligent memory system also needs:
| Layer | Responsibility |
|---|
| Ingestion API | Accept text, chats, files, URLs, connector events, and direct memory writes. |
| Extractors | Convert PDFs, docs, HTML, images/OCR, audio/video transcripts, code, and structured data into text. |
| Chunkers | Split raw context by semantic sections, message turns, document headings, code AST boundaries, or transcript segments. |
| Embedding worker | Generate 1536-dim embeddings for memories and chunks before writing to Helix, unless the app has explicitly standardised on another model. |
| Memory generator | Extract atomic, entity-centric candidate facts from conversations/documents using the current turn plus recent context, active entities, recalled memories, and current date. |
| Relationship classifier | Decide whether each candidate , , , duplicates, or stands alone. |
| Profile summariser | Maintain UserProfile.staticSummary
and from latest memories. |
| Forgetting jobs | Run expiry, decay, stale-profile, and connector deletion sweeps. |
| Retrieval service | Rewrite queries, run vector + BM25 over memories/chunks, fuse, rerank, graph-expand, and pack context with citations. |
| Evaluation | Measure recall quality, stale-memory suppression, tenant isolation, latency, and token efficiency. |
Do not imply Helix automatically does extraction, chunking, embedding, relationship classification, profile generation, connector sync, reranking, or TTL. Those are application responsibilities unless the user has a managed service that provides them.
The Memory Lifecycle
Each step links to complete examples in
(TypeScript) and
(Rust).
1. Ingestion & Generation
- Accept raw context as a , conversation/session, direct memory write, or connector update.
- Extract and chunk app-side when the input is not already an atomic memory.
- Embed each candidate memory/chunk app-side with OpenAI by default. Store/pass a 1536-length vector.
- Extract atomic, self-contained candidate memories. Prefer entity-centric facts: "Alex prefers morning meetings" rather than "prefers morning meetings".
- Classify candidate kind: , , , , or app-specific equivalents.
- Deduplicate before writing. A similarity threshold cannot be a batch condition, so use read-then-write for semantic dedup and idempotent upsert for exact repeats.
- Write with , , , , , , , and lifecycle timestamps; link ownership and provenance edges.
- Categorise and entity-link immediately.
Contextual Extraction Rules
Do not extract from the latest user message in isolation. The extraction worker should receive:
- the current user message
- the previous assistant message, because it often defines what a short answer means
- a bounded recent conversation window
- recalled active memories and active entities
- the current date/time for relative time phrases
- the memory scope ( plus , , , or ACL context)
Resolve pronouns, ellipsis, and short follow-up answers before deciding whether to store a memory. If the assistant asks a memory-bearing follow-up question and the user answers briefly, convert the answer into a self-contained memory.
Extractor output should be structured enough for deterministic writes:
, self-contained
,
,
,
,
,
pointers,
, optional temporal fields, and a relationship decision (
,
,
,
, or
). Do not let a single vector-distance threshold decide updates; retrieve candidates with vector + BM25 and adjudicate exact duplicate vs update vs extension in application code.
Example:
text
Existing memory: User is planning a trip to Japan with Maya.
Assistant: When are you going?
User: next April
Extract: User is planning a trip to Japan with Maya next April.
Relationship: EXTENDS the existing Japan trip memory; MENTIONS Maya and Japan.
Assistant: What do you want to do there?
User: mostly food, temples, and trains
Extract: User wants their Japan trip with Maya to focus on food, temples, and trains.
Relationship: EXTENDS the existing Japan trip memory; categorise as travel/preferences.
User later: actually we're going in May instead
Extract: User is planning a trip to Japan with Maya in May.
Relationship: UPDATES the previous next-April timing memory and invalidates the older version.
2. Updating & Versioning
- Reinforce on access: bump , , and bounded .
- Update/correct: create a new memory, link , set old and , and optionally set if it should disappear from normal recall.
- Extend: link when the new fact enriches but does not replace the old fact.
- Derive: link inferred facts with edges to supporting memories and mark them as inferred with confidence metadata.
- If changes, re-embed and update in the same write. Content and vector must never drift.
- Keep lifecycle validity (, , ) separate from real-world event time (, , ). Updating a memory because a fact changed should invalidate the old record even if both facts refer to future or past dates.
3. Deletion / Forgetting
Helix has no native TTL or decay. Forgetting is explicit write queries the app runs.
- Soft-delete (preferred): set
deletedAt = Expr.datetime()
and filter it from reads. Reversible and audit-friendly.
- Version invalidation: set and
validTo = Expr.datetime()
when a memory is superseded.
- Expiry sweep: hide or hard-delete memories where .
- Decay sweep: hide weak, stale, rarely accessed episodic memories.
- Hard delete: use only when policy requires physical deletion. removes the node and incident edges; use for surgical edge cleanup on multigraph-sensitive paths.
4. Categorisation & Entity Linking
- Store display categories as nodes scoped by and a unique such as
${tenant_id}:${normalisedName}
.
- Store entities as nodes scoped by and a unique such as
${tenant_id}:${normalisedName}
.
- Prefer edges over arrays when you will traverse, aggregate, or recall by the tag/entity.
- Use nested object metadata for display/audit fields that do not need graph expansion. Keep frequently filtered fields top-level, and prefer edges when you will traverse, aggregate, or recall by the tag/entity.
5. Profile Maintenance
- Maintain one per user/container with , , , , , and .
- Static profile: identity, stable preferences, long-lived background.
- Dynamic profile: current projects, recent context, temporary goals, unresolved tasks.
- Update profiles asynchronously after memory writes and deletions; keep profile generation deterministic enough to test.
Retrieval
Run multiple recall paths and fuse app-side:
- Fetch the for always-on context.
- Run vector and BM25 over current nodes, tenant-scoped, user/container-scoped, and freshness-filtered.
- Optionally run vector and BM25 over nodes for source-grounded RAG and citations, with the same owner/scope policy unless documents are intentionally shared.
- Fuse app-side with RRF, then re-rank by salience, recency, relationship type, and optional cross-encoder score.
- Expand top memories through , , , , and , bounded by depth and tenant filters.
- Pack context without embeddings and include source/citation metadata when available.
Anti-Patterns
Do not:
- use the deprecated dialect (, , , ) for new dynamic/TS/Rust DSL work
- use as the text-index tenant property; use for tenant-partitioned text/vector indexes
- assume alone is a safe recall boundary for org/team tenants; filter by , , project ACLs, or an explicit shared-memory flag
- attach only to ; every tenant-owned node and edge needs it
- mutate, delete, categorise, or reinforce by without also checking
- return superseded/forgotten/expired memories because recall only checked
- mix lifecycle timestamps (, ) with real-world event dates; use separate temporal fields for memories about trips, deadlines, appointments, or historical facts
- build a vector-only store and call it memory
- use a toy hash embedding for production recall or benchmark claims; default to unless the app has a better standard model
- decide dedup/update/extension by vector threshold alone; use exact checks, BM25 candidates, vector candidates, and app/LLM adjudication
- extract memories from only the latest user message and miss contextual follow-ups such as "next April" or "mostly food, temples, and trains"
- drop short follow-up answers because they are not self-contained before context resolution
- write user-specific chunks/documents without an owner or scope field, then recall them tenant-wide
- expect Helix to extract files, chunk documents, generate embeddings, classify updates, build profiles, rerank, sync connectors, or run TTL jobs automatically
- read after an // step; project it immediately after search
- try to express a similarity-threshold dedup as a ; it can only test variable emptiness/size
- update without re-embedding
- return arrays in API responses unless explicitly required
- make or global by display name in a multi-tenant memory app
Validation Checklist
Before finishing:
- vs is correct
- every tenant-owned node and edge has
- vector/text indexes use
tenant_property = "tenant_id"
, and searches pass
- every memory read filters , user/container visibility, , current/latest state, and expiry validity
- every write route accepts and filters by
- IDs used for upsert are either globally unique or tenant-qualified (, , etc.)
- user/container-specific documents and chunks carry the same owner/scope fields used by recall, or are explicitly marked/shared through app policy
- lifecycle validity fields are not overloaded as event-time fields; dated facts use , , , , or app equivalents
- embedding model is
openai:text-embedding-3-small
and every vector is 1536-dim , unless the app explicitly standardises on another fixed model/dimension
- content edits re-embed in the same write
- generation deduplicates semantically and exact repeats are idempotent
- extraction sees the previous assistant turn, recent conversation window, recalled active memories/entities, and current date before deciding what to store
- extraction emits a structured relationship/scope/source/temporal decision that can be tested deterministically
- source documents/chunks exist if the feature promises citations or RAG over raw context
- user profile update jobs exist if the feature promises always-on personalization
- evaluation covers tenant isolation, user/container isolation, stale-memory suppression, contextual follow-up extraction, exact-token recall, temporal corrections, deletion, profile rebuilds, latency, and token budget
- timestamps use one consistent convention; this skill uses typed DateTime via and
- no projected output includes unless explicitly required
- labels/edges/properties match existing repo casing
Reference Files
- — full data-model spec, tenant rules, indexes, modality cheat-sheet, embedding guidance, fusion/re-ranking formula, and TypeScript ↔ Rust API mapping.
- — lifecycle scenarios as TypeScript snippets. Default.
- — the same scenarios in the Rust DSL.
- Adjacent skills: , , , .