memory-collector
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMemory collector
内存收集器
Your richest memory source is sitting untouched on disk: every coding-agent
session ever run on this machine. Transcripts full of decisions, gotchas, open
questions, people, and the occasional quotable outburst — none of it queryable,
all of it rotting in JSONL.
This skill is the collector: a budgeted, cursor-tracked harvest that mines those
transcripts and plants durable knowledge into whatever memory stores the current
environment exposes. It is storage-agnostic (capabilities discovered at run
time, like memory-gardener) and source-aware
(it knows where coding tools keep their sessions). The extraction prompts ship in
, carried from EI's extraction pipeline by Jeremy
Scherer (MIT).
prompts/The composition: the collector plants; the gardener prunes. Collection
deliberately tolerates near-duplicates and overgrowth — the gardener's validate
gate, dedup curator, and bloat-split exist precisely to tend what collection
produces. Don't make the collector perfect; make the pair converge.
你最宝贵的内存源正闲置在磁盘上:这台机器上运行过的每一段coding-agent会话记录。这些记录里满是决策、陷阱、待解决问题、人物信息,偶尔还有值得引用的突发内容——但所有信息都无法被查询,只能在JSONL文件中逐渐失效。
本技能就是这个“收集器”:一个受预算控制、基于游标跟踪的提取过程,它会挖掘这些会话记录,并将持久化知识存入当前环境可访问的任意内存存储中。它具备存储无关性(运行时自动发现存储能力,类似memory-gardener)和源信息感知能力(了解编码工具存储会话记录的位置)。提取提示模板存放在中,由MIT的Jeremy Scherer从EI的提取管道中引入。
prompts/协作模式:收集器负责“播种”;内存管理器负责“修剪”。收集过程刻意允许近似重复和冗余内容——内存管理器的验证关卡、去重管理和拆分冗余功能,正是为了维护收集过程产生的内容。无需让收集器做到完美,只需让两者协作达成最优结果。
Ground rules — the safety contract
基本原则——安全协议
- Sources are read-only. Never modify, move, or delete a transcript file.
- Stores are additive. The collector creates and updates memory items; it never deletes. Anything that looks delete-worthy is the gardener's job.
- Budget every run. Default: 3 sessions (or ~150 messages) per run, oldest unprocessed first. Stop at the budget; the cursor makes the next run continue cleanly.
- Skip live sessions. A transcript modified in the last ~30 minutes (or whose tool is plainly mid-session) gets skipped — half-written sessions extract badly. It will be there next run.
- Never store secrets. Coding transcripts contain tokens, connection strings, ARNs, and keys. If an extracted value is shaped like a credential, drop it. The shipped prompts already exclude these from quotes; apply the same bar to every field you store.
- Conservative is the law. The shipped prompts are tuned so that empty results are the most common response. Honor that — noise is worse than gaps.
- Provenance is mandatory. Every stored item carries its source id (see Phase 4). An item you can't trace back to a session is a rumor.
- 源文件只读:绝不修改、移动或删除任何会话记录文件。
- 存储仅追加:收集器仅创建和更新内存条目;绝不删除内容。任何看起来需要删除的内容都交由内存管理器处理。
- 每次运行都设预算:默认值:每次运行处理3个会话(约150条消息),优先处理最早未处理的会话。达到预算即停止;游标会确保下一次运行可以无缝继续。
- 跳过活跃会话:最近约30分钟内修改过的记录(或工具明显处于会话进行中)会被跳过——未完成的会话提取效果很差,下次运行时再处理即可。
- 绝不存储机密信息:编码会话记录中包含令牌、连接字符串、ARN和密钥。如果提取的内容看起来像凭据,直接丢弃。内置的提示模板已在引用中排除这些内容,存储的所有字段都需遵循同样的标准。
- 保守原则至上:内置提示模板经过调校,空结果是最常见的输出。请严格遵循——噪音比空白更糟糕。
- 来源追踪是必需项:每个存储条目都必须携带其源ID(见第4阶段)。无法追溯到会话的条目视为无效信息。
Phase 0 — survey
第0阶段——调研
Transcript sources. The skill bundles dependency-free Node readers
() — run each with
to discover
what exists. Do not parse session stores by hand: the readers already
encode the format traps (sidechain files, tool-result records masquerading as
user messages, lossy cwd encodings).
readers/node readers/<tool>.mjs --list --since <cursor high-water mark>| Tool | Reader | Where sessions live |
|---|---|---|
| Claude Code | | |
| Pi / OMP | | |
| Codex | | |
| OpenCode, Cursor | none yet — see | local app data |
Memory stores. Discover capabilities from the tool surface exactly as the
gardener's Phase 0 does — memory search/mutation, knowledge graph, diary, stats.
Don't assume tool names.
The cursor. Find the previous collection state: a memory item or artifact
tagged holding, per source: a timestamp (pass it
as when listing) plus maps of processed and skipped-trivial session
ids → timestamps (the maps are the sole source of truth — EI's
pattern; is the cheap pre-filter that keeps a
noisy source from re-listing hundreds of already-judged sessions every run).
No cursor → first run: start with the most recent few sessions, not all of
history; backfill over subsequent runs.
collector-cursorhighWater--sinceprocessed_sessionshighWater会话记录源:本技能包含无需依赖的Node读取器()——运行即可发现可用的会话记录。请勿手动解析会话存储:读取器已封装了格式陷阱(如侧链文件、伪装成用户消息的工具结果记录、有损的cwd编码)。
readers/node readers/<tool>.mjs --list --since <cursor high-water mark>| 工具 | 读取器 | 会话存储位置 |
|---|---|---|
| Claude Code | | |
| Pi / OMP | | |
| Codex | | |
| OpenCode, Cursor | 暂未提供——查看 | 本地应用数据 |
内存存储:完全按照内存管理器第0阶段的方式,从工具表面发现能力——内存搜索/修改、知识图谱、日志、统计信息。请勿假设工具名称。
游标:查找上一次收集的状态:一个标记为的内存条目或工件,包含每个源的:时间戳(列出会话时作为参数传入),以及已处理和跳过的无意义会话ID→时间戳的映射(这些映射是唯一的事实来源——EI的模式;是低成本预过滤器,避免每次运行都重新列出数百条已判断过的会话)。如果没有游标→首次运行:从最近的几个会话开始,而非全部历史记录;后续运行逐步回填。
collector-cursorhighWater--sinceprocessed_sessionshighWaterPhase 1 — select
第1阶段——选择
From each available source, list sessions not in the cursor, oldest first, and
take sessions up to the budget. Apply the live-session guard (rule 4). The
readers supply (cwd-derived) and per session.
titlemessageCountPrefer real conversations. Agent automation produces sessions too — a Pi
store can hold a thousand mechanical runner-job sessions for every human one.
Skip sessions that are tiny (fewer than ~4 messages) or whose opening message
is plainly a machine-generated job prompt, and record them in the cursor as
so they are never re-listed. Spending the budget on noise is
how a collector starves.
skipped-trivial从每个可用源中,列出游标中未记录的会话,按从旧到新排序,选取不超过预算数量的会话。应用活跃会话防护规则(规则4)。读取器会提供每个会话的(基于cwd生成)和。
titlemessageCount优先选择真实对话:Agent自动化也会生成会话——Pi存储中每一条人类会话可能对应上千条机械运行任务会话。跳过消息量极少(少于约4条)或初始消息明显是机器生成的任务提示的会话,并在游标中标记为,避免再次列出。把预算浪费在噪音上会导致收集器失效。
skipped-trivialPhase 2 — convert
第2阶段——转换
node readers/<tool>.mjs --session <id>- Build fully qualified message ids: (e.g.,
<tool>:<machine>:<session>:<reader msg id>). Quotes and provenance point at these.claudecode:mbp:0a1f…:42 - Process the session in windows (~20–40 messages). For each window, the window itself is the "Most Recent Messages" and a compact tail of what came before is the "Earlier Conversation" — the shipped prompts are built around exactly this split and only ever analyze the recent window.
node readers/<tool>.mjs --session <id>- 构建完整的消息ID:(例如
<tool>:<machine>:<session>:<reader msg id>)。引用和来源追踪都指向这些ID。claudecode:mbp:0a1f…:42 - 按窗口(约20-40条消息)处理会话。每个窗口中,窗口本身是“最新消息”,之前的精简内容是“早期对话”——内置提示模板正是基于这种拆分设计,且仅分析最新窗口。
Phase 3 — extract
第3阶段——提取
Run the shipped pipelines over each window, with for
coding-tool sessions (it makes Technical a priority category):
technical_context: true- Topics — : scan flags candidate topics → match checks each against existing memory (conservative: unsure ⇒ "new") → update writes the record under the right discipline (Event narratives; Technical accumulate, don't synthesize; everything else synthesize, don't accumulate). Quotes ride along.
prompts/topics.md - People — : scan flags people (confidence 1–5, identifier capture, self/hypothetical guards) → match by identifiers first, then name → update under the person disciplines. For coding sessions most windows yield nobody; that's correct.
prompts/people.md - Events — : once per session, the campaign-recap test ("The Night We Debugged the CPU"). Empty is the norm.
prompts/events.md - Facts — : only if you maintain a missing-facts list (kept beside the cursor). No list, no run.
prompts/facts.md
对每个窗口运行内置管道,针对编码工具会话设置(让技术类成为优先类别):
technical_context: true- 主题——:扫描标记候选主题→匹配阶段将每个主题与现有内存对比(保守原则:不确定则标记为“新主题”)→更新阶段按正确规则写入记录(事件类采用叙事方式;技术类累加而非合成;其他类合成而非累加)。引用内容会一并存储。
prompts/topics.md - 人物——:扫描标记人物(置信度1-5,捕获标识符,区分自身/假设人物)→优先按标识符匹配,再按名称匹配→按人物规则更新记录。对于编码会话,大多数窗口不会提取到人物,这是正常情况。
prompts/people.md - 事件——:每个会话运行一次“活动回顾检测”(例如“我们调试CPU的那晚”)。空结果是常态。
prompts/events.md - 事实——:仅当你维护了缺失事实列表(与游标一起保存)时才运行。没有列表则不执行。
prompts/facts.md
Phase 4 — store
第4阶段——存储
Write extractions into the discovered stores, mapping fields onto the store's
schema (confidence/exposure-impact → importance-like fields; categories →
tags/containers; drop fields the store can't hold rather than inventing
homes). Tag everything plus
.
source:<tool>:<machine>:<session>collected:<ISO date>Where the store distinguishes recent/unreviewed items, leave new items visibly
new — the gardener's validate gate (its Phase 1)
is the door these newcomers are supposed to walk through. If both a fast store
and a structured knowledge store exist, put summaries where retrieval happens
and structure (entities, links) where the graph lives.
将提取内容写入已发现的存储中,将字段映射到存储的 schema(置信度/影响范围→类似重要性的字段;类别→标签/容器;丢弃存储无法容纳的字段,而非强行创建存储位置)。为所有内容添加标签和。
source:<tool>:<machine>:<session>collected:<ISO date>如果存储区分“最新/未审核”条目,保留新条目的可见性——内存管理器的验证关卡(其第1阶段)是这些新条目必经的流程。如果同时存在快速存储和结构化知识存储,将摘要放在检索位置,将结构化内容(实体、链接)放在知识图谱中。
Phase 5 — advance the cursor & report
第5阶段——更新游标并生成报告
Update the cursor only for sessions fully processed — a budget-truncated
session stays uncursored and resumes next run. Also advance each source's
to the newest among sessions you resolved
(processed or skipped-trivial), but never at-or-past a skipped-live session's
timestamp — live sessions must re-list once they settle. Then report:
highWaterlastMessageAtundefined仅对完全处理的会话更新游标——因预算截断的会话保持未标记状态,下次运行时继续处理。同时将每个源的更新为已处理或跳过的无意义会话中最新的时间戳,但绝不超过跳过的活跃会话的时间戳——活跃会话稳定后必须重新列出。然后生成报告:
highWaterlastMessageAtundefinedCollection report — <ISO timestamp>
收集报告 — <ISO时间戳>
Sources: <tool: sessions found / processed / skipped-live>
Windows analyzed: N · budget used: <sessions>/<max>
Planted: topics N (new X, updated Y) · people N · events N · facts N · quotes N
Dropped: secrets-shaped values N · low-confidence extractions N
Cursor: advanced to <session id / timestamp> per source
Handoff: <n> new items awaiting the gardener's validate gate
undefined源:<工具:发现会话数 / 处理会话数 / 跳过的活跃会话数>
分析窗口数:N · 已用预算:<会话数>/<最大值>
存入内容:主题N个(新增X个,更新Y个)· 人物N个 · 事件N个 · 事实N个 · 引用N个
丢弃内容:类机密值N个 · 低置信度提取内容N个
游标:每个源已推进至<会话ID / 时间戳>
移交:<n>个新条目等待内存管理器的验证关卡
undefinedRunning periodically
定期运行
Same hosting story as the gardener: any scheduler that can invoke an agent with
this skill. A good rhythm — collector daily, gardener nightly after it — so
each harvest is tended within a day. Both are budget-capped; worst case is a
report that says "nothing new."
与内存管理器的托管方式相同:任何可以调用Agent并执行本技能的调度器都可使用。推荐节奏——收集器每日运行,内存管理器在收集器之后夜间运行,确保每次收集的内容在一天内得到维护。两者都受预算限制;最坏情况是生成“无新内容”的报告。
Provenance & credit
来源与致谢
The extraction pipeline (scan → match → update), its conservative defaults, the
three description disciplines, the quote bar-test, and the prompts in
come from Flare576/ei
by Jeremy Scherer (MIT, © 2026 Jeremy Scherer) — EI runs this pipeline
against five coding tools as its importer layer. This skill generalizes the
storage side and pairs it with memory-gardener.
prompts/提取管道(扫描→匹配→更新)、保守默认值、三种描述规则、引用验证测试,以及中的提示模板均来自Flare576/ei,由Jeremy Scherer(MIT,© 2026 Jeremy Scherer)开发——EI将该管道作为导入层,用于处理五种编码工具的会话记录。本技能对存储端进行了通用化处理,并与memory-gardener配合使用。
prompts/