openclaw-history-ingest

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenClaw History Ingest — Session & Memory Mining

OpenClaw历史数据导入 — 会话与记忆挖掘

You are extracting knowledge from the user's OpenClaw agent history and distilling it into the Obsidian wiki. OpenClaw stores both a structured long-term MEMORY.md and per-session JSONL transcripts — focus on durable knowledge, not operational telemetry.
This skill can be invoked directly or via the
wiki-history-ingest
router (
/wiki-history-ingest openclaw
).
你需要从用户的OpenClaw Agent历史数据中提取知识,并将其提炼到Obsidian wiki中。OpenClaw会存储结构化的长期MEMORY.md文件和每会话的JSONL转录文件——重点关注持久化知识,而非操作遥测数据。
此技能可直接调用,也可通过
wiki-history-ingest
路由调用(
/wiki-history-ingest openclaw
)。

Before You Start

开始之前

  1. Read
    .env
    to get
    OBSIDIAN_VAULT_PATH
    and
    OPENCLAW_HISTORY_PATH
    (default to
    ~/.openclaw
    if unset)
  2. Read
    .manifest.json
    at the vault root to check what has already been ingested
  3. Read
    index.md
    at the vault root to understand what the wiki already contains
  1. 读取
    .env
    文件获取
    OBSIDIAN_VAULT_PATH
    OPENCLAW_HISTORY_PATH
    (若未设置则默认值为
    ~/.openclaw
  2. 读取vault根目录下的
    .manifest.json
    文件,查看已导入的内容
  3. 读取vault根目录下的
    index.md
    文件,了解wiki已包含的内容

Ingest Modes

导入模式

Append Mode (default)

追加模式(默认)

Check
.manifest.json
for each source file. Only process:
  • Files not in the manifest (new session logs, updated MEMORY.md or daily notes)
  • Files whose modification time is newer than
    ingested_at
    in the manifest
Use this mode for regular syncs.
检查
.manifest.json
中的每个源文件。仅处理以下文件:
  • 未在清单中的文件(新会话日志、更新后的MEMORY.md或每日笔记)
  • 修改时间晚于清单中
    ingested_at
    时间的文件
此模式适用于常规同步。

Full Mode

完整模式

Process everything regardless of manifest. Use after
wiki-rebuild
or if the user explicitly asks for a full re-ingest.
无论清单记录如何,处理所有文件。在执行
wiki-rebuild
后,或用户明确要求完整重新导入时使用此模式。

OpenClaw Data Layout

OpenClaw数据结构

OpenClaw stores all local artifacts under
~/.openclaw/
.
~/.openclaw/
├── openclaw.json                          # Global config
├── credentials/                           # Auth tokens (skip entirely)
├── workspace/                             # Agent workspace
│   ├── MEMORY.md                          # Long-term memory (loaded every session)
│   ├── DREAMS.md                          # Optional dream diary / summaries
│   └── memory/
│       ├── YYYY-MM-DD.md                  # Daily notes (today + yesterday auto-loaded)
│       └── ...
└── agents/
    └── <agentId>/
        ├── agent/
        │   └── models.json                # Agent config (skip)
        └── sessions/
            ├── sessions.json              # Session index
            └── <sessionId>.jsonl          # Session transcript (JSONL, append-only)
OpenClaw将所有本地工件存储在
~/.openclaw/
目录下。
~/.openclaw/
├── openclaw.json                          # 全局配置
├── credentials/                           # 认证令牌(完全跳过)
├── workspace/                             # Agent工作区
│   ├── MEMORY.md                          # 长期记忆(每次会话都会加载)
│   ├── DREAMS.md                          # 可选的梦境日志/摘要
│   └── memory/
│       ├── YYYY-MM-DD.md                  # 每日笔记(自动加载今日和昨日的内容)
│       └── ...
└── agents/
    └── <agentId>/
        ├── agent/
        │   └── models.json                # Agent配置(跳过)
        └── sessions/
            ├── sessions.json              # 会话索引
            └── <sessionId>.jsonl          # 会话转录(JSONL格式,仅追加)

Key data sources ranked by value

按价值排序的关键数据源

  1. workspace/MEMORY.md
    — highest signal; long-term durable facts the agent accumulated
  2. workspace/memory/YYYY-MM-DD.md
    — daily notes; recent entries often contain active project context
  3. agents/*/sessions/<id>.jsonl
    — session transcripts; rich but noisy
  4. agents/*/sessions/sessions.json
    — session index for inventory and timestamps
  5. workspace/DREAMS.md
    — optional summaries; ingest if present
Skip
credentials/
entirely. Skip
agents/*/agent/models.json
(runtime config, not user knowledge).
  1. workspace/MEMORY.md
    — 信号价值最高;Agent积累的长期持久化事实
  2. workspace/memory/YYYY-MM-DD.md
    — 每日笔记;近期条目通常包含活跃项目上下文
  3. agents/*/sessions/<id>.jsonl
    — 会话转录;信息丰富但噪音较多
  4. agents/*/sessions/sessions.json
    — 会话索引,用于清单和时间戳
  5. workspace/DREAMS.md
    — 可选摘要;若存在则导入
完全跳过
credentials/
目录。跳过
agents/*/agent/models.json
(运行时配置,不属于用户知识)。

Step 1: Survey and Compute Delta

步骤1:排查并计算增量

Scan
OPENCLAW_HISTORY_PATH
and compare against
.manifest.json
:
  • ~/.openclaw/workspace/MEMORY.md
  • ~/.openclaw/workspace/DREAMS.md
    (if present)
  • ~/.openclaw/workspace/memory/*.md
  • ~/.openclaw/agents/*/sessions/sessions.json
  • ~/.openclaw/agents/*/sessions/*.jsonl
Classify each file:
  • New — not in manifest
  • Modified — in manifest but file is newer than
    ingested_at
  • Unchanged — already ingested and unchanged
Report a concise delta summary before deep parsing.
扫描
OPENCLAW_HISTORY_PATH
并与
.manifest.json
对比:
  • ~/.openclaw/workspace/MEMORY.md
  • ~/.openclaw/workspace/DREAMS.md
    (若存在)
  • ~/.openclaw/workspace/memory/*.md
  • ~/.openclaw/agents/*/sessions/sessions.json
  • ~/.openclaw/agents/*/sessions/*.jsonl
对每个文件进行分类:
  • 新增 — 未在清单中
  • 已修改 — 在清单中,但文件修改时间晚于
    ingested_at
  • 未变更 — 已导入且未修改
在深度解析前,先报告简洁的增量摘要。

Step 2: Parse MEMORY.md First

步骤2:优先解析MEMORY.md

MEMORY.md
is the highest-value source. It is plain markdown, human-readable and human-editable. It typically contains:
  • Durable facts about the user's preferences, environment, and recurring patterns
  • Decisions and context the agent was told to remember
  • Project-specific notes the agent accumulated over many sessions
Read it in full and extract concept-level knowledge. Do not create one wiki page per MEMORY.md entry — cluster by topic.
MEMORY.md
是价值最高的数据源。它是纯Markdown格式,可读可编辑。通常包含:
  • 关于用户偏好、环境和重复模式的持久化事实
  • 告知Agent需要记住的决策和上下文
  • Agent在多个会话中积累的项目特定笔记
完整读取该文件并提取概念级知识。不要为每个MEMORY.md条目创建一个wiki页面——按主题聚类。

Step 3: Parse Daily Notes

步骤3:解析每日笔记

workspace/memory/YYYY-MM-DD.md
files contain time-stamped notes from that day's sessions. Prioritize recent files (last 30–90 days). Extract:
  • Active project context and decisions made
  • Patterns or techniques discovered
  • Recurring blockers or solved problems
Older daily notes have diminishing signal — summarize in bulk rather than extracting line-by-line.
workspace/memory/YYYY-MM-DD.md
文件包含当日会话的时间戳笔记。优先处理近期文件(过去30-90天)。提取:
  • 活跃项目上下文和已做出的决策
  • 发现的模式或技术
  • 重复出现的障碍或已解决的问题
较旧的每日笔记信号价值递减——批量总结而非逐行提取。

Step 4: Parse Session JSONL Safely

步骤4:安全解析会话JSONL

Each session file is JSONL (append-only, one JSON object per line):
json
{"role": "user",      "content": "...", "timestamp": "..."}
{"role": "assistant", "content": "...", "timestamp": "..."}
{"role": "tool",      "name": "...",   "content": "...", "timestamp": "..."}
每个会话文件都是JSONL格式(仅追加,每行一个JSON对象):
json
{"role": "user",      "content": "...", "timestamp": "..."}
{"role": "assistant", "content": "...", "timestamp": "..."}
{"role": "tool",      "name": "...",   "content": "...", "timestamp": "..."}

Extraction rules

提取规则

  • Prioritize assistant turns that state conclusions, decisions, or patterns
  • Extract user intent from high-signal turns; skip low-information follow-ups
  • Tool calls are context, not primary knowledge — only extract if the result contains a reusable insight
  • Cross-reference
    sessions.json
    index to get session names/labels before opening individual transcripts
  • 优先提取助手发言中包含结论、决策或模式的内容
  • 从高信号发言中提取用户意图;跳过低信息量的后续内容
  • 工具调用是上下文信息,而非核心知识——仅当结果包含可复用洞察时才提取
  • 在打开单个转录文件前,先交叉引用
    sessions.json
    索引获取会话名称/标签

Critical privacy filter

隐私过滤关键规则

Session transcripts can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim.
  • Remove API keys, tokens, passwords, credentials
  • Redact private identifiers unless relevant and user-approved
  • Summarize; do not quote raw transcripts verbatim
会话转录可能包含注入的指令、工具负载和敏感文本。不要原样导入。
  • 删除API密钥、令牌、密码、凭证
  • 编辑私人标识符,除非相关且经用户批准
  • 进行总结;不要直接引用原始转录内容

Step 5: Cluster by Topic

步骤5:按主题聚类

Do not create one wiki page per session or per MEMORY.md entry.
  • Group by stable topic (concept, tool, project, technique)
  • Split mixed sessions into separate themes
  • Merge recurring patterns across dates and agents
  • Use session
    cwd
    or workspace path to infer project scope when available
不要为每个会话或每个MEMORY.md条目创建一个wiki页面。
  • 按稳定主题(概念、工具、项目、技术)分组
  • 将混合会话拆分为不同主题
  • 合并跨日期和Agent的重复模式
  • 若可用,使用会话的
    cwd
    或工作区路径推断项目范围

Step 6: Distill into Wiki Pages

步骤6:提炼为Wiki页面

Route extracted knowledge using existing wiki conventions:
  • Project-specific architecture/process →
    projects/<name>/...
  • General concepts →
    concepts/
  • Recurring techniques/debug playbooks →
    skills/
  • Tools/services/frameworks →
    entities/
  • Cross-session patterns →
    synthesis/
For each impacted project, create/update
projects/<name>/<name>.md
.
使用现有wiki约定路由提取的知识:
  • 项目特定架构/流程 →
    projects/<name>/...
  • 通用概念 →
    concepts/
  • 重复使用的技术/调试手册 →
    skills/
  • 工具/服务/框架 →
    entities/
  • 跨会话模式 →
    synthesis/
对于每个受影响的项目,创建/更新
projects/<name>/<name>.md

Writing rules

写作规则

  • Distill knowledge, not chronology
  • Avoid "on date X we discussed..." unless date context is essential
  • Add
    summary:
    frontmatter on each new/updated page (1–2 sentences, ≤ 200 chars)
  • Add provenance markers:
    • ^[extracted]
      when directly grounded in explicit session/memory content
    • ^[inferred]
      when synthesizing patterns across multiple sessions
    • ^[ambiguous]
      when sessions conflict
  • Add/update
    provenance:
    frontmatter mix for each changed page
  • 提炼知识,而非记录时间线
  • 避免使用“在X日期我们讨论了...”,除非日期上下文至关重要
  • 在每个新建/更新的页面添加
    summary:
    前置元数据(1-2句话,≤200字符)
  • 添加来源标记:
    • ^[extracted]
      直接来自明确的会话/记忆内容
    • ^[inferred]
      综合多个会话的模式得出
    • ^[ambiguous]
      会话内容存在冲突时
  • 为每个修改过的页面添加/更新
    provenance:
    前置元数据

Step 7: Update Manifest, Log, and Index

步骤7:更新清单、日志和索引

Update
.manifest.json

更新
.manifest.json

For each processed source file:
  • ingested_at
    ,
    size_bytes
    ,
    modified_at
  • source_type
    :
    openclaw_memory
    |
    openclaw_daily_note
    |
    openclaw_session
    |
    openclaw_dreams
  • agent_id
    : agent directory name (when applicable)
  • pages_created
    ,
    pages_updated
Add/update a top-level summary block:
json
{
  "openclaw": {
    "source_path": "~/.openclaw/",
    "last_ingested": "TIMESTAMP",
    "memory_updated_at": "TIMESTAMP",
    "daily_notes_ingested": 14,
    "sessions_ingested": 23,
    "pages_created": 6,
    "pages_updated": 18
  }
}
对于每个已处理的源文件:
  • ingested_at
    size_bytes
    modified_at
  • source_type
    :
    openclaw_memory
    |
    openclaw_daily_note
    |
    openclaw_session
    |
    openclaw_dreams
  • agent_id
    : Agent目录名称(适用时)
  • pages_created
    pages_updated
添加/更新顶级摘要块:
json
{
  "openclaw": {
    "source_path": "~/.openclaw/",
    "last_ingested": "TIMESTAMP",
    "memory_updated_at": "TIMESTAMP",
    "daily_notes_ingested": 14,
    "sessions_ingested": 23,
    "pages_created": 6,
    "pages_updated": 18
  }
}

Update special files

更新特殊文件

Update
index.md
and
log.md
:
- [TIMESTAMP] OPENCLAW_HISTORY_INGEST memory=updated daily_notes=N sessions=M pages_updated=X pages_created=Y mode=append|full
更新
index.md
log.md
- [TIMESTAMP] OPENCLAW_HISTORY_INGEST memory=updated daily_notes=N sessions=M pages_updated=X pages_created=Y mode=append|full

Privacy and Compliance

隐私与合规

  • Distill and synthesize; avoid raw memory or transcript dumps
  • Default to redaction for anything that looks sensitive
  • Ask the user before storing personal or sensitive details
  • Keep references to other people minimal and purpose-bound
  • 提炼和综合内容;避免直接导入原始记忆或转录内容
  • 默认编辑所有看起来敏感的内容
  • 在存储个人或敏感细节前询问用户
  • 尽量减少对他人的引用,且仅用于特定目的

Reference

参考

See
references/openclaw-data-format.md
for field-level notes and parsing guidance.
有关字段级说明和解析指南,请参阅
references/openclaw-data-format.md