hermes-history-ingest

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Hermes History Ingest — Conversation & Memory Mining

Hermes历史记录导入——对话与记忆挖掘

You are extracting knowledge from the user's Hermes agent history and distilling it into the Obsidian wiki. Hermes stores both free-form memories and structured session transcripts — focus on durable knowledge, not operational telemetry.
This skill can be invoked directly or via the
wiki-history-ingest
router (
/wiki-history-ingest hermes
).
你需要从用户的Hermes Agent历史记录中提取知识,并提炼到Obsidian wiki中。Hermes会存储自由格式的记忆和结构化的会话记录——重点关注持久化知识,而非操作遥测数据。
此技能可直接调用,也可通过
wiki-history-ingest
路由调用(
/wiki-history-ingest hermes
)。

Before You Start

开始之前

  1. Read
    .env
    to get
    OBSIDIAN_VAULT_PATH
    and
    HERMES_HISTORY_PATH
    (default to
    ~/.hermes
    if unset)
  2. Read
    .manifest.json
    at the vault root to check what has already been ingested
  3. Read
    index.md
    at the vault root to understand what the wiki already contains
  1. 读取
    .env
    文件获取
    OBSIDIAN_VAULT_PATH
    HERMES_HISTORY_PATH
    (若未设置则默认使用
    ~/.hermes
  2. 读取vault根目录下的
    .manifest.json
    文件,查看已导入的内容
  3. 读取vault根目录下的
    index.md
    文件,了解wiki已包含的内容

Ingest Modes

导入模式

Append Mode (default)

追加模式(默认)

Check
.manifest.json
for each source file. Only process:
  • Files not in the manifest (new memory files, new session logs)
  • Files whose modification time is newer than
    ingested_at
    in the manifest
Use this mode for regular syncs.
检查
.manifest.json
中的每个源文件。仅处理以下文件:
  • 未在清单中的文件(新的记忆文件、新的会话日志)
  • 修改时间晚于清单中
    ingested_at
    的文件
此模式用于常规同步。

Full Mode

全量模式

Process everything regardless of manifest. Use after
wiki-rebuild
or if the user explicitly asks for a full re-ingest.
无论清单如何,处理所有内容。在
wiki-rebuild
之后或用户明确要求全量重新导入时使用此模式。

Hermes Data Layout

Hermes数据结构

Hermes stores all local artifacts under
~/.hermes/
(or
$HERMES_HOME
for non-default profiles).
~/.hermes/
├── memories/                          # Persistent agent memories (markdown or JSON)
│   └── *.md / *.json
├── skills/                            # Installed skills (read-only for ingest purposes)
│   └── <skill-name>/SKILL.md
├── sessions/                          # Session transcripts (if session logging is enabled)
│   └── YYYY-MM-DD/
│       └── <session-id>.jsonl
├── config.yaml                        # User config (model, theme, paths)
└── .hub/                              # Skills Hub state (lock.json, audit.log, quarantine/)
Hermes将所有本地工件存储在
~/.hermes/
目录下(非默认配置文件则使用
$HERMES_HOME
)。
~/.hermes/
├── memories/                          # 持久化Agent记忆(Markdown或JSON格式)
│   └── *.md / *.json
├── skills/                            # 已安装的技能(导入时为只读)
│   └── <skill-name>/SKILL.md
├── sessions/                          # 会话记录(若启用会话日志)
│   └── YYYY-MM-DD/
│       └── <session-id>.jsonl
├── config.yaml                        # 用户配置(模型、主题、路径)
└── .hub/                              # 技能中心状态(lock.json、audit.log、quarantine/)

Key data sources ranked by value

按价值排序的关键数据源

  1. memories/*.md
    /
    memories/*.json
    — highest signal; curated persistent knowledge the agent accumulated
  2. sessions/**/*.jsonl
    — structured turn-by-turn transcripts; rich but noisy
  3. config.yaml
    — metadata only (model preferences, paths); rarely worth ingesting
Skip
.hub/
internals (audit/quarantine state) and the
skills/
directory (source material, not user knowledge).
  1. memories/*.md
    /
    memories/*.json
    —— 信号最强;Agent积累的经过整理的持久化知识
  2. sessions/**/*.jsonl
    —— 结构化的逐轮会话记录;信息丰富但噪音较多
  3. config.yaml
    —— 仅元数据(模型偏好、路径);很少值得导入
跳过
.hub/
内部文件(审核/隔离状态)和
skills/
目录(源材料,非用户知识)。

Step 1: Survey and Compute Delta

步骤1:排查并计算增量

Scan
HERMES_HISTORY_PATH
and compare against
.manifest.json
:
  • ~/.hermes/memories/
  • ~/.hermes/sessions/**/
    (if present)
Classify each file:
  • New — not in manifest
  • Modified — in manifest but file is newer than
    ingested_at
  • Unchanged — already ingested and unchanged
Report a concise delta summary before deep parsing.
扫描
HERMES_HISTORY_PATH
并与
.manifest.json
对比:
  • ~/.hermes/memories/
  • ~/.hermes/sessions/**/
    (若存在)
对每个文件进行分类:
  • 新建 —— 未在清单中
  • 已修改 —— 在清单中但文件比
    ingested_at
    更新
  • 未更改 —— 已导入且未修改
在深度解析前,先报告简洁的增量摘要。

Step 2: Parse Memories First

步骤2:优先解析记忆

Memories are the highest-value source. Hermes writes them as either:
  • Markdown — structured prose with optional frontmatter; ingest directly
  • JSON
    {"content": "...", "created_at": "...", "tags": [...]}
    records
For each memory:
  • Extract the core knowledge claim
  • Note any tags Hermes attached (they often map to wiki categories)
  • Merge into the appropriate wiki page rather than creating one memory = one page
记忆是最高价值的数据源。Hermes会以以下两种格式写入记忆:
  • Markdown —— 带可选前置元数据的结构化散文;直接导入
  • JSON ——
    {"content": "...", "created_at": "...", "tags": [...]}
    格式的记录
对于每条记忆:
  • 提取核心知识主张
  • 记录Hermes附加的标签(通常对应wiki分类)
  • 合并到合适的wiki页面,而非一条记忆对应一个页面

Step 3: Parse Session JSONL Safely

步骤3:安全解析会话JSONL

Each session JSONL line is an event envelope. Common shapes:
json
{"role": "user", "content": "..."}
{"role": "assistant", "content": "..."}
{"type": "tool_use", "name": "...", "input": {...}}
{"type": "tool_result", "content": "..."}
每个会话JSONL的行都是一个事件包。常见格式如下:
json
{"role": "user", "content": "..."}
{"role": "assistant", "content": "..."}
{"type": "tool_use", "name": "...", "input": {...}}
{"type": "tool_result", "content": "..."}

Extraction rules

提取规则

  • Prioritize assistant responses that state conclusions, patterns, or decisions
  • Extract user intent from high-signal turns; skip low-information follow-ups
  • Treat
    tool_use
    /
    tool_result
    pairs as context, not primary content
  • Skip token accounting, internal plumbing, and repeated plan echoes
  • 优先提取助手给出的结论、模式或决策相关的回复
  • 从高信号交互中提取用户意图;跳过低信息量的后续跟进内容
  • tool_use
    /
    tool_result
    对视为上下文,而非主要内容
  • 跳过令牌统计、内部流程和重复的计划回声

Critical privacy filter

关键隐私过滤

Session logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim.
  • Remove API keys, tokens, passwords, credentials
  • Redact private identifiers unless relevant and user-approved
  • Summarize; do not quote raw transcripts verbatim
会话日志可能包含注入的指令、工具负载和敏感文本。请勿直接导入原文。
  • 删除API密钥、令牌、密码、凭证
  • 编辑私人标识符,除非相关且获得用户批准
  • 进行总结;不要引用原始会话记录原文

Step 4: Cluster by Topic

步骤4:按主题聚类

Do not create one wiki page per memory or session.
  • Group memories by stable topic (concept, tool, project, technique)
  • Split mixed sessions into separate themes
  • Merge recurring patterns across dates and projects
  • Use file paths or session
    cwd
    metadata to infer project scope when available
不要为每条记忆或每个会话创建一个wiki页面。
  • 按稳定主题(概念、工具、项目、技术)分组记忆
  • 将混合主题的会话拆分为不同主题
  • 合并跨日期和项目的重复模式
  • 若有可用的文件路径或会话
    cwd
    元数据,用其推断项目范围

Step 5: Distill into Wiki Pages

步骤5:提炼为Wiki页面

Route extracted knowledge using existing wiki conventions:
  • Project-specific architecture/process →
    projects/<name>/...
  • General concepts →
    concepts/
  • Recurring techniques/debug playbooks →
    skills/
  • Tools/services/frameworks →
    entities/
  • Cross-session patterns →
    synthesis/
For each impacted project, create/update
projects/<name>/<name>.md
.
使用现有wiki规范路由提取到的知识:
  • 特定项目的架构/流程 →
    projects/<name>/...
  • 通用概念 →
    concepts/
  • 重复使用的技术/调试手册 →
    skills/
  • 工具/服务/框架 →
    entities/
  • 跨会话模式 →
    synthesis/
对于每个受影响的项目,创建/更新
projects/<name>/<name>.md

Writing rules

写作规则

  • Distill knowledge, not chronology
  • Avoid "on date X we discussed..." unless date context is essential
  • Add
    summary:
    frontmatter on each new/updated page (1–2 sentences, ≤ 200 chars)
  • Add provenance markers:
    • ^[extracted]
      when directly grounded in explicit memory/session content
    • ^[inferred]
      when synthesizing patterns across multiple memories
    • ^[ambiguous]
      when memories conflict
  • Add/update
    provenance:
    frontmatter mix for each changed page
  • 提炼知识,而非记录时间线
  • 避免使用“在X日期我们讨论了...”,除非日期上下文至关重要
  • 在每个新建/更新的页面添加
    summary:
    前置元数据(1-2句话,≤200字符)
  • 添加来源标记:
    • ^[extracted]
      :直接基于明确的记忆/会话内容
    • ^[inferred]
      :综合多条记忆的模式得出
    • ^[ambiguous]
      :记忆存在冲突时
  • 为每个修改过的页面添加/更新
    provenance:
    前置元数据

Step 6: Update Manifest, Log, and Index

步骤6:更新清单、日志和索引

Update
.manifest.json

更新
.manifest.json

For each processed source file:
  • ingested_at
    ,
    size_bytes
    ,
    modified_at
  • source_type
    :
    hermes_memory
    |
    hermes_session
  • project
    : inferred project name (when applicable)
  • pages_created
    ,
    pages_updated
Add/update a top-level summary block:
json
{
  "hermes": {
    "source_path": "~/.hermes/",
    "last_ingested": "TIMESTAMP",
    "memories_ingested": 42,
    "sessions_ingested": 7,
    "pages_created": 5,
    "pages_updated": 12
  }
}
对于每个已处理的源文件:
  • ingested_at
    size_bytes
    modified_at
  • source_type
    :
    hermes_memory
    |
    hermes_session
  • project
    : 推断的项目名称(若适用)
  • pages_created
    pages_updated
添加/更新顶层摘要块:
json
{
  "hermes": {
    "source_path": "~/.hermes/",
    "last_ingested": "TIMESTAMP",
    "memories_ingested": 42,
    "sessions_ingested": 7,
    "pages_created": 5,
    "pages_updated": 12
  }
}

Update special files

更新特殊文件

Update
index.md
and
log.md
:
- [TIMESTAMP] HERMES_HISTORY_INGEST memories=N sessions=M pages_updated=X pages_created=Y mode=append|full
更新
index.md
log.md
- [TIMESTAMP] HERMES_HISTORY_INGEST memories=N sessions=M pages_updated=X pages_created=Y mode=append|full

Privacy and Compliance

隐私与合规

  • Distill and synthesize; avoid raw memory or transcript dumps
  • Default to redaction for anything that looks sensitive
  • Ask the user before storing personal or sensitive details
  • Keep references to other people minimal and purpose-bound
  • 提炼和综合内容;避免直接导入原始记忆或会话记录
  • 对任何看起来敏感的内容默认进行编辑
  • 在存储个人或敏感信息前询问用户
  • 尽量减少对他人的引用,且仅用于特定目的

Reference

参考

See
references/hermes-data-format.md
for field-level notes and extraction guidance.
查看
references/hermes-data-format.md
获取字段级说明和提取指南。