pi-history-ingest

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Pi History Ingest — Session Mining

Pi历史记录导入——会话挖掘

You are extracting knowledge from the user's Pi coding agent sessions and distilling it into the Obsidian wiki. Pi sessions are stored as structured JSONL with a tree layout — your job is to follow the active branch, extract durable knowledge, and compile it.
This skill can be invoked directly or via the
wiki-history-ingest
router (
/wiki-history-ingest pi
).
你需要从用户的Pi coding agent会话中提取知识,并将其提炼到Obsidian wiki中。Pi会话以树形结构的JSONL格式存储——你的任务是追踪活跃分支,提取可复用的知识并进行整理。
此技能可直接调用,也可通过
wiki-history-ingest
路由(
/wiki-history-ingest pi
)调用。

Before You Start

开始前准备

  1. Resolve config — follow the Config Resolution Protocol in
    llm-wiki/SKILL.md
    (walk up CWD for
    .env
    ~/.obsidian-wiki/config
    → prompt setup). This gives
    OBSIDIAN_VAULT_PATH
    and
    PI_HISTORY_PATH
    (defaults to
    ~/.pi/agent/sessions
    )
  2. Read
    .manifest.json
    at the vault root to check what has already been ingested
  3. Read
    index.md
    at the vault root to understand what the wiki already contains
  1. 解析配置 — 遵循
    llm-wiki/SKILL.md
    中的配置解析协议(从当前工作目录向上查找
    .env
    ~/.obsidian-wiki/config
    → 提示设置)。这将获取
    OBSIDIAN_VAULT_PATH
    PI_HISTORY_PATH
    (默认路径为
    ~/.pi/agent/sessions
  2. 读取库根目录下的
    .manifest.json
    ,查看已导入的内容
  3. 读取库根目录下的
    index.md
    ,了解wiki已包含的内容

Ingest Modes

导入模式

Append Mode (default)

追加模式(默认)

Check
.manifest.json
for each source file. Only process:
  • Files not in the manifest (new sessions)
  • Files whose modification time is newer than
    ingested_at
    in the manifest
Use this mode for regular syncs.
检查
.manifest.json
中的每个源文件。仅处理:
  • 未在清单中的文件(新会话)
  • 修改时间晚于清单中
    ingested_at
    的文件
此模式适用于常规同步。

Full Mode

全量模式

Process everything regardless of manifest. Use after
wiki-rebuild
or if the user explicitly asks for a full re-ingest.
无论清单记录如何,处理所有内容。在执行
wiki-rebuild
后或用户明确要求全量重新导入时使用。

Pi Data Layout

Pi数据结构

Pi stores sessions under
~/.pi/agent/sessions/
(or the path set by
PI_CODING_AGENT_SESSION_DIR
).
~/.pi/agent/sessions/
├── --<cwd-path>--/                    # Working directory with / replaced by -
│   └── <timestamp>_<uuid>.jsonl       # Session JSONL file
└── ...
The session filename contains an ISO timestamp and UUID. The parent directory encodes the working directory where the session was created.
Pi将会话存储在
~/.pi/agent/sessions/
(或
PI_CODING_AGENT_SESSION_DIR
设置的路径)下。
~/.pi/agent/sessions/
├── --<cwd-path>--/                    # 工作目录,其中/替换为-
│   └── <timestamp>_<uuid>.jsonl       # 会话JSONL文件
└── ...
会话文件名包含ISO时间戳和UUID。父目录编码了创建会话时的工作目录。

Session JSONL Format

会话JSONL格式

Each
.jsonl
file is a sequence of JSON objects. The first line is always a
session
header; subsequent lines are tree entries with
id
and
parentId
.
Key entry types:
type
PurposeIngest?
session
Header with
cwd
,
version
,
id
,
timestamp
Metadata only
message
Conversation turn (
user
,
assistant
,
toolResult
,
bashExecution
, etc.)
Primary source
session_info
Display name set via
/name
For session title
compaction
Context compaction summaryHigh signal
branch_summary
Summary when switching branches via
/tree
High signal
model_change
Model switch eventSkip
thinking_level_change
Thinking level changeSkip
custom
Extension state (not in LLM context)Skip
custom_message
Extension-injected messageContext only
label
User bookmark/labelSkip
每个
.jsonl
文件是一系列JSON对象。第一行始终是
session
头;后续行是带有
id
parentId
的树状条目。
关键条目类型:
type
用途是否导入?
session
包含
cwd
version
id
timestamp
的头信息
仅元数据
message
对话轮次(
user
assistant
toolResult
bashExecution
等)
主要数据源
session_info
通过
/name
设置的显示名称
用于会话标题
compaction
上下文压缩摘要高价值信号
branch_summary
通过
/tree
切换分支时的摘要
高价值信号
model_change
模型切换事件跳过
thinking_level_change
思考层级变更跳过
custom
扩展状态(不在LLM上下文中)跳过
custom_message
扩展注入的消息仅上下文
label
用户书签/标签跳过

Message roles inside
message
entries

message
条目中的角色

  • user
    — user input;
    content
    is string or
    (TextContent \| ImageContent)[]
  • assistant
    — assistant response;
    content
    is
    (TextContent \| ThinkingContent \| ToolCall)[]
  • toolResult
    — tool execution result;
    content
    is
    (TextContent \| ImageContent)[]
  • bashExecution
    — bash command + output;
    command
    ,
    output
    ,
    exitCode
  • branchSummary
    — branch switch summary;
    summary
    string
  • compactionSummary
    — compaction summary;
    summary
    string
  • user
    — 用户输入;
    content
    为字符串或
    (TextContent \| ImageContent)[]
  • assistant
    — 助手回复;
    content
    (TextContent \| ThinkingContent \| ToolCall)[]
  • toolResult
    — 工具执行结果;
    content
    (TextContent \| ImageContent)[]
  • bashExecution
    — bash命令+输出;包含
    command
    output
    exitCode
  • branchSummary
    — 分支切换摘要;包含
    summary
    字符串
  • compactionSummary
    — 压缩摘要;包含
    summary
    字符串

Key data sources ranked by value

按价值排序的关键数据源

  1. message
    entries (
    user
    +
    assistant
    )
    — full conversation transcripts; rich but noisy
  2. compaction
    entries
    — pre-synthesized summaries of older context; gold
  3. branch_summary
    entries
    — summaries of abandoned branches; good signal
  4. bashExecution
    entries
    — concrete commands run; useful for workflow patterns
  5. session_info
    entries
    — session name for topic inference
Skip
model_change
,
thinking_level_change
,
custom
(extension state), and
label
entries.
  1. message
    条目(
    user
    +
    assistant
    — 完整对话记录;信息丰富但存在噪音
  2. compaction
    条目
    — 旧上下文的预合成摘要;黄金数据源
  3. branch_summary
    条目
    — 废弃分支的摘要;优质信号
  4. bashExecution
    条目
    — 实际执行的具体命令;有助于发现工作流模式
  5. session_info
    条目
    — 会话名称,用于主题推断
跳过
model_change
thinking_level_change
custom
(扩展状态)和
label
条目。

Step 1: Survey and Compute Delta

步骤1:排查并计算增量

Scan
PI_HISTORY_PATH
and compare against
.manifest.json
:
bash
undefined
扫描
PI_HISTORY_PATH
并与
.manifest.json
对比:
bash
undefined

List all session files

列出所有会话文件

find ~/.pi/agent/sessions -name "*.jsonl" -type f
find ~/.pi/agent/sessions -name "*.jsonl" -type f

Or with custom path

或使用自定义路径

find "$PI_HISTORY_PATH" -name "*.jsonl" -type f

Build an inventory. For each session file, record:
- `path` — absolute path
- `cwd` — decoded from parent directory name (`--<path>--` → `/path`)
- `session_name` — from the latest `session_info` entry (if any)
- `modified_at` — file mtime
- `already_ingested` — presence in `.manifest.json`

Classify each file:
- **New** — not in manifest
- **Modified** — in manifest but file is newer than `ingested_at`
- **Unchanged** — already ingested and unchanged

Report a concise delta summary before deep parsing:
> "Found N Pi sessions across K projects. Delta: X new, Y modified."
find "$PI_HISTORY_PATH" -name "*.jsonl" -type f

构建清单。对于每个会话文件,记录:
- `path` — 绝对路径
- `cwd` — 从父目录名解码(`--<path>--` → `/path`)
- `session_name` — 来自最新的`session_info`条目(如有)
- `modified_at` — 文件修改时间
- `already_ingested` — 是否存在于`.manifest.json`中

对每个文件进行分类:
- **新增** — 不在清单中
- **已修改** — 在清单中但文件比`ingested_at`新
- **未变更** — 已导入且未修改

在深度解析前,生成简洁的增量摘要:
> "发现K个项目下的N个Pi会话。增量:X个新增,Y个已修改。"

Step 2: Parse Session JSONL

步骤2:解析会话JSONL

For each selected session file, read it line by line. Because sessions use a tree structure, build the active branch first:
  1. Parse all entries into a map by
    id
  2. Find the current leaf (the entry with no children, or the last
    message
    entry)
  3. Walk
    parentId
    chain from leaf to root to get the active path
  4. Reverse the path so it's chronological
对于每个选中的会话文件,逐行读取。由于会话采用树形结构,需先构建活跃分支:
  1. 将所有条目解析为按
    id
    映射的结构
  2. 找到当前叶节点(无子节点的条目,或最后一个
    message
    条目)
  3. 从叶节点沿
    parentId
    链向上遍历至根节点,获取活跃路径
  4. 反转路径使其按时间顺序排列

Extraction rules

提取规则

From the active path, extract:
  • session
    header
    cwd
    ,
    timestamp
    ,
    parentSession
    (if forked)
  • session_info
    name
    field for session title/topic inference
  • message
    entries with
    role: "user"
    — extract
    content
    text (skip images)
  • message
    entries with
    role: "assistant"
    — extract
    text
    content blocks; skip
    thinking
    blocks (noise); note
    toolCall
    blocks (they reveal what the agent actually did)
  • message
    entries with
    role: "toolResult"
    — summarize outcomes, not full output
  • message
    entries with
    role: "bashExecution"
    — extract command + exit code; recurring commands reveal build/test/deploy workflows
  • compaction
    entries
    — read
    summary
    verbatim; it's already distilled
  • branch_summary
    entries
    — read
    summary
    verbatim; captures abandoned approaches
从活跃路径中提取:
  • session
    cwd
    timestamp
    parentSession
    (如果是分叉会话)
  • session_info
    name
    字段,用于会话标题/主题推断
  • role: "user"
    message
    条目
    — 提取
    content
    文本(跳过图片)
  • role: "assistant"
    message
    条目
    — 提取
    text
    内容块;跳过
    thinking
    块(噪音);记录
    toolCall
    块(显示代理实际执行的操作)
  • role: "toolResult"
    message
    条目
    — 总结结果,而非完整输出
  • role: "bashExecution"
    message
    条目
    — 提取命令+退出码;重复出现的命令可揭示构建/测试/部署工作流
  • compaction
    条目
    — 直接读取
    summary
    ;已为提炼后的内容
  • branch_summary
    条目
    — 直接读取
    summary
    ;记录废弃的方案

Skip / noise filters

跳过/噪音过滤

  • thinking
    content blocks — internal reasoning, not durable knowledge
  • Image content blocks — skip unless the user explicitly asks for image transcription
  • Raw tool outputs longer than 500 chars — summarize the outcome
  • Token accounting (
    usage
    fields) — metadata only
  • Repeated plan echoes or status updates
  • thinking
    内容块 — 内部推理过程,不属于可复用知识
  • 图片内容块 — 除非用户明确要求图片转写,否则跳过
  • 超过500字符的原始工具输出 — 总结结果
  • 令牌统计(
    usage
    字段) — 仅作为元数据
  • 重复的计划回显或状态更新

Critical privacy filter

关键隐私过滤

Session logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim.
  • Remove API keys, tokens, passwords, credentials
  • Redact private identifiers unless relevant and user-approved
  • Summarize bash outputs that contain paths, environment variables, or secrets
  • Do not quote raw
    toolCall
    arguments verbatim if they contain sensitive data
会话日志可能包含注入的指令、工具负载和敏感文本。请勿直接导入原文。
  • 删除API密钥、令牌、密码、凭证
  • 编辑私人标识符,除非相关且经用户批准
  • 总结包含路径、环境变量或机密信息的bash输出
  • 如果
    toolCall
    参数包含敏感数据,请勿直接引用原文

Step 3: Cluster by Topic

步骤3:按主题聚类

Do not create one wiki page per session.
  • Group knowledge by stable topic across many sessions
  • Split mixed sessions into separate themes
  • Merge recurring patterns across dates and projects
  • Use the
    cwd
    from the session header to infer project scope
  • Use
    session_info.name
    as a topic hint when available
请勿为每个会话创建一个wiki页面。
  • 按跨会话的稳定主题对知识进行分组
  • 将混合主题的会话拆分为独立主题
  • 合并不同日期和项目中的重复模式
  • 使用会话头中的
    cwd
    推断项目范围
  • 如有可用,将
    session_info.name
    作为主题提示

Step 4: Distill into Wiki Pages

步骤4:提炼为Wiki页面

Route extracted knowledge using existing wiki conventions:
  • Project-specific architecture/process →
    projects/<name>/...
  • General concepts →
    concepts/
  • Recurring techniques/debug playbooks →
    skills/
  • Tools/services/frameworks →
    entities/
  • Cross-session patterns →
    synthesis/
For each impacted project, create/update
projects/<name>/<name>.md
.
使用现有wiki约定路由提取的知识:
  • 项目特定的架构/流程 →
    projects/<name>/...
  • 通用概念 →
    concepts/
  • 重复使用的技巧/调试手册 →
    skills/
  • 工具/服务/框架 →
    entities/
  • 跨会话模式 →
    synthesis/
对于每个受影响的项目,创建/更新
projects/<name>/<name>.md

Writing rules

写作规则

  • Distill knowledge, not chronology
  • Avoid "on date X we discussed..." unless date context is essential
  • Add
    summary:
    frontmatter on each new/updated page (1–2 sentences, ≤ 200 chars)
  • Add confidence and lifecycle fields to every new page:
    yaml
    base_confidence: 0.42
    lifecycle: draft
    lifecycle_changed: <ISO date today>
    Leave
    lifecycle
    unchanged on update.
  • Add provenance markers:
    • ^[extracted]
      when directly grounded in explicit session content (compaction/branch summaries, explicit assistant statements)
    • ^[inferred]
      when synthesizing patterns across multiple sessions or inferring from tool calls
    • ^[ambiguous]
      when sessions conflict or a compaction summary contradicts later turns
  • Add/update
    provenance:
    frontmatter mix for each changed page
Mark provenance per the convention in
llm-wiki
:
  • compaction
    and
    branch_summary
    entries are pre-distilled — treat as mostly
    ^[extracted]
  • Conversation distillation is mostly
    ^[inferred]
    — you're synthesizing from dialogue
  • Use
    ^[ambiguous]
    when the user changed their mind across sessions or when compaction summaries disagree with later conversation turns
  • 提炼知识,而非按时间顺序记录
  • 避免使用“在X日期我们讨论了...”,除非日期上下文至关重要
  • 在每个新建/更新的页面添加
    summary:
    前置元数据(1-2句话,≤200字符)
  • 为每个新页面添加置信度和生命周期字段:
    yaml
    base_confidence: 0.42
    lifecycle: draft
    lifecycle_changed: <今日ISO日期>
    更新页面时保持
    lifecycle
    不变。
  • 添加来源标记:
    • ^[extracted]
      直接来自明确的会话内容(压缩/分支摘要、助手明确陈述)
    • ^[inferred]
      从多个会话中合成模式,或从工具调用中推断
    • ^[ambiguous]
      会话内容存在冲突,或压缩摘要与后续对话矛盾
  • 为每个修改的页面添加/更新
    provenance:
    前置元数据
llm-wiki
约定标记来源
  • compaction
    branch_summary
    条目已预先提炼——视为主要
    ^[extracted]
  • 对话提炼主要为
    ^[inferred]
    ——你正在从对话中合成信息
  • 当用户在会话中改变想法,或压缩摘要与后续对话矛盾时,使用
    ^[ambiguous]

Step 5: Update Manifest, Log, and Index

步骤5:更新清单、日志和索引

Update
.manifest.json

更新
.manifest.json

For each processed source file:
  • ingested_at
    ,
    size_bytes
    ,
    modified_at
  • source_type
    :
    pi_session
  • project
    : inferred project name from decoded
    cwd
  • pages_created
    ,
    pages_updated
Add/update a top-level summary block:
json
{
  "pi": {
    "source_path": "~/.pi/agent/sessions/",
    "last_ingested": "TIMESTAMP",
    "sessions_ingested": 12,
    "sessions_total": 40,
    "pages_created": 5,
    "pages_updated": 12
  }
}
对于每个处理的源文件:
  • ingested_at
    size_bytes
    modified_at
  • source_type
    :
    pi_session
  • project
    : 从解码后的
    cwd
    推断项目名称
  • pages_created
    pages_updated
添加/更新顶级摘要块:
json
{
  "pi": {
    "source_path": "~/.pi/agent/sessions/",
    "last_ingested": "TIMESTAMP",
    "sessions_ingested": 12,
    "sessions_total": 40,
    "pages_created": 5,
    "pages_updated": 12
  }
}

Update special files

更新特殊文件

Update
index.md
and
log.md
:
- [TIMESTAMP] PI_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full
hot.md
— Read
$OBSIDIAN_VAULT_PATH/hot.md
(create from the template in
wiki-ingest
if missing). Update Recent Activity with a one-line summary — e.g. "Ingested 12 Pi sessions across 3 projects; surfaced patterns in CLI tooling and API design." Keep the last 3 operations. Update
updated
timestamp.
更新
index.md
log.md
- [TIMESTAMP] PI_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full
hot.md
— 读取
$OBSIDIAN_VAULT_PATH/hot.md
(如果缺失,从
wiki-ingest
中的模板创建)。更新近期活动,添加一行摘要——例如“导入了3个项目下的12个Pi会话;发现了CLI工具和API设计中的模式。”保留最近3次操作。更新
updated
时间戳。

Privacy and Compliance

隐私与合规

  • Distill and synthesize; avoid raw transcript dumps
  • Default to redaction for anything that looks sensitive
  • Ask the user before storing personal or sensitive details
  • Keep references to other people minimal and purpose-bound
  • 提炼和合成内容,避免直接转储原始对话
  • 默认编辑所有看似敏感的内容
  • 在存储个人或敏感细节前询问用户
  • 尽量减少对他人的引用,且仅用于特定目的

Reference

参考

See
references/pi-data-format.md
for field-level parsing notes and extraction guidance.
有关字段级解析说明和提取指南,请参阅
references/pi-data-format.md

QMD Refresh After Vault Writes

写入库后刷新QMD

QMD is a search index, not the source of truth. If
$QMD_WIKI_COLLECTION
is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.
Use
$QMD_CLI
if set; otherwise use
qmd
.
bash
${QMD_CLI:-qmd} update
If the output says vectors are needed or embeddings may be stale, run:
bash
${QMD_CLI:-qmd} embed
Verify the collection with either:
bash
${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"
or, when a specific page path is known:
bash
${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5
Record one of:
  • QMD refreshed: update + embed + verified
  • QMD refreshed: update only + verified
  • QMD skipped: QMD_WIKI_COLLECTION unset
  • QMD skipped: qmd CLI unavailable
  • QMD failed: <short error summary>
QMD是搜索索引,而非数据源。如果
$QMD_WIKI_COLLECTION
为空或未设置,跳过此步骤。仅在此技能写入或重写库中的markdown后执行。如果QMD刷新失败,请勿回滚库中的更改;单独报告QMD状态。
如果已设置
$QMD_CLI
则使用它;否则使用
qmd
bash
${QMD_CLI:-qmd} update
如果输出显示需要向量或嵌入可能过时,运行:
bash
${QMD_CLI:-qmd} embed
通过以下方式验证集合:
bash
${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"
或者,当已知特定页面路径时:
bash
${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5
记录以下状态之一:
  • QMD refreshed: update + embed + verified
  • QMD refreshed: update only + verified
  • QMD skipped: QMD_WIKI_COLLECTION unset
  • QMD skipped: qmd CLI unavailable
  • QMD failed: <简短错误摘要>