pi-history-ingest

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Pi History Ingest — Session Mining

Pi历史记录导入——会话挖掘

You are extracting knowledge from the user's Pi coding agent sessions and distilling it into the Obsidian wiki. Pi sessions are stored as structured JSONL with a tree layout — your job is to follow the active branch, extract durable knowledge, and compile it.

This skill can be invoked directly or via the

wiki-history-ingest

router (

/wiki-history-ingest pi

你需要从用户的Pi coding agent会话中提取知识，并将其提炼到Obsidian wiki中。Pi会话以树形结构的JSONL格式存储——你的任务是追踪活跃分支，提取可复用的知识并进行整理。

此技能可直接调用，也可通过

wiki-history-ingest

路由（

/wiki-history-ingest pi

）调用。

Before You Start

开始前准备

Resolve config — follow the Config Resolution Protocol in
```
llm-wiki/SKILL.md
```
(walk up CWD for
```
.env
```
→
```
~/.obsidian-wiki/config
```
→ prompt setup). This gives
```
OBSIDIAN_VAULT_PATH
```
and
```
PI_HISTORY_PATH
```
(defaults to
```
~/.pi/agent/sessions
```
)
Read
```
.manifest.json
```
at the vault root to check what has already been ingested
Read
```
index.md
```
at the vault root to understand what the wiki already contains

解析配置 — 遵循
```
llm-wiki/SKILL.md
```
中的配置解析协议（从当前工作目录向上查找
```
.env
```
→
```
~/.obsidian-wiki/config
```
→ 提示设置）。这将获取
```
OBSIDIAN_VAULT_PATH
```
和
```
PI_HISTORY_PATH
```
（默认路径为
```
~/.pi/agent/sessions
```
）
读取库根目录下的
```
.manifest.json
```
，查看已导入的内容
读取库根目录下的
```
index.md
```
，了解wiki已包含的内容

Ingest Modes

导入模式

Append Mode (default)

追加模式（默认）

Check

.manifest.json

for each source file. Only process:

Files not in the manifest (new sessions)
Files whose modification time is newer than
```
ingested_at
```
in the manifest

Use this mode for regular syncs.

检查

.manifest.json

中的每个源文件。仅处理：

未在清单中的文件（新会话）
修改时间晚于清单中
```
ingested_at
```
的文件

此模式适用于常规同步。

Full Mode

全量模式

Process everything regardless of manifest. Use after

wiki-rebuild

or if the user explicitly asks for a full re-ingest.

无论清单记录如何，处理所有内容。在执行

wiki-rebuild

后或用户明确要求全量重新导入时使用。

Pi Data Layout

Pi数据结构

Pi stores sessions under

~/.pi/agent/sessions/

(or the path set by

PI_CODING_AGENT_SESSION_DIR

~/.pi/agent/sessions/
├── --<cwd-path>--/                    # Working directory with / replaced by -
│   └── <timestamp>_<uuid>.jsonl       # Session JSONL file
└── ...

The session filename contains an ISO timestamp and UUID. The parent directory encodes the working directory where the session was created.

Pi将会话存储在

~/.pi/agent/sessions/

（或

PI_CODING_AGENT_SESSION_DIR

设置的路径）下。

~/.pi/agent/sessions/
├── --<cwd-path>--/                    # 工作目录，其中/替换为-
│   └── <timestamp>_<uuid>.jsonl       # 会话JSONL文件
└── ...

会话文件名包含ISO时间戳和UUID。父目录编码了创建会话时的工作目录。

Session JSONL Format

会话JSONL格式

Each

.jsonl

file is a sequence of JSON objects. The first line is always a

session

header; subsequent lines are tree entries with

id

and

parentId

Key entry types:

`type`	Purpose	Ingest?
`session`	Header with `cwd` , `version` , `id` , `timestamp`	Metadata only
`message`	Conversation turn ( `user` , `assistant` , `toolResult` , `bashExecution` , etc.)	Primary source
`session_info`	Display name set via `/name`	For session title
`compaction`	Context compaction summary	High signal
`branch_summary`	Summary when switching branches via `/tree`	High signal
`model_change`	Model switch event	Skip
`thinking_level_change`	Thinking level change	Skip
`custom`	Extension state (not in LLM context)	Skip
`custom_message`	Extension-injected message	Context only
`label`	User bookmark/label	Skip

每个

.jsonl

文件是一系列JSON对象。第一行始终是

session

头；后续行是带有

id

和

parentId

的树状条目。

关键条目类型：

`type`	用途	是否导入？
`session`	包含 `cwd` 、 `version` 、 `id` 、 `timestamp` 的头信息	仅元数据
`message`	对话轮次（ `user` 、 `assistant` 、 `toolResult` 、 `bashExecution` 等）	主要数据源
`session_info`	通过 `/name` 设置的显示名称	用于会话标题
`compaction`	上下文压缩摘要	高价值信号
`branch_summary`	通过 `/tree` 切换分支时的摘要	高价值信号
`model_change`	模型切换事件	跳过
`thinking_level_change`	思考层级变更	跳过
`custom`	扩展状态（不在LLM上下文中）	跳过
`custom_message`	扩展注入的消息	仅上下文
`label`	用户书签/标签	跳过

Message roles inside

message

entries

message

条目中的角色

user

— user input;

content

is string or

(TextContent \| ImageContent)[]

assistant

— assistant response;

content

(TextContent \| ThinkingContent \| ToolCall)[]

toolResult

— tool execution result;

content

(TextContent \| ImageContent)[]

```
bashExecution
```
— bash command + output;
```
command
```
,
```
output
```
,
```
exitCode
```
```
branchSummary
```
— branch switch summary;
```
summary
```
string
```
compactionSummary
```
— compaction summary;
```
summary
```
string

user

— 用户输入；

content

为字符串或

(TextContent \| ImageContent)[]

assistant

— 助手回复；

content

为

(TextContent \| ThinkingContent \| ToolCall)[]

toolResult

— 工具执行结果；

content

为

(TextContent \| ImageContent)[]

```
bashExecution
```
— bash命令+输出；包含
```
command
```
、
```
output
```
、
```
exitCode
```
```
branchSummary
```
— 分支切换摘要；包含
```
summary
```
字符串
```
compactionSummary
```
— 压缩摘要；包含
```
summary
```
字符串

Key data sources ranked by value

按价值排序的关键数据源

message
entries (
user
+
assistant
) — full conversation transcripts; rich but noisy
compaction
entries — pre-synthesized summaries of older context; gold
branch_summary
entries — summaries of abandoned branches; good signal
bashExecution
entries — concrete commands run; useful for workflow patterns
session_info
entries — session name for topic inference

Skip

model_change

thinking_level_change

custom

(extension state), and

label

entries.

message
条目（
user
+
assistant
） — 完整对话记录；信息丰富但存在噪音
compaction
条目 — 旧上下文的预合成摘要；黄金数据源
branch_summary
条目 — 废弃分支的摘要；优质信号
bashExecution
条目 — 实际执行的具体命令；有助于发现工作流模式
session_info
条目 — 会话名称，用于主题推断

跳过

model_change

、

thinking_level_change

、

custom

（扩展状态）和

label

条目。

Step 1: Survey and Compute Delta

步骤1：排查并计算增量

Scan

PI_HISTORY_PATH

and compare against

.manifest.json

bash

undefined

扫描

PI_HISTORY_PATH

并与

.manifest.json

对比：

bash

undefined

List all session files

列出所有会话文件

find ~/.pi/agent/sessions -name "*.jsonl" -type f

Or with custom path

或使用自定义路径

find "$PI_HISTORY_PATH" -name "*.jsonl" -type f


Build an inventory. For each session file, record:
- `path` — absolute path
- `cwd` — decoded from parent directory name (`--<path>--` → `/path`)
- `session_name` — from the latest `session_info` entry (if any)
- `modified_at` — file mtime
- `already_ingested` — presence in `.manifest.json`

Classify each file:
- **New** — not in manifest
- **Modified** — in manifest but file is newer than `ingested_at`
- **Unchanged** — already ingested and unchanged

Report a concise delta summary before deep parsing:
> "Found N Pi sessions across K projects. Delta: X new, Y modified."

find "$PI_HISTORY_PATH" -name "*.jsonl" -type f


构建清单。对于每个会话文件，记录：
- `path` — 绝对路径
- `cwd` — 从父目录名解码（`--<path>--` → `/path`）
- `session_name` — 来自最新的`session_info`条目（如有）
- `modified_at` — 文件修改时间
- `already_ingested` — 是否存在于`.manifest.json`中

对每个文件进行分类：
- **新增** — 不在清单中
- **已修改** — 在清单中但文件比`ingested_at`新
- **未变更** — 已导入且未修改

在深度解析前，生成简洁的增量摘要：
> "发现K个项目下的N个Pi会话。增量：X个新增，Y个已修改。"

Step 2: Parse Session JSONL

步骤2：解析会话JSONL

For each selected session file, read it line by line. Because sessions use a tree structure, build the active branch first:

Parse all entries into a map by
```
id
```
Find the current leaf (the entry with no children, or the last
```
message
```
entry)
Walk
```
parentId
```
chain from leaf to root to get the active path
Reverse the path so it's chronological

对于每个选中的会话文件，逐行读取。由于会话采用树形结构，需先构建活跃分支：

将所有条目解析为按
```
id
```
映射的结构
找到当前叶节点（无子节点的条目，或最后一个
```
message
```
条目）
从叶节点沿
```
parentId
```
链向上遍历至根节点，获取活跃路径
反转路径使其按时间顺序排列

Extraction rules

提取规则

From the active path, extract:

session
header —
```
cwd
```
,
```
timestamp
```
,
```
parentSession
```
(if forked)
session_info
—
```
name
```
field for session title/topic inference
message
entries with
role: "user"
— extract
```
content
```
text (skip images)
message
entries with
role: "assistant"
— extract
```
text
```
content blocks; skip
```
thinking
```
blocks (noise); note
```
toolCall
```
blocks (they reveal what the agent actually did)
message
entries with
role: "toolResult"
— summarize outcomes, not full output
message
entries with
role: "bashExecution"
— extract command + exit code; recurring commands reveal build/test/deploy workflows
compaction
entries — read
```
summary
```
verbatim; it's already distilled
branch_summary
entries — read
```
summary
```
verbatim; captures abandoned approaches

从活跃路径中提取：

session
头 —
```
cwd
```
、
```
timestamp
```
、
```
parentSession
```
（如果是分叉会话）
session_info
—
```
name
```
字段，用于会话标题/主题推断
role: "user"
的
message
条目 — 提取
```
content
```
文本（跳过图片）
role: "assistant"
的
message
条目 — 提取
```
text
```
内容块；跳过
```
thinking
```
块（噪音）；记录
```
toolCall
```
块（显示代理实际执行的操作）
role: "toolResult"
的
message
条目 — 总结结果，而非完整输出
role: "bashExecution"
的
message
条目 — 提取命令+退出码；重复出现的命令可揭示构建/测试/部署工作流
compaction
条目 — 直接读取
```
summary
```
；已为提炼后的内容
branch_summary
条目 — 直接读取
```
summary
```
；记录废弃的方案

Skip / noise filters

跳过/噪音过滤

```
thinking
```
content blocks — internal reasoning, not durable knowledge
Image content blocks — skip unless the user explicitly asks for image transcription
Raw tool outputs longer than 500 chars — summarize the outcome
Token accounting (
```
usage
```
fields) — metadata only
Repeated plan echoes or status updates

```
thinking
```
内容块 — 内部推理过程，不属于可复用知识
图片内容块 — 除非用户明确要求图片转写，否则跳过
超过500字符的原始工具输出 — 总结结果
令牌统计（
```
usage
```
字段） — 仅作为元数据
重复的计划回显或状态更新

Critical privacy filter

关键隐私过滤

Session logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim.

Remove API keys, tokens, passwords, credentials
Redact private identifiers unless relevant and user-approved
Summarize bash outputs that contain paths, environment variables, or secrets
Do not quote raw
```
toolCall
```
arguments verbatim if they contain sensitive data

会话日志可能包含注入的指令、工具负载和敏感文本。请勿直接导入原文。

删除API密钥、令牌、密码、凭证
编辑私人标识符，除非相关且经用户批准
总结包含路径、环境变量或机密信息的bash输出
如果
```
toolCall
```
参数包含敏感数据，请勿直接引用原文

Step 3: Cluster by Topic

步骤3：按主题聚类

Do not create one wiki page per session.

Group knowledge by stable topic across many sessions
Split mixed sessions into separate themes
Merge recurring patterns across dates and projects
Use the
```
cwd
```
from the session header to infer project scope
Use
```
session_info.name
```
as a topic hint when available

请勿为每个会话创建一个wiki页面。

按跨会话的稳定主题对知识进行分组
将混合主题的会话拆分为独立主题
合并不同日期和项目中的重复模式
使用会话头中的
```
cwd
```
推断项目范围
如有可用，将
```
session_info.name
```
作为主题提示

Step 4: Distill into Wiki Pages

步骤4：提炼为Wiki页面

Route extracted knowledge using existing wiki conventions:

Project-specific architecture/process →
```
projects/<name>/...
```
General concepts →
```
concepts/
```
Recurring techniques/debug playbooks →
```
skills/
```
Tools/services/frameworks →
```
entities/
```
Cross-session patterns →
```
synthesis/
```

For each impacted project, create/update

projects/<name>/<name>.md

使用现有wiki约定路由提取的知识：

项目特定的架构/流程 →
```
projects/<name>/...
```
通用概念 →
```
concepts/
```
重复使用的技巧/调试手册 →
```
skills/
```
工具/服务/框架 →
```
entities/
```
跨会话模式 →
```
synthesis/
```

对于每个受影响的项目，创建/更新

projects/<name>/<name>.md

。

Writing rules

写作规则

Distill knowledge, not chronology
Avoid "on date X we discussed..." unless date context is essential
Add
```
summary:
```
frontmatter on each new/updated page (1–2 sentences, ≤ 200 chars)

Add confidence and lifecycle fields to every new page:

yaml

base_confidence: 0.42
lifecycle: draft
lifecycle_changed: <ISO date today>

Leave

lifecycle

unchanged on update.

Add provenance markers:
- ```
^[extracted]
```
  when directly grounded in explicit session content (compaction/branch summaries, explicit assistant statements)
- ```
^[inferred]
```
  when synthesizing patterns across multiple sessions or inferring from tool calls
- ```
^[ambiguous]
```
  when sessions conflict or a compaction summary contradicts later turns
Add/update
```
provenance:
```
frontmatter mix for each changed page

Mark provenance per the convention in

llm-wiki

```
compaction
```
and
```
branch_summary
```
entries are pre-distilled — treat as mostly
```
^[extracted]
```
Conversation distillation is mostly
```
^[inferred]
```
— you're synthesizing from dialogue
Use
```
^[ambiguous]
```
when the user changed their mind across sessions or when compaction summaries disagree with later conversation turns

提炼知识，而非按时间顺序记录
避免使用“在X日期我们讨论了...”，除非日期上下文至关重要
在每个新建/更新的页面添加
```
summary:
```
前置元数据（1-2句话，≤200字符）
为每个新页面添加置信度和生命周期字段：
yaml
```
base_confidence: 0.42
lifecycle: draft
lifecycle_changed: <今日ISO日期>
```
更新页面时保持
```
lifecycle
```
不变。
添加来源标记：
- ```
^[extracted]
```
  直接来自明确的会话内容（压缩/分支摘要、助手明确陈述）
- ```
^[inferred]
```
  从多个会话中合成模式，或从工具调用中推断
- ```
^[ambiguous]
```
  会话内容存在冲突，或压缩摘要与后续对话矛盾
为每个修改的页面添加/更新
```
provenance:
```
前置元数据

按
llm-wiki
约定标记来源：

```
compaction
```
和
```
branch_summary
```
条目已预先提炼——视为主要
```
^[extracted]
```
对话提炼主要为
```
^[inferred]
```
——你正在从对话中合成信息
当用户在会话中改变想法，或压缩摘要与后续对话矛盾时，使用
```
^[ambiguous]
```

Step 5: Update Manifest, Log, and Index

步骤5：更新清单、日志和索引

Update

.manifest.json

更新

.manifest.json

For each processed source file:

```
ingested_at
```
,
```
size_bytes
```
,
```
modified_at
```
```
source_type
```
:
```
pi_session
```
```
project
```
: inferred project name from decoded
```
cwd
```
```
pages_created
```
,
```
pages_updated
```

Add/update a top-level summary block:

json

{
  "pi": {
    "source_path": "~/.pi/agent/sessions/",
    "last_ingested": "TIMESTAMP",
    "sessions_ingested": 12,
    "sessions_total": 40,
    "pages_created": 5,
    "pages_updated": 12
  }
}

对于每个处理的源文件：

```
ingested_at
```
、
```
size_bytes
```
、
```
modified_at
```
```
source_type
```
:
```
pi_session
```
```
project
```
: 从解码后的
```
cwd
```
推断项目名称
```
pages_created
```
、
```
pages_updated
```

添加/更新顶级摘要块：

json

{
  "pi": {
    "source_path": "~/.pi/agent/sessions/",
    "last_ingested": "TIMESTAMP",
    "sessions_ingested": 12,
    "sessions_total": 40,
    "pages_created": 5,
    "pages_updated": 12
  }
}

Update special files

更新特殊文件

Update

index.md

and

log.md

- [TIMESTAMP] PI_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full

hot.md
— Read

$OBSIDIAN_VAULT_PATH/hot.md

(create from the template in

wiki-ingest

if missing). Update Recent Activity with a one-line summary — e.g. "Ingested 12 Pi sessions across 3 projects; surfaced patterns in CLI tooling and API design." Keep the last 3 operations. Update

updated

timestamp.

更新

index.md

和

log.md

：

- [TIMESTAMP] PI_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full

hot.md
— 读取

$OBSIDIAN_VAULT_PATH/hot.md

（如果缺失，从

wiki-ingest

中的模板创建）。更新近期活动，添加一行摘要——例如“导入了3个项目下的12个Pi会话；发现了CLI工具和API设计中的模式。”保留最近3次操作。更新

updated

时间戳。

Privacy and Compliance

隐私与合规

Distill and synthesize; avoid raw transcript dumps
Default to redaction for anything that looks sensitive
Ask the user before storing personal or sensitive details
Keep references to other people minimal and purpose-bound

提炼和合成内容，避免直接转储原始对话
默认编辑所有看似敏感的内容
在存储个人或敏感细节前询问用户
尽量减少对他人的引用，且仅用于特定目的

Reference

参考

See

references/pi-data-format.md

for field-level parsing notes and extraction guidance.

有关字段级解析说明和提取指南，请参阅

references/pi-data-format.md

。

QMD Refresh After Vault Writes

写入库后刷新QMD

QMD is a search index, not the source of truth. If

$QMD_WIKI_COLLECTION

is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.

Use

$QMD_CLI

if set; otherwise use

qmd

bash

${QMD_CLI:-qmd} update

If the output says vectors are needed or embeddings may be stale, run:

bash

${QMD_CLI:-qmd} embed

Verify the collection with either:

bash

${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"

or, when a specific page path is known:

bash

${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5

Record one of:

QMD refreshed: update + embed + verified

```
QMD refreshed: update only + verified
```
```
QMD skipped: QMD_WIKI_COLLECTION unset
```
```
QMD skipped: qmd CLI unavailable
```
```
QMD failed: <short error summary>
```

QMD是搜索索引，而非数据源。如果

$QMD_WIKI_COLLECTION

为空或未设置，跳过此步骤。仅在此技能写入或重写库中的markdown后执行。如果QMD刷新失败，请勿回滚库中的更改；单独报告QMD状态。

如果已设置

$QMD_CLI

则使用它；否则使用

qmd

。

bash

${QMD_CLI:-qmd} update

如果输出显示需要向量或嵌入可能过时，运行：

bash

${QMD_CLI:-qmd} embed

通过以下方式验证集合：

bash

${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"

或者，当已知特定页面路径时：

bash

${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5

记录以下状态之一：

QMD refreshed: update + embed + verified

```
QMD refreshed: update only + verified
```
```
QMD skipped: QMD_WIKI_COLLECTION unset
```
```
QMD skipped: qmd CLI unavailable
```
```
QMD failed: <简短错误摘要>
```

pi-history-ingest

Original

Translation

Pi History Ingest — Session Mining

Pi历史记录导入——会话挖掘

Before You Start

开始前准备

Ingest Modes

导入模式

Append Mode (default)

追加模式（默认）

Full Mode

全量模式

Pi Data Layout

Pi数据结构

Session JSONL Format

会话JSONL格式

Message roles inside message entries

message条目中的角色

Key data sources ranked by value

按价值排序的关键数据源

Step 1: Survey and Compute Delta

步骤1：排查并计算增量

List all session files

列出所有会话文件

Or with custom path

或使用自定义路径

Step 2: Parse Session JSONL

步骤2：解析会话JSONL

Extraction rules

提取规则

Skip / noise filters

跳过/噪音过滤

Critical privacy filter

关键隐私过滤

Step 3: Cluster by Topic

步骤3：按主题聚类

Step 4: Distill into Wiki Pages

步骤4：提炼为Wiki页面

Writing rules

写作规则

Step 5: Update Manifest, Log, and Index

步骤5：更新清单、日志和索引

Update .manifest.json

更新.manifest.json

Update special files

更新特殊文件

Privacy and Compliance

隐私与合规

Reference

参考

QMD Refresh After Vault Writes

写入库后刷新QMD

Message roles inside
`message`
entries

`message`
条目中的角色

Update
`.manifest.json`

更新
`.manifest.json`