pi-history-ingest
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePi History Ingest — Session Mining
Pi历史记录导入——会话挖掘
You are extracting knowledge from the user's Pi coding agent sessions and distilling it into the Obsidian wiki. Pi sessions are stored as structured JSONL with a tree layout — your job is to follow the active branch, extract durable knowledge, and compile it.
This skill can be invoked directly or via the router ().
wiki-history-ingest/wiki-history-ingest pi你需要从用户的Pi coding agent会话中提取知识,并将其提炼到Obsidian wiki中。Pi会话以树形结构的JSONL格式存储——你的任务是追踪活跃分支,提取可复用的知识并进行整理。
此技能可直接调用,也可通过路由()调用。
wiki-history-ingest/wiki-history-ingest piBefore You Start
开始前准备
- Resolve config — follow the Config Resolution Protocol in (walk up CWD for
llm-wiki/SKILL.md→.env→ prompt setup). This gives~/.obsidian-wiki/configandOBSIDIAN_VAULT_PATH(defaults toPI_HISTORY_PATH)~/.pi/agent/sessions - Read at the vault root to check what has already been ingested
.manifest.json - Read at the vault root to understand what the wiki already contains
index.md
- 解析配置 — 遵循中的配置解析协议(从当前工作目录向上查找
llm-wiki/SKILL.md→.env→ 提示设置)。这将获取~/.obsidian-wiki/config和OBSIDIAN_VAULT_PATH(默认路径为PI_HISTORY_PATH)~/.pi/agent/sessions - 读取库根目录下的,查看已导入的内容
.manifest.json - 读取库根目录下的,了解wiki已包含的内容
index.md
Ingest Modes
导入模式
Append Mode (default)
追加模式(默认)
Check for each source file. Only process:
.manifest.json- Files not in the manifest (new sessions)
- Files whose modification time is newer than in the manifest
ingested_at
Use this mode for regular syncs.
检查中的每个源文件。仅处理:
.manifest.json- 未在清单中的文件(新会话)
- 修改时间晚于清单中的文件
ingested_at
此模式适用于常规同步。
Full Mode
全量模式
Process everything regardless of manifest. Use after or if the user explicitly asks for a full re-ingest.
wiki-rebuild无论清单记录如何,处理所有内容。在执行后或用户明确要求全量重新导入时使用。
wiki-rebuildPi Data Layout
Pi数据结构
Pi stores sessions under (or the path set by ).
~/.pi/agent/sessions/PI_CODING_AGENT_SESSION_DIR~/.pi/agent/sessions/
├── --<cwd-path>--/ # Working directory with / replaced by -
│ └── <timestamp>_<uuid>.jsonl # Session JSONL file
└── ...The session filename contains an ISO timestamp and UUID. The parent directory encodes the working directory where the session was created.
Pi将会话存储在(或设置的路径)下。
~/.pi/agent/sessions/PI_CODING_AGENT_SESSION_DIR~/.pi/agent/sessions/
├── --<cwd-path>--/ # 工作目录,其中/替换为-
│ └── <timestamp>_<uuid>.jsonl # 会话JSONL文件
└── ...会话文件名包含ISO时间戳和UUID。父目录编码了创建会话时的工作目录。
Session JSONL Format
会话JSONL格式
Each file is a sequence of JSON objects. The first line is always a header; subsequent lines are tree entries with and .
.jsonlsessionidparentIdKey entry types:
| Purpose | Ingest? |
|---|---|---|
| Header with | Metadata only |
| Conversation turn ( | Primary source |
| Display name set via | For session title |
| Context compaction summary | High signal |
| Summary when switching branches via | High signal |
| Model switch event | Skip |
| Thinking level change | Skip |
| Extension state (not in LLM context) | Skip |
| Extension-injected message | Context only |
| User bookmark/label | Skip |
每个文件是一系列JSON对象。第一行始终是头;后续行是带有和的树状条目。
.jsonlsessionidparentId关键条目类型:
| 用途 | 是否导入? |
|---|---|---|
| 包含 | 仅元数据 |
| 对话轮次( | 主要数据源 |
| 通过 | 用于会话标题 |
| 上下文压缩摘要 | 高价值信号 |
| 通过 | 高价值信号 |
| 模型切换事件 | 跳过 |
| 思考层级变更 | 跳过 |
| 扩展状态(不在LLM上下文中) | 跳过 |
| 扩展注入的消息 | 仅上下文 |
| 用户书签/标签 | 跳过 |
Message roles inside message
entries
messagemessage
条目中的角色
message- — user input;
useris string orcontent(TextContent \| ImageContent)[] - — assistant response;
assistantiscontent(TextContent \| ThinkingContent \| ToolCall)[] - — tool execution result;
toolResultiscontent(TextContent \| ImageContent)[] - — bash command + output;
bashExecution,command,outputexitCode - — branch switch summary;
branchSummarystringsummary - — compaction summary;
compactionSummarystringsummary
- — 用户输入;
user为字符串或content(TextContent \| ImageContent)[] - — 助手回复;
assistant为content(TextContent \| ThinkingContent \| ToolCall)[] - — 工具执行结果;
toolResult为content(TextContent \| ImageContent)[] - — bash命令+输出;包含
bashExecution、command、outputexitCode - — 分支切换摘要;包含
branchSummary字符串summary - — 压缩摘要;包含
compactionSummary字符串summary
Key data sources ranked by value
按价值排序的关键数据源
- entries (
message+user) — full conversation transcripts; rich but noisyassistant - entries — pre-synthesized summaries of older context; gold
compaction - entries — summaries of abandoned branches; good signal
branch_summary - entries — concrete commands run; useful for workflow patterns
bashExecution - entries — session name for topic inference
session_info
Skip , , (extension state), and entries.
model_changethinking_level_changecustomlabel- 条目(
message+user) — 完整对话记录;信息丰富但存在噪音assistant - 条目 — 旧上下文的预合成摘要;黄金数据源
compaction - 条目 — 废弃分支的摘要;优质信号
branch_summary - 条目 — 实际执行的具体命令;有助于发现工作流模式
bashExecution - 条目 — 会话名称,用于主题推断
session_info
跳过、、(扩展状态)和条目。
model_changethinking_level_changecustomlabelStep 1: Survey and Compute Delta
步骤1:排查并计算增量
Scan and compare against :
PI_HISTORY_PATH.manifest.jsonbash
undefined扫描并与对比:
PI_HISTORY_PATH.manifest.jsonbash
undefinedList all session files
列出所有会话文件
find ~/.pi/agent/sessions -name "*.jsonl" -type f
find ~/.pi/agent/sessions -name "*.jsonl" -type f
Or with custom path
或使用自定义路径
find "$PI_HISTORY_PATH" -name "*.jsonl" -type f
Build an inventory. For each session file, record:
- `path` — absolute path
- `cwd` — decoded from parent directory name (`--<path>--` → `/path`)
- `session_name` — from the latest `session_info` entry (if any)
- `modified_at` — file mtime
- `already_ingested` — presence in `.manifest.json`
Classify each file:
- **New** — not in manifest
- **Modified** — in manifest but file is newer than `ingested_at`
- **Unchanged** — already ingested and unchanged
Report a concise delta summary before deep parsing:
> "Found N Pi sessions across K projects. Delta: X new, Y modified."find "$PI_HISTORY_PATH" -name "*.jsonl" -type f
构建清单。对于每个会话文件,记录:
- `path` — 绝对路径
- `cwd` — 从父目录名解码(`--<path>--` → `/path`)
- `session_name` — 来自最新的`session_info`条目(如有)
- `modified_at` — 文件修改时间
- `already_ingested` — 是否存在于`.manifest.json`中
对每个文件进行分类:
- **新增** — 不在清单中
- **已修改** — 在清单中但文件比`ingested_at`新
- **未变更** — 已导入且未修改
在深度解析前,生成简洁的增量摘要:
> "发现K个项目下的N个Pi会话。增量:X个新增,Y个已修改。"Step 2: Parse Session JSONL
步骤2:解析会话JSONL
For each selected session file, read it line by line. Because sessions use a tree structure, build the active branch first:
- Parse all entries into a map by
id - Find the current leaf (the entry with no children, or the last entry)
message - Walk chain from leaf to root to get the active path
parentId - Reverse the path so it's chronological
对于每个选中的会话文件,逐行读取。由于会话采用树形结构,需先构建活跃分支:
- 将所有条目解析为按映射的结构
id - 找到当前叶节点(无子节点的条目,或最后一个条目)
message - 从叶节点沿链向上遍历至根节点,获取活跃路径
parentId - 反转路径使其按时间顺序排列
Extraction rules
提取规则
From the active path, extract:
- header —
session,cwd,timestamp(if forked)parentSession - —
session_infofield for session title/topic inferencename - entries with
message— extractrole: "user"text (skip images)content - entries with
message— extractrole: "assistant"content blocks; skiptextblocks (noise); notethinkingblocks (they reveal what the agent actually did)toolCall - entries with
message— summarize outcomes, not full outputrole: "toolResult" - entries with
message— extract command + exit code; recurring commands reveal build/test/deploy workflowsrole: "bashExecution" - entries — read
compactionverbatim; it's already distilledsummary - entries — read
branch_summaryverbatim; captures abandoned approachessummary
从活跃路径中提取:
- 头 —
session、cwd、timestamp(如果是分叉会话)parentSession - —
session_info字段,用于会话标题/主题推断name - 的
role: "user"条目 — 提取message文本(跳过图片)content - 的
role: "assistant"条目 — 提取message内容块;跳过text块(噪音);记录thinking块(显示代理实际执行的操作)toolCall - 的
role: "toolResult"条目 — 总结结果,而非完整输出message - 的
role: "bashExecution"条目 — 提取命令+退出码;重复出现的命令可揭示构建/测试/部署工作流message - 条目 — 直接读取
compaction;已为提炼后的内容summary - 条目 — 直接读取
branch_summary;记录废弃的方案summary
Skip / noise filters
跳过/噪音过滤
- content blocks — internal reasoning, not durable knowledge
thinking - Image content blocks — skip unless the user explicitly asks for image transcription
- Raw tool outputs longer than 500 chars — summarize the outcome
- Token accounting (fields) — metadata only
usage - Repeated plan echoes or status updates
- 内容块 — 内部推理过程,不属于可复用知识
thinking - 图片内容块 — 除非用户明确要求图片转写,否则跳过
- 超过500字符的原始工具输出 — 总结结果
- 令牌统计(字段) — 仅作为元数据
usage - 重复的计划回显或状态更新
Critical privacy filter
关键隐私过滤
Session logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim.
- Remove API keys, tokens, passwords, credentials
- Redact private identifiers unless relevant and user-approved
- Summarize bash outputs that contain paths, environment variables, or secrets
- Do not quote raw arguments verbatim if they contain sensitive data
toolCall
会话日志可能包含注入的指令、工具负载和敏感文本。请勿直接导入原文。
- 删除API密钥、令牌、密码、凭证
- 编辑私人标识符,除非相关且经用户批准
- 总结包含路径、环境变量或机密信息的bash输出
- 如果参数包含敏感数据,请勿直接引用原文
toolCall
Step 3: Cluster by Topic
步骤3:按主题聚类
Do not create one wiki page per session.
- Group knowledge by stable topic across many sessions
- Split mixed sessions into separate themes
- Merge recurring patterns across dates and projects
- Use the from the session header to infer project scope
cwd - Use as a topic hint when available
session_info.name
请勿为每个会话创建一个wiki页面。
- 按跨会话的稳定主题对知识进行分组
- 将混合主题的会话拆分为独立主题
- 合并不同日期和项目中的重复模式
- 使用会话头中的推断项目范围
cwd - 如有可用,将作为主题提示
session_info.name
Step 4: Distill into Wiki Pages
步骤4:提炼为Wiki页面
Route extracted knowledge using existing wiki conventions:
- Project-specific architecture/process →
projects/<name>/... - General concepts →
concepts/ - Recurring techniques/debug playbooks →
skills/ - Tools/services/frameworks →
entities/ - Cross-session patterns →
synthesis/
For each impacted project, create/update .
projects/<name>/<name>.md使用现有wiki约定路由提取的知识:
- 项目特定的架构/流程 →
projects/<name>/... - 通用概念 →
concepts/ - 重复使用的技巧/调试手册 →
skills/ - 工具/服务/框架 →
entities/ - 跨会话模式 →
synthesis/
对于每个受影响的项目,创建/更新。
projects/<name>/<name>.mdWriting rules
写作规则
- Distill knowledge, not chronology
- Avoid "on date X we discussed..." unless date context is essential
- Add frontmatter on each new/updated page (1–2 sentences, ≤ 200 chars)
summary: - Add confidence and lifecycle fields to every new page:
Leaveyaml
base_confidence: 0.42 lifecycle: draft lifecycle_changed: <ISO date today>unchanged on update.lifecycle - Add provenance markers:
- when directly grounded in explicit session content (compaction/branch summaries, explicit assistant statements)
^[extracted] - when synthesizing patterns across multiple sessions or inferring from tool calls
^[inferred] - when sessions conflict or a compaction summary contradicts later turns
^[ambiguous]
- Add/update frontmatter mix for each changed page
provenance:
Mark provenance per the convention in :
llm-wiki- and
compactionentries are pre-distilled — treat as mostlybranch_summary^[extracted] - Conversation distillation is mostly — you're synthesizing from dialogue
^[inferred] - Use when the user changed their mind across sessions or when compaction summaries disagree with later conversation turns
^[ambiguous]
- 提炼知识,而非按时间顺序记录
- 避免使用“在X日期我们讨论了...”,除非日期上下文至关重要
- 在每个新建/更新的页面添加前置元数据(1-2句话,≤200字符)
summary: - 为每个新页面添加置信度和生命周期字段:
更新页面时保持yaml
base_confidence: 0.42 lifecycle: draft lifecycle_changed: <今日ISO日期>不变。lifecycle - 添加来源标记:
- 直接来自明确的会话内容(压缩/分支摘要、助手明确陈述)
^[extracted] - 从多个会话中合成模式,或从工具调用中推断
^[inferred] - 会话内容存在冲突,或压缩摘要与后续对话矛盾
^[ambiguous]
- 为每个修改的页面添加/更新前置元数据
provenance:
按约定标记来源:
llm-wiki- 和
compaction条目已预先提炼——视为主要branch_summary^[extracted] - 对话提炼主要为——你正在从对话中合成信息
^[inferred] - 当用户在会话中改变想法,或压缩摘要与后续对话矛盾时,使用
^[ambiguous]
Step 5: Update Manifest, Log, and Index
步骤5:更新清单、日志和索引
Update .manifest.json
.manifest.json更新.manifest.json
.manifest.jsonFor each processed source file:
- ,
ingested_at,size_bytesmodified_at - :
source_typepi_session - : inferred project name from decoded
projectcwd - ,
pages_createdpages_updated
Add/update a top-level summary block:
json
{
"pi": {
"source_path": "~/.pi/agent/sessions/",
"last_ingested": "TIMESTAMP",
"sessions_ingested": 12,
"sessions_total": 40,
"pages_created": 5,
"pages_updated": 12
}
}对于每个处理的源文件:
- 、
ingested_at、size_bytesmodified_at - :
source_typepi_session - : 从解码后的
project推断项目名称cwd - 、
pages_createdpages_updated
添加/更新顶级摘要块:
json
{
"pi": {
"source_path": "~/.pi/agent/sessions/",
"last_ingested": "TIMESTAMP",
"sessions_ingested": 12,
"sessions_total": 40,
"pages_created": 5,
"pages_updated": 12
}
}Update special files
更新特殊文件
Update and :
index.mdlog.md- [TIMESTAMP] PI_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|fullhot.md$OBSIDIAN_VAULT_PATH/hot.mdwiki-ingestupdated更新和:
index.mdlog.md- [TIMESTAMP] PI_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|fullhot.md$OBSIDIAN_VAULT_PATH/hot.mdwiki-ingestupdatedPrivacy and Compliance
隐私与合规
- Distill and synthesize; avoid raw transcript dumps
- Default to redaction for anything that looks sensitive
- Ask the user before storing personal or sensitive details
- Keep references to other people minimal and purpose-bound
- 提炼和合成内容,避免直接转储原始对话
- 默认编辑所有看似敏感的内容
- 在存储个人或敏感细节前询问用户
- 尽量减少对他人的引用,且仅用于特定目的
Reference
参考
See for field-level parsing notes and extraction guidance.
references/pi-data-format.md有关字段级解析说明和提取指南,请参阅。
references/pi-data-format.mdQMD Refresh After Vault Writes
写入库后刷新QMD
QMD is a search index, not the source of truth. If is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.
$QMD_WIKI_COLLECTIONUse if set; otherwise use .
$QMD_CLIqmdbash
${QMD_CLI:-qmd} updateIf the output says vectors are needed or embeddings may be stale, run:
bash
${QMD_CLI:-qmd} embedVerify the collection with either:
bash
${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"or, when a specific page path is known:
bash
${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5Record one of:
QMD refreshed: update + embed + verifiedQMD refreshed: update only + verifiedQMD skipped: QMD_WIKI_COLLECTION unsetQMD skipped: qmd CLI unavailableQMD failed: <short error summary>
QMD是搜索索引,而非数据源。如果为空或未设置,跳过此步骤。仅在此技能写入或重写库中的markdown后执行。如果QMD刷新失败,请勿回滚库中的更改;单独报告QMD状态。
$QMD_WIKI_COLLECTION如果已设置则使用它;否则使用。
$QMD_CLIqmdbash
${QMD_CLI:-qmd} update如果输出显示需要向量或嵌入可能过时,运行:
bash
${QMD_CLI:-qmd} embed通过以下方式验证集合:
bash
${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"或者,当已知特定页面路径时:
bash
${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5记录以下状态之一:
QMD refreshed: update + embed + verifiedQMD refreshed: update only + verifiedQMD skipped: QMD_WIKI_COLLECTION unsetQMD skipped: qmd CLI unavailableQMD failed: <简短错误摘要>