llm-wiki
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLLM Wiki — Knowledge Distillation Pattern
LLM Wiki — 知识蒸馏模式
You are maintaining a persistent, compounding knowledge base. The wiki is not a chatbot — it is a compiled artifact where knowledge is distilled once and kept current, not re-derived on every query.
你正在维护一个持久化、可积累的知识库。这个wiki不是聊天机器人,它是一个编译产物,知识仅需蒸馏一次并保持更新,无需在每次查询时重新推导。
Three-Layer Architecture
三层架构
Layer 1: Raw Sources (immutable)
第一层:原始源数据(不可变)
The user's original documents — articles, papers, notes, PDFs, conversation logs, bookmarks, and images (screenshots, whiteboard photos, diagrams, slide captures). These are never modified by the system. They live wherever the user keeps them (configured via in ). Images are first-class sources: the ingest skills read them via the Read tool's vision support and treat their interpreted content as inferred unless it's verbatim transcribed text. Image ingestion requires a vision-capable model — models without vision support should skip image sources and report which files were skipped.
OBSIDIAN_SOURCES_DIR.envThink of raw sources as the "source code" — authoritative but hard to query directly.
用户的原始文档——文章、论文、笔记、PDF、对话日志、书签、以及图片(截图、白板照片、示意图、幻灯片截图)。系统永远不会修改这些内容,它们存储在用户指定的位置(通过中的配置)。图片是一等源数据:摄入技能通过Read工具的视觉支持读取图片,将其解析内容视为推断内容,除非是逐字转录的文本。图片摄入需要支持视觉能力的模型,不支持视觉的模型应跳过图片源并报告跳过的文件。
.envOBSIDIAN_SOURCES_DIR可以把原始源数据想象成「源代码」:权威但难以直接查询。
Layer 2: The Wiki (LLM-maintained)
第二层:Wiki(LLM维护)
A collection of interconnected Obsidian-compatible markdown files organized by category. This is the compiled knowledge — synthesized, cross-referenced, and navigable. Each page has:
- YAML frontmatter (title, category, tags, sources, timestamps)
- Obsidian connecting related concepts
[[wikilinks]] - Clear provenance — every claim traces back to a source
The wiki lives at the path configured via in .
OBSIDIAN_VAULT_PATH.env一系列相互关联、兼容Obsidian的markdown文件,按类别组织。这是编译后的知识:经过合成、交叉引用、可导航。每个页面包含:
- YAML frontmatter(标题、类别、标签、来源、时间戳)
- Obsidian 连接相关概念
[[wikilinks]] - 清晰的来源追溯——每条主张都可溯源到对应来源
wiki存储在中配置的路径下。
.envOBSIDIAN_VAULT_PATHLayer 3: The Schema (this skill + config)
第三层:模式(本技能 + 配置)
The rules governing how the wiki is structured — categories, conventions, page templates, and operational workflows. The schema tells the LLM how to maintain the wiki.
管控wiki结构的规则:类别、约定、页面模板、操作工作流。模式告诉LLM 如何 维护wiki。
Wiki Organization
Wiki组织
The vault has two levels of structure: categories (what kind of knowledge) and projects (where the knowledge came from).
vault有两层结构:类别(知识的类型)和项目(知识的来源)。
Categories
类别
Organize pages into these default categories (customizable in ):
.env| Category | Purpose | Example |
|---|---|---|
| Ideas, theories, mental models | |
| People, orgs, tools, projects | |
| How-to knowledge, procedures | |
| Summaries of specific sources | |
| Cross-cutting analysis across sources | |
| Timestamped observations, session logs | |
将页面归类到以下默认类别中(可在中自定义):
.env| 类别 | 用途 | 示例 |
|---|---|---|
| 创意、理论、思维模型 | |
| 人物、组织、工具、项目 | |
| 操作指南类知识、流程 | |
| 特定来源的摘要 | |
| 跨来源的交叉分析 | |
| 带时间戳的观察记录、会话日志 | |
Projects
项目
Knowledge often belongs to a specific project. The directory mirrors this:
projects/$OBSIDIAN_VAULT_PATH/
├── projects/
│ ├── my-project/
│ │ ├── my-project.md ← project overview (named after project)
│ │ ├── concepts/ ← project-scoped category pages
│ │ ├── skills/
│ │ └── ...
│ ├── another-project/
│ │ └── ...
│ └── side-project/
│ └── ...
├── concepts/ ← global (cross-project) knowledge
├── entities/
├── skills/
└── ...When knowledge is project-specific (a debugging technique that only applies to one codebase, a project-specific architecture decision), put it under .
projects/<project-name>/<category>/When knowledge is general (a concept like "React Server Components", a person like "Andrej Karpathy", a widely applicable skill), put it in the global category directory.
Cross-referencing: Project pages should to global pages and vice versa. A project's overview page should link to the key concept, skill, and entity pages relevant to that project — whether they live under the project or globally.
[[wikilink]]Naming rule: The project overview file must be named , not . Obsidian's graph view uses the filename as the node label — makes every project appear as in the graph, making it unreadable. So , , etc.
<project-name>.md_project.md_project.md_projectprojects/my-project/my-project.mdprojects/another-project/another-project.mdEach project directory has an overview page structured like this:
markdown
---
title: My Project
category: project
tags: [ai, web, backend]
source_path: ~/.claude/projects/-Users-name-Documents-projects-my-project
created: 2026-03-01T00:00:00Z
updated: 2026-04-06T00:00:00Z
---知识通常归属于特定项目,目录对应这一属性:
projects/$OBSIDIAN_VAULT_PATH/
├── projects/
│ ├── my-project/
│ │ ├── my-project.md ← 项目概览(以项目名命名)
│ │ ├── concepts/ ← 项目级别的类别页面
│ │ ├── skills/
│ │ └── ...
│ ├── another-project/
│ │ └── ...
│ └── side-project/
│ └── ...
├── concepts/ ← 全局(跨项目)知识
├── entities/
├── skills/
└── ...当知识属于特定项目时(仅适用于某个代码库的调试技巧、项目专属的架构决策),将其放在下。
projects/<project-name>/<category>/当知识是通用内容时(比如「React Server Components」这类概念、「Andrej Karpathy」这类人物、广泛适用的技能),将其放在全局类别目录中。
交叉引用: 项目页面应通过链接到全局页面,反之亦然。项目概览页面需要链接到与该项目相关的核心概念、技能、实体页面,无论这些页面是在项目目录下还是全局目录下。
[[wikilink]]命名规则: 项目概览文件必须命名为,不能用。Obsidian的关系图视图会将文件名作为节点标签,会导致所有项目在关系图中都显示为,无法识别。因此要按照、的规则命名。
<project-name>.md_project.md_project.md_projectprojects/my-project/my-project.mdprojects/another-project/another-project.md每个项目目录的概览页面结构如下:
markdown
---
title: My Project
category: project
tags: [ai, web, backend]
source_path: ~/.claude/projects/-Users-name-Documents-projects-my-project
created: 2026-03-01T00:00:00Z
updated: 2026-04-06T00:00:00Z
---My Project
My Project
One-paragraph summary of what this project is.
用一段文字总结该项目的内容。
Key Concepts
核心概念
- [[concepts/some-api]] — used for core functionality
- [[projects/my-project/concepts/main-architecture]] — project-specific architecture
- [[concepts/some-api]] — 用于核心功能
- [[projects/my-project/concepts/main-architecture]] — 项目专属架构
Related
相关内容
- [[entities/some-service]] — deployment platform
undefined- [[entities/some-service]] — 部署平台
undefinedSpecial Files
特殊文件
Every wiki has these files at its root:
每个wiki的根目录下都包含以下文件:
index.md
index.mdindex.md
index.mdA content-oriented catalog organized by category. Each entry has a one-line summary and tags. Rebuild this after every ingest operation. Format:
markdown
undefined按类别组织的内容导向目录,每个条目包含一行简介和标签。每次摄入操作后都需要重建该文件,格式如下:
markdown
undefinedWiki Index
Wiki索引
Concepts
概念
- [[transformer-architecture]] — The dominant architecture for sequence modeling ( #ml #architecture)
- [[attention-mechanism]] — Core building block of transformers ( #ml #fundamentals)
- [[transformer-architecture]] — 序列建模的主流架构 ( #ml #architecture)
- [[attention-mechanism]] — transformer的核心构建块 ( #ml #fundamentals)
Entities
实体
- [[andrej-karpathy]] — AI researcher, educator, former Tesla AI director ( #person #ml)
**Format rule**: Add a space after the opening `(` and tags.
❌ Don't: `description (#tag)` — breaks tag parsing
✅ Do: `description ( #tag)` — proper spacing and tag parsing- [[andrej-karpathy]] — AI研究员、教育者、前特斯拉AI负责人 ( #person #ml)
**格式规则:** 在左括号`(`和标签之间添加空格。
❌ 错误写法:`description (#tag)` — 会导致标签解析失败
✅ 正确写法:`description ( #tag)` — 空格正确,标签可正常解析log.md
log.mdlog.md
log.mdChronological append-only record tracking every operation. Each entry is parseable:
markdown
undefined按时间顺序追加的操作记录,每条记录都可被解析:
markdown
undefinedLog
日志
- [2024-03-15T10:30:00Z] INGEST source="papers/attention.pdf" pages_updated=12 pages_created=3
- [2024-03-15T11:00:00Z] QUERY query="How do transformers handle long sequences?" result_pages=4
- [2024-03-16T09:00:00Z] LINT issues_found=2 orphans=1 contradictions=1
- [2024-03-17T10:00:00Z] ARCHIVE reason="rebuild" pages=87 destination="_archives/..."
- [2024-03-17T10:05:00Z] REBUILD archived_to="_archives/..." previous_pages=87
undefined- [2024-03-15T10:30:00Z] INGEST source="papers/attention.pdf" pages_updated=12 pages_created=3
- [2024-03-15T11:00:00Z] QUERY query="How do transformers handle long sequences?" result_pages=4
- [2024-03-16T09:00:00Z] LINT issues_found=2 orphans=1 contradictions=1
- [2024-03-17T10:00:00Z] ARCHIVE reason="rebuild" pages=87 destination="_archives/..."
- [2024-03-17T10:05:00Z] REBUILD archived_to="_archives/..." previous_pages=87
undefined.manifest.json
.manifest.json.manifest.json
.manifest.jsonTracks every source file that has been ingested — path, timestamps, what wiki pages it produced. This is the backbone of the delta system. See the skill for the full schema.
wiki-statusThe manifest enables:
- Delta computation — what's new or modified since last ingest
- Append mode — only process the delta, not everything
- Audit — which source produced which wiki page
- Staleness detection — source changed but wiki page hasn't been updated
跟踪所有已摄入的源文件:路径、时间戳、生成的wiki页面。这是增量更新系统的核心,完整schema可参考技能。
wiki-status该清单支持以下能力:
- 增量计算 — 上次摄入后新增或修改的内容
- 追加模式 — 仅处理增量内容,而非全部内容
- 审计 — 追溯某个wiki页面对应的来源
- 过期检测 — 源文件已修改但wiki页面未更新
Page Template
页面模板
When creating a new wiki page, use this structure:
markdown
---
title: Page Title
category: concepts
tags: [ml, architecture]
aliases: [alternate name]
sources: [papers/attention.pdf]
summary: One or two sentences, ≤200 chars, so a reader (or another skill) can preview this page without opening it.
provenance:
extracted: 0.72
inferred: 0.25
ambiguous: 0.03
created: 2024-03-15T10:30:00Z
updated: 2024-03-15T10:30:00Z
---创建新wiki页面时,使用以下结构:
markdown
---
title: 页面标题
category: concepts
tags: [ml, architecture]
aliases: [别名]
sources: [papers/attention.pdf]
summary: 1-2句话,不超过200字符,方便用户或其他技能无需打开页面即可预览内容。
provenance:
extracted: 0.72
inferred: 0.25
ambiguous: 0.03
created: 2024-03-15T10:30:00Z
updated: 2024-03-15T10:30:00Z
---Page Title
页面标题
One-paragraph summary of what this page covers.
用一段文字总结该页面覆盖的内容。
Key Ideas
核心观点
- The source's central claim, paraphrased directly.
- A generalization the source implies but doesn't state outright. ^[inferred]
- A figure two sources disagree on. ^[ambiguous]
Use [[wikilinks]] to connect to related pages.
- 源文件核心主张的直接转述。
- 源文件暗示但未直接说明的通用结论。 ^[inferred]
- 两个来源存在分歧的数值。 ^[ambiguous]
使用[[wikilinks]]链接到相关页面。
Open Questions
待解决问题
Things that are unresolved or need more sources.
尚未解决或需要更多来源佐证的内容。
Sources
来源
- [[references/attention-is-all-you-need]] — Original paper
undefined- [[references/attention-is-all-you-need]] — 原始论文
undefinedProvenance Markers
来源标记
Every claim on a wiki page has one of three provenance states. Mark them inline so the reader (and future ingest passes) can tell signal from synthesis.
| State | Marker | Meaning |
|---|---|---|
| Extracted | (no marker — default) | A paraphrase of something a source actually says. |
| Inferred | | An LLM-synthesized claim — a connection, generalization, or implication the source doesn't state directly. |
| Ambiguous | | Sources disagree, or the source is unclear. |
Example:
markdown
- Transformers parallelize across positions, unlike RNNs.
- This is why they scale better on modern hardware. ^[inferred]
- GPT-4 was trained on roughly 13T tokens. ^[ambiguous]Why this syntax:
- is footnote-adjacent in Obsidian — renders cleanly and never collides with
^[...].[[wikilinks]] - Inline (suffix) so a single bullet stays a single bullet.
- Default = extracted means existing pages without markers stay valid.
Frontmatter summary: Optionally surface the rough mix at the page level so the user can scan for speculation-heavy pages without reading them:
yaml
provenance:
extracted: 0.72 # rough fraction of sentences/bullets with no marker
inferred: 0.25
ambiguous: 0.03These are best-effort numbers written by the ingest skill at create/update time. recomputes them and flags drift. The block is optional — pages without it are treated as fully extracted by convention.
wiki-lintwiki页面上的每条主张都属于以下三种来源状态之一。在行内标注这些状态,方便读者和后续的摄入流程区分原始信息和合成内容。
| 状态 | 标记 | 含义 |
|---|---|---|
| 提取内容 | (无标记 — 默认) | 源文件实际表述内容的转述。 |
| 推断内容 | | LLM合成的主张——源文件未直接说明的关联、通用结论或隐含信息。 |
| 存疑内容 | | 来源存在分歧,或源文件表述不清晰。 |
示例:
markdown
- Transformers可以跨位置并行计算,和RNN不同。
- 这是它在现代硬件上扩展性更好的原因。 ^[inferred]
- GPT-4大约使用13T token训练。 ^[ambiguous]使用该语法的原因:
- 和Obsidian的脚注语法接近,渲染效果干净,不会和
^[...]冲突。[[wikilinks]] - 行内后缀形式,保证单个列表项保持完整。
- 默认状态为提取内容,意味着没有标记的现有页面依然有效。
Frontmatter摘要: 可以选择在页面级别展示大致的内容占比,用户无需阅读页面即可快速识别推测内容较多的页面:
yaml
provenance:
extracted: 0.72 # 无标记的句子/列表项的大致占比
inferred: 0.25
ambiguous: 0.03这些是摄入技能在创建/更新页面时写入的估算数值,会重新计算并标记偏差。该模块是可选的,按照约定,没有该模块的页面会被视为全部是提取内容。
wiki-lintRetrieval Primitives
检索原语
Reading the vault is the dominant cost of every read-side skill. Use the cheapest primitive that can answer the question and escalate only when the cheaper one is insufficient. Any skill that needs content from the vault should follow this table rather than jumping straight to full-page reads.
| Need | Primitive | Relative cost |
|---|---|---|
| Does a page exist? What's its title/category/tags? | Read | Cheapest |
| 1–2 sentence preview of a page | Read the | Cheap |
| A specific claim or section inside a page | | Medium |
| Whole-page content | | Expensive — last resort |
| Relationships across pages | | Case-by-case |
The rule: escalate only when the cheaper primitive can't answer the question. If you can answer from fields alone, don't read page bodies. If a grepped section with gives you the claim, don't read the whole page. A 500-line page opened to read 15 lines is 485 lines of wasted tokens.
summary:-A 10 -B 2Why this matters: a 20-page vault lets you get away with full-vault scans. A 200-page vault does not. The primitives above are how the skills framework scales to large vaults without a database.
Skills that consume this table: , , , (insights mode). Any new skill that reads the vault should cite this section rather than reinvent the pattern.
wiki-querycross-linkerwiki-lintwiki-status读取vault是所有读侧技能的主要开销。使用能满足需求的最低开销原语,仅当低开销原语不足以满足需求时再升级。任何需要从vault获取内容的技能都应遵循下表,而不是直接读取全页内容。
| 需求 | 原语 | 相对开销 |
|---|---|---|
| 页面是否存在?它的标题/类别/标签是什么? | 读取 | 最低 |
| 获取页面的1-2句预览内容 | 读取frontmatter中的 | 低 |
| 页面内的特定主张或章节 | | 中等 |
| 全页内容 | | 高 — 最后手段 |
| 跨页面的关系 | 全vault | 视情况而定 |
规则: 仅当低开销原语无法回答问题时再升级。如果仅通过字段就能回答,就不要读取页面正文。如果带的grep结果就能获取需要的主张,就不要读取整个页面。为了读取15行内容打开一个500行的页面,会浪费485行的token。
summary:-A 10 -B 2重要性: 20页的vault可以接受全量扫描,但200页的vault不行。上述原语是技能框架无需数据库即可支撑大型vault的核心。
使用该规则的技能:、、、(洞察模式)。任何读取vault的新技能都应参考本章节,不要重复造轮子。
wiki-querycross-linkerwiki-lintwiki-statusCore Principles
核心原则
-
Compile, don't retrieve. The wiki is pre-compiled knowledge. When you ingest a source, update every relevant page — don't just create a summary of the source.
-
Compound over time. Each ingest should make the wiki smarter, not just bigger. Merge new information into existing pages, resolve contradictions, strengthen cross-references.
-
Provenance matters. Every claim should trace to a source. When updating a page, note which source prompted the update.
-
Mark inferences. Default sentences are extracted. Mark synthesized claims withand contested claims with
^[inferred]. A wiki that hides its guessing rots silently; one that marks it stays trustworthy.^[ambiguous] -
Human curates, LLM maintains. The human decides what sources to add and what questions to ask. The LLM handles the bookkeeping — updating cross-references, maintaining consistency, noting contradictions.
-
Obsidian is the IDE. The user browses and explores the wiki in Obsidian. Everything must be valid Obsidian markdown with working wikilinks.
-
编译,而非检索。 wiki是预编译的知识。摄入源文件时,更新所有相关页面,不要只创建源文件的摘要。
-
随时间持续积累。 每次摄入都应该让wiki更智能,而不仅仅是体积更大。将新信息合并到现有页面,解决矛盾,强化交叉引用。
-
来源追溯很重要。 每条主张都可以溯源到对应的来源。更新页面时,标注触发更新的来源。
-
标记推断内容。 默认句子为提取内容,合成主张用标记,存在争议的主张用
^[inferred]标记。隐藏猜测内容的wiki会默默失效,标记猜测内容的wiki才能保持可信度。^[ambiguous] -
人类负责 curated,LLM负责维护。 人类决定添加哪些来源、提出哪些问题,LLM负责记录工作:更新交叉引用、保持一致性、标注矛盾。
-
Obsidian是IDE。 用户在Obsidian中浏览和探索wiki,所有内容都必须是有效的Obsidian markdown,带有可正常工作的wikilinks。
Environment Variables
环境变量
The wiki is configured through environment variables (see ). The only required variable is the vault path — everything else has sensible defaults.
.env.example- — Where the wiki lives (required)
OBSIDIAN_VAULT_PATH - — Where raw source documents are
OBSIDIAN_SOURCES_DIR - — Comma-separated list of categories
OBSIDIAN_CATEGORIES - — Where to find Claude conversation data
CLAUDE_HISTORY_PATH
No API keys are needed — the agent running these skills already has LLM access built in.
wiki通过环境变量配置(参考)。唯一必填的变量是vault路径,其他所有配置都有合理的默认值。
.env.example- — wiki的存储路径 (必填)
OBSIDIAN_VAULT_PATH - — 原始源文档的存储路径
OBSIDIAN_SOURCES_DIR - — 逗号分隔的类别列表
OBSIDIAN_CATEGORIES - — Claude对话数据的存储路径
CLAUDE_HISTORY_PATH
不需要API密钥——运行这些技能的Agent已经内置了LLM访问权限。
Modes of Operation
操作模式
The wiki supports three ingest modes:
| Mode | When to use | What happens |
|---|---|---|
| Append | Small delta, incremental updates | Compute delta via manifest, ingest only new/modified sources |
| Rebuild | Major drift, fresh start needed | Archive current wiki to |
| Restore | Need to go back | Bring back a previous archive |
Use to see the delta and get a recommendation. Use for archive/rebuild/restore operations.
wiki-statuswiki-rebuildwiki支持三种摄入模式:
| 模式 | 使用场景 | 操作内容 |
|---|---|---|
| 追加 | 少量增量、渐进式更新 | 通过清单计算增量,仅摄入新增/修改的源文件 |
| 重建 | 偏差较大、需要全新开始 | 将当前wiki归档到 |
| 恢复 | 需要回退到历史版本 | 恢复之前的归档版本 |
使用查看增量并获取推荐操作,使用执行归档/重建/恢复操作。
wiki-statuswiki-rebuildReference
参考
For details on specific operations, see the companion skills:
- wiki-status — Audit what's ingested, compute delta, recommend append vs rebuild
- wiki-rebuild — Archive current wiki, rebuild from scratch, or restore from archive
- wiki-ingest — Distill source documents into wiki pages
- claude-history-ingest — Ingest Claude conversation history
- codex-history-ingest — Ingest Codex CLI session history
- data-ingest — Ingest any raw text data
- wiki-query — Answer questions against the wiki
- wiki-lint — Audit and maintain wiki health
- wiki-setup — Initialize a new vault
具体操作的细节可参考配套技能:
- wiki-status — 审计已摄入内容、计算增量、推荐追加或重建操作
- wiki-rebuild — 归档当前wiki、从头重建,或从归档恢复
- wiki-ingest — 将源文档蒸馏为wiki页面
- claude-history-ingest — 摄入Claude对话历史
- codex-history-ingest — 摄入Codex CLI会话历史
- data-ingest — 摄入任意原始文本数据
- wiki-query — 基于wiki回答问题
- wiki-lint — 审计和维护wiki健康度
- wiki-setup — 初始化新的vault