llm-wiki

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

LLM Wiki

A skill for building and maintaining an LLM-curated knowledge base inside a project, following the pattern Andrej Karpathy described in his April 2026 gist. The wiki is a directory of markdown files that the LLM owns and maintains; the user curates sources and asks questions, and the LLM does the bookkeeping.

这是一项用于在项目内部构建并维护由LLM管理的知识库的技能，遵循Andrej Karpathy在2026年4月的gist中描述的模式。该wiki是由LLM负责管理的markdown文件目录；用户负责整理来源并提出问题，LLM则负责记录工作。

The pattern in one paragraph

模式概述

Conventional RAG re-derives knowledge from raw chunks on every query; nothing accumulates. The LLM Wiki pattern flips this: when a new source arrives, the LLM compiles it once into a persistent, structured wiki — extracting concepts, writing entity pages, updating cross-references, flagging contradictions. Subsequent queries read the pre-synthesized wiki rather than the raw sources. Knowledge compounds. The user is in charge of sourcing and asking good questions; the LLM handles the summarizing, linking, and consistency work that humans abandon wikis over.

传统RAG会在每次查询时从原始片段重新推导知识，不会积累任何内容。LLM Wiki模式则完全相反：当新来源导入时，LLM会一次性将其编译为持久化的结构化wiki——提取概念、撰写实体页面、更新交叉引用、标记矛盾点。后续查询会读取预先合成的wiki，而非原始来源。知识会不断积累。用户负责提供来源和提出优质问题；LLM则处理人类因维护wiki而放弃的总结、链接和一致性工作。

When to use this skill

何时使用此技能

The trigger surface is broad. Any time the user is accumulating textual material over time — research papers, articles, transcripts, meeting notes, book chapters, customer calls, code repos, journal entries — and would benefit from having that material organized rather than dumped into a chat each session, this skill applies. It is equally useful for one source ("ingest this paper") and for the steady-state operations against an existing wiki ("what does my wiki say about diffusion models", "lint the wiki", "what's missing").

If the project does not yet have a wiki, run the bootstrap step first (see "Initializing a new wiki" below). Otherwise, locate the existing wiki and read its

SCHEMA.md

before doing anything else — the schema encodes the conventions for that specific wiki and may override defaults documented here.

触发场景非常广泛。只要用户长期积累文本材料——研究论文、文章、文字记录、会议纪要、书籍章节、客户通话记录、代码仓库、日记条目——并希望对这些材料进行整理，而非每次会话都将其直接导入聊天框，就可以使用此技能。它对单一来源（比如「ingest this paper」）和现有wiki的稳态操作（比如「what does my wiki say about diffusion models」「lint the wiki」「what's missing」）同样适用。

如果项目尚未拥有wiki，请先执行引导步骤（见下文「初始化新wiki」）。否则，在进行任何操作前先找到现有wiki并阅读其

SCHEMA.md

——该schema定义了此特定wiki的约定，可能会覆盖本文档中记录的默认设置。

Architecture: three layers, three operations

架构：三层结构，三种操作

The wiki has three layers and three operations. Internalize this vocabulary because the rest of the skill assumes it.

The three layers are raw sources (the user's curated source material — articles, papers, PDFs, transcripts; immutable, the LLM reads but never modifies them), the wiki (a directory of LLM-generated markdown pages — entity pages, concept pages, comparisons, summaries; the LLM owns this layer entirely), and the schema (a

SCHEMA.md

file at the wiki root that documents the conventions for this particular wiki — page types, naming rules, tag taxonomy, ingest workflow customizations; co-evolved with the user).

The three operations are ingest (a new source arrives; the LLM reads it, writes a summary page, updates relevant entity and concept pages, appends to the log), query (the user asks a question; the LLM navigates the wiki via the index, reads the relevant pages, and synthesizes an answer — often filing the answer back as a new page so the exploration compounds), and lint (a periodic health check; the LLM scans for contradictions, stale claims, orphan pages, missing concepts, broken links).

For the canonical write-up of these operations, read

references/architecture.md

. For the step-by-step procedures, read

references/ingest-workflow.md

references/query-workflow.md

, and

references/lint-workflow.md

as needed.

wiki包含三层结构和三种操作。请牢记这些术语，因为本技能的其余部分均以此为前提。

三层结构分别是：原始来源（用户整理的源材料——文章、论文、PDF、文字记录；不可修改，LLM仅读取但从不修改）、wiki（LLM生成的markdown页面目录——实体页面、概念页面、对比页面、摘要页面；LLM完全掌控此层）、schema（位于wiki根目录的

SCHEMA.md

文件，记录此特定wiki的约定——页面类型、命名规则、标签分类法、导入工作流自定义；与用户共同演进）。

三种操作分别是：导入（新来源导入；LLM读取来源，撰写摘要页面，更新相关实体和概念页面，追加至日志）、查询（用户提出问题；LLM通过索引浏览wiki，读取相关页面，合成答案——通常会将答案作为新页面归档，以便探索过程不断积累）、检查（定期健康检查；LLM扫描矛盾点、过时声明、孤立页面、缺失概念、损坏链接）。

有关这些操作的标准说明，请阅读

references/architecture.md

。如需分步流程，请根据需要阅读

references/ingest-workflow.md

、

references/query-workflow.md

和

references/lint-workflow.md

。

Default project layout

默认项目布局

Unless the user's

SCHEMA.md

says otherwise, the wiki lives in the project at this layout:

<project-root>/
├── wiki/
│   ├── SCHEMA.md              ← conventions, the "config file" — read this FIRST
│   ├── index.md               ← entry point: catalog of all pages with one-line summaries
│   ├── log.md                 ← append-only chronological log of ingests/queries/lints
│   ├── indexes/               ← (appears once index.md shards) per-category indexes
│   ├── entities/              ← pages about specific things (people, products, papers, places)
│   ├── concepts/              ← pages about ideas, methods, frameworks
│   ├── sources/               ← per-source summary pages (one per ingested source)
│   └── synthesis/             ← cross-cutting analyses, comparisons, query results filed back
├── raw/                       ← the user's source material (PDFs, .md clippings, images)
│   └── assets/                ← downloaded images referenced by raw clippings
└── ...

This layout is a default, not a requirement. If the project already has a wiki under a different name (e.g.

kb/

notes/

vault/

), use that. If the user has placed sources outside

raw/

, follow their convention.

除非用户的

SCHEMA.md

另有规定，wiki在项目中的默认布局如下：

<project-root>/
├── wiki/
│   ├── SCHEMA.md              ← 约定，即「配置文件」——请首先阅读此文件
│   ├── index.md               ← 入口点：所有页面的目录，包含单行摘要
│   ├── log.md                 ← 仅追加的导入/查询/检查操作时序日志
│   ├── indexes/               ← （当index.md分片后出现）按类别划分的索引
│   ├── entities/              ← 关于特定事物的页面（人物、产品、论文、地点）
│   ├── concepts/              ← 关于思想、方法、框架的页面
│   ├── sources/               ← 按来源划分的摘要页面（每个导入来源对应一个页面）
│   └── synthesis/             ← 跨领域分析、对比、归档的查询结果
├── raw/                       ← 用户的源材料（PDF、.md片段、图片）
│   └── assets/                ← 原始片段引用的下载图片
└── ...

此布局为默认设置，而非强制要求。如果项目已在其他名称下拥有wiki（例如

kb/

、

notes/

、

vault/

），请使用该名称。如果用户将来源放置在

raw/

之外，请遵循其约定。

The scalability discipline

可扩展性准则

The single biggest failure mode of the LLM Wiki pattern is the wiki itself becoming a context bottleneck. Naive implementations break around a few hundred pages: the LLM either reads too many pages per query or starts hallucinating because it skipped the relevant ones. This skill's design is shaped almost entirely by avoiding that failure. The principles below are non-negotiable; ignoring them is what makes the pattern collapse at scale.

Atomic pages. Every wiki page is about one concept and stays small — soft cap 400 lines or roughly 2,000 words, hard cap 800 lines. When a page outgrows this, split it: extract sub-concepts into their own pages and have the parent link to them. A page that takes up 30% of the context window on its own is a design smell.

Index-first navigation. Never grep or glob the wiki blindly when answering a query. Always read

index.md

(or the relevant sharded index under

indexes/

) first to identify candidate pages, then drill into only those. The index is engineered to be cheap to read — one line per page, no bodies — and it is the cache that makes the whole pattern scalable.

Sharded indexes. When

index.md

itself exceeds ~300 lines or the wiki passes ~150 pages, shard it: move category-specific entries into

indexes/<category>.md

files (e.g.

indexes/entities.md

indexes/concepts.md

indexes/sources.md

, or finer domain shards), and have the top-level

index.md

become a directory of those shards. Now reading the index is a two-step lookup but each step is bounded.

YAML frontmatter on every page. Every wiki page begins with frontmatter that includes at minimum

type

tags

sources

, and

updated

. The bundled

wiki_search.py

script can filter on these without reading page bodies. See

references/page-conventions.md

Surgical edits, not rewrites. When updating a page (e.g. adding a new cross-reference because a freshly ingested source mentions an existing entity), use

str_replace

to touch only the relevant section. Rewriting whole pages is slow, expensive in tokens, and risks losing prior nuance.

Backlink discovery via grep. To find every page that references a given entity, run

grep -rl "\[\[entity-name\]\]" wiki/

rather than reading pages to look for mentions. The bundled scripts make this easy.

Chunked source ingestion. Large raw sources (long PDFs, book chapters, lengthy transcripts) should be read in chunks during ingest, not loaded whole. The ingest workflow handles this — see

references/ingest-workflow.md

Search script for large wikis. Once the wiki passes ~300 pages, plain index lookup may not surface the right pages for fuzzy queries. Use

scripts/wiki_search.py

for BM25-ranked retrieval with optional frontmatter filters. It's a fallback, not the default — index-first is still cheaper when it works.

For the full scaling playbook including thresholds and migration steps, read

references/scaling-playbook.md

LLM Wiki模式最大的失败模式是wiki本身成为上下文瓶颈。朴素的实现会在数百个页面左右崩溃：LLM要么在每次查询时读取过多页面，要么因跳过相关页面而开始产生幻觉。本技能的设计几乎完全围绕避免这一问题展开。以下原则是不可协商的；忽视它们会导致模式在扩展时崩溃。

原子化页面。每个wiki页面仅围绕一个概念展开，且篇幅保持简短——软上限为400行或约2000词，硬上限为800行。当页面超出此范围时，将其拆分：提取子概念到独立页面，并让父页面链接到这些子页面。单个页面占用上下文窗口30%的空间属于设计缺陷。

索引优先导航。回答查询时，切勿盲目扫描或遍历wiki。请始终先读取

index.md

（或

indexes/

下的相关分片索引）以识别候选页面，然后仅深入这些页面。索引经过优化，读取成本极低——每个页面仅一行，无正文内容——它是使整个模式具备可扩展性的缓存。

分片索引。当

index.md

本身超过约300行，或wiki页面数量超过约150个时，对其进行分片：将特定类别的条目移至

indexes/<category>.md

文件（例如

indexes/entities.md

、

indexes/concepts.md

、

indexes/sources.md

，或更细分的领域分片），并让顶层

index.md

成为这些分片的目录。现在读取索引需要两步查找，但每一步的范围都是有限的。

每个页面都包含YAML前置元数据。每个wiki页面都以前置元数据开头，至少包含

type

、

tags

、

sources

和

updated

字段。内置的

wiki_search.py

脚本可以在不读取页面正文的情况下根据这些字段进行筛选。请参阅

references/page-conventions.md

。

精准编辑，而非重写。更新页面时（例如，因新导入的来源提及现有实体而添加新的交叉引用），使用

str_replace

仅修改相关部分。重写整个页面速度慢、令牌成本高，且可能丢失之前的细节。

通过grep发现反向链接。要查找所有引用特定实体的页面，请运行

grep -rl "\[\[entity-name\]\]" wiki/

，而非读取页面查找提及内容。内置脚本简化了此操作。

分块导入来源。大型原始来源（长PDF、书籍章节、冗长的文字记录）应在导入时分块读取，而非一次性加载完整内容。导入工作流会处理此问题——请参阅

references/ingest-workflow.md

。

大型wiki的搜索脚本。当wiki页面数量超过约300个时，单纯的索引查找可能无法为模糊查询找到合适的页面。请使用

scripts/wiki_search.py

进行BM25排名检索，并支持可选的前置元数据筛选。这是一种 fallback 方案，而非默认选项——当索引优先方法可行时，其成本更低。

如需完整的扩展指南，包括阈值和迁移步骤，请阅读

references/scaling-playbook.md

。

Initializing a new wiki

初始化新wiki

If the project does not contain a

wiki/

directory (or whatever the user calls theirs), run the bootstrap script:

bash

python scripts/init_wiki.py <project-root> [--wiki-dir wiki] [--raw-dir raw]

This creates the directory structure, drops in templates for

SCHEMA.md

index.md

, and

log.md

, and seeds a starter page convention document. After bootstrapping, briefly walk the user through the schema and ask whether they want to customize anything (e.g. domain-specific page types, custom tags) before the first ingest. The schema is meant to evolve — encourage editing it.

Then propose wiring the wiki into the project's agent-memory file so the running agent remembers the wiki in future sessions without being told. The target file depends on the agent:

CLAUDE.md

for Claude Code,

AGENTS.md

for Codex / Cursor / OpenCode / Pi / OpenClaw,

GEMINI.md

for Gemini CLI, with

AGENTS.md

as the safe default if the user runs multiple agents or is unsure. Full workflow, canonical stanza, and a three-line short variant are in

references/agent-memory-integration.md

. Never write to the memory file without the user's approval — show them the proposed stanza, ask whether to append to an existing file or create a new one, and honour a "skip" answer without pushing.

如果项目中没有

wiki/

目录（或用户自定义的名称），请运行引导脚本：

bash

python scripts/init_wiki.py <project-root> [--wiki-dir wiki] [--raw-dir raw]

此脚本会创建目录结构，生成

SCHEMA.md

、

index.md

和

log.md

的模板，并植入初始页面约定文档。引导完成后，向用户简要介绍schema，并询问他们是否需要自定义某些内容（例如特定领域的页面类型、自定义标签），然后再进行首次导入。schema旨在不断演进——鼓励用户对其进行编辑。

然后建议将wiki连接到项目的agent-memory文件，以便运行中的agent在未来会话中无需告知即可记住wiki。目标文件取决于agent：Claude Code对应

CLAUDE.md

，Codex / Cursor / OpenCode / Pi / OpenClaw对应

AGENTS.md

，Gemini CLI对应

GEMINI.md

；如果用户运行多个agent或不确定，

AGENTS.md

是安全的默认选项。完整工作流、标准段落和三行简化变体均在

references/agent-memory-integration.md

中。未经用户批准，切勿写入内存文件——向他们展示拟添加的段落，询问是追加到现有文件还是创建新文件，并尊重「跳过」的答复，不要强制操作。

The ingest workflow (summary)

导入工作流（摘要）

The full workflow is in

references/ingest-workflow.md

; what follows is the shape of it. When a new source arrives, first write the source itself into

raw/

(verbatim; if it's a web article, use the markdown form). Then read the source — chunked if large — and write a single source-summary page in

wiki/sources/

, named after the source slug, with full frontmatter and citations back to the raw file. Then identify which existing entity and concept pages this source touches; for each, surgically update the relevant section using

str_replace

rather than rewriting. Identify any new entities or concepts the source introduces and create new pages for them, linking from related existing pages so they don't become orphans. Update

index.md

(or the relevant shard) with the new pages. Append a single line to

log.md

with the date, operation type, and source title. Discuss the takeaways with the user as a final step — what surprised them, what's worth following up on — and offer to file that discussion back as a synthesis page.

完整工作流请参阅

references/ingest-workflow.md

；以下是其概述。当新来源导入时，首先将来源本身写入

raw/

（原文原样；如果是网页文章，请使用markdown格式）。然后读取来源——如果是大型来源则分块读取——并在

wiki/sources/

中撰写单个来源摘要页面，以来源slug命名，包含完整的前置元数据和指向原始文件的引用。然后识别此来源涉及的现有实体和概念页面；对于每个页面，使用

str_replace

精准更新相关部分，而非重写。识别来源引入的任何新实体或概念，并为其创建新页面，从相关现有页面链接到这些新页面，避免成为孤立页面。更新

index.md

（或相关分片）以添加新页面。在

log.md

中追加一行，包含日期、操作类型和来源标题。最后与用户讨论要点——哪些内容让他们感到惊讶，哪些值得跟进——并提议将讨论内容作为合成页面归档。

The query workflow (summary)

查询工作流（摘要）

Full version in

references/query-workflow.md

. To answer a query against the wiki: read

index.md

(or the relevant shard) first; identify candidate pages from one-line summaries; read those pages (and any backlinks they list that look relevant); synthesize the answer with

[[wikilink]]

citations to the pages you used; offer to file the synthesized answer back into

wiki/synthesis/

so future queries benefit. If the index doesn't surface good candidates, fall back to

python scripts/wiki_search.py "query terms"

for ranked retrieval. If the wiki appears to lack coverage of the topic, say so plainly rather than confabulating — flag it as a candidate ingest target.

完整版本请参阅

references/query-workflow.md

。要针对wiki回答查询：首先读取

index.md

（或相关分片）；从单行摘要中识别候选页面；读取这些页面（以及它们列出的任何看起来相关的反向链接）；使用

[[wikilink]]

引用所使用的页面来合成答案；提议将合成答案归档到

wiki/synthesis/

，以便未来查询受益。如果索引未找到合适的候选页面，请回退到

python scripts/wiki_search.py "query terms"

进行排名检索。如果wiki似乎缺乏该主题的相关内容，请直接告知用户，而非编造内容——将其标记为候选导入目标。

The lint workflow (summary)

检查工作流（摘要）

Full version in

references/lint-workflow.md

. Lint is best run on a cadence (after every N ingests or weekly), not on every operation. The bundled

python scripts/wiki_lint.py

finds the structural issues automatically — orphan pages, broken

[[wikilinks]]

, oversized pages, missing or malformed frontmatter, stale

updated

dates. Then for the semantic checks that need an LLM: read recently-updated pages and look for contradictions with older claims, identify concepts mentioned but lacking their own page, and suggest gaps the user might want to fill via web search or new sources. Always present lint findings as proposed edits for the user to approve, not as faits accomplis — the wiki is the user's, and silent rewrites erode trust.

完整版本请参阅

references/lint-workflow.md

。检查最好定期进行（每N次导入后或每周一次），而非每次操作都进行。内置的

python scripts/wiki_lint.py

会自动发现结构性问题——孤立页面、损坏的

[[wikilinks]]

、过大页面、缺失或格式错误的前置元数据、过时的

updated

日期。然后，对于需要LLM的语义检查：读取最近更新的页面，查找与旧声明的矛盾点，识别已提及但缺乏独立页面的概念，并建议用户可能需要通过网络搜索或新来源填补的空白。始终将检查结果作为拟议编辑提交给用户批准，而非直接生效——wiki属于用户，静默重写会损害信任。

Failure modes to guard against

需要防范的失败模式

The community discussion around the gist surfaced several failure modes worth internalizing because they are how this pattern goes wrong. Read them so you know what to actively avoid.

The first is silent corruption — a misreading of one source becomes an authoritative-looking wiki page, which then influences how subsequent sources are interpreted, and the error compounds invisibly. Mitigation: every wiki claim must carry a

sources:

frontmatter entry pointing back to the raw file, and the lint pass should surface any claim whose source can't be located. When in doubt during ingest, hedge in the wiki text ("the source claims X, though this is not yet corroborated by other sources") rather than asserting.

The second is wiki-reads-its-own-output drift — the LLM begins treating prior wiki pages as ground truth and stops checking them against raw sources. Mitigation: during ingest, when updating an existing page, re-read the relevant raw source for the existing claim before merging the new one. Don't take the wiki's word for what the source said.

The third is the maintenance ratchet — a critic in the gist comments noted that LLM-Wikis can require more human work over time, not less, because the LLM's output needs increasing supervision as the wiki grows. The mitigation is the scalability discipline above (sharded indexes, atomic pages, frontmatter, lint scripts) plus a strong cadence of lint passes. If lint reports start exceeding what the user can review, the wiki has outgrown its conventions and the schema needs revision.

The fourth is scope creep into bad fits — the pattern shines for accumulating textual research and degrades for highly relational data (org charts, financial ledgers, structured datasets) where a real database would serve better. If the user's domain is fundamentally relational, say so and suggest a better tool rather than forcing markdown to do the wrong job.

围绕该gist的社区讨论指出了几种值得注意的失败模式，因为这些是该模式出错的常见原因。请了解这些模式，以便主动避免。

第一种是静默损坏——对某个来源的误读变成了看似权威的wiki页面，进而影响后续来源的解读，错误会无形地积累。缓解措施：每个wiki声明必须包含指向原始文件的

sources:

前置元数据条目，检查步骤应指出任何无法找到来源的声明。导入时如有疑问，请在wiki文本中保留不确定性（例如「来源声称X，但尚未得到其他来源的证实」），而非断言。

第二种是wiki自输出漂移——LLM开始将之前的wiki页面视为事实依据，不再对照原始来源进行检查。缓解措施：导入时更新现有页面之前，重新读取原始来源中与现有声明相关的内容，再合并新内容。不要轻信wiki对来源内容的描述。

第三种是维护棘轮效应——gist评论中的一位批评者指出，LLM Wiki随着时间推移可能需要更多的人工工作，而非更少，因为随着wiki的增长，LLM的输出需要越来越多的监督。缓解措施是上述可扩展性准则（分片索引、原子化页面、前置元数据、检查脚本）加上严格的检查周期。如果检查报告超出用户可审阅的范围，说明wiki已超出其现有约定，需要修订schema。

第四种是范围蔓延至不适用场景——该模式在积累文本研究方面表现出色，但在处理高度关系型数据（组织结构图、财务分类账、结构化数据集）时效果不佳，此时真实数据库会更合适。如果用户的领域本质上是关系型的，请告知用户并建议使用更合适的工具，而非强制markdown完成不适合的工作。

Reference files

参考文件

The reference files are the source of truth for the detailed procedures. Read them when the relevant operation is happening, not preemptively.

```
references/architecture.md
```
— the three layers and three operations explained in depth, with examples of page formats and the rationale behind each design choice
```
references/ingest-workflow.md
```
— the step-by-step ingest procedure including chunked reading for large sources and the per-page-type templates
```
references/query-workflow.md
```
— navigation patterns from index → page → backlinks, when to fall back to the search script, and how to file answers back as synthesis pages
```
references/lint-workflow.md
```
— what to check, how to present findings, and the cadence
```
references/page-conventions.md
```
— frontmatter schema, page naming, link syntax, page-type definitions, sizing rules
```
references/scaling-playbook.md
```
— thresholds at which to shard the index, when to introduce the search script, signals that the wiki has outgrown its current conventions
```
references/agent-memory-integration.md
```
— how to wire the wiki into the project's agent-memory file (
```
CLAUDE.md
```
/
```
AGENTS.md
```
/
```
GEMINI.md
```
), canonical stanza and short variant, and the bootstrap conversation script

参考文件是详细流程的权威来源。请在进行相关操作时阅读，无需预先阅读。

```
references/architecture.md
```
— 深入解释三层结构和三种操作，包含页面格式示例以及每个设计选择的理由
```
references/ingest-workflow.md
```
— 分步导入流程，包括大型来源的分块读取和按页面类型划分的模板
```
references/query-workflow.md
```
— 从索引→页面→反向链接的导航模式、何时回退到搜索脚本、以及如何将答案归档为合成页面
```
references/lint-workflow.md
```
— 检查内容、结果呈现方式以及检查周期
```
references/page-conventions.md
```
— 前置元数据schema、页面命名、链接语法、页面类型定义、篇幅规则
```
references/scaling-playbook.md
```
— 索引分片的阈值、引入搜索脚本的时机、wiki已超出当前约定的信号
```
references/agent-memory-integration.md
```
— 如何将wiki连接到项目的agent-memory文件（
```
CLAUDE.md
```
/
```
AGENTS.md
```
/
```
GEMINI.md
```
）、标准段落和简化变体、以及引导对话脚本

Bundled scripts

内置脚本

The scripts are intentionally minimal — they exist so the LLM doesn't reinvent the same helpers on every invocation. Each is documented in its own

--help

output and at the top of the file.

```
scripts/init_wiki.py
```
— bootstrap a new wiki structure in a project, seeding the templates
```
scripts/wiki_search.py
```
— BM25 search over wiki pages with optional frontmatter filters; fallback when index navigation doesn't surface the right pages
```
scripts/wiki_lint.py
```
— structural health check (orphans, broken links, oversized pages, frontmatter validation, stale dates)
```
scripts/wiki_stats.py
```
— quick summary of wiki size, page count by type, link density; useful for deciding when to shard the index

脚本设计得尽可能简洁——它们的存在是为了避免LLM在每次调用时重复创建相同的辅助工具。每个脚本的

--help

输出和文件顶部都有文档说明。

```
scripts/init_wiki.py
```
— 在项目中引导新的wiki结构，生成模板
```
scripts/wiki_search.py
```
— 对wiki页面进行BM25搜索，支持可选的前置元数据筛选；当索引导航无法找到合适页面时作为 fallback
```
scripts/wiki_lint.py
```
— 结构性健康检查（孤立页面、损坏链接、过大页面、前置元数据验证、过时日期）
```
scripts/wiki_stats.py
```
— 快速汇总wiki大小、按类型划分的页面数量、链接密度；有助于决定何时对索引进行分片

Templates

模板

The templates in

assets/

are starting points — they get copied into the user's wiki on bootstrap and then evolve under the user's editing.

```
assets/SCHEMA.md.template
```
— the canonical schema document for a new wiki
```
assets/index.md.template
```
— the empty index file
```
assets/log.md.template
```
— the empty log file
```
assets/page.md.template
```
— a generic wiki page with the frontmatter scaffold

assets/

中的模板是起点——它们会在引导时复制到用户的wiki中，然后在用户的编辑下不断演进。

```
assets/SCHEMA.md.template
```
— 新wiki的标准schema文档
```
assets/index.md.template
```
— 空索引文件
```
assets/log.md.template
```
— 空日志文件
```
assets/page.md.template
```
— 包含前置元数据框架的通用wiki页面