llm-wiki-skill — Multi-Platform Knowledge Base Builder
llm-wiki-skill — 多平台知识库构建工具
Skill by
ara.so — Daily 2026 Skills collection.
Build a persistent, interlinked personal knowledge base from URLs, PDFs, markdown files, and raw text. Based on Karpathy's llm-wiki methodology: knowledge is compiled once and maintained, not re-derived from raw docs on every query.
Skill由
ara.so开发,属于2026年度每日技能合集。
基于URL、PDF、Markdown文件和纯文本构建持久化、可关联的个人知识库。遵循Karpathy的llm-wiki方法论:知识一次编译持续维护,无需每次查询都从原始文档重新推导。
- Ingests articles, tweets, PDFs, YouTube transcripts, WeChat posts, and plain text
- Routes each source type to the best extraction tool automatically
- Generates structured wiki pages with
- Produces entity pages, topic pages, source summaries, and comparisons
- Outputs Obsidian-compatible local markdown files
- Detects orphaned pages, broken links, and contradictions via health checks
- 支持摄入文章、推文、PDF、YouTube字幕、微信公众号文章和纯文本
- 自动为每种源类型匹配最优提取工具
- 生成带有的结构化Wiki页面
- 输出实体页面、主题页面、源摘要和对比分析内容
- 生成兼容Obsidian的本地Markdown文件
- 可通过健康检查识别孤立页面、死链和内容矛盾
Recommended: Let Your Agent Install It
推荐方式:让Agent自动安装
Give your agent the repo URL and ask it to install for your platform:
https://github.com/sdyckjq-lab/llm-wiki-skill
给Agent提供仓库地址,让它根据你的平台完成安装:
https://github.com/sdyckjq-lab/llm-wiki-skill
Clone the repo anywhere, then run the installer for your platform:
bash install.sh --platform claude
bash install.sh --platform claude
bash install.sh --platform codex
bash install.sh --platform codex
bash install.sh --platform openclaw
bash install.sh --platform openclaw
Auto-detect (only if one platform directory exists)
自动检测(仅当存在唯一平台目录时生效)
bash install.sh --platform auto
bash install.sh --platform auto
Custom target directory (OpenClaw non-standard path)
自定义目标目录(OpenClaw非标准路径时使用)
bash install.sh --platform openclaw --target-dir /path/to/your/skills
bash install.sh --platform openclaw --target-dir /path/to/your/skills
Default Install Locations
默认安装路径
| Platform | Path |
|---|
| Claude Code | ~/.claude/skills/llm-wiki
|
| Codex | |
| OpenClaw | ~/.openclaw/skills/llm-wiki
|
| 平台 | 路径 |
|---|
| Claude Code | ~/.claude/skills/llm-wiki
|
| Codex | |
| OpenClaw | ~/.openclaw/skills/llm-wiki
|
Legacy Claude Setup (existing users)
旧版Claude设置(存量用户适用)
This is now a compatibility shim for the unified installer
目前该脚本是统一安装器的兼容垫片
Check Chrome debug mode is running (needed for web extraction)
确认Chrome调试模式已开启(网页提取所需)
google-chrome --remote-debugging-port=9222 &
google-chrome --remote-debugging-port=9222 &
Check uv is installed (needed for WeChat + YouTube extraction)
确认uv已安装(微信和YouTube内容提取所需)
Install bun OR npm (one is enough for web extraction deps)
安装bun 或 npm(二选一即可,用于网页提取依赖)
OR: npm is already available in most environments
或:大部分环境已默认预装npm
Platform Entry Points
平台入口文档
After installation, read the platform-specific instructions:
- Claude Code:
platforms/claude/CLAUDE.md
- Codex:
platforms/codex/AGENTS.md
- OpenClaw:
platforms/openclaw/README.md
安装完成后,可查看对应平台的专属说明:
- Claude Code:
platforms/claude/CLAUDE.md
- Codex:
platforms/codex/AGENTS.md
- OpenClaw:
platforms/openclaw/README.md
Knowledge Base Structure
知识库结构
your-wiki/
├── raw/ # Immutable source material
│ ├── articles/ # Web articles
│ ├── tweets/ # X/Twitter
│ ├── wechat/ # WeChat posts
│ ├── xiaohongshu/ # Xiaohongshu (manual paste only)
│ ├── zhihu/ # Zhihu
│ ├── pdfs/ # PDFs
│ ├── notes/ # Notes
│ └── assets/ # Images, attachments
├── wiki/ # AI-generated knowledge base
│ ├── entities/ # People, concepts, tools
│ ├── topics/ # Topic pages
│ ├── sources/ # Source summaries
│ ├── comparisons/ # Side-by-side analysis
│ └── synthesis/ # Cross-source synthesis
├── index.md # Master index
├── log.md # Operation log
└── .wiki-schema.md # Config
your-wiki/
├── raw/ # 不可变的原始素材
│ ├── articles/ # 网页文章
│ ├── tweets/ # X/Twitter内容
│ ├── wechat/ # 微信公众号文章
│ ├── xiaohongshu/ # 小红书内容(仅支持手动粘贴)
│ ├── zhihu/ # 知乎内容
│ ├── pdfs/ # PDF文件
│ ├── notes/ # 笔记内容
│ └── assets/ # 图片、附件
├── wiki/ # AI生成的知识库内容
│ ├── entities/ # 人物、概念、工具等实体
│ ├── topics/ # 主题页面
│ ├── sources/ # 源内容摘要
│ ├── comparisons/ # 横向对比分析
│ └── synthesis/ # 跨源内容综合
├── index.md # 总索引
├── log.md # 操作日志
└── .wiki-schema.md # 配置文件
Initialize a New Knowledge Base
初始化新知识库
Ask your agent:
向你的Agent提问:
"Create a new knowledge base at ~/my-wiki"
"Create a new knowledge base at ~/my-wiki"
The agent will scaffold the directory structure,
Agent会自动生成目录结构,
generate index.md, log.md, and .wiki-schema.md
创建index.md、log.md和.wiki-schema.md文件
Ingest a Web Article
摄入网页文章
Agent command pattern:
Agent命令格式:
Under the hood, the agent routes to baoyu-url-to-markdown:
底层会调用baoyu-url-to-markdown工具:
Ingest a YouTube Video
摄入YouTube视频
Agent command pattern:
Agent命令格式:
Uses youtube-transcript via uv:
通过uv调用youtube-transcript工具:
Ingest a WeChat Article
摄入微信公众号文章
Agent command pattern:
Agent命令格式:
Uses wechat-article-to-markdown via uv:
通过uv调用wechat-article-to-markdown工具:
Agent command pattern:
Agent命令格式:
"Process this PDF into my wiki: /path/to/paper.pdf"
"Process this PDF into my wiki: /path/to/paper.pdf"
OR drag a file into the chat
或直接将文件拖入聊天窗口
No external tool needed — goes directly into main pipeline:
无需外部工具,直接进入主处理流程:
cp /path/to/paper.pdf raw/pdfs/paper.pdf
cp /path/to/paper.pdf raw/pdfs/paper.pdf
Ingest Raw Text or Notes
摄入纯文本或笔记
Just paste text to your agent:
直接将文本粘贴给Agent即可:
"Add these notes to my wiki: [paste content]"
"Add these notes to my wiki: [粘贴内容]"
Agent writes directly to:
Agent会直接写入对应文件:
echo "your content" > raw/notes/note-slug.md
echo "your content" > raw/notes/note-slug.md
Batch Process a Folder
批量处理文件夹
Agent command pattern:
Agent命令格式:
"Process all files in ~/Downloads/research into my wiki"
"Process all files in ~/Downloads/research into my wiki"
Agent iterates over files and routes each by type
Agent会遍历文件,按类型分别处理
for f in ~/Downloads/research/*; do
agent determines type and processes accordingly
done
for f in ~/Downloads/research/*; do
agent识别文件类型并执行对应处理逻辑
done
Agent command pattern:
Agent命令格式:
"Run a health check on my knowledge base"
"Run a health check on my knowledge base"
Agent checks for:
Agent会检查以下项:
- Orphaned pages (no incoming links)
- 孤立页面(无入链的页面)
- Broken [[wiki links]]
- 失效的[[wiki链接]]
- Contradictory information across pages
- 跨页面的内容矛盾
- Missing source summaries
- 缺失的源内容摘要
Source Routing Reference
源路由参考
| Source Type | Tool Used | Requires |
|---|
| Web articles | | Chrome debug mode |
| X/Twitter | | Chrome debug mode + X login |
| Zhihu | | Chrome debug mode |
| WeChat | wechat-article-to-markdown
| |
| YouTube | | |
| Xiaohongshu | Manual paste | Nothing |
| PDF / Markdown / Text | Direct pipeline | Nothing |
Source registry lives at:
scripts/source-registry.tsv
Routing logic lives at:
scripts/source-registry.sh
| 源类型 | 使用工具 | 依赖要求 |
|---|
| 网页文章 | | Chrome调试模式 |
| X/Twitter | | Chrome调试模式 + X账号登录 |
| 知乎 | | Chrome调试模式 |
| 微信公众号 | wechat-article-to-markdown
| |
| YouTube | | |
| 小红书 | 手动粘贴 | 无 |
| PDF / Markdown / 纯文本 | 直接处理 | 无 |
源类型注册表路径:
scripts/source-registry.tsv
路由逻辑代码路径:
scripts/source-registry.sh
Wiki Page Conventions
Wiki页面规范
Entity Page (wiki/entities/andrej-karpathy.md)
实体页面示例(wiki/entities/andrej-karpathy.md)
Andrej Karpathy
Andrej Karpathy
Former OpenAI/Tesla researcher, creator of llm-wiki methodology.
前OpenAI/特斯拉研究员,llm-wiki方法论提出者。
- [[llm-wiki]] — compile knowledge once, maintain over time
- [[nanoGPT]] — minimal GPT implementation for education
- [[llm-wiki]] — 一次编译知识,持续迭代维护
- [[nanoGPT]] — 用于教学的极简GPT实现
- [[sources/llm-wiki-gist-2024]]
- [[sources/karpathy-interview-2023]]
- [[sources/llm-wiki-gist-2024]]
- [[sources/karpathy-interview-2023]]
- [[topics/language-models]]
- [[entities/openai]]
- [[topics/language-models]]
- [[entities/openai]]
Topic Page (wiki/topics/retrieval-augmented-generation.md)
主题页面示例(wiki/topics/retrieval-augmented-generation.md)
Retrieval-Augmented Generation
Retrieval-Augmented Generation
- [[entities/langchain]]
- [[entities/llamaindex]]
- [[entities/langchain]]
- [[entities/llamaindex]]
- [[comparisons/rag-vs-finetuning]]
- [[comparisons/rag-vs-finetuning]]
- [[sources/rag-paper-2020]]
- [[sources/rag-paper-2020]]
Source Summary (wiki/sources/article-slug.md)
源摘要页面示例(wiki/sources/article-slug.md)
Source: Article Title
来源:文章标题
Raw**: [[raw/articles/article-slug]]
原始内容:[[raw/articles/article-slug]]
Chrome / Web Extraction Fails
Chrome/网页提取失败
Start Chrome with remote debugging enabled
启动Chrome并开启远程调试
google-chrome --remote-debugging-port=9222 --no-first-run &
google-chrome --remote-debugging-port=9222 --no-first-run &
Verify it's running
验证服务是否正常运行
For X/Twitter: make sure you're logged in on that Chrome session
针对X/Twitter:确认该Chrome会话已登录X账号
Then retry the extraction
之后重试提取操作即可
WeChat or YouTube Extraction Fails
微信或YouTube内容提取失败
Install uv if missing
如缺失uv则先安装
Re-run installer to pick up uv
重新运行安装器识别uv
bash install.sh --platform claude # or your platform
bash install.sh --platform claude # 替换为你的平台
Verify uv tools work
验证uv工具是否正常
uvx youtube-transcript --help
uvx wechat-article-to-markdown --help
uvx youtube-transcript --help
uvx wechat-article-to-markdown --help
bun/npm Dependency Install Fails
bun/npm依赖安装失败
The installer auto-selects bun or npm — check which is available
安装器会自动选择bun或npm,检查可用的包管理器
which bun && echo "bun found"
which npm && echo "npm found"
which bun && echo "bun found"
which npm && echo "npm found"
Manually install web extraction deps with npm
用npm手动安装网页提取依赖
npm install -g baoyu-url-to-markdown
npm install -g baoyu-url-to-markdown
Codex Legacy Path Compatibility
Codex旧路径兼容问题
Old path still supported automatically:
旧路径仍自动支持:
~/.Codex/skills # capital C — installer handles both
~/.codex/skills # lowercase — new default
~/.Codex/skills # 大写C — 安装器自动兼容两种写法
~/.codex/skills # 小写c — 新默认路径
Agent Can't Find Installed Skill
Agent找不到已安装的Skill
Verify install location for your platform
验证对应平台的安装路径是否存在
ls ~/.claude/skills/llm-wiki/ # Claude Code
ls ~/.codex/skills/llm-wiki/ # Codex
ls ~/.openclaw/skills/llm-wiki/ # OpenClaw
ls ~/.claude/skills/llm-wiki/ # Claude Code
ls ~/.codex/skills/llm-wiki/ # Codex
ls ~/.openclaw/skills/llm-wiki/ # OpenClaw
Re-run installer if directory is missing
如目录缺失则重新运行安装器
bash install.sh --platform <your-platform>
bash install.sh --platform <你的平台>
Source Registry Lookup
源注册表查询
Check registered sources and routing
查看已注册的源类型和路由规则
cat scripts/source-registry.tsv
cat scripts/source-registry.tsv
Test routing for a URL
测试某个URL的路由结果
Key Design Principles
核心设计原则
- Compile once, maintain — wiki pages are living documents, not ephemeral answers
- Bidirectional links — every entity and topic links to related nodes with
- Immutable raw — source files in are never modified after ingestion
- Graceful degradation — if a tool fails, agent prompts for manual paste instead of crashing
- Platform-agnostic — same knowledge base works across all supported agents
- Obsidian-compatible — open directly in Obsidian at any time
- 一次编译,持续维护 — Wiki页面是活文档,不是临时生成的答案
- 双向链接 — 所有实体和主题都通过关联到相关节点
- 原始素材不可变 — 摄入后的目录下的源文件永不修改
- 优雅降级 — 工具失效时Agent会提示手动粘贴,而非直接崩溃
- 跨平台兼容 — 同一个知识库可在所有支持的Agent上使用
- Obsidian兼容 — 可随时直接用Obsidian打开目录
Quick Reference for Agents
Agent快捷指令参考
Initialize wiki: "Create a new wiki at <path>"
Add URL: "Add <url> to my wiki"
Add file: "Process <file path> into my wiki"
Add text: "Add these notes to my wiki: <text>"
Batch process: "Process all files in <folder> into my wiki"
Health check: "Check my wiki for broken links and orphans"
Find information: "What does my wiki say about <topic>"
Update a page: "Update the [[entity]] page with new info from <source>"
初始化知识库: "Create a new wiki at <路径>"
添加URL: "Add <url> to my wiki"
添加文件: "Process <文件路径> into my wiki"
添加文本: "Add these notes to my wiki: <文本内容>"
批量处理: "Process all files in <文件夹路径> into my wiki"
健康检查: "Check my wiki for broken links and orphans"
查找信息: "What does my wiki say about <主题>"
更新页面: "Update the [[实体]] page with new info from <来源>"