understand-knowledge

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/understand-knowledge

/understand-knowledge

Analyzes a Karpathy-pattern LLM wiki — a three-layer knowledge base with raw sources, wiki markdown, and a schema file — and produces an interactive knowledge graph dashboard.
分析Karpathy模式的LLM维基知识库——一种包含原始数据源、wiki markdown文件和schema文件的三层知识库——并生成交互式知识图谱仪表盘。

What It Detects

检测内容

  • Raw sources — immutable source documents (articles, papers, data files)
  • Wiki — LLM-generated markdown files with wikilinks (
    [[target]]
    syntax)
  • Schema — CLAUDE.md, AGENTS.md, or similar configuration file
  • index.md — content catalog organized by categories
  • log.md — chronological operation log
Detection signals: has
index.md
+ multiple
.md
files with wikilinks. May have
raw/
directory and schema file.
  • 原始数据源 —— 不可修改的源文档(文章、论文、数据文件)
  • Wiki —— LLM生成的带维基链接(
    [[target]]
    语法)的markdown文件
  • Schema —— CLAUDE.md、AGENTS.md或类似配置文件
  • index.md —— 按类别组织的内容目录
  • log.md —— 按时间顺序记录的操作日志
检测特征:包含
index.md
+ 多个带维基链接的
.md
文件,可能包含
raw/
目录和schema文件。

Instructions

操作步骤

Phase 1: DETECT

阶段1:检测

  1. Determine the target directory:
    • If the user provided a path argument, use that
    • Otherwise, use the current working directory
  2. Run the format detection script bundled with this skill:
    python3 <SKILL_DIR>/parse-knowledge-base.py <TARGET_DIR>
    • If the script exits with an error, tell the user this doesn't appear to be a Karpathy-pattern wiki and explain what was expected
    • If successful, proceed. The script writes
      scan-manifest.json
      to
      <TARGET_DIR>/.understand-anything/intermediate/
  3. Read the scan-manifest.json and announce the results:
    • "Detected Karpathy wiki: N articles, N sources, N topics, N wikilinks (N unresolved)"
    • List the categories found from index.md
  1. 确定目标目录:
    • 如果用户提供了路径参数,使用该路径
    • 否则,使用当前工作目录
  2. 运行本技能附带的格式检测脚本:
    python3 <SKILL_DIR>/parse-knowledge-base.py <TARGET_DIR>
    • 如果脚本报错,告知用户当前目录不符合Karpathy模式维基的特征,并说明预期格式
    • 如果检测成功,继续下一步。脚本会将
      scan-manifest.json
      写入
      <TARGET_DIR>/.understand-anything/intermediate/
      目录
  3. 读取scan-manifest.json并告知检测结果:
    • 格式示例:“检测到Karpathy维基知识库:N篇文章、N个数据源、N个主题、N个维基链接(其中N个未解析)”
    • 列出从index.md中识别到的所有类别

Phase 2: SCAN (already done)

阶段2:扫描(已完成)

The parse script in Phase 1 already performed the deterministic scan. The scan-manifest.json contains:
  • Article nodes (one per wiki .md file) with extracted wikilinks, headings, frontmatter
  • Source nodes (one per raw/ file)
  • Topic nodes (from index.md section headings)
  • related
    edges (from wikilinks)
  • categorized_under
    edges (from index.md sections)
No additional scanning is needed. Proceed to Phase 3.
阶段1中的解析脚本已完成确定性扫描。scan-manifest.json包含以下内容:
  • 文章节点(每个wiki .md文件对应一个节点),包含提取的维基链接、标题、前置元数据
  • 数据源节点(每个raw/目录下的文件对应一个节点)
  • 主题节点(来自index.md的章节标题)
  • related
    关联边(来自维基链接)
  • categorized_under
    分类边(来自index.md的章节)
无需额外扫描,直接进入阶段3。

Phase 3: ANALYZE

阶段3:分析

Dispatch
article-analyzer
subagents to extract implicit knowledge:
  1. Read the scan-manifest.json to get the article list
  2. Prepare batches of 10-15 articles each, grouped by category when possible (articles in the same category are more likely to have implicit cross-references)
  3. For each batch, dispatch an
    article-analyzer
    subagent with:
    • The batch of articles (id, name, summary, wikilinks, category, content from knowledgeMeta)
    • The full list of existing node IDs (so the agent can reference them)
    • The batch number for output file naming
    • The intermediate directory path:
      $INTERMEDIATE_DIR = <TARGET_DIR>/.understand-anything/intermediate
    The agent will write
    analysis-batch-{N}.json
    to the intermediate directory.
  4. Run up to 3 batches concurrently. Wait for all batches to complete.
  5. If any batch fails, log a warning but continue — the scan-manifest provides a solid base graph even without LLM analysis.
调用
article-analyzer
子代理提取隐式知识:
  1. 读取scan-manifest.json获取文章列表
  2. 将文章按每10-15篇分为一组,尽可能按类别分组(同一类别的文章更可能存在隐式交叉引用)
  3. 为每组文章调用
    article-analyzer
    子代理,传入以下参数:
    • 该组文章的信息(ID、名称、摘要、维基链接、类别、来自knowledgeMeta的内容)
    • 所有现有节点ID的完整列表(以便代理可以引用)
    • 批次编号(用于输出文件命名)
    • 中间目录路径:
      $INTERMEDIATE_DIR = <TARGET_DIR>/.understand-anything/intermediate
    代理会将
    analysis-batch-{N}.json
    写入中间目录
  4. 最多同时运行3个批次,等待所有批次完成
  5. 如果某个批次失败,记录警告但继续执行——即使没有LLM分析,scan-manifest也能提供基础的知识图谱

Phase 4: MERGE

阶段4:合并

  1. Run the merge script bundled with this skill:
    python3 <SKILL_DIR>/merge-knowledge-graph.py <TARGET_DIR>
  2. The script:
    • Combines scan-manifest.json + all analysis-batch-*.json files
    • Deduplicates entities (case-insensitive name matching)
    • Normalizes node/edge types via alias maps
    • Builds layers from index.md categories
    • Builds a tour from index.md section ordering
    • Writes
      assembled-graph.json
      to the intermediate directory
  3. Read the merge report from stderr and announce:
    • Total nodes, edges, layers, tour steps
    • How many entities/claims the LLM analysis added
  1. 运行本技能附带的合并脚本:
    python3 <SKILL_DIR>/merge-knowledge-graph.py <TARGET_DIR>
  2. 脚本执行以下操作:
    • 合并scan-manifest.json与所有analysis-batch-*.json文件
    • 去重实体(名称匹配不区分大小写)
    • 通过别名映射标准化节点/边类型
    • 基于index.md的类别构建图谱层级
    • 基于index.md的章节顺序构建浏览路径
    • assembled-graph.json
      写入中间目录
  3. 读取stderr中的合并报告并告知结果:
    • 总节点数、边数、层级数、浏览路径步数
    • LLM分析新增的实体/声明数量

Phase 5: SAVE

阶段5:保存

  1. Read the assembled-graph.json
  2. Run basic validation:
    • Every edge source/target must reference an existing node
    • Every node must have: id, type, name, summary, tags, complexity
    • Remove any edges with dangling references
  3. Copy the validated graph to
    <TARGET_DIR>/.understand-anything/knowledge-graph.json
  4. Write metadata to
    <TARGET_DIR>/.understand-anything/meta.json
    :
    json
    {
      "lastAnalyzedAt": "<ISO timestamp>",
      "gitCommitHash": "<from git rev-parse HEAD or empty>",
      "version": "1.0.0",
      "analyzedFiles": <number of wiki articles>
    }
  5. Clean up intermediate files:
    rm -rf <TARGET_DIR>/.understand-anything/intermediate
  6. Report summary to the user:
    • "Knowledge graph saved: N articles, N entities, N topics, N claims, N sources"
    • "N edges (N wikilink, N categorized, N implicit)"
    • "N layers, N tour steps"
  7. Auto-trigger the dashboard:
    /understand-dashboard <TARGET_DIR>
  1. 读取assembled-graph.json
  2. 执行基础验证:
    • 每条边的源节点/目标节点必须指向已存在的节点
    • 每个节点必须包含:id、type、name、summary、tags、complexity字段
    • 删除所有指向无效节点的边
  3. 将验证后的图谱复制到
    <TARGET_DIR>/.understand-anything/knowledge-graph.json
  4. 将元数据写入
    <TARGET_DIR>/.understand-anything/meta.json
    json
    {
      "lastAnalyzedAt": "<ISO时间戳>",
      "gitCommitHash": "<来自git rev-parse HEAD,无则为空>",
      "version": "1.0.0",
      "analyzedFiles": <wiki文章数量>
    }
  5. 清理中间文件:
    rm -rf <TARGET_DIR>/.understand-anything/intermediate
  6. 向用户报告总结信息:
    • 格式示例:“知识图谱已保存:N篇文章、N个实体、N个主题、N个声明、N个数据源”
    • “N条边(其中N条来自维基链接、N条来自分类、N条为隐式关系)”
    • “N个层级、N个浏览路径步数”
  7. 自动触发仪表盘:
    /understand-dashboard <TARGET_DIR>

Notes

注意事项

  • The parse script handles ALL deterministic extraction (wikilinks, headings, frontmatter, categories from index.md). The LLM agents only add implicit knowledge that requires inference.
  • Categories and taxonomy come from index.md section headings, NOT from filename prefixes. The Karpathy spec is intentionally abstract about naming conventions.
  • The graph uses
    kind: "knowledge"
    to signal the dashboard to use force-directed layout instead of hierarchical dagre.
  • Source nodes from raw/ are lightweight (filename + size only) — we don't parse PDFs or binary files.
  • 解析脚本负责所有确定性提取工作(维基链接、标题、前置元数据、来自index.md的类别),LLM代理仅添加需要推理的隐式知识
  • 类别和分类体系来自index.md的章节标题,而非文件名前缀。Karpathy规范对命名约定故意设计得较为灵活
  • 图谱使用
    kind: "knowledge"
    标记,用于告知仪表盘采用力导向布局而非分层dagre布局
  • raw/目录下的数据源节点仅包含基础信息(文件名+大小)——我们不会解析PDF或二进制文件