ebook-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Ebook Analysis: Non-Fiction Knowledge Extraction

电子书分析:非虚构类知识提取

You analyze ebooks to extract knowledge with full citation traceability. This skill supports two complementary extraction modes:
  1. Concept Extraction - Extract ideas classified by abstraction (principle → tactic)
  2. Entity Extraction - Extract named things (studies, researchers, frameworks, anecdotes) that persist across books
你需要分析电子书,提取具备完整引用可追溯性的知识。该技能支持两种互补的提取模式:
  1. 概念提取 - 按抽象程度(原则→策略)分类提取观点
  2. 实体提取 - 提取跨书籍通用的命名对象(研究、研究者、框架、轶事)

Core Principle

核心原则

Every extraction must be traceable to its exact source. Citation traceability is non-negotiable. Extract less with full provenance rather than more without it.

**所有提取内容必须可追溯至其确切来源。**引用可追溯性是硬性要求。宁肯少提取一些但保留完整溯源信息,也不要无溯源地大量提取。

Two Extraction Modes

两种提取模式

Mode 1: Concept Extraction

模式1:概念提取

For extracting IDEAS organized by abstraction level.
Use when: Analyzing a book for transferable ideas, building a concept taxonomy, understanding how abstract principles relate to concrete tactics.
Output: JSON files (analysis.json, concepts.json)
Example: "Spaced repetition improves retention" is a MECHANISM at Layer 2.
用于提取按抽象层级组织的观点。
适用场景: 分析书籍以获取可迁移观点、构建概念分类体系、理解抽象原则与具体策略的关联。
输出: JSON文件(analysis.json、concepts.json)
示例: "间隔重复提升记忆留存率"属于第2层的机制类概念。

Mode 2: Entity Extraction

模式2:实体提取

For extracting NAMED THINGS that can be cross-referenced across books.
Use when: Building a knowledge base where the same study, researcher, or framework appears in multiple books. The goal is entity resolution—recognizing that "Hogarth's framework" in Range is the same as "kind/wicked environments" mentioned elsewhere.
Output: Markdown files in knowledge base structure
Example: "Kind vs Wicked Environments" is a FRAMEWORK by Robin Hogarth.
用于提取可跨书籍交叉引用的命名对象。
适用场景: 构建知识库,其中同一研究、研究者或框架会出现在多本书中。目标是实体消歧——识别出《Range》中的"Hogarth框架"与其他地方提到的"良性/恶性环境"是同一对象。
输出: 知识库结构的Markdown文件
示例: "良性 vs 恶性环境"是Robin Hogarth提出的框架。

Choosing a Mode

模式选择

If you want to...Use Mode
Understand a book's argument structureConcept Extraction
Build a reference library across booksEntity Extraction
Create actionable takeawaysConcept Extraction
Track what researchers say across sourcesEntity Extraction
BothRun both modes sequentially

如果你想...使用模式
理解书籍的论证结构概念提取
构建跨书籍的参考库实体提取
生成可落地的要点概念提取
追踪研究者在不同来源中的观点实体提取
同时实现以上目标依次运行两种模式

Entity Extraction Mode (Detailed)

实体提取模式(详细说明)

Entity Types

实体类型

TypeWhat It CapturesExample
studyResearch findings, experiments, dataFlynn Effect, Marshmallow Test
researcherPeople and their contributionsAnders Ericsson, Robin Hogarth
frameworkMental models, taxonomies, systemsKind vs Wicked, Desirable Difficulties
anecdoteStories used to illustrate pointsTiger vs Roger, Challenger Disaster
conceptIdeas that aren't frameworksCognitive entrenchment, Match quality
类型捕获内容示例
study研究成果、实验、数据Flynn Effect、Marshmallow Test
researcher人物及其贡献Anders Ericsson、Robin Hogarth
framework思维模型、分类体系、系统Kind vs Wicked、Desirable Difficulties
anecdote用于阐释观点的故事Tiger vs Roger、Challenger Disaster
concept非框架类的观点Cognitive entrenchment、Match quality

Extended Entity Type Guidance

扩展实体类型指导

Some entities don't fit cleanly into the five types. Guidelines:
Entity KindUse TypeRationale
Simulations/Games (Superstruct, EVOKE)anecdoteIllustrative events, even if hypothetical
Institutions (IFTF, WEF)researcherOrganizations contribute ideas like individuals
Historical events (Challenger disaster)anecdoteStories that illustrate principles
Hypothetical scenariosanecdoteFuture scenarios from books like Imaginable
Thought experimentsframeworkIf systematic; otherwise concept
When uncertain: Default to
anecdote
for narratives/events,
concept
for ideas,
framework
for systematic methods.
部分实体无法完全匹配上述五类,遵循以下指导:
实体类别使用类型理由
模拟/游戏(Superstruct、EVOKE)anecdote用于阐释的事件,即使是假设场景
机构(IFTF、WEF)researcher组织和个人一样贡献观点
历史事件(挑战者号灾难)anecdote用于阐释原则的故事
假设场景anecdote出自《Imaginable》这类书籍的未来场景
思想实验framework若具备系统性则归为此类;否则归为concept
不确定时的默认规则: 叙事/事件类默认归为
anecdote
,观点类默认归为
concept
,系统性方法默认归为
framework

Author-as-Subject Pattern

作者作为实体的规则

When the book's author is also a significant entity (e.g., Jane McGonigal in Imaginable):
Create a researcher entity if:
  • Author has notable prior work or institutional affiliation
  • Author appears in Wikipedia or other reference sources
  • Author's background/credentials are relevant to understanding the book
  • Other books in your collection might reference them
Skip if:
  • Author is primarily known only for this book
  • No external sources to verify/enrich the entity
Template addition for author-subjects:
markdown
undefined
当书籍作者本身是重要实体时(如《Imaginable》中的Jane McGonigal):
需创建researcher实体的情况:
  • 作者有知名的前期研究成果或机构背景
  • 作者出现在维基百科或其他参考来源中
  • 作者的背景/资质对理解书籍内容有帮助
  • 你的书籍集合中其他书籍可能会引用该作者
无需创建的情况:
  • 作者仅因本书为人所知
  • 没有外部来源可验证/丰富该实体信息
作者实体的模板补充:
markdown
undefined

Note

说明

This researcher is the author of [Book] in our collection. Their frameworks and concepts are documented separately.
undefined
该研究者是我们集合中《[书名]》的作者。其提出的框架和概念已单独记录。
undefined

Entity File Template

实体文件模板

markdown
undefined
markdown
undefined

[Entity Name]

[实体名称]

Type: study | researcher | framework | anecdote | concept Status: stub | partial | solid | authoritative Last Updated: YYYY-MM-DD Aliases: alias1, alias2, alias3
类型: study | researcher | framework | anecdote | concept 状态: stub | partial | solid | authoritative 最后更新: YYYY-MM-DD 别名: 别名1, 别名2, 别名3

Summary

摘要

[2-3 sentence synthesized understanding]
[2-3句话的综合理解]

Key Findings / What It Illustrates

关键发现 / 阐释内容

  1. [Claim or finding with source] — Source: [Book], Ch.[X]
  2. [Another claim] — Source: [Book], Ch.[X]
  1. [观点或发现及来源] — 来源:《[书名]》,第[X]章
  2. [另一观点] — 来源:《[书名]》,第[X]章

Key Quotes

关键引用

"Quotable text here."
"Another memorable quote."
"引用原文内容。"
"另一段值得记忆的引用。"

Sources in Collection

集合中的来源

BookAuthorHow It's UsedCitation
RangeEpstein[Role in book]Ch.X
书名作者书中作用引用位置
RangeEpstein[在书中的角色]第X章

Sources NOT in Collection

集合外的来源

  • [Book that would enrich this entity]
  • [可丰富该实体的书籍]

Related Entities

相关实体

  • Other Entity - Relationship description
  • 其他实体 - 关系描述

Open Questions

待解决问题

  • [What we don't yet know]
undefined
  • [我们尚未知晓的内容]
undefined

Knowledge Base Structure

知识库结构

/knowledge/
├── _index.md                    # Master registry
├── _entities.json               # Searchable index (generated)
├── nonfiction/
│   ├── _index.md                # Domain index
│   ├── _[book]-quotes.md        # Book-specific quotes file
│   ├── studies/
│   │   ├── flynn-effect.md
│   │   └── chase-simon-chunking.md
│   ├── researchers/
│   │   ├── hogarth-robin.md
│   │   └── tetlock-philip.md
│   ├── frameworks/
│   │   ├── kind-vs-wicked-environments.md
│   │   └── desirable-difficulties.md
│   ├── anecdotes/
│   │   ├── tiger-vs-roger.md
│   │   └── challenger-disaster.md
│   └── concepts/
│       ├── cognitive-entrenchment.md
│       └── match-quality.md
├── cooking/                     # Domain-specific structure
│   ├── techniques/
│   ├── ingredients/
│   └── equipment/
└── technical/
    ├── patterns/
    └── technologies/
/knowledge/
├── _index.md                    # 主注册表
├── _entities.json               # 可搜索索引(自动生成)
├── nonfiction/
│   ├── _index.md                # 领域索引
│   ├── _[book]-quotes.md        # 书籍专属引用文件
│   ├── studies/
│   │   ├── flynn-effect.md
│   │   └── chase-simon-chunking.md
│   ├── researchers/
│   │   ├── hogarth-robin.md
│   │   └── tetlock-philip.md
│   ├── frameworks/
│   │   ├── kind-vs-wicked-environments.md
│   │   └── desirable-difficulties.md
│   ├── anecdotes/
│   │   ├── tiger-vs-roger.md
│   │   └── challenger-disaster.md
│   └── concepts/
│       ├── cognitive-entrenchment.md
│       └── match-quality.md
├── cooking/                     # 领域专属结构
│   ├── techniques/
│   ├── ingredients/
│   └── equipment/
└── technical/
    ├── patterns/
    └── technologies/

Quotes Extraction

引用提取

Quotable quotes are a distinct extraction type. For each book, create a quotes file:
File:
_[book-slug]-quotes.md
Structure:
markdown
undefined
值得引用的内容是一种独立的提取类型。为每本书创建一个引用文件:
文件:
_[book-slug]-quotes.md
结构:
markdown
undefined

Quotable Quotes from [Book Title]

《[书名]》中的精选引用

Author: [Author] Last Updated: YYYY-MM-DD
作者: [作者名] 最后更新: YYYY-MM-DD

On [Theme 1]

关于[主题1]

"Quote text here."
"Another quote on same theme."
"引用内容。"
"同一主题的另一引用。"

On [Theme 2]

关于[主题2]

"Quote on different theme."

**What makes a good quote:**
- Memorable phrasing that captures a key insight
- Self-contained (understandable without context)
- Surprising or counterintuitive formulation
- Useful for presentations, writing, or reference
"不同主题的引用。"

**优质引用的标准:**
- 表述生动,能捕捉核心洞见
- 独立成意(无需上下文也能理解)
- 观点新颖或反直觉
- 适用于演讲、写作或参考

Entity Extraction Workflow

实体提取流程

  1. Scan book - Read through identifying named studies, researchers, frameworks, illustrative stories
  2. Check existing entities - Use
    kb-resolve-entity.ts
    to see if entity already exists
  3. Create or update - New entity → create file; existing → add as source
  4. Add quotes - Extract memorable quotes to quotes file
  5. Cross-link - Add Related Entities sections
  6. Regenerate index - Run
    kb-generate-index.ts
  1. 扫描书籍 - 逐章阅读,识别所有命名研究、研究者、框架、阐释性故事
  2. 检查现有实体 - 使用
    kb-resolve-entity.ts
    查看实体是否已存在
  3. 创建或更新 - 新实体→创建文件;已有实体→添加新来源
  4. 添加引用 - 将值得记忆的引用提取至引用文件
  5. 交叉链接 - 添加相关实体章节
  6. 重新生成索引 - 运行
    kb-generate-index.ts

Entity Extraction States (KB0-KB5)

实体提取状态(KB0-KB5)

StateSymptomsIntervention
KB0No knowledge baseCreate directory structure
KB1Structure exists, no entitiesBegin extraction
KB2Extracting from bookCreate entity files
KB3Entities created, not linkedAdd Related Entities
KB4Linked, no indexRun kb-generate-index.ts
KB5Complete for this bookProceed to next book
状态特征处理动作
KB0无知识库创建目录结构
KB1结构已存在,但无实体开始提取
KB2正在从书籍中提取创建实体文件
KB3实体已创建,但未链接添加相关实体
KB4已链接,但无索引运行kb-generate-index.ts
KB5本书提取完成处理下一本书

Cross-Book Synthesis Workflow

跨书籍综合流程

Triggered when: 2+ books have been extracted to the knowledge base.
Goals:
  1. Find entities that appear in multiple books
  2. Identify conceptual connections between books
  3. Surface contradictions or complementary perspectives
  4. Update entity files with multi-source synthesis
Process:
  1. Entity overlap detection
    bash
    # Find entities with 2+ sources
    grep -l "Sources in Collection" knowledge/nonfiction/**/*.md | \
      xargs grep -l "| .* | .* |" | head -20
    Or manually review entities updated with new source.
  2. Conceptual connection mapping
    • Compare frameworks across books (e.g., Range's "wicked environments" ↔ Imaginable's "futures thinking")
    • Identify shared researchers (e.g., Tetlock appears in both Range and Imaginable)
    • Look for complementary themes (prediction failure → preparation despite uncertainty)
  3. Synthesis documentation For entities appearing in 2+ books, update the Summary section:
    markdown
    ## Summary
    [Synthesized understanding from BOTH sources, noting agreements and differences]
  4. Cross-book insights Document thematic connections in
    context/insights/cross-book-{theme}.md
    :
    markdown
    # Cross-Book Insight: [Theme]
    
    ## Books Contributing
    - Range (Epstein) - [perspective]
    - Imaginable (McGonigal) - [perspective]
    
    ## Synthesis
    [How the books complement or contradict each other]

触发条件: 知识库中已提取2本及以上书籍。
目标:
  1. 找出在多本书中出现的实体
  2. 识别书籍间的概念关联
  3. 发现矛盾或互补观点
  4. 用多来源综合信息更新实体文件
流程:
  1. 实体重叠检测
    bash
    # 找出有2个及以上来源的实体
    grep -l "Sources in Collection" knowledge/nonfiction/**/*.md | \
      xargs grep -l "| .* | .* |" | head -20
    或手动查看新增来源的实体。
  2. 概念关联映射
    • 对比不同书籍中的框架(如《Range》的"恶性环境" ↔ 《Imaginable》的"未来思维")
    • 识别共同提及的研究者(如Tetlock同时出现在《Range》和《Imaginable》中)
    • 寻找互补主题(预测失败→在不确定性中做好准备)
  3. 综合信息记录 对于在2本及以上书籍中出现的实体,更新摘要部分:
    markdown
    ## 摘要
    [结合两个来源的综合理解,注明共识与差异]
  4. 跨书籍洞见记录
    context/insights/cross-book-{theme}.md
    中记录主题关联:
    markdown
    # 跨书籍洞见:[主题]
    
    ## 涉及书籍
    - 《Range》(Epstein) - [观点]
    - 《Imaginable》(McGonigal) - [观点]
    
    ## 综合分析
    [书籍间的互补或矛盾之处]

Concept Extraction Mode (Detailed)

概念提取模式(详细说明)

Concept Types (Abstract → Concrete)

概念类型(从抽象到具体)

TypeDefinitionExample
PrincipleFoundational truth or axiom"Communities form around shared identity"
MechanismHow something works"Reciprocity creates social bonds"
PatternRecurring structure or approach"The community lifecycle pattern"
StrategyHigh-level approach to achieve goals"Build trust before asking for contribution"
TacticSpecific actionable technique"Send welcome emails within 24 hours"
类型定义示例
原则基础真理或公理"社区围绕共同身份形成"
机制事物的运作方式"互惠关系构建社会纽带"
模式重复出现的结构或方法"社区生命周期模式"
策略实现目标的高阶方法"先建立信任再请求贡献"
策略具体可执行的技巧"24小时内发送欢迎邮件"

Abstraction Layers

抽象层级

LayerNameAbstractionExample
0FoundationalUniversal principles"Humans seek belonging"
1TheoreticalDomain-specific theory"Community requires shared purpose"
2StrategicApproaches and frameworks"The funnel model of engagement"
3TacticalSpecific methods"Onboarding sequences"
4SpecificConcrete implementations"Use Discourse for forums"
层级名称抽象程度示例
0基础层通用原则"人类寻求归属感"
1理论层领域专属理论"社区需要共同目标"
2策略层方法与框架"参与度漏斗模型"
3战术层具体方法"新用户引导流程"
4具体层落地实现"使用Discourse搭建论坛"

Relationship Types

关系类型

RelationshipMeaningWhen to Use
INFLUENCESA affects BCausal or correlational connection
SUPPORTSA provides evidence for BCitation, example, validation
CONTRADICTSA conflicts with BOpposing claims
COMPOSED_OFA contains BPart-whole relationships
DERIVES_FROMA is derived from BLogical conclusions
关系含义适用场景
INFLUENCESA影响B因果或相关关联
SUPPORTSA为B提供证据引用、示例、验证
CONTRADICTSA与B冲突对立观点
COMPOSED_OFA包含B整体-部分关系
DERIVES_FROMA源自B逻辑推导关系

Concept Extraction States (EA0-EA7)

概念提取状态(EA0-EA7)

StateSymptomsIntervention
EA0No input fileGuide file preparation
EA1Raw file, not parsedRun ea-parse.ts
EA2Parsed, not extractedLLM extracts concepts
EA3Extracted, not classifiedAssign types and layers
EA4Classified, not annotatedAdd themes, relationships
EA5Single book completeExport or proceed to synthesis
EA6Multi-book readyCross-book synthesis
EA7Analysis completeGenerate reports
状态特征处理动作
EA0无输入文件指导文件准备
EA1原始文件,未解析运行ea-parse.ts
EA2已解析,未提取用LLM提取概念
EA3已提取,未分类分配类型和层级
EA4已分类,未标注添加主题和关系
EA5单本书处理完成导出或继续综合分析
EA6多本书待综合跨书籍综合分析
EA7分析完成生成报告

Concept Extraction Workflow

概念提取流程

  1. Parse - Run
    ea-parse.ts
    to chunk book with position tracking
  2. Extract - Present chunks to LLM for concept identification with exact quotes
  3. Classify - Assign type (principle→tactic) and layer (0-4)
  4. Annotate - Add themes and functional analysis
  5. Link - Connect related concepts
  6. Export - Generate analysis.json, concepts.json, report.md

  1. 解析 - 运行
    ea-parse.ts
    将书籍分块并保留位置追踪信息
  2. 提取 - 将分块内容提交给LLM,提取带精确引用的概念
  3. 分类 - 分配类型(原则→策略)和层级(0-4)
  4. 标注 - 添加主题和功能分析
  5. 链接 - 关联相关概念
  6. 导出 - 生成analysis.json、concepts.json、report.md

Available Tools

可用工具

Parsing Tools

解析工具

ea-parse.ts

ea-parse.ts

Parse ebook files into chunks with metadata and position tracking.
bash
deno run --allow-read scripts/ea-parse.ts path/to/book.txt
deno run --allow-read scripts/ea-parse.ts path/to/book.epub --format epub
deno run --allow-read scripts/ea-parse.ts book.txt --chunk-size 1500 --overlap 150
Output: JSON with metadata, chapters (if detected), and chunks with positions.
将电子书文件解析为带元数据和位置追踪的分块内容。
bash
deno run --allow-read scripts/ea-parse.ts path/to/book.txt
deno run --allow-read scripts/ea-parse.ts path/to/book.epub --format epub
deno run --allow-read scripts/ea-parse.ts book.txt --chunk-size 1500 --overlap 150
输出: 包含元数据、章节(若可识别)和带位置信息的分块内容的JSON文件。

Knowledge Base Tools

知识库工具

kb-generate-index.ts

kb-generate-index.ts

Scan knowledge base and generate searchable entity index.
bash
deno run --allow-read --allow-write scripts/kb-generate-index.ts /path/to/knowledge
Output: Creates
_entities.json
with all entities, aliases, and metadata.
扫描知识库并生成可搜索的实体索引。
bash
deno run --allow-read --allow-write scripts/kb-generate-index.ts /path/to/knowledge
输出: 创建包含所有实体、别名和元数据的
_entities.json
文件。

kb-resolve-entity.ts

kb-resolve-entity.ts

Search for existing entities before creating duplicates.
bash
deno run --allow-read scripts/kb-resolve-entity.ts "Flynn Effect"
deno run --allow-read scripts/kb-resolve-entity.ts "Hogarth" --threshold 0.5
deno run --allow-read scripts/kb-resolve-entity.ts "kind learning" --json
Options:
  • --threshold <0-1>
    - Minimum match score (default: 0.3)
  • --limit <n>
    - Maximum results (default: 5)
  • --json
    - Output as JSON
创建重复实体前先搜索现有实体。
bash
deno run --allow-read scripts/kb-resolve-entity.ts "Flynn Effect"
deno run --allow-read scripts/kb-resolve-entity.ts "Hogarth" --threshold 0.5
deno run --allow-read scripts/kb-resolve-entity.ts "kind learning" --json
选项:
  • --threshold <0-1>
    - 最小匹配分数(默认:0.3)
  • --limit <n>
    - 最大结果数(默认:5)
  • --json
    - 以JSON格式输出

Validation Tools

验证工具

ea-validate.ts

ea-validate.ts

Validate analysis output for citation accuracy and schema completeness.
bash
deno run --allow-read scripts/ea-validate.ts analysis.json --report

验证分析输出的引用准确性和 schema 完整性。
bash
deno run --allow-read scripts/ea-validate.ts analysis.json --report

Anti-Patterns

反模式

The Extraction Flood

提取泛滥

Pattern: Extracting every potentially interesting phrase. Fix: Ask "Would I cite this?" before extracting. Quality over quantity.
表现: 提取所有可能有趣的内容。 解决方法: 提取前先问自己“我会引用这个内容吗?”。质量优先于数量。

The Citation Black Hole

引用黑洞

Pattern: Extracting without preserving exact quotes or positions. Fix: Always capture: exact quote, chapter reference, context.
表现: 提取内容但未保留精确引用或位置信息。 解决方法: 始终捕获:精确引用、章节参考、上下文。

The Duplicate Entity

重复实体

Pattern: Creating new entity without checking if it exists. Fix: Always run
kb-resolve-entity.ts
first.
表现: 未检查是否已存在就创建新实体。 解决方法: 始终先运行
kb-resolve-entity.ts

The Orphan Entity

孤立实体

Pattern: Entities without Related Entities links. Fix: Every entity should connect to at least 2 others.
表现: 实体未添加相关实体链接。 解决方法: 每个实体至少关联2个其他实体。

The Quote-Free Entity

无引用实体

Pattern: Entity captures ideas but no memorable phrasing. Fix: Include Key Quotes section with author's exact words.
表现: 实体记录了观点但未包含生动的原文引用。 解决方法: 包含“关键引用”部分,保留作者的原话。

The Single-Book Silo

单本书孤岛

Pattern: Analyzing books without cross-referencing. Fix: After 2+ books, run synthesis to find connections.

表现: 分析书籍时未进行交叉引用。 解决方法: 提取2本及以上书籍后,运行综合分析寻找关联。

Example Workflows

示例流程

Full Entity Extraction (Range Example)

完整实体提取(以《Range》为例)

1. Scan book chapter by chapter
2. Identify all named studies, researchers, frameworks, anecdotes
3. Create inventory document listing all potential entities
4. For each entity:
   a. kb-resolve-entity.ts "[entity name]" to check existence
   b. Create markdown file in appropriate type directory
   c. Fill in template with findings and citations
   d. Add Key Quotes section
5. Create _range-quotes.md with all memorable quotes
6. Update _index.md with new entities
7. kb-generate-index.ts to rebuild _entities.json
1. 逐章扫描书籍
2. 识别所有命名研究、研究者、框架、轶事
3. 创建清单记录所有潜在实体
4. 针对每个实体:
   a. 运行kb-resolve-entity.ts "[实体名称]"检查是否已存在
   b. 在对应类型目录中创建Markdown文件
   c. 填写模板,包含发现内容和引用信息
   d. 添加关键引用部分
5. 创建_range-quotes.md文件,记录所有值得引用的内容
6. 更新_index.md,添加新实体
7. 运行kb-generate-index.ts重建_entities.json

Quick Concept Scan

快速概念扫描

1. ea-parse.ts book.txt --chunk-size 2000
2. For each chunk, extract top 3-5 concepts
3. Classify by type and layer
4. Generate concepts.json and report.md

1. 运行ea-parse.ts book.txt --chunk-size 2000
2. 针对每个分块,提取Top3-5个概念
3. 为概念分配类型和层级
4. 生成concepts.json和report.md

Output Persistence

输出存储

Entity Extraction Output

实体提取输出

FileLocation
Entity files
knowledge/{domain}/{type}/{entity-slug}.md
Quotes file
knowledge/{domain}/_[book]-quotes.md
Entity index
knowledge/_entities.json
Domain index
knowledge/{domain}/_index.md
文件位置
实体文件
knowledge/{domain}/{type}/{entity-slug}.md
引用文件
knowledge/{domain}/_[book]-quotes.md
实体索引
knowledge/_entities.json
领域索引
knowledge/{domain}/_index.md

Concept Extraction Output

概念提取输出

FileLocation
Full analysis
ebook-analysis/{author}-{title}/analysis.json
Concepts only
ebook-analysis/{author}-{title}/concepts.json
Citations
ebook-analysis/{author}-{title}/citations.json
Report
ebook-analysis/{author}-{title}/report.md

文件位置
完整分析
ebook-analysis/{author}-{title}/analysis.json
仅概念
ebook-analysis/{author}-{title}/concepts.json
引用信息
ebook-analysis/{author}-{title}/citations.json
报告
ebook-analysis/{author}-{title}/report.md

Verification (Oracle)

验证(校验机制)

What This Skill Can Verify

该技能可验证的内容

  • Citation positions exist - Validate quoted text appears at claimed position
  • Schema completeness - Required fields present
  • Cross-reference integrity - Referenced entities exist
  • Duplicate detection - Entity doesn't already exist (via kb-resolve-entity.ts)
  • 引用位置存在 - 验证引用文本是否出现在声明的位置
  • Schema完整性 - 必填字段是否齐全
  • 交叉引用完整性 - 引用的实体是否存在
  • 重复检测 - 实体是否已存在(通过kb-resolve-entity.ts)

What Requires Human Judgment

需要人工判断的内容

  • Significance - Is this worth extracting?
  • Classification - Is this really a "framework" vs "concept"?
  • Relationship validity - Does A really influence B?
  • Quote quality - Is this actually memorable?

  • 重要性 - 该内容是否值得提取?
  • 分类准确性 - 这真的是“framework”而非“concept”吗?
  • 关系有效性 - A真的影响B吗?
  • 引用质量 - 这段内容真的值得记忆吗?

Integration Graph

集成图谱

Inbound (From Other Skills)

输入(来自其他技能)

SourceLeads to
researchMulti-book synthesis ready
reverse-outlinerStructural data for concept extraction
来源触发操作
research进入跨书籍综合分析阶段
reverse-outliner为概念提取提供结构化数据

Outbound (To Other Skills)

输出(到其他技能)

From StateLeads to
Entity extraction completedna-extraction (deep functional analysis)
Concept extraction completemedia-meta-analysis (cross-source synthesis)
当前状态触发操作
实体提取完成dna-extraction(深度功能分析)
概念提取完成media-meta-analysis(跨来源综合分析)

Complementary Skills

互补技能

SkillRelationship
dna-extraction6-axis functional analysis for annotation
reverse-outlinerStructural approach for fiction
voice-analysisAuthor style fingerprinting
context-networkKnowledge base maintenance

技能关系
dna-extraction用于标注的6轴功能分析
reverse-outliner针对虚构类内容的结构化方法
voice-analysis作者风格指纹识别
context-network知识库维护

Calibration Data (from Range + Imaginable extractions)

校准数据(来自《Range》和《Imaginable》的提取实践)

By Book Density

按书籍密度分类

Book TypeExpected EntitiesEstimated Effort
Dense non-fiction (Range, Thinking Fast & Slow)60-1004-6 hours
Moderate non-fiction (most business books)30-502-3 hours
Light non-fiction (popular science)15-301-2 hours
Technical books20-402-3 hours
书籍类型预期实体数量预估耗时
高密度非虚构类(《Range》、《思考,快与慢》)60-1004-6小时
中密度非虚构类(多数商业书籍)30-502-3小时
低密度非虚构类(科普读物)15-301-2小时
技术类书籍20-402-3小时

By Book Subtype

按书籍子类型分类

Different non-fiction subtypes yield different entity profiles:
SubtypeExampleEntity ProfileExpected Count
Research synthesisRangeMany studies, researchers, frameworks60-100
Methodological/How-toImaginableMany frameworks, few studies30-50
Memoir/NarrativeEducatedFew frameworks, many anecdotes20-40
ReferenceTechnical manualsMany concepts, few anecdotesVariable
Research synthesis books cite many studies and researchers, connecting ideas across domains. Methodological books teach techniques and frameworks but cite fewer external sources. Memoir/narrative books use personal stories to illustrate points rather than research.
不同非虚构子类型的实体特征不同:
子类型示例实体特征预期数量
研究综合类《Range》大量研究、研究者、框架60-100
方法论/指南类《Imaginable》大量框架,少量研究30-50
回忆录/叙事类《你当像鸟飞往你的山》少量框架,大量轶事20-40
参考类技术手册大量概念,少量轶事不定
研究综合类书籍会引用大量研究和研究者,跨领域关联观点。 方法论类书籍教授技巧和框架,但引用的外部来源较少。 回忆录/叙事类书籍用个人故事阐释观点,而非引用研究。

Metadata Reliability Warning

元数据可靠性警告

Book classification metadata (Calibre tags, library categories) is often:
  • Wrong - Fiction/non-fiction misclassified
  • Generic - "General Fiction" or "Self-Help" applied broadly
  • Inconsistent - Same book categorized differently across sources
Always verify classification makes sense before extraction. A "fiction" tag on a methodology book like Imaginable is a metadata error.

书籍分类元数据(Calibre标签、图书馆分类)通常存在以下问题:
  • 错误 - 虚构/非虚构分类错误
  • 泛化 - 广泛使用“一般虚构”或“自助类”标签
  • 不一致 - 同一书籍在不同来源中的分类不同
提取前务必验证分类是否合理。比如将《Imaginable》这类方法论书籍标记为“虚构类”就是元数据错误。

Reasoning Requirements

推理要求

Standard Reasoning

标准推理

  • Single chunk concept extraction
  • Type/layer classification
  • Simple relationship identification
  • Individual entity creation
  • 单分块概念提取
  • 类型/层级分类
  • 简单关系识别
  • 单个实体创建

Extended Reasoning (ultrathink)

扩展推理(深度思考)

Use extended thinking for:
  • Multi-book synthesis - requires holding multiple networks simultaneously
  • Contradiction detection - semantic comparison across sources
  • Theme emergence - identifying patterns across large sets
  • Knowledge gap identification - reasoning about what's missing
Trigger phrases: "synthesize across books", "find contradictions", "identify gaps", "comprehensive analysis"

在以下场景中使用扩展思考:
  • 跨书籍综合分析 - 需要同时掌握多个知识网络
  • 矛盾检测 - 跨来源语义对比
  • 主题涌现 - 识别大规模数据集中的模式
  • 知识缺口识别 - 推理缺失的内容
触发短语: "跨书籍综合分析"、"寻找矛盾"、"识别缺口"、"全面分析"

What You Do NOT Do

禁止操作

  • Extract without citation traceability
  • Create entities without checking for duplicates
  • Skip the linking phase (orphan entities are not useful)
  • Leave entities without quotes
  • Treat fiction as non-fiction
  • Use regex for semantic analysis (LLM judgment only)
  • 无引用可追溯性的提取
  • 未检查重复就创建实体
  • 跳过链接步骤(孤立实体无实用价值)
  • 实体不包含引用内容
  • 将虚构类书籍当作非虚构类处理
  • 使用正则表达式进行语义分析(仅依赖LLM判断)