ebook-analysis
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseEbook Analysis: Non-Fiction Knowledge Extraction
电子书分析:非虚构类知识提取
You analyze ebooks to extract knowledge with full citation traceability. This skill supports two complementary extraction modes:
- Concept Extraction - Extract ideas classified by abstraction (principle → tactic)
- Entity Extraction - Extract named things (studies, researchers, frameworks, anecdotes) that persist across books
你需要分析电子书,提取具备完整引用可追溯性的知识。该技能支持两种互补的提取模式:
- 概念提取 - 按抽象程度(原则→策略)分类提取观点
- 实体提取 - 提取跨书籍通用的命名对象(研究、研究者、框架、轶事)
Core Principle
核心原则
Every extraction must be traceable to its exact source. Citation traceability is non-negotiable. Extract less with full provenance rather than more without it.
**所有提取内容必须可追溯至其确切来源。**引用可追溯性是硬性要求。宁肯少提取一些但保留完整溯源信息,也不要无溯源地大量提取。
Two Extraction Modes
两种提取模式
Mode 1: Concept Extraction
模式1:概念提取
For extracting IDEAS organized by abstraction level.
Use when: Analyzing a book for transferable ideas, building a concept taxonomy, understanding how abstract principles relate to concrete tactics.
Output: JSON files (analysis.json, concepts.json)
Example: "Spaced repetition improves retention" is a MECHANISM at Layer 2.
用于提取按抽象层级组织的观点。
适用场景: 分析书籍以获取可迁移观点、构建概念分类体系、理解抽象原则与具体策略的关联。
输出: JSON文件(analysis.json、concepts.json)
示例: "间隔重复提升记忆留存率"属于第2层的机制类概念。
Mode 2: Entity Extraction
模式2:实体提取
For extracting NAMED THINGS that can be cross-referenced across books.
Use when: Building a knowledge base where the same study, researcher, or framework appears in multiple books. The goal is entity resolution—recognizing that "Hogarth's framework" in Range is the same as "kind/wicked environments" mentioned elsewhere.
Output: Markdown files in knowledge base structure
Example: "Kind vs Wicked Environments" is a FRAMEWORK by Robin Hogarth.
用于提取可跨书籍交叉引用的命名对象。
适用场景: 构建知识库,其中同一研究、研究者或框架会出现在多本书中。目标是实体消歧——识别出《Range》中的"Hogarth框架"与其他地方提到的"良性/恶性环境"是同一对象。
输出: 知识库结构的Markdown文件
示例: "良性 vs 恶性环境"是Robin Hogarth提出的框架。
Choosing a Mode
模式选择
| If you want to... | Use Mode |
|---|---|
| Understand a book's argument structure | Concept Extraction |
| Build a reference library across books | Entity Extraction |
| Create actionable takeaways | Concept Extraction |
| Track what researchers say across sources | Entity Extraction |
| Both | Run both modes sequentially |
| 如果你想... | 使用模式 |
|---|---|
| 理解书籍的论证结构 | 概念提取 |
| 构建跨书籍的参考库 | 实体提取 |
| 生成可落地的要点 | 概念提取 |
| 追踪研究者在不同来源中的观点 | 实体提取 |
| 同时实现以上目标 | 依次运行两种模式 |
Entity Extraction Mode (Detailed)
实体提取模式(详细说明)
Entity Types
实体类型
| Type | What It Captures | Example |
|---|---|---|
| study | Research findings, experiments, data | Flynn Effect, Marshmallow Test |
| researcher | People and their contributions | Anders Ericsson, Robin Hogarth |
| framework | Mental models, taxonomies, systems | Kind vs Wicked, Desirable Difficulties |
| anecdote | Stories used to illustrate points | Tiger vs Roger, Challenger Disaster |
| concept | Ideas that aren't frameworks | Cognitive entrenchment, Match quality |
| 类型 | 捕获内容 | 示例 |
|---|---|---|
| study | 研究成果、实验、数据 | Flynn Effect、Marshmallow Test |
| researcher | 人物及其贡献 | Anders Ericsson、Robin Hogarth |
| framework | 思维模型、分类体系、系统 | Kind vs Wicked、Desirable Difficulties |
| anecdote | 用于阐释观点的故事 | Tiger vs Roger、Challenger Disaster |
| concept | 非框架类的观点 | Cognitive entrenchment、Match quality |
Extended Entity Type Guidance
扩展实体类型指导
Some entities don't fit cleanly into the five types. Guidelines:
| Entity Kind | Use Type | Rationale |
|---|---|---|
| Simulations/Games (Superstruct, EVOKE) | anecdote | Illustrative events, even if hypothetical |
| Institutions (IFTF, WEF) | researcher | Organizations contribute ideas like individuals |
| Historical events (Challenger disaster) | anecdote | Stories that illustrate principles |
| Hypothetical scenarios | anecdote | Future scenarios from books like Imaginable |
| Thought experiments | framework | If systematic; otherwise concept |
When uncertain: Default to for narratives/events, for ideas, for systematic methods.
anecdoteconceptframework部分实体无法完全匹配上述五类,遵循以下指导:
| 实体类别 | 使用类型 | 理由 |
|---|---|---|
| 模拟/游戏(Superstruct、EVOKE) | anecdote | 用于阐释的事件,即使是假设场景 |
| 机构(IFTF、WEF) | researcher | 组织和个人一样贡献观点 |
| 历史事件(挑战者号灾难) | anecdote | 用于阐释原则的故事 |
| 假设场景 | anecdote | 出自《Imaginable》这类书籍的未来场景 |
| 思想实验 | framework | 若具备系统性则归为此类;否则归为concept |
不确定时的默认规则: 叙事/事件类默认归为,观点类默认归为,系统性方法默认归为。
anecdoteconceptframeworkAuthor-as-Subject Pattern
作者作为实体的规则
When the book's author is also a significant entity (e.g., Jane McGonigal in Imaginable):
Create a researcher entity if:
- Author has notable prior work or institutional affiliation
- Author appears in Wikipedia or other reference sources
- Author's background/credentials are relevant to understanding the book
- Other books in your collection might reference them
Skip if:
- Author is primarily known only for this book
- No external sources to verify/enrich the entity
Template addition for author-subjects:
markdown
undefined当书籍作者本身是重要实体时(如《Imaginable》中的Jane McGonigal):
需创建researcher实体的情况:
- 作者有知名的前期研究成果或机构背景
- 作者出现在维基百科或其他参考来源中
- 作者的背景/资质对理解书籍内容有帮助
- 你的书籍集合中其他书籍可能会引用该作者
无需创建的情况:
- 作者仅因本书为人所知
- 没有外部来源可验证/丰富该实体信息
作者实体的模板补充:
markdown
undefinedNote
说明
This researcher is the author of [Book] in our collection. Their frameworks and concepts are documented separately.
undefined该研究者是我们集合中《[书名]》的作者。其提出的框架和概念已单独记录。
undefinedEntity File Template
实体文件模板
markdown
undefinedmarkdown
undefined[Entity Name]
[实体名称]
Type: study | researcher | framework | anecdote | concept
Status: stub | partial | solid | authoritative
Last Updated: YYYY-MM-DD
Aliases: alias1, alias2, alias3
类型: study | researcher | framework | anecdote | concept
状态: stub | partial | solid | authoritative
最后更新: YYYY-MM-DD
别名: 别名1, 别名2, 别名3
Summary
摘要
[2-3 sentence synthesized understanding]
[2-3句话的综合理解]
Key Findings / What It Illustrates
关键发现 / 阐释内容
-
[Claim or finding with source] — Source: [Book], Ch.[X]
-
[Another claim] — Source: [Book], Ch.[X]
-
[观点或发现及来源] — 来源:《[书名]》,第[X]章
-
[另一观点] — 来源:《[书名]》,第[X]章
Key Quotes
关键引用
"Quotable text here."
"Another memorable quote."
"引用原文内容。"
"另一段值得记忆的引用。"
Sources in Collection
集合中的来源
| Book | Author | How It's Used | Citation |
|---|---|---|---|
| Range | Epstein | [Role in book] | Ch.X |
| 书名 | 作者 | 书中作用 | 引用位置 |
|---|---|---|---|
| Range | Epstein | [在书中的角色] | 第X章 |
Sources NOT in Collection
集合外的来源
- [Book that would enrich this entity]
- [可丰富该实体的书籍]
Related Entities
相关实体
- Other Entity - Relationship description
- 其他实体 - 关系描述
Open Questions
待解决问题
- [What we don't yet know]
undefined- [我们尚未知晓的内容]
undefinedKnowledge Base Structure
知识库结构
/knowledge/
├── _index.md # Master registry
├── _entities.json # Searchable index (generated)
│
├── nonfiction/
│ ├── _index.md # Domain index
│ ├── _[book]-quotes.md # Book-specific quotes file
│ ├── studies/
│ │ ├── flynn-effect.md
│ │ └── chase-simon-chunking.md
│ ├── researchers/
│ │ ├── hogarth-robin.md
│ │ └── tetlock-philip.md
│ ├── frameworks/
│ │ ├── kind-vs-wicked-environments.md
│ │ └── desirable-difficulties.md
│ ├── anecdotes/
│ │ ├── tiger-vs-roger.md
│ │ └── challenger-disaster.md
│ └── concepts/
│ ├── cognitive-entrenchment.md
│ └── match-quality.md
│
├── cooking/ # Domain-specific structure
│ ├── techniques/
│ ├── ingredients/
│ └── equipment/
│
└── technical/
├── patterns/
└── technologies//knowledge/
├── _index.md # 主注册表
├── _entities.json # 可搜索索引(自动生成)
│
├── nonfiction/
│ ├── _index.md # 领域索引
│ ├── _[book]-quotes.md # 书籍专属引用文件
│ ├── studies/
│ │ ├── flynn-effect.md
│ │ └── chase-simon-chunking.md
│ ├── researchers/
│ │ ├── hogarth-robin.md
│ │ └── tetlock-philip.md
│ ├── frameworks/
│ │ ├── kind-vs-wicked-environments.md
│ │ └── desirable-difficulties.md
│ ├── anecdotes/
│ │ ├── tiger-vs-roger.md
│ │ └── challenger-disaster.md
│ └── concepts/
│ ├── cognitive-entrenchment.md
│ └── match-quality.md
│
├── cooking/ # 领域专属结构
│ ├── techniques/
│ ├── ingredients/
│ └── equipment/
│
└── technical/
├── patterns/
└── technologies/Quotes Extraction
引用提取
Quotable quotes are a distinct extraction type. For each book, create a quotes file:
File:
_[book-slug]-quotes.mdStructure:
markdown
undefined值得引用的内容是一种独立的提取类型。为每本书创建一个引用文件:
文件:
_[book-slug]-quotes.md结构:
markdown
undefinedQuotable Quotes from [Book Title]
《[书名]》中的精选引用
Author: [Author]
Last Updated: YYYY-MM-DD
作者: [作者名]
最后更新: YYYY-MM-DD
On [Theme 1]
关于[主题1]
"Quote text here."
"Another quote on same theme."
"引用内容。"
"同一主题的另一引用。"
On [Theme 2]
关于[主题2]
"Quote on different theme."
**What makes a good quote:**
- Memorable phrasing that captures a key insight
- Self-contained (understandable without context)
- Surprising or counterintuitive formulation
- Useful for presentations, writing, or reference"不同主题的引用。"
**优质引用的标准:**
- 表述生动,能捕捉核心洞见
- 独立成意(无需上下文也能理解)
- 观点新颖或反直觉
- 适用于演讲、写作或参考Entity Extraction Workflow
实体提取流程
- Scan book - Read through identifying named studies, researchers, frameworks, illustrative stories
- Check existing entities - Use to see if entity already exists
kb-resolve-entity.ts - Create or update - New entity → create file; existing → add as source
- Add quotes - Extract memorable quotes to quotes file
- Cross-link - Add Related Entities sections
- Regenerate index - Run
kb-generate-index.ts
- 扫描书籍 - 逐章阅读,识别所有命名研究、研究者、框架、阐释性故事
- 检查现有实体 - 使用查看实体是否已存在
kb-resolve-entity.ts - 创建或更新 - 新实体→创建文件;已有实体→添加新来源
- 添加引用 - 将值得记忆的引用提取至引用文件
- 交叉链接 - 添加相关实体章节
- 重新生成索引 - 运行
kb-generate-index.ts
Entity Extraction States (KB0-KB5)
实体提取状态(KB0-KB5)
| State | Symptoms | Intervention |
|---|---|---|
| KB0 | No knowledge base | Create directory structure |
| KB1 | Structure exists, no entities | Begin extraction |
| KB2 | Extracting from book | Create entity files |
| KB3 | Entities created, not linked | Add Related Entities |
| KB4 | Linked, no index | Run kb-generate-index.ts |
| KB5 | Complete for this book | Proceed to next book |
| 状态 | 特征 | 处理动作 |
|---|---|---|
| KB0 | 无知识库 | 创建目录结构 |
| KB1 | 结构已存在,但无实体 | 开始提取 |
| KB2 | 正在从书籍中提取 | 创建实体文件 |
| KB3 | 实体已创建,但未链接 | 添加相关实体 |
| KB4 | 已链接,但无索引 | 运行kb-generate-index.ts |
| KB5 | 本书提取完成 | 处理下一本书 |
Cross-Book Synthesis Workflow
跨书籍综合流程
Triggered when: 2+ books have been extracted to the knowledge base.
Goals:
- Find entities that appear in multiple books
- Identify conceptual connections between books
- Surface contradictions or complementary perspectives
- Update entity files with multi-source synthesis
Process:
-
Entity overlap detectionbash
# Find entities with 2+ sources grep -l "Sources in Collection" knowledge/nonfiction/**/*.md | \ xargs grep -l "| .* | .* |" | head -20Or manually review entities updated with new source. -
Conceptual connection mapping
- Compare frameworks across books (e.g., Range's "wicked environments" ↔ Imaginable's "futures thinking")
- Identify shared researchers (e.g., Tetlock appears in both Range and Imaginable)
- Look for complementary themes (prediction failure → preparation despite uncertainty)
-
Synthesis documentation For entities appearing in 2+ books, update the Summary section:markdown
## Summary [Synthesized understanding from BOTH sources, noting agreements and differences] -
Cross-book insights Document thematic connections in:
context/insights/cross-book-{theme}.mdmarkdown# Cross-Book Insight: [Theme] ## Books Contributing - Range (Epstein) - [perspective] - Imaginable (McGonigal) - [perspective] ## Synthesis [How the books complement or contradict each other]
触发条件: 知识库中已提取2本及以上书籍。
目标:
- 找出在多本书中出现的实体
- 识别书籍间的概念关联
- 发现矛盾或互补观点
- 用多来源综合信息更新实体文件
流程:
-
实体重叠检测bash
# 找出有2个及以上来源的实体 grep -l "Sources in Collection" knowledge/nonfiction/**/*.md | \ xargs grep -l "| .* | .* |" | head -20或手动查看新增来源的实体。 -
概念关联映射
- 对比不同书籍中的框架(如《Range》的"恶性环境" ↔ 《Imaginable》的"未来思维")
- 识别共同提及的研究者(如Tetlock同时出现在《Range》和《Imaginable》中)
- 寻找互补主题(预测失败→在不确定性中做好准备)
-
综合信息记录 对于在2本及以上书籍中出现的实体,更新摘要部分:markdown
## 摘要 [结合两个来源的综合理解,注明共识与差异] -
跨书籍洞见记录 在中记录主题关联:
context/insights/cross-book-{theme}.mdmarkdown# 跨书籍洞见:[主题] ## 涉及书籍 - 《Range》(Epstein) - [观点] - 《Imaginable》(McGonigal) - [观点] ## 综合分析 [书籍间的互补或矛盾之处]
Concept Extraction Mode (Detailed)
概念提取模式(详细说明)
Concept Types (Abstract → Concrete)
概念类型(从抽象到具体)
| Type | Definition | Example |
|---|---|---|
| Principle | Foundational truth or axiom | "Communities form around shared identity" |
| Mechanism | How something works | "Reciprocity creates social bonds" |
| Pattern | Recurring structure or approach | "The community lifecycle pattern" |
| Strategy | High-level approach to achieve goals | "Build trust before asking for contribution" |
| Tactic | Specific actionable technique | "Send welcome emails within 24 hours" |
| 类型 | 定义 | 示例 |
|---|---|---|
| 原则 | 基础真理或公理 | "社区围绕共同身份形成" |
| 机制 | 事物的运作方式 | "互惠关系构建社会纽带" |
| 模式 | 重复出现的结构或方法 | "社区生命周期模式" |
| 策略 | 实现目标的高阶方法 | "先建立信任再请求贡献" |
| 策略 | 具体可执行的技巧 | "24小时内发送欢迎邮件" |
Abstraction Layers
抽象层级
| Layer | Name | Abstraction | Example |
|---|---|---|---|
| 0 | Foundational | Universal principles | "Humans seek belonging" |
| 1 | Theoretical | Domain-specific theory | "Community requires shared purpose" |
| 2 | Strategic | Approaches and frameworks | "The funnel model of engagement" |
| 3 | Tactical | Specific methods | "Onboarding sequences" |
| 4 | Specific | Concrete implementations | "Use Discourse for forums" |
| 层级 | 名称 | 抽象程度 | 示例 |
|---|---|---|---|
| 0 | 基础层 | 通用原则 | "人类寻求归属感" |
| 1 | 理论层 | 领域专属理论 | "社区需要共同目标" |
| 2 | 策略层 | 方法与框架 | "参与度漏斗模型" |
| 3 | 战术层 | 具体方法 | "新用户引导流程" |
| 4 | 具体层 | 落地实现 | "使用Discourse搭建论坛" |
Relationship Types
关系类型
| Relationship | Meaning | When to Use |
|---|---|---|
| INFLUENCES | A affects B | Causal or correlational connection |
| SUPPORTS | A provides evidence for B | Citation, example, validation |
| CONTRADICTS | A conflicts with B | Opposing claims |
| COMPOSED_OF | A contains B | Part-whole relationships |
| DERIVES_FROM | A is derived from B | Logical conclusions |
| 关系 | 含义 | 适用场景 |
|---|---|---|
| INFLUENCES | A影响B | 因果或相关关联 |
| SUPPORTS | A为B提供证据 | 引用、示例、验证 |
| CONTRADICTS | A与B冲突 | 对立观点 |
| COMPOSED_OF | A包含B | 整体-部分关系 |
| DERIVES_FROM | A源自B | 逻辑推导关系 |
Concept Extraction States (EA0-EA7)
概念提取状态(EA0-EA7)
| State | Symptoms | Intervention |
|---|---|---|
| EA0 | No input file | Guide file preparation |
| EA1 | Raw file, not parsed | Run ea-parse.ts |
| EA2 | Parsed, not extracted | LLM extracts concepts |
| EA3 | Extracted, not classified | Assign types and layers |
| EA4 | Classified, not annotated | Add themes, relationships |
| EA5 | Single book complete | Export or proceed to synthesis |
| EA6 | Multi-book ready | Cross-book synthesis |
| EA7 | Analysis complete | Generate reports |
| 状态 | 特征 | 处理动作 |
|---|---|---|
| EA0 | 无输入文件 | 指导文件准备 |
| EA1 | 原始文件,未解析 | 运行ea-parse.ts |
| EA2 | 已解析,未提取 | 用LLM提取概念 |
| EA3 | 已提取,未分类 | 分配类型和层级 |
| EA4 | 已分类,未标注 | 添加主题和关系 |
| EA5 | 单本书处理完成 | 导出或继续综合分析 |
| EA6 | 多本书待综合 | 跨书籍综合分析 |
| EA7 | 分析完成 | 生成报告 |
Concept Extraction Workflow
概念提取流程
- Parse - Run to chunk book with position tracking
ea-parse.ts - Extract - Present chunks to LLM for concept identification with exact quotes
- Classify - Assign type (principle→tactic) and layer (0-4)
- Annotate - Add themes and functional analysis
- Link - Connect related concepts
- Export - Generate analysis.json, concepts.json, report.md
- 解析 - 运行将书籍分块并保留位置追踪信息
ea-parse.ts - 提取 - 将分块内容提交给LLM,提取带精确引用的概念
- 分类 - 分配类型(原则→策略)和层级(0-4)
- 标注 - 添加主题和功能分析
- 链接 - 关联相关概念
- 导出 - 生成analysis.json、concepts.json、report.md
Available Tools
可用工具
Parsing Tools
解析工具
ea-parse.ts
ea-parse.ts
Parse ebook files into chunks with metadata and position tracking.
bash
deno run --allow-read scripts/ea-parse.ts path/to/book.txt
deno run --allow-read scripts/ea-parse.ts path/to/book.epub --format epub
deno run --allow-read scripts/ea-parse.ts book.txt --chunk-size 1500 --overlap 150Output: JSON with metadata, chapters (if detected), and chunks with positions.
将电子书文件解析为带元数据和位置追踪的分块内容。
bash
deno run --allow-read scripts/ea-parse.ts path/to/book.txt
deno run --allow-read scripts/ea-parse.ts path/to/book.epub --format epub
deno run --allow-read scripts/ea-parse.ts book.txt --chunk-size 1500 --overlap 150输出: 包含元数据、章节(若可识别)和带位置信息的分块内容的JSON文件。
Knowledge Base Tools
知识库工具
kb-generate-index.ts
kb-generate-index.ts
Scan knowledge base and generate searchable entity index.
bash
deno run --allow-read --allow-write scripts/kb-generate-index.ts /path/to/knowledgeOutput: Creates with all entities, aliases, and metadata.
_entities.json扫描知识库并生成可搜索的实体索引。
bash
deno run --allow-read --allow-write scripts/kb-generate-index.ts /path/to/knowledge输出: 创建包含所有实体、别名和元数据的文件。
_entities.jsonkb-resolve-entity.ts
kb-resolve-entity.ts
Search for existing entities before creating duplicates.
bash
deno run --allow-read scripts/kb-resolve-entity.ts "Flynn Effect"
deno run --allow-read scripts/kb-resolve-entity.ts "Hogarth" --threshold 0.5
deno run --allow-read scripts/kb-resolve-entity.ts "kind learning" --jsonOptions:
- - Minimum match score (default: 0.3)
--threshold <0-1> - - Maximum results (default: 5)
--limit <n> - - Output as JSON
--json
创建重复实体前先搜索现有实体。
bash
deno run --allow-read scripts/kb-resolve-entity.ts "Flynn Effect"
deno run --allow-read scripts/kb-resolve-entity.ts "Hogarth" --threshold 0.5
deno run --allow-read scripts/kb-resolve-entity.ts "kind learning" --json选项:
- - 最小匹配分数(默认:0.3)
--threshold <0-1> - - 最大结果数(默认:5)
--limit <n> - - 以JSON格式输出
--json
Validation Tools
验证工具
ea-validate.ts
ea-validate.ts
Validate analysis output for citation accuracy and schema completeness.
bash
deno run --allow-read scripts/ea-validate.ts analysis.json --report验证分析输出的引用准确性和 schema 完整性。
bash
deno run --allow-read scripts/ea-validate.ts analysis.json --reportAnti-Patterns
反模式
The Extraction Flood
提取泛滥
Pattern: Extracting every potentially interesting phrase.
Fix: Ask "Would I cite this?" before extracting. Quality over quantity.
表现: 提取所有可能有趣的内容。
解决方法: 提取前先问自己“我会引用这个内容吗?”。质量优先于数量。
The Citation Black Hole
引用黑洞
Pattern: Extracting without preserving exact quotes or positions.
Fix: Always capture: exact quote, chapter reference, context.
表现: 提取内容但未保留精确引用或位置信息。
解决方法: 始终捕获:精确引用、章节参考、上下文。
The Duplicate Entity
重复实体
Pattern: Creating new entity without checking if it exists.
Fix: Always run first.
kb-resolve-entity.ts表现: 未检查是否已存在就创建新实体。
解决方法: 始终先运行。
kb-resolve-entity.tsThe Orphan Entity
孤立实体
Pattern: Entities without Related Entities links.
Fix: Every entity should connect to at least 2 others.
表现: 实体未添加相关实体链接。
解决方法: 每个实体至少关联2个其他实体。
The Quote-Free Entity
无引用实体
Pattern: Entity captures ideas but no memorable phrasing.
Fix: Include Key Quotes section with author's exact words.
表现: 实体记录了观点但未包含生动的原文引用。
解决方法: 包含“关键引用”部分,保留作者的原话。
The Single-Book Silo
单本书孤岛
Pattern: Analyzing books without cross-referencing.
Fix: After 2+ books, run synthesis to find connections.
表现: 分析书籍时未进行交叉引用。
解决方法: 提取2本及以上书籍后,运行综合分析寻找关联。
Example Workflows
示例流程
Full Entity Extraction (Range Example)
完整实体提取(以《Range》为例)
1. Scan book chapter by chapter
2. Identify all named studies, researchers, frameworks, anecdotes
3. Create inventory document listing all potential entities
4. For each entity:
a. kb-resolve-entity.ts "[entity name]" to check existence
b. Create markdown file in appropriate type directory
c. Fill in template with findings and citations
d. Add Key Quotes section
5. Create _range-quotes.md with all memorable quotes
6. Update _index.md with new entities
7. kb-generate-index.ts to rebuild _entities.json1. 逐章扫描书籍
2. 识别所有命名研究、研究者、框架、轶事
3. 创建清单记录所有潜在实体
4. 针对每个实体:
a. 运行kb-resolve-entity.ts "[实体名称]"检查是否已存在
b. 在对应类型目录中创建Markdown文件
c. 填写模板,包含发现内容和引用信息
d. 添加关键引用部分
5. 创建_range-quotes.md文件,记录所有值得引用的内容
6. 更新_index.md,添加新实体
7. 运行kb-generate-index.ts重建_entities.jsonQuick Concept Scan
快速概念扫描
1. ea-parse.ts book.txt --chunk-size 2000
2. For each chunk, extract top 3-5 concepts
3. Classify by type and layer
4. Generate concepts.json and report.md1. 运行ea-parse.ts book.txt --chunk-size 2000
2. 针对每个分块,提取Top3-5个概念
3. 为概念分配类型和层级
4. 生成concepts.json和report.mdOutput Persistence
输出存储
Entity Extraction Output
实体提取输出
| File | Location |
|---|---|
| Entity files | |
| Quotes file | |
| Entity index | |
| Domain index | |
| 文件 | 位置 |
|---|---|
| 实体文件 | |
| 引用文件 | |
| 实体索引 | |
| 领域索引 | |
Concept Extraction Output
概念提取输出
| File | Location |
|---|---|
| Full analysis | |
| Concepts only | |
| Citations | |
| Report | |
| 文件 | 位置 |
|---|---|
| 完整分析 | |
| 仅概念 | |
| 引用信息 | |
| 报告 | |
Verification (Oracle)
验证(校验机制)
What This Skill Can Verify
该技能可验证的内容
- Citation positions exist - Validate quoted text appears at claimed position
- Schema completeness - Required fields present
- Cross-reference integrity - Referenced entities exist
- Duplicate detection - Entity doesn't already exist (via kb-resolve-entity.ts)
- 引用位置存在 - 验证引用文本是否出现在声明的位置
- Schema完整性 - 必填字段是否齐全
- 交叉引用完整性 - 引用的实体是否存在
- 重复检测 - 实体是否已存在(通过kb-resolve-entity.ts)
What Requires Human Judgment
需要人工判断的内容
- Significance - Is this worth extracting?
- Classification - Is this really a "framework" vs "concept"?
- Relationship validity - Does A really influence B?
- Quote quality - Is this actually memorable?
- 重要性 - 该内容是否值得提取?
- 分类准确性 - 这真的是“framework”而非“concept”吗?
- 关系有效性 - A真的影响B吗?
- 引用质量 - 这段内容真的值得记忆吗?
Integration Graph
集成图谱
Inbound (From Other Skills)
输入(来自其他技能)
| Source | Leads to |
|---|---|
| research | Multi-book synthesis ready |
| reverse-outliner | Structural data for concept extraction |
| 来源 | 触发操作 |
|---|---|
| research | 进入跨书籍综合分析阶段 |
| reverse-outliner | 为概念提取提供结构化数据 |
Outbound (To Other Skills)
输出(到其他技能)
| From State | Leads to |
|---|---|
| Entity extraction complete | dna-extraction (deep functional analysis) |
| Concept extraction complete | media-meta-analysis (cross-source synthesis) |
| 当前状态 | 触发操作 |
|---|---|
| 实体提取完成 | dna-extraction(深度功能分析) |
| 概念提取完成 | media-meta-analysis(跨来源综合分析) |
Complementary Skills
互补技能
| Skill | Relationship |
|---|---|
| dna-extraction | 6-axis functional analysis for annotation |
| reverse-outliner | Structural approach for fiction |
| voice-analysis | Author style fingerprinting |
| context-network | Knowledge base maintenance |
| 技能 | 关系 |
|---|---|
| dna-extraction | 用于标注的6轴功能分析 |
| reverse-outliner | 针对虚构类内容的结构化方法 |
| voice-analysis | 作者风格指纹识别 |
| context-network | 知识库维护 |
Calibration Data (from Range + Imaginable extractions)
校准数据(来自《Range》和《Imaginable》的提取实践)
By Book Density
按书籍密度分类
| Book Type | Expected Entities | Estimated Effort |
|---|---|---|
| Dense non-fiction (Range, Thinking Fast & Slow) | 60-100 | 4-6 hours |
| Moderate non-fiction (most business books) | 30-50 | 2-3 hours |
| Light non-fiction (popular science) | 15-30 | 1-2 hours |
| Technical books | 20-40 | 2-3 hours |
| 书籍类型 | 预期实体数量 | 预估耗时 |
|---|---|---|
| 高密度非虚构类(《Range》、《思考,快与慢》) | 60-100 | 4-6小时 |
| 中密度非虚构类(多数商业书籍) | 30-50 | 2-3小时 |
| 低密度非虚构类(科普读物) | 15-30 | 1-2小时 |
| 技术类书籍 | 20-40 | 2-3小时 |
By Book Subtype
按书籍子类型分类
Different non-fiction subtypes yield different entity profiles:
| Subtype | Example | Entity Profile | Expected Count |
|---|---|---|---|
| Research synthesis | Range | Many studies, researchers, frameworks | 60-100 |
| Methodological/How-to | Imaginable | Many frameworks, few studies | 30-50 |
| Memoir/Narrative | Educated | Few frameworks, many anecdotes | 20-40 |
| Reference | Technical manuals | Many concepts, few anecdotes | Variable |
Research synthesis books cite many studies and researchers, connecting ideas across domains.
Methodological books teach techniques and frameworks but cite fewer external sources.
Memoir/narrative books use personal stories to illustrate points rather than research.
不同非虚构子类型的实体特征不同:
| 子类型 | 示例 | 实体特征 | 预期数量 |
|---|---|---|---|
| 研究综合类 | 《Range》 | 大量研究、研究者、框架 | 60-100 |
| 方法论/指南类 | 《Imaginable》 | 大量框架,少量研究 | 30-50 |
| 回忆录/叙事类 | 《你当像鸟飞往你的山》 | 少量框架,大量轶事 | 20-40 |
| 参考类 | 技术手册 | 大量概念,少量轶事 | 不定 |
研究综合类书籍会引用大量研究和研究者,跨领域关联观点。
方法论类书籍教授技巧和框架,但引用的外部来源较少。
回忆录/叙事类书籍用个人故事阐释观点,而非引用研究。
Metadata Reliability Warning
元数据可靠性警告
Book classification metadata (Calibre tags, library categories) is often:
- Wrong - Fiction/non-fiction misclassified
- Generic - "General Fiction" or "Self-Help" applied broadly
- Inconsistent - Same book categorized differently across sources
Always verify classification makes sense before extraction. A "fiction" tag on a methodology book like Imaginable is a metadata error.
书籍分类元数据(Calibre标签、图书馆分类)通常存在以下问题:
- 错误 - 虚构/非虚构分类错误
- 泛化 - 广泛使用“一般虚构”或“自助类”标签
- 不一致 - 同一书籍在不同来源中的分类不同
提取前务必验证分类是否合理。比如将《Imaginable》这类方法论书籍标记为“虚构类”就是元数据错误。
Reasoning Requirements
推理要求
Standard Reasoning
标准推理
- Single chunk concept extraction
- Type/layer classification
- Simple relationship identification
- Individual entity creation
- 单分块概念提取
- 类型/层级分类
- 简单关系识别
- 单个实体创建
Extended Reasoning (ultrathink)
扩展推理(深度思考)
Use extended thinking for:
- Multi-book synthesis - requires holding multiple networks simultaneously
- Contradiction detection - semantic comparison across sources
- Theme emergence - identifying patterns across large sets
- Knowledge gap identification - reasoning about what's missing
Trigger phrases: "synthesize across books", "find contradictions", "identify gaps", "comprehensive analysis"
在以下场景中使用扩展思考:
- 跨书籍综合分析 - 需要同时掌握多个知识网络
- 矛盾检测 - 跨来源语义对比
- 主题涌现 - 识别大规模数据集中的模式
- 知识缺口识别 - 推理缺失的内容
触发短语: "跨书籍综合分析"、"寻找矛盾"、"识别缺口"、"全面分析"
What You Do NOT Do
禁止操作
- Extract without citation traceability
- Create entities without checking for duplicates
- Skip the linking phase (orphan entities are not useful)
- Leave entities without quotes
- Treat fiction as non-fiction
- Use regex for semantic analysis (LLM judgment only)
- 无引用可追溯性的提取
- 未检查重复就创建实体
- 跳过链接步骤(孤立实体无实用价值)
- 实体不包含引用内容
- 将虚构类书籍当作非虚构类处理
- 使用正则表达式进行语义分析(仅依赖LLM判断)