concept-synthesis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

concept-synthesis — From Raw Stubs to Intellectual Map

concept-synthesis — 从原始草稿到知识图谱

Convention: see conventions/quality.md for back-link enforcement and quote-fidelity requirements.
Convention: see _brain-filing-rules.md — output files under
concepts/
per the primary-subject rule.
规范: 请查看 conventions/quality.md 了解反向链接强制要求和引用保真度规则。
规范: 请查看 _brain-filing-rules.md — 按照主主题规则将输出文件存放在
concepts/
目录下。

What this solves

解决的问题

Many ingestion pipelines (signal-detector, idea-ingest, voice-note-ingest) create a concept page for every idea mentioned. Over months this produces:
  • Thousands of stub pages, many duplicates or near-duplicates
  • Timeline entries that repeat the same source across multiple concept pages
  • No synthesis — just "the user mentioned X on this date"
  • No tier assignments — everything flat
  • No clustering — related ideas aren't linked
This skill transforms that raw material into a curated intellectual map.
许多数据摄入流水线(信号检测、想法录入、语音笔记录入)会为每个提及的想法创建一个概念页面。数月下来,这会导致:
  • 数千个草稿页面,其中许多是重复或近乎重复的
  • 时间线条目在多个概念页面中重复引用同一来源
  • 缺乏整合——仅记录“用户在该日期提及了X”
  • 没有层级划分——所有内容都是扁平化的
  • 没有聚类——相关想法之间未建立关联
本技能将这些原始素材转换为经过整理的知识图谱。

Architecture

架构

Phase 1: Dedup + merge (deterministic)
  N stubs → ~N/4 canonical concepts
    ├── Jaccard dedup (word-overlap on titles + first-paragraph)
    ├── Substring dedup ("founder mode" vs "founder mode vs manager mode")
    ├── Semantic dedup (LLM: "are these the same idea?")
    └── Merge timelines + aliases from duplicates into the canonical page

Phase 2: Score + tier (deterministic + heuristic)
  Each canonical concept → scored and tiered
    ├── Frequency: distinct sources referencing this concept
    ├── Timespan: first mention → last mention in days
    ├── Breadth: distinct months it appears in
    ├── Engagement: avg engagement on concept-bearing sources (if available)
    └── Tier: T1 Canon | T2 Developing | T3 Speculative | T4 Riff

Phase 3: Synthesize (LLM, T1+T2 only)
  T1 + T2 concepts → rich synthesis
    ├── Evolution narrative: how the idea sharpened over time
    ├── Best articulation: highest-engagement or most precise quote
    ├── Related concepts: cross-links to other concepts
    ├── Context: what was happening when this idea emerged / evolved
    └── Counter-positions: what this idea argues against

Phase 4: Cluster + map (LLM)
  All tiered concepts → intellectual clusters
    ├── Group related concepts into domains (auto-named via LLM)
    ├── Generate cluster summary pages
    ├── Build a master concepts/README.md with the full map
    └── Identify idea genealogies (concept A → evolved into concept B)
Phase 1: Dedup + merge (deterministic)
  N stubs → ~N/4 canonical concepts
    ├── Jaccard dedup (word-overlap on titles + first-paragraph)
    ├── Substring dedup ("founder mode" vs "founder mode vs manager mode")
    ├── Semantic dedup (LLM: "are these the same idea?")
    └── Merge timelines + aliases from duplicates into the canonical page

Phase 2: Score + tier (deterministic + heuristic)
  Each canonical concept → scored and tiered
    ├── Frequency: distinct sources referencing this concept
    ├── Timespan: first mention → last mention in days
    ├── Breadth: distinct months it appears in
    ├── Engagement: avg engagement on concept-bearing sources (if available)
    └── Tier: T1 Canon | T2 Developing | T3 Speculative | T4 Riff

Phase 3: Synthesize (LLM, T1+T2 only)
  T1 + T2 concepts → rich synthesis
    ├── Evolution narrative: how the idea sharpened over time
    ├── Best articulation: highest-engagement or most precise quote
    ├── Related concepts: cross-links to other concepts
    ├── Context: what was happening when this idea emerged / evolved
    └── Counter-positions: what this idea argues against

Phase 4: Cluster + map (LLM)
  All tiered concepts → intellectual clusters
    ├── Group related concepts into domains (auto-named via LLM)
    ├── Generate cluster summary pages
    ├── Build a master concepts/README.md with the full map
    └── Identify idea genealogies (concept A → evolved into concept B)

Invocation

调用方式

The skill is markdown agent instructions. The agent uses gbrain's existing operations + LLM passes:
bash
undefined
本技能是Markdown格式的Agent指令。Agent会使用gbrain的现有操作+LLM处理步骤:
bash
undefined

1. List all concept pages

1. 列出所有概念页面

gbrain query "type:concept" --limit 10000 --json
gbrain query "type:concept" --limit 10000 --json

2. Phase 1 dedup — agent applies Jaccard + substring locally,

2. Phase 1 dedup — agent applies Jaccard + substring locally,

then LLM passes to identify semantic duplicates.

then LLM passes to identify semantic duplicates.

3. Phase 2 tier — agent scores each canonical concept based on

3. Phase 2 tier — agent scores each canonical concept based on

frequency / timespan / breadth and writes tier into frontmatter.

frequency / timespan / breadth and writes tier into frontmatter.

4. Phase 3 synthesis — for each T1/T2, agent reads the timeline

4. Phase 3 synthesis — for each T1/T2, agent reads the timeline

+ associated source pages and writes a synthesis section

+ associated source pages and writes a synthesis section

onto the concept page via put_page.

onto the concept page via put_page.

5. Phase 4 clustering — agent reads the tiered concept list

5. Phase 4 clustering — agent reads the tiered concept list

and writes concepts/README.md with the full intellectual map.

and writes concepts/README.md with the full intellectual map.

undefined
undefined

Output: concept page format (post-synthesis)

输出:整合后的概念页面格式

T1 Canon — full synthesis

T1 标准概念 — 完整整合内容

markdown
---
title: "concept name"
type: concept
tier: 1
tier_label: "Canon"
mention_count: 18
distinct_months: 8
first_mention: "YYYY-MM-DD"
last_mention: "YYYY-MM-DD"
composite_score: 78.4
aliases: ["alternate phrasing 1", "alternate phrasing 2"]
related: ["sibling-concept-1", "sibling-concept-2"]
---
markdown
---
title: "概念名称"
type: concept
tier: 1
tier_label: "Canon"
mention_count: 18
distinct_months: 8
first_mention: "YYYY-MM-DD"
last_mention: "YYYY-MM-DD"
composite_score: 78.4
aliases: ["替代表述1", "替代表述2"]
related: ["关联概念1", "关联概念2"]
---

concept name

概念名称

Tier 1 — Canon | 18 mentions across 8 months
Tier 1 — 标准 | 8个月内被提及18次

Synthesis

整合内容

[2-4 paragraph narrative tracing how the idea evolved, what it means in the user's worldview, why it matters. Third-person analytical voice.]
[2-4段叙事,追踪想法如何演变、在用户世界观中的含义及其重要性。采用第三人称分析语气。]

Best Articulation

最佳表述

"Verbatim quote from a source — the most precise or highest-engagement expression of this idea." — Date
"来自源文件的逐字引用——该想法最精准或参与度最高的表达。" — 日期

Evolution

演变历程

PeriodExpressionSignal
YYYY-MM"First articulation"First use — aspiration frame
YYYY-MM"Sharpening"Anti-pattern emerges
YYYY-MM"Peak form"Cleanest expression
时间段表述内容信号
YYYY-MM"首次表述"首次使用——愿景框架
YYYY-MM"逐步清晰"反模式出现
YYYY-MM"成熟形态"最清晰的表述

Related Concepts

相关概念

  • sibling concept — relationship description
  • sibling concept — relationship description
  • 关联概念 — 关系描述
  • 关联概念 — 关系描述

Timeline

时间线

[Full timeline with deduped entries, quotes, source links]
undefined
[包含去重条目、引用和源链接的完整时间线]
undefined

T3 / T4 — stub only (no LLM synthesis)

T3 / T4 — 仅草稿(无LLM整合内容)

markdown
---
title: "concept name"
type: concept
tier: 4
tier_label: "Riff"
mention_count: 1
---
markdown
---
title: "概念名称"
type: concept
tier: 4
tier_label: "Riff"
mention_count: 1
---

concept name

概念名称

Tier 4 — Riff | 1 mention
"Quote from the source" — Date
undefined
Tier 4 — 即兴 | 被提及1次
"来自源文件的引用" — 日期
undefined

Output: cluster map at concepts/README.md

输出:concepts/README.md中的聚类图谱

markdown
undefined
markdown
undefined

Intellectual Universe

知识体系

Canon (T1) — N concepts

标准(T1) — N个概念

The permanent intellectual fingerprint. Ideas that recur across years.
永久的知识特征图谱。多年来反复出现的想法。

[Cluster Name]

[聚类名称]

  • concept-slug — one-line characterization
  • ...
  • 概念别名 — 一行描述
  • ...

[Other Cluster]

[其他聚类]

  • ...
  • ...

Developing (T2) — N concepts

发展中(T2) — N个概念

Sharpening. Might become canon.
逐步清晰。可能成为标准概念。

Speculative (T3) — N concepts

探索性(T3) — N个概念

Testing in public.
公开测试中的想法。

Stats

统计数据

  • Total concepts: N
  • T1 Canon: N
  • T2 Developing: N
  • T3 Speculative: N
  • T4 Riff: N
  • Earliest source: YYYY-MM-DD
  • Latest source: YYYY-MM-DD
undefined
  • 总概念数:N
  • T1标准概念:N
  • T2发展中概念:N
  • T3探索性概念:N
  • T4即兴概念:N
  • 最早来源:YYYY-MM-DD
  • 最新来源:YYYY-MM-DD
undefined

Quality gates

质量门槛

Dedup quality

去重质量

  • No two concept pages should be "the same idea in different words."
  • Aliases preserved in frontmatter for search.
  • Run
    gbrain query "type:concept"
    and spot-check the count reduction.
  • 任意两个概念页面不应是“同一想法的不同表述”。
  • 别名需保留在前置元数据中以便搜索。
  • 运行
    gbrain query "type:concept"
    并抽查数量减少情况。

Tier quality

分层质量

  • T1 should feel like "yes, that IS one of my recurring frameworks" — recognizable, recurring, sharp.
  • T2 should feel like "I'm working on this; it's getting clearer."
  • No concept should be T1 with < 4 months span or < 6 mentions.
  • No concept should be T4 with > 3 months span.
  • T1概念应给人“没错,这是我反复使用的框架之一”的感觉——可识别、反复出现、表述清晰。
  • T2概念应给人“我正在研究这个,它正变得越来越清晰”的感觉。
  • 时间跨度少于4个月或提及次数少于6次的概念不应归为T1。
  • 时间跨度超过3个月的概念不应归为T4。

Synthesis quality

整合质量

  • Captures evolution, not just repetition.
  • Uses verbatim quotes, not paraphrase.
  • Links to related concepts (markdown links, not wiki-links).
  • Does NOT hallucinate sources or dates.
  • 捕捉演变过程,而非单纯的重复。
  • 使用逐字引用,而非转述。
  • 链接到相关概念(使用Markdown链接,而非维基链接)。
  • 不得虚构来源或日期。

Cron integration

Cron集成

This is heavy work. Run on a cadence, not on every signal:
  • After a major ingestion batch completes (signal-detector burst, archive crawler run, etc.).
  • Weekly cron for incremental synthesis of newly-promoted T1/T2 concepts.
  • Manual trigger for a full re-synthesis when the corpus shifts significantly.
这是一项繁重的工作。应按周期运行,而非每次有信号就运行:
  • 在大型数据摄入批次完成后(如信号检测批量处理、归档爬虫运行等)。
  • 每周定时任务,对新升级为T1/T2的概念进行增量整合。
  • 当语料库发生显著变化时,手动触发全量重新整合。

Anti-Patterns

反模式

  • ❌ Running synthesis on T3/T4 — wastes API budget on ideas that may never sharpen.
  • ❌ Hallucinating quotes or dates. The timeline must be verifiable against existing brain pages.
  • ❌ Generic cluster names ("Various Topics"). If you can't name the cluster, the cluster isn't real.
  • ❌ Re-synthesizing already-synthesized T1s without new source material. Idempotency-respect.
  • ❌ 对T3/T4概念进行整合——在可能永远不会清晰的想法上浪费API资源。
  • ❌ 虚构引用或日期。时间线必须可通过现有知识库页面验证。
  • ❌ 使用通用聚类名称(如“各类主题”)。如果无法为聚类命名,说明该聚类不存在。
  • ❌ 在没有新源素材的情况下重新整合已完成整合的T1概念。需保证幂等性。

Related skills

相关技能

  • skills/signal-detector/SKILL.md
    — creates raw concept stubs from text channels
  • skills/voice-note-ingest/SKILL.md
    — same for audio channels
  • skills/idea-ingest/SKILL.md
    — same for links / articles
  • skills/signal-detector/SKILL.md
    — 从文本渠道创建原始概念草稿
  • skills/voice-note-ingest/SKILL.md
    — 针对音频渠道执行相同操作
  • skills/idea-ingest/SKILL.md
    — 针对链接/文章执行相同操作

Contract

契约

This skill guarantees:
  • Routing matches the canonical triggers in the frontmatter.
  • Output written under the directories listed in
    writes_to:
    (when applicable).
  • Conventions referenced (
    quality.md
    ,
    brain-first.md
    ,
    _brain-filing-rules.md
    ) are followed.
  • Privacy contract preserved: no real names, no fork-specific filesystem path literals, no upstream-fork references.
The full behavior contract is documented in the body sections above; this section exists for the conformance test.
本技能保证:
  • 路由匹配前置元数据中的标准触发条件。
  • 输出写入
    writes_to:
    中列出的目录(若适用)。
  • 遵循引用的规范(
    quality.md
    brain-first.md
    _brain-filing-rules.md
    )。
  • 遵守隐私契约:不包含真实姓名、不包含分支特定的文件系统路径字面量、不包含上游分支引用。
完整的行为契约已在上述主体部分中记录;本部分用于一致性测试。

Output Format

输出格式

The skill's output shape is documented inline in the body sections above (see "Output", "Brain page format", or equivalent). The literal section header here exists for the conformance test (
test/skills-conformance.test.ts
).
技能的输出结构已在上述主体部分中内联记录(参见“输出”、“知识库页面格式”或等效部分)。此处的字面章节标题用于一致性测试(
test/skills-conformance.test.ts
)。