schema-author

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

schema-author — evolve your schema pack

schema-author — 升级你的schema包

Non-goals (use these other skills instead)

非目标(请改用以下其他技能)

This skill AUTHORS the schema pack (adds page types, link verbs, prefixes, flags). For these adjacent jobs, route elsewhere:
  • Filing one specific page
    skills/brain-taxonomist/SKILL.md
    . Brain- taxonomist routes at WRITE TIME ("where does this note go?"). schema-author changes the rules at AUTHORING TIME ("what types and prefixes exist?").
  • Schema-check as part of EIIRP iteration
    skills/eiirp/SKILL.md
    already has a schema-check phase. Don't duplicate.
  • Just looking up a type's settings
    gbrain schema explain <type>
    directly. This skill is for CHANGING the pack, not READING from it.
  • Querying who knows about X
    skills/expert-routing/SKILL.md
    (or
    gbrain whoknows
    directly). schema-author makes a type expert-routable; it does not run the query.
本技能用于编写(AUTHORS)schema包(添加页面类型、链接动词、前缀、标记)。对于以下相关任务,请转至其他技能:
  • 归档单个特定页面
    skills/brain-taxonomist/SKILL.md
    。Brain-taxonomist在写入阶段(“这条笔记应该放在哪里?”)进行路由,而schema-author在编写阶段(“存在哪些类型和前缀?”)修改规则。
  • 作为EIIRP迭代一部分的schema检查
    skills/eiirp/SKILL.md
    已包含schema检查阶段,请勿重复操作。
  • 仅查询某类型的设置 → 直接使用
    gbrain schema explain <type>
    。本技能用于修改包,而非读取包内容。
  • 查询谁了解X相关内容
    skills/expert-routing/SKILL.md
    (或直接使用
    gbrain whoknows
    )。schema-author使类型支持专家路由,但不执行查询操作。

Convention

约定

Convention: see conventions/brain-first.md for the lookup chain (search → query → get_page → external).
Convention: see conventions/schema-evolution.md for "when to add a type vs alias vs prefix" — the heuristic.
约定: 查阅conventions/brain-first.md了解查找链(搜索→查询→get_page→外部)。
约定: 查阅conventions/schema-evolution.md了解“何时添加类型、别名或前缀”的启发式规则。

When to invoke

调用时机

Invoke when the user (or a sibling skill) says any of:
  • "Add a
    researcher
    type to my schema"
  • "I have 4000 untyped pages under
    meetings/
    "
  • "My brain doesn't know that
    journal-article
    is a type"
  • "Set
    paper
    to be extractable"
  • "Propose types from what I've ingested"
  • "Sync the new types to backfill existing pages"
DON'T invoke for "where does THIS note go" (use brain-taxonomist) or "who knows about X" (use expert-routing /
gbrain whoknows
).
当用户(或同级技能)说出以下任意内容时调用:
  • “为我的schema添加
    researcher
    类型”
  • “我的
    meetings/
    目录下有4000个未分类页面”
  • “我的brain不知道
    journal-article
    是一种类型”
  • “将
    paper
    设置为可提取”
  • “根据我已导入的内容提议新类型”
  • “同步新类型以回填现有页面”
请勿在“这条笔记应该放在哪里”(使用brain-taxonomist)或“谁了解X相关内容”(使用expert-routing /
gbrain whoknows
)时调用本技能。

Tutorial + vision

教程与愿景

  • Why this matters:
    docs/what-schemas-unlock.md
    — 7 killer use cases (4000 invisible meetings made queryable, founder ops brain, research brain, legal brain, team brain, agent-as-co-curator) plus the structural argument for why types matter at query time. Read this before pitching schema authoring to a user — it's the doc that explains the difference between a pile of notes and a brain with structure.
  • 5-minute walkthrough:
    docs/schema-author-tutorial.md
    — fork the bundled pack, add a researcher type, sync, prove the T1.5 wiring via
    gbrain whoknows
    . Use placeholder pages so it runs against any brain without affecting real content.
  • 重要性:
    docs/what-schemas-unlock.md
    — 7个关键用例(4000个不可见会议变为可查询、创始人运营brain、研究brain、法律brain、团队brain、Agent作为联合策展人),以及类型在查询阶段为何重要的结构化论证。在向用户推荐schema编写之前,请阅读本文档——它解释了一堆笔记与结构化brain之间的区别。
  • 5分钟演练:
    docs/schema-author-tutorial.md
    — 复刻捆绑包、添加researcher类型、同步、通过
    gbrain whoknows
    验证T1.5连接。使用占位页面,以便在不影响真实内容的情况下在任意brain上运行。

Workflow

工作流程

Phase 1 — Brain (know which pack is active)

阶段1 — Brain(了解当前激活的包)

gbrain schema active --json
Output gives you
pack_name
,
version
,
sha8
,
page_types_count
,
source_tier
. If
source_tier === "default"
, the user is on bundled
gbrain-base
and any mutation will need a fork first (Phase 4).
gbrain schema active --json
输出将提供
pack_name
version
sha8
page_types_count
source_tier
。如果
source_tier === "default"
,则用户使用的是捆绑的
gbrain-base
,任何修改都需要先复刻(阶段4)。

Phase 2 — Assess (what does the current pack cover?)

阶段2 — 评估(当前包覆盖了哪些内容?)

gbrain schema stats --json
Returns per-type page counts, untyped count, and
dead_prefixes
(pack- declared prefixes with zero matching pages — probable mis-declarations). If coverage < 90%, there's untyped content worth typing.
gbrain schema review-orphans --limit 50 --json
Untyped pages drilldown. Look for shared path prefixes (e.g. "12 of these are under
research/papers/
") — those are candidates for a new type.
gbrain schema stats --json
返回各类型页面数量、未分类页面数量,以及
dead_prefixes
(包中声明但无匹配页面的前缀——可能是错误声明)。如果覆盖率<90%,则存在值得分类的未分类内容。
gbrain schema review-orphans --limit 50 --json
未分类页面详情。查找共享路径前缀(例如“其中12个位于
research/papers/
下”)——这些是新类型的候选对象。

Phase 3 — Propose (what types should the pack add?)

阶段3 — 提议(包应添加哪些类型?)

gbrain schema detect --json
Clusters pages by
source_path
and proposes candidate types. Heuristic only (no LLM call).
gbrain schema suggest --json
LLM-refined candidates with confidence scores. Use the top-3 hit rate as the signal for which to promote.
gbrain schema detect --json
source_path
聚类页面并提议候选类型。仅使用启发式规则(无LLM调用)。
gbrain schema suggest --json
经LLM优化的候选类型,带有置信度评分。使用前3个候选的命中率作为选择依据。

Phase 4 — Apply (mutate the pack)

阶段4 — 应用(修改包)

If the active pack is bundled (
gbrain-base
or
gbrain-recommended
), fork it first:
gbrain schema fork gbrain-base mine
gbrain schema use mine
Then add the types one at a time:
gbrain schema add-type researcher \
  --primitive entity \
  --prefix people/researchers/ \
  --extractable \
  --expert
For complex multi-mutation refactors (e.g. add a type AND the link verb that points to it), agents reaching this surface over MCP can use the batched
schema_apply_mutations
op:
jsonl
{"op": "add_type", "name": "researcher", "primitive": "entity", "prefix": "people/researchers/", "extractable": true, "expert_routing": true}
{"op": "add_type", "name": "paper", "primitive": "annotation", "prefix": "research/papers/", "extractable": true}
{"op": "add_link_type", "name": "authored", "inference": {"page_type": "researcher", "target_type": "paper"}}
Validate before sync:
gbrain schema lint --with-db
The
--with-db
flag opts into the 2 DB-aware rules (
extractable_empty_corpus
,
mutation_count_anomaly
) that detect mis-declared types you'd otherwise discover only at runtime.
如果激活的包是捆绑包(
gbrain-base
gbrain-recommended
),请先复刻:
gbrain schema fork gbrain-base mine
gbrain schema use mine
然后逐个添加类型:
gbrain schema add-type researcher \
  --primitive entity \
  --prefix people/researchers/ \
  --extractable \
  --expert
对于复杂的多修改重构(例如添加类型及其指向的链接动词),通过MCP访问此接口的Agent可使用批量
schema_apply_mutations
操作:
jsonl
{"op": "add_type", "name": "researcher", "primitive": "entity", "prefix": "people/researchers/", "extractable": true, "expert_routing": true}
{"op": "add_type", "name": "paper", "primitive": "annotation", "prefix": "research/papers/", "extractable": true}
{"op": "add_link_type", "name": "authored", "inference": {"page_type": "researcher", "target_type": "paper"}}
同步前验证:
gbrain schema lint --with-db
--with-db
标志启用两个数据库感知规则(
extractable_empty_corpus
mutation_count_anomaly
),可检测否则仅在运行时才会发现的错误声明类型。

Phase 5 — Sync (backfill existing pages with the new types)

阶段5 — 同步(用新类型回填现有页面)

Dry-run first:
gbrain schema sync --json
Returns per-prefix
would_apply
counts + sample slugs. If the numbers look right:
gbrain schema sync --apply
Chunked UPDATE in 1000-row batches; never wedges concurrent writers. Idempotent on re-run (second
--apply
finds nothing to backfill).
先执行试运行:
gbrain schema sync --json
返回每个前缀的
would_apply
计数+示例slug。如果数值合理:
gbrain schema sync --apply
以1000行批次进行分块更新;不会阻塞并发写入。重复执行具有幂等性(第二次
--apply
将无内容可回填)。

Phase 6 — Verify

阶段6 — 验证

gbrain schema stats --json
Coverage should be ≥95% now. Spot-check the new type:
gbrain whoknows "machine learning"
If
researcher
was declared
--expert
, results should include researcher-typed pages. (The pack-aware wiring at the query path was added in v0.40.6.0 — pre-v0.40.6 brains silently ignored custom expert-routed types.)
gbrain schema stats --json
现在覆盖率应≥95%。抽查新类型:
gbrain whoknows "machine learning"
如果
researcher
被声明为
--expert
,结果应包含researcher类型的页面。(查询路径中的包感知连接在v0.40.6.0中添加——v0.40.6之前的brain会忽略自定义专家路由类型。)

Phase 7 — Commit (preserve the change)

阶段7 — 提交(保存更改)

If the pack is in source control, commit:
cd ~/.gbrain/schema-packs/mine
git add pack.json
git commit -m "schema: add researcher + paper types + authored link"
git push
If the brain daemon is running (
gbrain serve --http
), other processes pick up the change within 1 second (stat-mtime TTL gate in loadActivePack — v0.40.6.0 closed the cross-process invalidation gap).
如果包处于版本控制中,请提交:
cd ~/.gbrain/schema-packs/mine
git add pack.json
git commit -m "schema: add researcher + paper types + authored link"
git push
如果brain守护进程正在运行(
gbrain serve --http
),其他进程将在1秒内获取更改(loadActivePack中的stat-mtime TTL gate——v0.40.6.0修复了跨进程失效的问题)。

Outputs

输出

  • Mutated pack file at
    ~/.gbrain/schema-packs/<name>/pack.{json,yaml}
    .
  • Audit row in
    ~/.gbrain/audit/schema-mutations-YYYY-Www.jsonl
    per mutation.
  • pages.type
    backfilled on matching rows after
    sync --apply
    .
  • Query paths (
    whoknows
    ,
    find_experts
    ) now route through the new expert types.
  • 修改后的包文件位于
    ~/.gbrain/schema-packs/<name>/pack.{json,yaml}
  • 每次修改都会在
    ~/.gbrain/audit/schema-mutations-YYYY-Www.jsonl
    中添加一条审核记录。
  • 执行
    sync --apply
    后,匹配行的
    pages.type
    将被回填。
  • 查询路径(
    whoknows
    find_experts
    )现在会通过新的专家类型进行路由。

Contract

契约

  • Inputs: a natural-language request that names a type / prefix / link verb / flag change, OR the result of
    gbrain schema review-orphans
    showing untyped pages that need a new type.
  • Outputs: mutated pack file at
    ~/.gbrain/schema-packs/<name>/pack.{json,yaml}
    + an audit row in
    ~/.gbrain/audit/schema-mutations-YYYY-Www.jsonl
    + (if
    sync --apply
    ran) backfilled
    pages.type
    on matching rows.
  • Side effects: invalidates the in-process pack cache + the query cache for the source. Other processes pick up the change within 1 second (stat-mtime TTL).
  • Idempotency: every primitive is idempotent.
    add-alias
    /
    add-prefix
    no-op on duplicate;
    sync --apply
    finds nothing to update on second run.
  • Trust: CLI = local trust (no scope check). MCP = OAuth
    admin
    scope (write ops). Audit log captures
    actor: mcp:<clientId8>
    per mutation.
  • Atomicity: every mutation is wrapped in
    withMutation
    's atomic write (
    .tmp + fsync + rename
    ) + per-pack
    O_CREAT|O_EXCL
    lock. Crash mid-write leaves the original file untouched.
  • 输入: 命名类型/前缀/链接动词/标记更改的自然语言请求,或
    gbrain schema review-orphans
    显示的需要新类型的未分类页面结果。
  • 输出: 修改后的包文件位于
    ~/.gbrain/schema-packs/<name>/pack.{json,yaml}
    +
    ~/.gbrain/audit/schema-mutations-YYYY-Www.jsonl
    中的审核记录 + (如果执行了
    sync --apply
    )匹配行的
    pages.type
    被回填。
  • 副作用: 使进程内包缓存和源查询缓存失效。其他进程将在1秒内获取更改(stat-mtime TTL)。
  • 幂等性: 每个操作都是幂等的。
    add-alias
    /
    add-prefix
    在重复操作时无效果;
    sync --apply
    在第二次运行时无内容可更新。
  • 信任: CLI = 本地信任(无范围检查)。MCP = OAuth
    admin
    范围(写入操作)。审核日志会记录每次修改的
    actor: mcp:<clientId8>
  • 原子性: 每次修改都被
    withMutation
    的原子写入(
    .tmp + fsync + rename
    )+ 每个包的
    O_CREAT|O_EXCL
    锁包裹。写入中途崩溃会保留原始文件不变。

Anti-Patterns

反模式

  • Don't mutate
    gbrain-base
    or
    gbrain-recommended
    .
    Fork first (
    gbrain schema fork gbrain-base mine
    ). These are bundled packs; edits would be lost on upgrade. The mutation primitives refuse with
    PACK_READONLY
    .
  • Don't add a type for a directory you imported once for triage. Pack types are permanent decisions; one-time imports are not. See
    skills/conventions/schema-evolution.md
    for the <20-pages-don't-pack-codify heuristic.
  • Don't add
    --expert
    to a type with no
    path_prefixes
    .
    The
    expert_routing_without_prefix
    lint warns about this — expert-routed types with no prefix never match a put_page inference, so
    whoknows
    silently never surfaces them.
  • Don't promote a
    schema suggest
    candidate without verifying the prefix matches real content.
    Run
    lint --with-db
    before
    add-type
    to catch prefix collisions pre-write.
  • Don't conflate "filing one page" with "evolving the schema." Filing routes via
    brain-taxonomist
    ; schema-author is for authoring the type taxonomy itself. The Non-goals section above names the boundary.
  • Don't skip the dry-run before
    sync --apply
    .
    Always run
    sync
    first to see
    would_apply
    counts + sample slugs. A pack prefix that matches 50,000 pages is recoverable but slow; verifying first is cheap.
  • Don't remove a type without checking references.
    remove-type
    refuses with
    STILL_REFERENCED
    if another type's
    aliases
    /
    enrichable_types
    /
    link_types
    /
    frontmatter_links
    references it. Break the references first; don't add
    --force
    .
  • 不要修改
    gbrain-base
    gbrain-recommended
    先复刻(
    gbrain schema fork gbrain-base mine
    )。这些是捆绑包,升级时修改会丢失。修改操作会返回
    PACK_READONLY
    错误。
  • 不要为仅导入一次用于分类的目录添加类型。 包类型是永久决策;一次性导入不属于此类。查阅
    skills/conventions/schema-evolution.md
    了解“少于20页则不纳入包”的启发式规则。
  • 不要为无
    path_prefixes
    的类型添加
    --expert
    expert_routing_without_prefix
    规则会对此发出警告——无前缀的专家路由类型永远不会匹配put_page推断,因此
    whoknows
    不会显示它们。
  • 不要在未验证前缀匹配真实内容的情况下推广
    schema suggest
    候选类型。
    add-type
    前运行
    lint --with-db
    以在写入前捕获前缀冲突。
  • 不要混淆“归档单个页面”与“升级schema”。 归档通过
    brain-taxonomist
    路由;schema-author用于编写类型分类体系本身。上述非目标部分明确了边界。
  • 不要在
    sync --apply
    前跳过试运行。
    始终先运行
    sync
    查看
    would_apply
    计数+示例slug。匹配50000个页面的包前缀可恢复但速度慢;提前验证成本低。
  • 不要在未检查引用的情况下删除类型。 如果其他类型的
    aliases
    /
    enrichable_types
    /
    link_types
    /
    frontmatter_links
    引用了该类型,
    remove-type
    会返回
    STILL_REFERENCED
    错误。先删除这些引用;不要使用
    --force

Output Format

输出格式

When invoked, this skill produces structured output suitable for both human + JSON consumption:
Per-mutation result (JSON):
json
{"schema_version": 1, "pack": "mine", "path": "/Users/.../pack.json", "format": "json", "prev_sha8": "a1b2c3d4", "new_sha8": "e5f6g7h8"}
Per-batch result (from
schema_apply_mutations
MCP op):
json
{"schema_version": 1, "pack": "mine", "batch_id": "batch-1716491400-abc123", "mutations_applied": 3, "results": [{...}, {...}, {...}]}
Stats JSON (per-source + aggregate + dead-prefix hints):
json
{"schema_version": 1, "pack_identity": "mine@1.0.0+abc12345", "aggregate": {"total_pages": 4823, "typed_pages": 4710, "untyped_pages": 113, "coverage": 0.9766, "by_type": [{"type": "person", "count": 2104}, ...]}, "per_source": [...], "dead_prefixes": [{"type": "researcher", "prefix": "people/researchers/"}]}
Sync dry-run JSON:
json
{"schema_version": 1, "apply": false, "pack_identity": "mine@1.0.0+abc12345", "per_prefix": [{"type": "meeting", "prefix": "meetings/", "would_apply": 4000, "sample_slugs": ["meetings/2026-01-01-foo", ...], "dead_prefix": false, "applied": 0}], "total_would_apply": 4000, "total_applied": 0}
Human output (the agent's final summary):
  • One line per mutation:
    Pack: <name> (<format>)
    and
    Sha8: <prev> → <new>
  • Stats: total pages, typed %, untyped count, per-type breakdown, dead-prefix list
  • Sync: per-prefix
    would_apply
    /
    applied
    count + sample slugs in dry-run mode
On failure, the error envelope follows the standard
StructuredAgentError
shape from
src/core/errors.ts
:
{error, code, message, details?}
. Codes from the mutation primitives:
PACK_NOT_FOUND
,
PACK_READONLY
,
PACK_CORRUPT
,
TYPE_EXISTS
,
TYPE_NOT_FOUND
,
INVALID_PRIMITIVE
,
INVALID_RESULT
,
IO_ERROR
,
STILL_REFERENCED
,
LOCK_BUSY
.
调用本技能时,会生成适合人类和JSON消费的结构化输出:
每次修改的结果(JSON):
json
{"schema_version": 1, "pack": "mine", "path": "/Users/.../pack.json", "format": "json", "prev_sha8": "a1b2c3d4", "new_sha8": "e5f6g7h8"}
批量操作结果(来自
schema_apply_mutations
MCP操作):
json
{"schema_version": 1, "pack": "mine", "batch_id": "batch-1716491400-abc123", "mutations_applied": 3, "results": [{...}, {...}, {...}]}
统计信息JSON(按源+汇总+无效前缀提示):
json
{"schema_version": 1, "pack_identity": "mine@1.0.0+abc12345", "aggregate": {"total_pages": 4823, "typed_pages": 4710, "untyped_pages": 113, "coverage": 0.9766, "by_type": [{"type": "person", "count": 2104}, ...]}, "per_source": [...], "dead_prefixes": [{"type": "researcher", "prefix": "people/researchers/"}]}
同步试运行JSON:
json
{"schema_version": 1, "apply": false, "pack_identity": "mine@1.0.0+abc12345", "per_prefix": [{"type": "meeting", "prefix": "meetings/", "would_apply": 4000, "sample_slugs": ["meetings/2026-01-01-foo", ...], "dead_prefix": false, "applied": 0}], "total_would_apply": 4000, "total_applied": 0}
人类可读输出(Agent的最终摘要):
  • 每次修改一行:
    Pack: <name> (<format>)
    Sha8: <prev> → <new>
  • 统计信息:总页面数、已分类百分比、未分类页面数、各类型细分、无效前缀列表
  • 同步:试运行模式下每个前缀的
    would_apply
    /
    applied
    计数+示例slug
失败时,错误信封遵循
src/core/errors.ts
中的标准
StructuredAgentError
格式:
{error, code, message, details?}
。修改操作返回的错误码包括:
PACK_NOT_FOUND
PACK_READONLY
PACK_CORRUPT
TYPE_EXISTS
TYPE_NOT_FOUND
INVALID_PRIMITIVE
INVALID_RESULT
IO_ERROR
STILL_REFERENCED
LOCK_BUSY

Failure modes

失败模式

  • PACK_READONLY
    → you tried to mutate
    gbrain-base
    or
    gbrain-recommended
    . Fork first.
  • INVALID_RESULT
    → the mutation would create a dangling reference or prefix collision. The pre-write lint gate caught it. Read the error message; the lint rule name names the problem.
  • STILL_REFERENCED
    → you tried to remove a type that another type's
    aliases
    /
    enrichable_types
    /
    link_types
    /
    frontmatter_links
    references. The error names every reference. Remove those first.
  • LOCK_BUSY
    → another process is mid-mutation. Wait 30s and retry, or pass
    --force
    if you know the holder is wedged.
  • permission_denied
    (MCP only) → your OAuth client doesn't have
    admin
    scope. Re-register with
    gbrain auth register-client --scopes admin
    .
  • PACK_READONLY
    → 你尝试修改
    gbrain-base
    gbrain-recommended
    。请先复刻。
  • INVALID_RESULT
    → 修改会创建悬空引用或前缀冲突。写入前的检查规则已捕获此问题。阅读错误消息;规则名称指明了问题。
  • STILL_REFERENCED
    → 你尝试删除的类型被其他类型的
    aliases
    /
    enrichable_types
    /
    link_types
    /
    frontmatter_links
    引用。错误会列出所有引用。请先删除这些引用。
  • LOCK_BUSY
    → 另一个进程正在执行修改。等待30秒后重试,或在确认持有锁的进程已卡住时使用
    --force
  • permission_denied
    (仅MCP)→ 你的OAuth客户端没有
    admin
    范围。使用
    gbrain auth register-client --scopes admin
    重新注册。