schema-author — evolve your schema pack
Non-goals (use these other skills instead)
This skill AUTHORS the schema pack (adds page types, link verbs, prefixes,
flags). For these adjacent jobs, route elsewhere:
- Filing one specific page →
skills/brain-taxonomist/SKILL.md
. Brain-
taxonomist routes at WRITE TIME ("where does this note go?"). schema-author
changes the rules at AUTHORING TIME ("what types and prefixes exist?").
- Schema-check as part of EIIRP iteration →
already has a schema-check phase. Don't duplicate.
- Just looking up a type's settings →
gbrain schema explain <type>
directly. This skill is for CHANGING the pack, not READING from it.
- Querying who knows about X →
skills/expert-routing/SKILL.md
(or
directly). schema-author makes a type expert-routable;
it does not run the query.
Convention
Convention: see conventions/brain-first.md for the lookup chain (search → query → get_page → external).
Convention: see conventions/schema-evolution.md for "when to add a type vs alias vs prefix" — the heuristic.
When to invoke
Invoke when the user (or a sibling skill) says any of:
- "Add a type to my schema"
- "I have 4000 untyped pages under "
- "My brain doesn't know that is a type"
- "Set to be extractable"
- "Propose types from what I've ingested"
- "Sync the new types to backfill existing pages"
DON'T invoke for "where does THIS note go" (use brain-taxonomist) or
"who knows about X" (use expert-routing /
).
Tutorial + vision
- Why this matters:
docs/what-schemas-unlock.md
— 7 killer use cases (4000 invisible meetings made queryable, founder ops brain, research brain, legal brain, team brain, agent-as-co-curator) plus the structural argument for why types matter at query time. Read this before pitching schema authoring to a user — it's the doc that explains the difference between a pile of notes and a brain with structure.
- 5-minute walkthrough:
docs/schema-author-tutorial.md
— fork the bundled pack, add a researcher type, sync, prove the T1.5 wiring via . Use placeholder pages so it runs against any brain without affecting real content.
Workflow
Phase 1 — Brain (know which pack is active)
gbrain schema active --json
Output gives you
,
,
,
,
.
If
source_tier === "default"
, the user is on bundled
and any
mutation will need a fork first (Phase 4).
Phase 2 — Assess (what does the current pack cover?)
gbrain schema stats --json
Returns per-type page counts, untyped count, and
(pack-
declared prefixes with zero matching pages — probable mis-declarations).
If coverage < 90%, there's untyped content worth typing.
gbrain schema review-orphans --limit 50 --json
Untyped pages drilldown. Look for shared path prefixes (e.g. "12 of these
are under
") — those are candidates for a new type.
Phase 3 — Propose (what types should the pack add?)
gbrain schema detect --json
Clusters pages by
and proposes candidate types. Heuristic
only (no LLM call).
gbrain schema suggest --json
LLM-refined candidates with confidence scores. Use the top-3 hit rate as
the signal for which to promote.
Phase 4 — Apply (mutate the pack)
If the active pack is bundled (
or
),
fork it first:
gbrain schema fork gbrain-base mine
gbrain schema use mine
Then add the types one at a time:
gbrain schema add-type researcher \
--primitive entity \
--prefix people/researchers/ \
--extractable \
--expert
For complex multi-mutation refactors (e.g. add a type AND the link verb
that points to it), agents reaching this surface over MCP can use the
batched
op:
jsonl
{"op": "add_type", "name": "researcher", "primitive": "entity", "prefix": "people/researchers/", "extractable": true, "expert_routing": true}
{"op": "add_type", "name": "paper", "primitive": "annotation", "prefix": "research/papers/", "extractable": true}
{"op": "add_link_type", "name": "authored", "inference": {"page_type": "researcher", "target_type": "paper"}}
Validate before sync:
gbrain schema lint --with-db
The
flag opts into the 2 DB-aware rules
(
,
) that detect
mis-declared types you'd otherwise discover only at runtime.
Phase 5 — Sync (backfill existing pages with the new types)
Dry-run first:
gbrain schema sync --json
Returns per-prefix
counts + sample slugs. If the numbers
look right:
gbrain schema sync --apply
Chunked UPDATE in 1000-row batches; never wedges concurrent writers.
Idempotent on re-run (second
finds nothing to backfill).
Phase 6 — Verify
gbrain schema stats --json
Coverage should be ≥95% now. Spot-check the new type:
gbrain whoknows "machine learning"
If
was declared
, results should include
researcher-typed pages. (The pack-aware wiring at the query path was
added in v0.40.6.0 — pre-v0.40.6 brains silently ignored custom
expert-routed types.)
Phase 7 — Commit (preserve the change)
If the pack is in source control, commit:
cd ~/.gbrain/schema-packs/mine
git add pack.json
git commit -m "schema: add researcher + paper types + authored link"
git push
If the brain daemon is running (
), other processes
pick up the change within 1 second (stat-mtime TTL gate in
loadActivePack — v0.40.6.0 closed the cross-process invalidation gap).
Outputs
- Mutated pack file at
~/.gbrain/schema-packs/<name>/pack.{json,yaml}
.
- Audit row in
~/.gbrain/audit/schema-mutations-YYYY-Www.jsonl
per mutation.
- backfilled on matching rows after .
- Query paths (, ) now route through the new
expert types.
Contract
- Inputs: a natural-language request that names a type / prefix / link verb / flag change, OR the result of
gbrain schema review-orphans
showing untyped pages that need a new type.
- Outputs: mutated pack file at
~/.gbrain/schema-packs/<name>/pack.{json,yaml}
+ an audit row in ~/.gbrain/audit/schema-mutations-YYYY-Www.jsonl
+ (if ran) backfilled on matching rows.
- Side effects: invalidates the in-process pack cache + the query cache for the source. Other processes pick up the change within 1 second (stat-mtime TTL).
- Idempotency: every primitive is idempotent. / no-op on duplicate; finds nothing to update on second run.
- Trust: CLI = local trust (no scope check). MCP = OAuth scope (write ops). Audit log captures per mutation.
- Atomicity: every mutation is wrapped in 's atomic write () + per-pack lock. Crash mid-write leaves the original file untouched.
Anti-Patterns
- Don't mutate or . Fork first (
gbrain schema fork gbrain-base mine
). These are bundled packs; edits would be lost on upgrade. The mutation primitives refuse with .
- Don't add a type for a directory you imported once for triage. Pack types are permanent decisions; one-time imports are not. See
skills/conventions/schema-evolution.md
for the <20-pages-don't-pack-codify heuristic.
- Don't add to a type with no . The
expert_routing_without_prefix
lint warns about this — expert-routed types with no prefix never match a put_page inference, so silently never surfaces them.
- Don't promote a candidate without verifying the prefix matches real content. Run before to catch prefix collisions pre-write.
- Don't conflate "filing one page" with "evolving the schema." Filing routes via ; schema-author is for authoring the type taxonomy itself. The Non-goals section above names the boundary.
- Don't skip the dry-run before . Always run first to see counts + sample slugs. A pack prefix that matches 50,000 pages is recoverable but slow; verifying first is cheap.
- Don't remove a type without checking references. refuses with if another type's / / / references it. Break the references first; don't add .
Output Format
When invoked, this skill produces structured output suitable for both human + JSON consumption:
Per-mutation result (JSON):
json
{"schema_version": 1, "pack": "mine", "path": "/Users/.../pack.json", "format": "json", "prev_sha8": "a1b2c3d4", "new_sha8": "e5f6g7h8"}
Per-batch result (from MCP op):
json
{"schema_version": 1, "pack": "mine", "batch_id": "batch-1716491400-abc123", "mutations_applied": 3, "results": [{...}, {...}, {...}]}
Stats JSON (per-source + aggregate + dead-prefix hints):
json
{"schema_version": 1, "pack_identity": "mine@1.0.0+abc12345", "aggregate": {"total_pages": 4823, "typed_pages": 4710, "untyped_pages": 113, "coverage": 0.9766, "by_type": [{"type": "person", "count": 2104}, ...]}, "per_source": [...], "dead_prefixes": [{"type": "researcher", "prefix": "people/researchers/"}]}
Sync dry-run JSON:
json
{"schema_version": 1, "apply": false, "pack_identity": "mine@1.0.0+abc12345", "per_prefix": [{"type": "meeting", "prefix": "meetings/", "would_apply": 4000, "sample_slugs": ["meetings/2026-01-01-foo", ...], "dead_prefix": false, "applied": 0}], "total_would_apply": 4000, "total_applied": 0}
Human output (the agent's final summary):
- One line per mutation: and
- Stats: total pages, typed %, untyped count, per-type breakdown, dead-prefix list
- Sync: per-prefix / count + sample slugs in dry-run mode
On failure, the error envelope follows the standard
shape from
:
{error, code, message, details?}
. Codes from the mutation primitives:
,
,
,
,
,
,
,
,
,
.
Failure modes
- → you tried to mutate or . Fork first.
- → the mutation would create a dangling reference or
prefix collision. The pre-write lint gate caught it. Read the error
message; the lint rule name names the problem.
- → you tried to remove a type that another type's
/ / /
references. The error names every reference. Remove those first.
- → another process is mid-mutation. Wait 30s and retry, or
pass if you know the holder is wedged.
- (MCP only) → your OAuth client doesn't have
scope. Re-register with
gbrain auth register-client --scopes admin
.