Functional-Area Resolver — Pattern for Compressing Routing Tables

Problem

Routing files (RESOLVER.md, AGENTS.md) grow as skills are added. Each skill gets its own row (trigger -> skill path). At ~200+ skills this hits 25-30KB, eating context budget that should go to actual work.

Solution: Functional-Area Dispatchers

Replace N rows per area with one entry per functional area. Each entry lists all sub-skills it can dispatch to in a

(dispatcher for: ...)

clause.

Before (270 rows, 25KB)

- Creating/enriching a person or company page -> `enrich`
- Fix broken citations in brain pages -> `citation-fixer`
- Publish/share a brain page as link -> `brain-publish`
- Generate PDF from brain page -> `brain-pdf`
- Read a book through lens of a problem -> `strategic-reading`
- Personalized book analysis -> `book-mirror`
- Brain integrity -> `brain-librarian`
...

After (13 rows, 13KB)

- **Brain & knowledge**: create/enrich/search/export brain pages, filing,
  citations, publishing, book analysis, strategic reading, concept synthesis,
  archive mining -> `brain-ops` (dispatcher for: enrich, query, brain-pdf,
  brain-publish, brain-export, brain-librarian, citation-fixer, book-mirror,
  strategic-reading, concept-synthesis, archive-crawler, ...)

Why It Works

The LLM doesn't need one row per sub-skill. It needs:

Area recognition — "this is about brain pages" -> Brain & Knowledge
Sub-skill visibility — the
```
(dispatcher for: ...)
```
list shows what's available
The skill file itself — once the LLM reads
```
brain-ops/SKILL.md
```
, it has full routing detail

This is a two-layer dispatch: routing file routes to the area, the area skill routes to the specific sub-skill. Each layer does one job well.

A/B Eval Results

Three resolver architectures tested across three Anthropic frontier models (Opus 4.7, Sonnet 4.6, Haiku 4.5) on real production AGENTS.md content, 20 hand-authored training fixtures + 5 held-out blind fixtures, n=3 seeded repeats per (fixture, variant). Two scoring rules: STRICT (predicted slug exactly equals expected) and LENIENT (predicted is in the same dispatcher area as expected). Both matter:

STRICT measures: "does the LLM return the exact slug?"
LENIENT measures: "does the LLM land in the right area, even if it picks a more-specific sub-skill from
```
(dispatcher for: ...)
```
?" This is closer to production behavior — an agent that lands in
```
gmail
```
for an email intent succeeds even if the resolver entry said
```
executive-assistant
```
.

Training corpus (n=20, 3 seeds × 3 variants × 3 models, LENIENT)

Variant	Opus 4.7	Sonnet 4.6	Haiku 4.5	Size
baseline (270 bullet rows)	81.7% ± 7.2%	86.7% ± 7.2%	73.3% ± 7.2%	25KB
functional-areas (this pattern)	98.3% ± 7.2%	100% ± 0%	88.3% ± 7.2%	13KB
resolver-of-resolvers (no dispatcher clause)	63.3% ± 14.3%	41.7% ± 7.2%	65.0% ± 12.4%	10KB

Held-out blind corpus (n=5, 3 seeds, LENIENT)

Variant	Opus 4.7	Sonnet 4.6	Haiku 4.5
baseline	100% ± 0%	100% ± 0%	100% ± 0%
functional-areas	100% ± 0%	100% ± 0%	100% ± 0%
resolver-of-resolvers	100% ± 0%	73.3% ± 28.7%	100% ± 0%

What the data shows

Functional-areas BEATS baseline on training across all three models (+13 to +17pp) at 48% the size. Held-out is saturated at 100% for both — within margin of error.
The
(dispatcher for: ...)
clause is the load-bearing signal. resolver-of-resolvers strips that clause and collapses to 41.7% on Sonnet — the catastrophic failure case the original PR predicted, now observed.
The pattern works because the LLM can drill into the dispatcher list. Most "STRICT failures" are the LLM picking a more-specific sub-skill (
```
gmail
```
instead of
```
executive-assistant
```
). That's the pattern working as designed. STRICT scoring under-counts; LENIENT scoring reflects production agent behavior.
The pattern's value scales with model tier. Compression gain (functional-areas vs baseline, training, LENIENT) is +17pp on Opus, +13pp on Sonnet, +15pp on Haiku. Sonnet shows the cleanest separation between functional-areas and resolver-of-resolvers (100% vs 41.7%) — model capacity affects how much the dispatcher signal matters.

Reproduce

bash

cd evals/functional-area-resolver
node harness.mjs --model opus    # ~225 LLM calls, ~$1.70 at Opus pricing
node harness.mjs --model sonnet  # ~$1.00
node harness.mjs --model haiku   # ~$0.30
node rescore.mjs baseline-runs/2026-05-11-opus-4-7.jsonl  # zero-cost re-score

Receipts (model, prompt_template_hash, fixtures_hash, harness_sha, ts):

evals/functional-area-resolver/baseline-runs/2026-05-11-{opus-4-7,sonnet-4-6,haiku-4-5}.jsonl

Methodology caveats

Production prompt matters. With a naive "return the skill slug" prompt (no instruction about
```
(dispatcher for: ...)
```
), every compression variant collapses to ~30-60% on Opus. The dispatcher-aware prompt is in
```
evals/functional-area-resolver/harness-runner.ts:PROMPT_TEMPLATE
```
. Use it as the template for your agent's harness; without it, compression breaks.
Training corpus and variants were authored by the same release. Held-out corpus was written before the variants and never adjusted; this mitigates but does not eliminate overfitting.
Confidence intervals via t-distribution across n=3 seeded repeats. Hold the n=3 lower-bound: high CIs mean the underlying sample is noisy.
Single-vendor result. All three models are Anthropic. Cross-vendor verification (Gemini, GPT) is a v0.33.x follow-up.
Held-out blind set is small (n=5). Saturated at 100% across most cells — the harness can't distinguish between "100%" and "95% with one nondeterministic miss." Expanding to ≥20 is a v0.33.x follow-up.

Prior work and citations

The pattern is a static-prompt analog of hierarchical agent routing, a 2024-2025 research direction:

AnyTool (arXiv:2402.04253) showed meta-agent → category-agent → tool-agent hierarchy on 16K APIs beats flat retrieval by +35.4pp. The
```
(dispatcher for: ...)
```
clause is the meta-agent's view collapsed into a single LLM pass.
RAG-MCP (arXiv:2505.03275) reports 49.2% prompt-token reduction at 3.2× accuracy gain via embedding-based pre-retrieval. The token-reduction story matches ours (48% smaller), via a different mechanism (RAG vs static dispatcher).
Anthropic Agent Skills (engineering blog) promotes progressive disclosure: frontmatter (~80 tokens) always loaded, SKILL.md body loaded on match. This skill applies the same principle at the routing-table level, not the per-skill body level.

The 2025-2026 literature has no published benchmark for static-prompt hierarchical routing (every published hierarchical scheme resolves the hierarchy at runtime via a second LLM call). Our finding — that the hierarchy can be inlined into a single-LLM-pass dispatcher list and retain routing accuracy — is the open contribution. See

evals/functional-area-resolver/README.md

for methodology details.

How To Compress

Step 1: Preconditions

Refuse to compress if either gate fails:

Source routing file is under 12KB (compression overhead exceeds benefit).
```
git status
```
shows uncommitted changes to the routing file (the compressor's edit would entangle with whatever the user was doing).

If a user wants to override either gate, they ask explicitly with

--force

Step 2: When to compress which file

GBrain workspaces often have TWO routing files merged at runtime (per

src/core/check-resolvable.ts

v0.31.7):

skills/RESOLVER.md

and a sibling

../AGENTS.md

. Choose which to compress:

Only one is fat (>12KB): compress that one; leave the small one alone.
Both are fat: compress them separately, in order: AGENTS.md first (usually the larger one in OpenClaw-style deployments), then RESOLVER.md.
Only the small one is fat (rare): same rule — compress it.

If the deployment uses only one routing file, this section is a no-op — compress that one.

Step 3: Identify functional areas

Group skills by domain. Typical areas (adjust per deployment):

Brain & Knowledge — brain-ops as dispatcher
Content Ingestion — ingest as dispatcher
Calendar & Scheduling — google-calendar as dispatcher
Email & Comms — executive-assistant as dispatcher
Research & Investigation — perplexity-research as dispatcher
X/Twitter & Social — x-ingest as dispatcher
Places & Travel — checkin as dispatcher
Product & Building — acp-coding as dispatcher
Infrastructure — healthcheck as dispatcher
Tasks & Logistics — daily-task-manager as dispatcher
People & Contacts — google-contacts as dispatcher

Step 4: Build the area entry format

Each area entry follows this template:

- **{Area Name}**: {comma-separated trigger phrases} -> `{dispatcher-skill}`
  (dispatcher for: {comma-separated sub-skill names})

Rules:

Trigger phrases should be broad enough to catch intent ("brain pages, enrich, search, filing, citations, book analysis")
Sub-skill list should be comprehensive — this is how the LLM knows what's available
The dispatcher skill file should have its own internal routing table

Step 5: Keep always-on entries separate

Gates and always-on entries (acknowledge, multi-user, entity-detector, etc.) stay as individual rows — they're checked on every message, not dispatched.

Step 6 (MANDATORY): Verify routing accuracy

Run two gates before committing the compressed file. Do NOT commit if either fails.

Gate 1: Structural verification. Confirms your

routing-eval.jsonl

fixtures still resolve to the right skills under the compressed routing file. Run from the workspace whose routing file you just edited:

bash

gbrain routing-eval --json

If accuracy on your fixtures drops below 95%, revert and tune the area entries before re-running.

Gate 2: LLM A/B verification on YOUR edited file. Confirms a frontier LLM can still drill into the dispatcher list and reach sub-skills under your specific compression. Requires a gbrain repo checkout because the harness lives there. Copy your edited routing file into the harness's variants directory, then invoke the harness with

--variants

pointing at it:

bash

# In your agent workspace, identify the routing file you just compressed.
EDITED=/path/to/your/AGENTS.md       # or skills/RESOLVER.md, whichever you edited

# In your gbrain repo checkout:
cd /path/to/gbrain/evals/functional-area-resolver
TMP=$(mktemp -d)/variants && mkdir -p "$TMP"
cp "$EDITED" "$TMP/my-edit.md"

# Run the harness against your file (sequential, ~75 calls × $0.0076 ≈ $0.57 on Opus).
ANTHROPIC_API_KEY=... node harness.mjs --variants-dir "$TMP" --variants my-edit \
                                       --model opus --parallel 3 --yes

The harness uses gbrain's bundled fixture set, so this verifies "did the LLM land in the right sub-skill for routing intents the gbrain-bundled fixtures cover" — a regression check on shared skills, not a full re-eval of YOUR fixture set. For full eval coverage, mirror this skill's

fixtures.jsonl

fixtures-held-out.jsonl

setup with intents specific to your skills.

If the lenient (same-area) score on your variant drops below 95%, revert the compression and tune. Common causes:

A sub-skill was omitted from the
```
(dispatcher for: ...)
```
list.
Trigger phrases for an area are too narrow (LLM can't recognize intent).
Areas were collapsed too aggressively (too few areas — see Anti-Patterns).
ASCII
```
->
```
vs Unicode
```
→
```
mismatch — the harness now accepts both, but earlier versions only matched Unicode. Pin gbrain to v0.32.3.0+.

Common false negatives on the harness eval (NOT bugs in your compression):

The gbrain-bundled fixtures target skill names like
```
enrich
```
,
```
query
```
,
```
gmail
```
,
```
executive-assistant
```
. If your routing file doesn't expose those skills at all, expect strict-scoring failures on those fixtures. Lenient scoring stays accurate for any sub-skill present in your
```
(dispatcher for: ...)
```
lists.

Step 7: Review the diff before committing

Show the user the proposed edit (or the actual git diff) and wait for explicit approval before staging. Same convention as

skills/book-mirror/SKILL.md

Contract

This skill guarantees:

Routing matches the canonical triggers in the frontmatter.
Compression is only performed when the preconditions in Step 1 pass (file ≥12KB AND clean working tree, or
```
--force
```
).
The mandatory verification gate in Step 6 fires on the user's edited file, not on sample variants. The user runs
```
gbrain routing-eval --json
```
AND the gbrain-repo harness (
```
node harness.mjs --variants-dir <tmp> --variants my-edit
```
) before committing the compressed file.
Privacy contract preserved: no fork-specific filesystem path literals (server-side brain home, OpenClaw fork home) leak into the compressed output.

The full behavior contract is documented in the body sections above; this section exists for the conformance test.

Output Format

The compressed routing file follows the area-entry template documented in Step 4 ("Build the area entry format"). Each entry:

- **{Area Name}**: {trigger phrases} -> \

{dispatcher-skill}` (dispatcher for: {sub-skill list})

. The dispatcher arrow may be either ASCII

(default in this template) or Unicode

→` (used in some production deployments); the gbrain harness accepts both.

Anti-Patterns

Resolver-of-resolvers with pipe tables. Tested and failed (see eval table). The LLM picks area names from the table instead of drilling into sub-skills.
Removing sub-skill names. Without the
```
(dispatcher for: ...)
```
list, the LLM can't route to specific sub-skills. The list is the routing signal.
Too few areas. Collapsing to <5 areas makes each area too broad. 12-15 areas is the sweet spot.
Too many areas. Defeats the purpose. If you have 50 areas, just keep individual rows.

Maintenance

When adding a new skill:

Identify its functional area.
Add the skill name to that area's
```
(dispatcher for: ...)
```
list.
Update the area's skill file with routing detail.
Run the routing eval (Step 6) to verify.

When adding a new functional area:

Create the dispatcher skill with internal routing.
Add the area entry to the routing file.
Run the routing eval (Step 6) to verify.

Changelog

v1.0.0 — 2026-05-11

Initial version. Pattern shipped in gbrain v0.32.3.0 with a held-out A/B eval (see
```
evals/functional-area-resolver/
```
).
Skill renamed from
```
compress-agents-md
```
to
```
functional-area-resolver
```
pre-release; the contribution is the pattern, not the filename.

functional-area-resolver

NPX Install

Tags

SKILL.md Content

Functional-Area Resolver — Pattern for Compressing Routing Tables

Problem

Solution: Functional-Area Dispatchers

Before (270 rows, 25KB)

After (13 rows, 13KB)

Why It Works

A/B Eval Results

Training corpus (n=20, 3 seeds × 3 variants × 3 models, LENIENT)

Held-out blind corpus (n=5, 3 seeds, LENIENT)

What the data shows

Reproduce

Methodology caveats

Prior work and citations

How To Compress

Step 1: Preconditions

Step 2: When to compress which file

Step 3: Identify functional areas

Step 4: Build the area entry format

Step 5: Keep always-on entries separate

Step 6 (MANDATORY): Verify routing accuracy

Step 7: Review the diff before committing

Contract

Output Format

Anti-Patterns

Maintenance

Changelog

v1.0.0 — 2026-05-11