wren-enrich-context

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Wren Enrich Context — Fill the Business-Context Gap

Wren 上下文增强——填补业务上下文空白

This skill exists because most business context never lives in a DB schema — it lives in handbooks, glossaries, finance reports, support playbooks, code comments, Slack rules-of-thumb. The agent reads those raw artifacts, finds what's missing from the Wren project, and either grills the user one question at a time (grill mode) or applies its best inferences directly and hands over an audit (auto-pilot mode) before writing back. The output lands in three (or four) sinks each project already has — no new artifact, no new tooling.
本技能的存在是因为大多数业务上下文从未存储在数据库架构中——它们存在于手册、术语表、财务报告、支持操作指南、代码注释、Slack经验规则中。Agent会读取这些原始工件,找出Wren项目中缺失的内容,要么逐个问题询问用户(grill模式),要么直接应用最佳推断并提供审计报告(auto-pilot模式),然后写入结果。输出会存入每个项目已有的三个(或四个)目标位置——无需新工件或新工具。

Version check

版本检查

Silently fetch
https://raw.githubusercontent.com/Canner/WrenAI/main/skills/versions.json
. Compare the
wren-enrich-context
key with this skill's version (from the frontmatter above). If the remote version is newer, notify the user before proceeding:
A newer version of the wren-enrich-context skill is available. Update with:
bash
npx skills add Canner/WrenAI --skill wren-enrich-context
The CLI auto-detects your installed agent. To target a specific one, add
--agent <name>
(e.g.,
claude-code
,
cursor
,
windsurf
,
cline
).
Continue regardless of update status.

静默获取
https://raw.githubusercontent.com/Canner/WrenAI/main/skills/versions.json
。将其中的
wren-enrich-context
键值与本技能的版本(来自上方的前置信息)进行对比。如果远程版本更新,请在继续前通知用户:
A newer version of the wren-enrich-context skill is available. Update with:
bash
npx skills add Canner/WrenAI --skill wren-enrich-context
The CLI auto-detects your installed agent. To target a specific one, add
--agent <name>
(e.g.,
claude-code
,
cursor
,
windsurf
,
cline
).
无论更新状态如何,继续执行后续步骤。

Hard rules — READ FIRST

硬性规则——请先阅读

Universal (apply to both modes)

通用规则(适用于两种模式)

  1. Only add, never modify existing. If you find an existing MDL description / relationship / rule that looks wrong, do not edit it. Surface it on the "please fix manually" list shown in Step 9.
  2. Every MDL edit must validate. Right after any MDL YAML change, run
    wren context validate
    . If it fails, revert that single change and feed the error back. Never leave a project in an invalid state.
  3. Pre-draft every proposal. Whether you're showing the draft to the user (grill) or applying it directly (auto-pilot), generate the concrete content — never lazy-ask "what should the description say?".
  4. Be explicit about confidence. In grill mode, open Lane 3 inference questions with "I'm guessing — ". In auto-pilot, tag every Lane 3 inference and partial Lane 2 match in the Step 9 audit with confidence (high / med / low) and source.
  1. 仅添加,绝不修改现有内容。如果发现现有MDL描述/关联/规则看起来有误,请勿编辑。将其列入步骤9中的“请手动修复”列表。
  2. 每次MDL编辑必须验证。任何MDL YAML修改后,立即运行
    wren context validate
    。如果验证失败,撤销此次单独修改并反馈错误信息。绝不能让项目处于无效状态。
  3. 为每个提案预先生成草稿。无论是向用户展示草稿(grill模式)还是直接应用(auto-pilot模式),都要生成具体内容——绝不能笼统提问“描述应该怎么写?”。
  4. 明确说明置信度。在grill模式下,所有第三类推断问题开头要加上“我猜测——”。在auto-pilot模式下,步骤9的审计报告中要为每个第三类推断和部分第二类匹配项标记置信度(高/中/低)及来源。

Grill mode only

仅适用于grill模式

  1. One question at a time. Grill relentlessly. Walk every gap top-down, resolve one decision before moving to the next. Provide a recommended answer for every question.
  2. Skip is final for this session. No pending queue, no nagging next round. If the user wants to revisit, they re-run the skill.
  1. 一次一个问题,持续提问。自上而下梳理每个空白,解决一个决策后再进行下一个。为每个问题提供推荐答案。
  2. 跳过即视为本次会话终结。无待处理队列,后续会话不会再次提及。如果用户想重新处理,需重新运行本技能。

Auto-pilot mode only

仅适用于auto-pilot模式

  1. Drop into grill for three cases. Always interrupt auto-pilot and ask the user when:
    • (a) Lane 2 conflict — raw and current MDL disagree.
    • (b) High-blast-radius proposal (any lane) — new cube, new view, new relationship, or new MDL metric/calculated column. These become public artifacts visible to every future agent session, so blast radius doesn't depend on whether the trigger was raw evidence (Lane 2) or inference (Lane 3).
    • (c) Lane 2 routing ambiguity — you can't confidently pick a sink (MDL /
      instructions.md
      /
      queries.yml
      /
      cubes/
      ).
    Everything else: apply directly and log to the audit list.

  1. 三种情况切换至grill模式。出现以下情况时,必须中断auto-pilot并询问用户:
    • (a) 第二类冲突——原始内容与当前MDL不一致。
    • (b) 高影响提案(任何类别)——新增cube、新增视图、新增关联,或新增MDL指标/计算列。这些会成为公共工件,对未来所有Agent会话可见,因此影响范围不取决于触发因素是原始证据(第二类)还是推断(第三类)。
    • (c) 第二类路由歧义——无法确定应写入哪个目标位置(MDL/
      instructions.md
      /
      queries.yml
      /
      cubes/
      )。
    其他所有情况:直接应用并记录到审计列表中。

Step 0 — Mode selection (before anything else)

步骤0——模式选择(优先执行)

Before touching the project or reading any file, ask the user which mode to run in. Lock the choice for the whole session — no mid-session switching; the user re-runs to change.
Two modes for this session:
a) Grill mode — I walk every gap with you, one question at a time, proposing a draft and waiting for your accept / edit / skip. You stay in the driver's seat. Best when the raw material is sensitive, when you want to learn what I don't know about your project, or when you'd rather review than re-do.
b) Auto-pilot mode — I read raw + current context, make my best inferences, and apply them. I'll only stop to grill you on (1) conflicts between raw and existing MDL and (2) high-blast-radius additions like new metrics, views, or relationships. The session ends with a full diff + confidence-tagged inference list for you to audit.
Which? (a / b)
Remember the choice as
MODE = grill | autopilot
and use it to branch Steps 6 and 9.

在接触项目或读取任何文件之前,询问用户要运行哪种模式。整个会话锁定该选择——会话中途不可切换;用户需重新运行技能以更改模式。
本次会话提供两种模式:
a) Grill模式——我会与你逐个梳理每个空白,提出草稿并等待你的接受/编辑/跳过。你掌控全程。适用于原始材料敏感、你想了解项目中我未知的信息,或你更倾向于审核而非重做的场景。
b) Auto-pilot模式——我会读取原始内容+当前上下文,做出最佳推断并应用。仅在以下情况时切换至grill模式询问你:(1)原始内容与现有MDL冲突;(2)新增指标、视图或关联等高影响添加项。会话结束时会提供完整差异+带置信度标记的推断列表供你审核。
选择哪种模式?(a / b)
将选择记录为
MODE = grill | autopilot
,并在步骤6和步骤9中根据该选择分支执行。

Preflight

预检查

Step 1 — Choose the Wren project

步骤1——选择Wren项目

Always ask the user which project to enrich before doing anything else — never assume cwd. A user can have several Wren projects and an ambient
~/.wren
profile that doesn't match the one they want to augment.
Offer concrete hints in the question so the user can answer in one round-trip:
bash
undefined
在执行任何操作前,务必询问用户要增强哪个项目——切勿假设当前工作目录即为目标项目。用户可能拥有多个Wren项目,且
~/.wren
环境配置文件指向的项目可能并非其想要增强的项目。
在问题中提供具体提示,以便用户一次回答即可确认:
bash
undefined

Hint 1 — does cwd look like a project?

提示1——当前目录是否看起来像项目?

test -f wren_project.yml && pwd
test -f wren_project.yml && pwd

Hint 2 — does ~/.wren/config.yml point at a default project?

提示2——~/.wren/config.yml是否指向默认项目?

grep -E '^project_path:' ~/.wren/config.yml 2>/dev/null

Then ask:

> Which Wren project do you want me to augment?
> a) `$PWD` (current directory)         ← if Hint 1 matched
> b) `<path from ~/.wren/config.yml>`   ← if Hint 2 matched
> c) something else — paste the absolute path

After the user answers, lock the path in for the whole session:

```bash
cd <chosen-path>
test -f wren_project.yml || {
  echo "Error: <chosen-path> is not a Wren project (no wren_project.yml)."
  exit 1
}
wren context show >/dev/null 2>&1 || {
  echo "Error: wren context show failed — manifest may be invalid."
  exit 1
}
If either check fails, stop and tell the user — suggest
wren-onboarding
if it's not a project, or
wren context validate
if the manifest is broken.
From this point on, every command and file path in this skill is relative to the chosen project root. Do not switch projects mid-session — if the user wants to work a different project, end this session and re-run.
grep -E '^project_path:' ~/.wren/config.yml 2>/dev/null

然后询问:

> 你想增强哪个Wren项目?
> a) `$PWD`(当前目录)         ← 如果提示1匹配
> b) `<path from ~/.wren/config.yml>`   ← 如果提示2匹配
> c) 其他路径——粘贴绝对路径

用户回答后,锁定该路径用于整个会话:

```bash
cd <chosen-path>
test -f wren_project.yml || {
  echo "Error: <chosen-path> is not a Wren project (no wren_project.yml)."
  exit 1
}
wren context show >/dev/null 2>&1 || {
  echo "Error: wren context show failed — manifest may be invalid."
  exit 1
}
如果任一检查失败,停止执行并告知用户——如果不是Wren项目,建议使用
wren-onboarding
;如果清单无效,建议使用
wren context validate
从此时起,本技能中的所有命令和文件路径均相对于所选项目根目录。会话中途不可切换项目——如果用户想处理其他项目,需结束当前会话并重新运行技能。

Step 2 — Detect memory availability

步骤2——检测记忆功能可用性

bash
wren memory --help >/dev/null 2>&1
  • Exit 0 → set
    MEMORY_AVAILABLE = true
    . The fourth sink (direct
    wren memory store
    ) is open.
  • Exit non-zero → set
    MEMORY_AVAILABLE = false
    . Skip the memory-only paths below.
bash
wren memory --help >/dev/null 2>&1
  • 退出码0 → 设置
    MEMORY_AVAILABLE = true
    。第四个目标位置(直接
    wren memory store
    )可用。
  • 非零退出码 → 设置
    MEMORY_AVAILABLE = false
    。跳过以下仅适用于记忆功能的路径。

Step 3 — Ensure raw/ folder exists

步骤3——确保raw/文件夹存在

From the project root (cwd is already there from Step 1):
bash
mkdir -p raw
If you just created it (the directory was empty or new):
I've created
raw/
at the project root. Drop anything you think helps explain this project's business context — PDFs, glossaries, handbooks, financial reports, data dictionaries, sample queries, code with comments, screenshots of dashboards, anything.
Heads-up: the contents may be sensitive. Decide for yourself whether to commit
raw/
to git — I won't touch
.gitignore
.
Tell me when you've added the files and I'll start reading.
Wait for the user to confirm before continuing.

从项目根目录(步骤1已切换至此)执行:
bash
mkdir -p raw
如果刚刚创建了该文件夹(目录为空或新建):
我已在项目根目录创建
raw/
文件夹。请放入任何有助于解释项目业务上下文的内容——PDF、术语表、手册、财务报告、数据字典、示例查询、带注释的代码、仪表板截图等均可。
注意: 内容可能包含敏感信息。请自行决定是否将
raw/
提交至git——我不会修改
.gitignore
请在添加文件后告知我,我将开始读取。
等待用户确认后再继续。

Step 4 — Read everything

步骤4——读取所有内容

Read both sides — the raw material and the current Wren context — before forming any opinion.
在形成任何结论前,读取两方面内容——原始材料和当前Wren上下文。

Raw

原始内容

Read every file under
raw/
. Use whatever capability your agent has natively (text, markdown, code, PDF). If you genuinely can't read a particular file, tell the user once which file and suggest converting it to text or pasting the relevant excerpt — then move on to the rest. Do not install extra Python packages, do not reach for new CLI subcommands.
读取
raw/
下的所有文件。使用Agent原生支持的任何能力(文本、markdown、代码、PDF)。如果确实无法读取某个文件,仅告知用户一次该文件名,并建议转换为文本或粘贴相关内容——然后继续处理其他文件。请勿安装额外Python包,请勿使用新的CLI子命令

Current Wren context

当前Wren上下文

SourceCommand
MDL (full)
wren context show --output json
Project instructions
wren context instructions
Existing cubes (names)
wren cube list
Existing cubes (measures + dimensions)
wren cube describe <cube>
for each name above
Curated NL-SQL pairsread
queries.yml
directly
(Memory) stored pairs
wren memory list -n 200 --output json
(Memory) schema as text
wren memory describe
The memory rows only matter when
MEMORY_AVAILABLE = true
. Reading cubes is essential before any Lane 3 metric proposal — see
references/cube_proposals.md
for the duplication guard.

来源命令
完整MDL
wren context show --output json
项目说明
wren context instructions
现有cube(名称)
wren cube list
现有cube(度量+维度)对每个名称执行
wren cube describe <cube>
精选NL-SQL对直接读取
queries.yml
(记忆功能)存储的配对
wren memory list -n 200 --output json
(记忆功能)文本形式的架构
wren memory describe
仅当
MEMORY_AVAILABLE = true
时,记忆行才有用。在提出任何第三类指标提案前,读取cube信息至关重要——请参阅
references/cube_proposals.md
中的重复检查规则。

Step 4.5 — Ground-truth probe (grill mode default; auto-pilot opt-out)

步骤4.5——基准探测(grill模式默认开启;auto-pilot模式默认关闭)

When raw is silent on a column's enum / unit / null / magic / time semantics, the catalog's column-local categories (#1, #2, #3, #5, #7 in
references/gap_catalog.md
) can often be settled directly by sampling distinct values from the live DB. Read
references/gap_catalog.md
before this step — its Trigger column tells you which columns are probe candidates.
Default policy by mode:
ModeDefaultHow to override
GrillProbe on. Before the first query, ask the user once: "I want to sample N columns with
LIMIT 30
each to find enum / sentinel / time-grain values — OK?" Lock the answer for the session.
User says no → skip Step 4.5 entirely; rely on Lane 2 + Lane 3 instead.
Auto-pilotProbe off. The skill never queries the live DB in auto-pilot mode.None — user must re-run in grill mode if probe would unblock high-confidence inferences.
Candidate selection (no DB call yet):
A column is a probe candidate when all hold:
  • Description is empty OR does not yet contain the relevant
    [tag]
    (catalog write format).
  • Column type / name pattern matches catalog #1 (enum), #3 (NULL), #5 (magic), or #7 (time grain).
  • For #3 and #7, the description also lacks event-vs-record or TZ wording.
Categories #2 (unit), #4, #6, #8, #9, #10 are not probable —
SELECT DISTINCT
doesn't reveal units, default filters, synonyms, external mappings, currency conventions, or canonical-table preferences. Those need raw evidence or human judgment.
Probe query:
bash
wren --sql "SELECT DISTINCT <col> FROM <model> LIMIT 30" --output json
当原始内容未提及列的枚举/单位/空值/魔法值/时间语义时,目录中的列本地类别(
references/gap_catalog.md
中的#1、#2、#3、#5、#7)通常可通过直接采样实时数据库中的不同值来确定。执行此步骤前请阅读
references/gap_catalog.md
——其“触发条件”列会告诉你哪些列是探测候选。
按模式划分的默认策略:
模式默认设置如何覆盖
Grill开启探测。在首次查询前,询问用户一次:“我想对N列进行采样,每列
LIMIT 30
以查找枚举/哨兵值/时间粒度——可以吗?” 会话中锁定该答案。
用户拒绝 → 完全跳过步骤4.5;仅依赖第二类+第三类内容。
Auto-pilot关闭探测。auto-pilot模式下技能绝不会查询实时数据库。无——如果探测可实现高置信度推断,用户需重新运行grill模式。
候选选择(尚未调用数据库):
当以下所有条件满足时,列即为探测候选:
  • 描述为空 或 尚未包含相关
    [tag]
    (目录写入格式)。
  • 列类型/名称模式匹配目录#1(枚举)、#3(NULL)、#5(魔法值)或#7(时间粒度)。
  • 对于#3和#7,描述还缺少事件与记录的区分或时区相关表述。
类别#2(单位)、#4、#6、#8、#9、#10 不可探测——
SELECT DISTINCT
无法揭示单位、默认过滤器、同义词、外部映射、货币约定或规范表偏好。这些需要原始证据或人工判断。
探测查询:
bash
wren --sql "SELECT DISTINCT <col> FROM <model> LIMIT 30" --output json

For magic sentinels (catalog #5), also fetch min/max:

对于魔法哨兵值(目录#5),还需获取最小值/最大值:

wren --sql "SELECT MIN(<col>) AS lo, MAX(<col>) AS hi FROM <model>" --output json

- **≤ 30 distinct values returned** → enum / sentinel / grain candidate. Draft the `[tag]` line and surface to user (grill) with confidence "med — probed values, semantics still inferred".
- **30 returned (LIMIT hit)** → cardinality too high; not an enum / sentinel candidate. Skip.
- **Query fails** (permissions, connection, large-table timeout) → do not retry. Log the failure to the audit list, surface in Step 9, continue with Lane 2 + Lane 3 only.

**Safety:**

- Probe each (model, column) at most once per session.
- Never probe a column that already has a matching `[tag]` line — Universal Rule 1.
- Probe results stay in working memory; do not write them to disk.

---
wren --sql "SELECT MIN(<col>) AS lo, MAX(<col>) AS hi FROM <model>" --output json

- **返回≤30个不同值** → 枚举/哨兵值/粒度候选。生成`[tag]`行并展示给用户(grill模式),置信度标记为“中——已探测值,语义仍为推断”。
- **返回30个值(达到LIMIT上限)** → 基数过高;不是枚举/哨兵值候选。跳过。
- **查询失败**(权限、连接、大表超时) → 请勿重试。将失败记录到审计列表,在步骤9中展示,继续仅使用第二类+第三类内容。

**安全注意事项:**

- 每个(模型,列)在会话中最多探测一次。
- 绝不探测已包含匹配`[tag]`行的列——通用规则1。
- 探测结果仅保留在工作内存中;请勿写入磁盘。

---

Step 5 — Three gap-detection lanes (in your head, no artifact)

步骤5——三类空白检测(仅在内存中执行,不生成工件)

Hold all three lanes in working memory. Do not write a
gaps.yml
.
Before sweeping, load
references/gap_catalog.md
— the ten business-semantic categories the schema cannot carry. Each lane consumes the catalog differently: Lane 1 walks it as type-aware mechanical triggers, Lane 2 classifies each atomic raw claim into one of the 10 categories before routing, Lane 3 seeds inference prompts when a trigger fires but raw is silent.
在工作内存中同时处理三类检测。请勿写入
gaps.yml
开始扫描前,加载
references/gap_catalog.md
——架构无法承载的十大业务语义类别。每类检测对目录的使用方式不同:第一类将其作为类型感知的机械触发条件,第二类将每个原始原子声明分类为10个类别之一再路由,第三类在触发条件满足但原始内容为空时生成推断提示。

Lane 1 — Structural coverage (mechanical)

第一类——结构覆盖(机械检测)

Scan the current MDL and check:
  • Every model has a non-empty
    properties.description
    ?
  • Every column has a description (at least for non-PK, non-FK ones)?
  • Every model has a
    primary_key
    ?
  • Every model has at least one relationship (orphan models are suspicious)?
  • instructions.md
    is more than the scaffold default?
  • queries.yml
    has at least a few canonical pairs?
Plus, walk every column / model against
references/gap_catalog.md
triggers:
  • For each column matching catalog #1 / #2 / #3 / #5 / #7 triggers → is the corresponding
    [tag]
    line present in
    properties.description
    ?
  • For each model with a soft-delete column (
    deleted_at
    ,
    is_active
    ,
    archived_at
    , etc.) → is there a
    ## Default filters
    rule in
    instructions.md
    covering it (catalog #4)?
  • For each lookalike table pair (e.g.
    users
    /
    users_v3
    ) → is there a
    ## Canonical tables
    rule (catalog #10)?
  • For each
    *_currency
    /
    fx_rate
    / external-system ID column → is the matching
    ## Currency
    (#9) or
    ## External identifiers
    (#8) section present?
  • Business terms in
    instructions.md
    or raw that don't map verbatim to model / column names → catalog #6
    ## Naming conventions
    rule missing.
Each unsatisfied check is a candidate. Combine with Step 4.5 probe results (if available) before moving to Lane 2.
扫描当前MDL并检查:
  • 每个模型是否有非空的
    properties.description
  • 每个列是否有描述(至少非主键、非外键列)?
  • 每个模型是否有
    primary_key
  • 每个模型是否至少有一个关联(孤立模型需警惕)?
  • instructions.md
    是否超出脚手架默认内容?
  • queries.yml
    是否至少包含几个规范配对?
此外,对照
references/gap_catalog.md
的触发条件检查每个列/模型:
  • 每个匹配目录#1/#2/#3/#5/#7触发条件的列 →
    properties.description
    中是否存在对应的
    [tag]
    行?
  • 每个包含软删除列(
    deleted_at
    is_active
    archived_at
    等)的模型 →
    instructions.md
    中是否有覆盖该列的
    ## Default filters
    规则(目录#4)?
  • 每个相似表对(如
    users
    /
    users_v3
    ) → 是否存在
    ## Canonical tables
    规则(目录#10)?
  • 每个
    *_currency
    /
    fx_rate
    /外部系统ID列 → 是否存在匹配的
    ## Currency
    (#9)或
    ## External identifiers
    (#8)章节?
  • instructions.md
    或原始内容中未直接映射到模型/列名的业务术语 → 缺少目录#6的
    ## Naming conventions
    规则。
每个未满足的检查项均为候选。结合步骤4.5的探测结果(如果可用)后进入第二类检测。

Lane 2 — Claim-diff (raw vs current context)

第二类——声明差异(原始内容vs当前上下文)

For each raw file, internally extract 5–15 atomic claims — single statements that could be true or false, e.g. "an order has exactly one customer", "user means type=default by default", "ARR equals MRR × 12 minus refunds". Then for each claim, classify against the current Wren context:
ClassMeaningResolution outcome
coveredalready reflected in MDL / instructions / pairsskip
partialthe topic exists but the wording / scope differspropose tightening
newnothing in current context matchesroute to a sink
conflictraw says A, current context says Bgrill the user (both modes), but do not edit existing — surface for manual fix
对每个原始文件,内部提取5–15个原子声明——可判断真假的单一陈述,例如“一个订单恰好对应一个客户”“默认情况下user指type=default”“ARR等于MRR×12减去退款”。然后将每个声明与当前Wren上下文进行分类:
分类含义解决结果
已覆盖已在MDL/说明/配对中体现跳过
部分覆盖主题存在但表述/范围不同建议优化
新增当前上下文无匹配内容路由至目标位置
冲突原始内容说A,当前上下文说B询问用户(两种模式),但请勿编辑现有内容——列入手动修复列表

Lane 3 — Inference (your own guesses)

第三类——推断(Agent自主猜测)

After reading raw and the current MDL, propose additions the user did not literally state in raw but that would clearly help the agent later. Examples:
  • "I see
    quarterly_churn
    referenced five times in
    finance.pdf
    . No existing cube covers it. Want me to add
    cubes/quarterly_churn/metadata.yml
    with measure =
    COUNT(*) FILTER (WHERE churned_at IS NOT NULL) / NULLIF(COUNT(*), 0)
    ?" — see
    references/cube_proposals.md
    for the YAML template and duplication guard.
  • "Your support handbook keeps mentioning
    core users
    without defining it. Is this
    users WHERE tier = 'premium'
    ? Want me to make a view?"
  • "The data dictionary says
    events.payload
    is JSON but the column has no description — let me draft one."
For any aggregation-shaped proposal (
SUM
,
COUNT
,
AVG
, "by month / by status / per customer" patterns), default to a cube. Run
wren cube list
+
wren cube describe
first to confirm no existing cube already covers the measure expression; if one does, skip the proposal and add a
queries.yml
example pointing at the existing cube instead. The full decision tree, naming rules, and validation flow live in
references/cube_proposals.md
.
In grill mode, open every Lane 3 question with "I'm guessing — ". In auto-pilot, tag the audit entry with
agent inference
so the user sees you extrapolated.

读取原始内容和当前MDL后,提出用户未在原始内容中明确提及但显然有助于Agent后续工作的补充内容。示例:
  • “我看到
    quarterly_churn
    finance.pdf
    中被引用五次。现有cube均未覆盖该指标。是否需要我添加
    cubes/quarterly_churn/metadata.yml
    ,其中度量为
    COUNT(*) FILTER (WHERE churned_at IS NOT NULL) / NULLIF(COUNT(*), 0)
    ?”——请参阅
    references/cube_proposals.md
    中的YAML模板和重复检查规则。
  • “你的支持手册多次提到
    core users
    但未定义。是否指
    users WHERE tier = 'premium'
    ?是否需要我创建视图?”
  • “数据字典显示
    events.payload
    是JSON类型,但该列无描述——我来生成一个草稿。”
对于任何聚合形式的提案(
SUM
COUNT
AVG
、“按月份/按状态/按客户”模式),默认采用cube形式。先执行
wren cube list
+
wren cube describe
确认现有cube未覆盖该度量表达式;如果已覆盖,跳过提案并添加一个指向现有cube的
queries.yml
示例。完整决策树、命名规则和验证流程请参阅
references/cube_proposals.md
在grill模式下,所有第三类问题开头要加上“我猜测——”。在auto-pilot模式下,审计条目要标记
agent inference
以便用户知晓这是推断内容。

Step 6 — Resolve gaps

步骤6——解决空白

Branch on the
MODE
locked in Step 0.
根据步骤0中锁定的
MODE
分支执行。

Grill mode

Grill模式

Use this conversational pattern for every gap surfaced in Lanes 1–3:
Interview the user relentlessly about every gap until we reach a shared understanding. Walk down each branch of the decision tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer.
Ask the questions one at a time.
If a question can be answered by exploring the codebase or the raw files, do that instead of asking.
For every grill turn:
  1. State the gap and where it came from (Lane 1 / 2 / 3, and for Lane 2 quote the raw file + a short excerpt).
  2. Propose the concrete answer — draft the description, the rule, the SQL pair, the relationship.
  3. Propose the sink ("I'll add this to
    instructions.md
    as a rule" / "I'll add this to the
    users
    model description in MDL").
  4. Let the user accept / edit / skip.
  5. On accept: write back (Step 7).
  6. On edit: apply their wording, then write back.
  7. On skip: drop it, move to the next gap. Do not requeue.
When the user gives a curve-ball answer ("actually we don't track that") — pivot. The goal is shared understanding, not pushing a pre-built list.
对第一至第三类中发现的每个空白,使用以下对话模式:
针对每个空白持续询问用户,直到达成共识。逐步梳理决策树的每个分支,逐个解决决策间的依赖关系。每个问题都要提供推荐答案。
一次只问一个问题。
如果问题可通过探索代码库或原始文件解决,请勿询问用户。
每次询问时:
  1. 说明空白及其来源(第一/第二/第三类,第二类需引用原始文件+简短摘录)。
  2. 提出具体答案——生成描述、规则、SQL配对、关联的草稿。
  3. 建议目标位置(“我会将此添加到
    instructions.md
    作为规则”/“我会将此添加到MDL中
    users
    模型的描述里”)。
  4. 让用户选择接受/编辑/跳过。
  5. 接受:写入结果(步骤7)。
  6. 编辑:应用用户的表述,然后写入结果。
  7. 跳过:放弃该空白,进入下一个。请勿重新列入队列。
如果用户给出意外答案(“实际上我们不跟踪这个”)——调整方向。目标是达成共识,而非推进预设列表。

Auto-pilot mode

Auto-pilot模式

Process every finding from Lanes 1–3 directly — except for the three escalation cases from Universal Rule 7:
  • Lane 2 conflict — drop into grill for this single question, never auto-resolve.
  • Lane 3 proposing a new metric / view / relationship — drop into grill before applying.
  • Lane 2 routing ambiguous — drop into grill for the sink choice only, then auto-apply.
For everything else (Lane 1 mechanical fixes, Lane 2 unambiguous new claims, Lane 3 low-impact description tweaks):
  1. Synthesize the concrete proposal (description / rule / SQL pair) using your best inference.
  2. Decide the sink using the routing table (Step 7).
  3. Write back.
  4. Run
    wren context validate
    immediately after any MDL edit. On failure: revert the single change and log the revert with the error.
  5. Append to the audit list: the finding, the sink, confidence tag (high / med / low), and source (raw file + excerpt for Lane 2,
    agent inference
    for Lane 3,
    structural
    for Lane 1).
Auto-pilot does not pause for confirmation on each item — the user reviews the full diff + audit list in Step 9. They are the reviewer, not the gatekeeper.

直接处理第一至第三类中的所有发现——除了通用规则7中的三种升级情况
  • 第二类冲突——切换至grill模式处理该单个问题,绝不自动解决。
  • 第三类提案涉及新增指标/视图/关联——应用前切换至grill模式询问用户。
  • 第二类路由歧义——切换至grill模式询问目标位置选择,然后自动应用。
其他所有情况(第一类机械修复、第二类明确新增声明、第三类低影响描述调整):
  1. 使用最佳推断生成具体提案(描述/规则/SQL配对)。
  2. 使用路由表确定目标位置(步骤7)。
  3. 写入结果。
  4. 任何MDL编辑后立即运行
    wren context validate
    。验证失败:撤销此次单独修改并记录撤销及错误信息。
  5. 添加到审计列表:发现内容、目标位置、置信度标记(高/中/低)、来源(第二类为原始文件+摘录,第三类为
    agent inference
    ,第一类为
    structural
    )。
Auto-pilot模式不会逐个暂停等待确认——用户在步骤9中查看完整差异+审计列表。用户是审核者,而非把关者。

Step 7 — Routing & writeback

步骤7——路由与写入

Decide the sink as part of the proposal (Step 6.3 in grill mode; Step 6.2 in auto-pilot), so the user can correct routing in grill mode and audit it in auto-pilot.
Finding typeSinkHow to write
Schema structure / relationship / view / model or column descriptionMDL YAML under
models/
,
views/
,
relationships.yml
Edit the YAML file directly. For catalog #1 / #2 / #3 / #5 / #7 / PII, append a
[tag]
line to
properties.description
(prose first, then one tag per category). See
references/gap_catalog.md
for the exact tag format and triggers.
Aggregation metric / named measure (with measures + dimensions)
cubes/<name>/metadata.yml
New file per cube. Default sink for any
SUM
/
COUNT
/
AVG
/ ratio metric raw defines or Lane 3 infers. See
references/cube_proposals.md
for the YAML template, naming policy, duplication guard, and validation flow. Run
wren context validate
+
wren cube query --cube <name> --sql-only
after writing; revert on either failure. Always escalates to grill in auto-pilot (Universal Rule 7b).
Default filter / implicit rule / business convention / naming convention / external mapping / currency / canonical table
instructions.md
Append under the catalog-specified
##
section heading (#4 →
## Default filters
, #6 →
## Naming conventions
, #8 →
## External identifiers
, #9 →
## Currency
, #10 →
## Canonical tables
). Create the heading if absent; never modify existing text.
Canonical NL→SQL example the team should share
queries.yml
Append a new entry under
pairs:
Ad-hoc NL→SQL pair (user-local, not for the repo) — only if
MEMORY_AVAILABLE = true
wren memory store
wren memory store --nl "..." --sql "..." --tags "source:enrich"
Catalog-driven routing means every column-local proposal goes to the column's
properties.description
with a
[tag]
line; every cross-model rule goes to
instructions.md
under a fixed heading. This keeps re-enrichment deterministic (greppable) and avoids inventing new sink locations.
在提案中确定目标位置(grill模式步骤6.3;auto-pilot模式步骤6.2),以便用户在grill模式下纠正路由,在auto-pilot模式下审计路由。
发现类型目标位置写入方式
架构结构/关联/视图/模型或列描述MDL YAML(位于
models/
views/
relationships.yml
直接编辑YAML文件。对于目录#1/#2/#3/#5/#7/PII,在
properties.description
末尾添加
[tag]
行(先写 prose,每个类别一个标签)。请参阅
references/gap_catalog.md
中的具体标签格式和触发条件。
聚合指标/命名度量(包含度量+维度)
cubes/<name>/metadata.yml
每个cube对应一个新文件。原始内容定义或第三类推断的任何
SUM
/
COUNT
/
AVG
/比率指标的默认目标位置。请参阅
references/cube_proposals.md
中的YAML模板、命名规则、重复检查规则和验证流程。写入后运行
wren context validate
+
wren cube query --cube <name> --sql-only
;任一失败则撤销。auto-pilot模式下始终升级至grill模式(通用规则7b)。
默认过滤器/隐式规则/业务约定/命名约定/外部映射/货币/规范表
instructions.md
添加到目录指定的
##
章节标题下(#4→
## Default filters
,#6→
## Naming conventions
,#8→
## External identifiers
,#9→
## Currency
,#10→
## Canonical tables
)。如果标题不存在则创建;绝不修改现有文本。
团队应共享的规范NL→SQL示例
queries.yml
pairs:
下添加新条目
临时NL→SQL配对(用户本地使用,不提交至仓库)——仅当
MEMORY_AVAILABLE = true
wren memory store
wren memory store --nl "..." --sql "..." --tags "source:enrich"
基于目录的路由意味着每个列本地提案都要添加到列的
properties.description
并附带
[tag]
行;每个跨模型规则都要添加到
instructions.md
的固定标题下。这使得重新增强过程具有确定性(可通过 grep 查找),避免发明新的目标位置。

After every MDL edit

每次MDL编辑后

bash
wren context validate
If it fails:
  1. Revert the single change you just made.
  2. Show the user the validation error (grill mode) or log it in the audit (auto-pilot).
  3. In grill mode, re-grill on that specific gap with the error as new context.
  4. In auto-pilot, mark the finding as "revert: validation failed" and move on.
bash
wren context validate
如果验证失败:
  1. 撤销刚刚做出的单独修改。
  2. 向用户展示验证错误(grill模式)或记录到审计列表(auto-pilot模式)。
  3. 在grill模式下,结合错误信息重新询问该空白。
  4. 在auto-pilot模式下,将该发现标记为“已撤销:验证失败”并继续。

Format reminders

格式提醒

  • queries.yml
    schema follows
    wren memory dump
    output:
    version: 1
    ,
    pairs:
    list of
    {nl, sql, source}
    (use
    source: enrich
    ).
  • MDL YAML uses snake_case keys (e.g.
    primary_key
    ,
    is_calculated
    ,
    not_null
    ).
    wren context build
    converts to camelCase for
    target/mdl.json
    .
  • instructions.md
    is free-form markdown. Group rules by topic with headings.

  • queries.yml
    架构遵循
    wren memory dump
    输出:
    version: 1
    pairs:
    {nl, sql, source}
    列表(使用
    source: enrich
    )。
  • MDL YAML使用蛇形命名键(如
    primary_key
    is_calculated
    not_null
    )。
    wren context build
    会将其转换为驼峰命名并存入
    target/mdl.json
  • instructions.md
    为自由格式markdown。按主题分组规则并添加标题。

Step 8 — Session finalize

步骤8——会话收尾

After Step 6 ends (user says stop in grill mode, or every finding is processed in auto-pilot):
bash
wren context build
This recompiles
target/mdl.json
from the YAML edits.
If
MEMORY_AVAILABLE = true
:
bash
wren memory index
This re-embeds the new schema items, the updated
instructions.md
, and the new
queries.yml
entries.

步骤6结束后(grill模式下用户说停止,或auto-pilot模式下所有发现已处理):
bash
wren context build
此命令会根据YAML编辑重新编译
target/mdl.json
如果
MEMORY_AVAILABLE = true
bash
wren memory index
此命令会重新嵌入新架构项、更新后的
instructions.md
和新的
queries.yml
条目。

Step 9 — Summary

步骤9——总结

Both modes — common section

两种模式通用部分

Print a tight session report:
text
Wren Enrich Context — session summary  (mode: <grill|autopilot>)

Added:
  MDL              : N model descriptions, N column descriptions, N relationships, N views
                     by tag: [enum]=N [unit]=N [null]=N [magic]=N [time]=N [pii]=N
  cubes            : N new (names: <list>)                                    via cubes/<name>/metadata.yml
  instructions.md  : N new rules across sections
                     by section: Default filters=N | Naming conventions=N | External identifiers=N | Currency=N | Canonical tables=N
  queries.yml      : N new NL→SQL pairs
  memory store     : N ad-hoc pairs                                          (only if MEMORY_AVAILABLE)
  Probe            : N columns sampled, M failed                             (grill mode only)

Please fix manually (we don't edit existing fields):
  - models/orders/metadata.yml: existing description seems to contradict raw/glossary.pdf p.3
  - relationships.yml: existing orders↔customers is MANY_TO_ONE but raw/data_dict.md p.7 says MANY_TO_MANY
  - …
打印简洁的会话报告:
text
Wren 上下文增强——会话总结  (模式: <grill|autopilot>)

已添加:
  MDL              : N个模型描述,N个列描述,N个关联,N个视图
                     按标签统计: [enum]=N [unit]=N [null]=N [magic]=N [time]=N [pii]=N
  cubes            : N个新增(名称: <列表>)                                    路径:cubes/<name>/metadata.yml
  instructions.md  : N条新增规则,分布在多个章节
                     按章节统计: Default filters=N | Naming conventions=N | External identifiers=N | Currency=N | Canonical tables=N
  queries.yml      : N个新增NL→SQL配对
  memory store     : N个临时配对                                          (仅当MEMORY_AVAILABLE可用时)
  探测            : N个列已采样,M个探测失败                             (仅grill模式)

请手动修复(我们不编辑现有字段):
  - models/orders/metadata.yml: 现有描述似乎与raw/glossary.pdf第3页矛盾
  - relationships.yml: 现有orders↔customers关联为MANY_TO_ONE,但raw/data_dict.md第7页显示为MANY_TO_MANY
  - …

Grill mode extras

Grill模式额外内容

Append:
text
Skipped this session: N gaps (re-run /wren-enrich-context to revisit)
追加:
text
本次会话已跳过:N个空白(重新运行/wren-enrich-context可重新处理)

Auto-pilot mode extras

Auto-pilot模式额外内容

Append a detailed audit so the user can sanity-check inferences:
text
Inferred items (please review):
  high   | MDL model:orders.description           | from raw/glossary.pdf p.2 — "Order = ..."
  high   | instructions.md rule                    | from raw/handbook.md §4 — "default tier ..."
  med    | MDL column:users.signup_source.desc    | agent inference from raw/onboarding.md
  low    | queries.yml: "weekly active customers"  | agent inference, no direct raw evidence

Validation:
  K successful applies, M reverted after wren context validate failed:
    - relationships.yml: <error> → reverted

Escalated to grill (raw vs MDL conflicts / high-impact additions):
  - <count> items — see grill transcript above
The user should be encouraged to skim the audit and either accept it as-is, manually tweak low-confidence rows, or re-run in grill mode if they want to revisit interactively.

追加详细审计报告,以便用户检查推断内容:
text
推断项(请审核):
  高   | MDL模型:orders.description           | 来自raw/glossary.pdf第2页 — "Order = ..."
  高   | instructions.md规则                    | 来自raw/handbook.md第4节 — "default tier ..."
  中   | MDL列:users.signup_source.desc    | Agent根据raw/onboarding.md推断
  低   | queries.yml: "weekly active customers"  | Agent推断,无直接原始证据

验证情况:
  K项成功应用,M项在wren context validate失败后已撤销:
    - relationships.yml: <错误信息> → 已撤销

升级至grill模式的项(原始内容与MDL冲突 / 高影响添加项):
  - <数量>项 — 请查看上方grill会话记录
应鼓励用户浏览审计报告,可选择直接接受、手动调整低置信度条目,或重新运行grill模式以交互式重新处理。

Things to avoid

注意事项

  • Do not write a
    gaps.yml
    ,
    state.yml
    , or any other tracking artifact. The session lives entirely in conversation.
  • Do not modify any existing MDL field, instructions rule, or queries.yml entry — only append / add. Surface mismatches on the manual-fix list.
  • Do not install new Python packages (
    pypdf
    ,
    docling
    , …) to read raw. Use what your agent already has; ask the user to convert files you can't open.
  • Do not auto-resolve a conflict between raw and current MDL — always grill the user, in both modes.
  • Do not present Lane 3 inferences as if they were quoted from raw. Open with "I'm guessing — " (grill) or tag
    agent inference
    (auto-pilot).
  • Do not call
    wren memory store
    when
    MEMORY_AVAILABLE = false
    — write to
    queries.yml
    instead so the pair survives a future
    wren memory index
    .
  • Do not commit anything to git. The user owns the commit decision.
  • Do not nag about skipped questions. Skip is skip for this session (grill mode only — auto-pilot has no skip concept).
  • Do not run
    wren context build
    after every single MDL edit — once at the end is enough. Do run
    wren context validate
    after every edit.
  • Do not assume
    raw/
    was created by
    wren context init
    — it isn't. This skill creates it.
  • Do not switch modes mid-session. The user re-runs to change mode.
  • Do not append a
    [tag]
    line if the same category tag already exists for that column — Universal Rule 1. Surface contradictions on the manual-fix list instead.
  • Do not invent new
    instructions.md
    section headings. Stick to the five catalog-defined headings (
    ## Default filters
    ,
    ## Naming conventions
    ,
    ## External identifiers
    ,
    ## Currency
    ,
    ## Canonical tables
    ). Anything that doesn't fit goes on the manual-fix list.
  • Do not probe the live DB in auto-pilot mode. Step 4.5 is grill-only by default.
  • Do not propose a cube whose measure expression already exists in another cube on the same
    base_object
    — write a
    queries.yml
    example pointing at the existing cube instead. See
    references/cube_proposals.md
    duplication guard.
  • Do not modify an existing cube YAML even when raw contradicts it — Universal Rule 1. Surface on the manual-fix list.
  • Do not write a new cube alongside an old MDL
    metrics:
    entry that already covers the same logic. Surface as "consider migrating to cube" on the manual-fix list.
  • Do not skip
    wren cube query --cube <name> --sql-only
    after creating a cube. Structural
    wren context validate
    doesn't catch unresolvable measure / dimension expressions.
  • In auto-pilot, do not auto-apply Lane 2 conflicts or new metric / view / relationship inferences — always drop into grill for those.

  • 请勿写入
    gaps.yml
    state.yml
    或任何其他跟踪工件。会话全程仅通过对话进行。
  • 请勿修改任何现有MDL字段、说明规则或queries.yml条目——仅追加/添加。将不匹配项列入手动修复列表。
  • 请勿安装新Python包(
    pypdf
    docling
    等)以读取原始内容。使用Agent已有的能力;请用户转换无法打开的文件。
  • 请勿自动解决原始内容与当前MDL的冲突——两种模式下均需询问用户。
  • 请勿将第三类推断表述为原始内容中的引用。grill模式下开头要加“我猜测——”,auto-pilot模式下要标记
    agent inference
  • MEMORY_AVAILABLE = false
    时,请勿调用
    wren memory store
    ——改为写入
    queries.yml
    ,以便配对在未来
    wren memory index
    时保留。
  • 请勿提交任何内容至git。用户拥有提交决策权。
  • 请勿对跳过的问题反复提及。跳过即视为本次会话终结(仅grill模式——auto-pilot模式无跳过概念)。
  • 请勿在每次MDL编辑后都运行
    wren context build
    ——会话结束时运行一次即可。但每次编辑后必须运行
    wren context validate
  • 请勿假设
    raw/
    是由
    wren context init
    创建的——并非如此。本技能会创建该文件夹。
  • 会话中途不可切换模式。用户需重新运行技能以更改模式。
  • 如果列已存在相同类别标签,请勿追加
    [tag]
    行——通用规则1。将矛盾项列入手动修复列表。
  • 请勿创建新的
    instructions.md
    章节标题。坚持使用目录定义的五个标题(
    ## Default filters
    ## Naming conventions
    ## External identifiers
    ## Currency
    ## Canonical tables
    )。不适合的内容列入手动修复列表。
  • auto-pilot模式下请勿探测实时数据库。步骤4.5默认仅在grill模式下执行。
  • 请勿提出度量表达式已存在于同一
    base_object
    其他cube中的cube提案——改为写入指向现有cube的
    queries.yml
    示例。请参阅
    references/cube_proposals.md
    中的重复检查规则。
  • 即使原始内容与现有cube YAML矛盾,也请勿修改——通用规则1。列入手动修复列表。
  • 请勿在已有MDL
    metrics:
    条目覆盖相同逻辑的情况下新增cube。列入手动修复列表,标注“考虑迁移至cube”。
  • 创建cube后请勿跳过
    wren cube query --cube <name> --sql-only
    。结构性的
    wren context validate
    无法检测无法解析的度量/维度表达式。
  • auto-pilot模式下,请勿自动应用第二类冲突或新增指标/视图/关联的推断——这些情况始终要切换至grill模式。

See also

另请参阅

  • references/gap_catalog.md
    — the ten business-semantic gap categories, with triggers, default sinks, and write formats. Read this before Step 4.5 and Step 5.
  • references/cube_proposals.md
    — decision tree for when to propose a cube vs view vs calculated column, the cube YAML template, naming policy, duplication guard, and validation flow. Read this before any Lane 3 aggregation-shaped proposal.
  • references/gap_catalog.md
    — 十大业务语义空白类别,包含触发条件、默认目标位置和写入格式。执行步骤4.5和步骤5前请阅读。
  • references/cube_proposals.md
    — 决定何时提案cube、视图或计算列的决策树,cube YAML模板,命名规则,重复检查规则和验证流程。提出任何第三类聚合形式提案前请阅读。