evanflow-coder-overseer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

EvanFlow: Coder-Overseer Orchestration

EvanFlow：编码者-监督者编排流程

Vocabulary

术语定义

See

evanflow

meta-skill for shared terms. New roles introduced here:

Orchestrator — the main Claude session running this skill. Authors the contract, decomposes work, spawns subagents, reconciles findings, reports to the user.
Coder — subagent dispatched to implement one decomposed unit. Uses
```
evanflow-tdd
```
. Writes tests first. Outputs code + tests + brief summary.
Overseer — subagent dispatched to review ONE coder's output. Looks for bugs, gaps, errors, and contract violations. Reports findings; does NOT fix them.
Cohesion contract — the shared interfaces, types, invariants, and naming that must hold across ALL coder outputs. A short doc, authored by the orchestrator before any agent spawns.
Integration overseer — a final subagent that reviews the combined output across all coders, catching inter-task cohesion drift that single-coder overseers can't see (boundary type mismatches, naming inconsistency, missed invariants).

请查看

evanflow

元技能了解通用术语。此处新增角色如下：

Orchestrator — 运行此技能的主Claude会话。负责撰写契约、分解工作、生成子Agent、协调结果并向用户汇报。
Coder — 被分派实现单个分解单元的子Agent。使用
```
evanflow-tdd
```
。先编写测试，输出代码+测试+简要总结。
Overseer — 被分派评审单个编码者输出的子Agent。查找漏洞、遗漏、错误以及契约违反问题。仅报告发现问题，不进行修复。
Cohesion contract（一致性契约） — 所有编码者输出必须遵循的共享接口、类型、不变量和命名规则。由Orchestrator在生成任何Agent前撰写的简短文档。
Integration overseer（集成监督者） — 最终的子Agent，负责评审所有编码者的合并输出，捕捉单个编码者监督者无法发现的跨任务一致性偏差（如边界类型不匹配、命名不一致、遗漏的不变量）。

When to Use

适用场景

Plan has 3+ truly independent tasks that can run in parallel
Tasks share a contract (interfaces, types, naming) where divergence is a bug
Work benefits from independent review (complex routers, multi-file refactors, new modules with cross-cutting concerns)

SKIP when:

Plan has tightly sequential dependencies → use
```
evanflow-executing-plans
```
instead
Tasks are trivial (orchestration overhead > benefit)
A single agent can hold the whole thing in context comfortably

计划包含3个及以上可并行执行的真正独立任务
任务共享契约（接口、类型、命名规则），偏离契约即视为错误
工作需要独立评审（如复杂路由、多文件重构、涉及跨领域关注点的新模块）

请勿使用的场景：

计划存在紧密的顺序依赖 → 改用
```
evanflow-executing-plans
```
任务过于简单（编排开销大于收益）
单个Agent可轻松掌握全部上下文

The Flow

流程步骤

1. Author the Cohesion Contract (with Test Specifications)

1. 撰写一致性契约（含测试规范）

Before spawning anyone, the orchestrator writes a contract at

.claude/orchestration/<topic>-contract.md

(or any path the user prefers). Contents:

Shared types and interfaces — with full file paths and signatures
Naming conventions for new symbols (e.g., "router files are
```
<resource>.ts
```
, services are
```
<resource>-service.ts
```
")
Invariants that must hold (e.g., "all routes use the authenticated middleware", "all services return
```
Result<T, Error>
```
", "all DB writes go through the canonical write helper documented in CLAUDE.md")
Cross-references to
```
CONTEXT.md
```
and relevant ADRs
Integration touchpoints — where coder outputs must connect (e.g., "router A imports type X from package B; service C calls function D from service E")

Behavior specifications per coder — for EACH coder, list 3–7 testable behaviors with: a test name, a one-line description of the assertion, and the public interface used to verify it. Example:

### Coder 2: rate-limit service (example)

- test: returns full cap when no usage recorded
  assert: `getRemainingThisWeek(userId)` returns `{ remaining: 25, resetsAt: null }` for a fresh user
  surface: services/rate-limit.ts (public)

- test: counts ACTIVE and PENDING rows but excludes CANCELLED/FAILED
  assert: after seeding rows of varied statuses, `getRemainingThisWeek` returns the correct count
  surface: services/rate-limit.ts (public)

Integration tests at touchpoints — for every place where one coder's output is consumed by another, name an integration test that proves the connection works end-to-end. Both coders must satisfy it. Integration tests become the executable contract — they prevent interface drift the way prose specifications can't.

The contract is the single source of truth for everyone downstream. If it's wrong or ambiguous, fix it BEFORE spawning agents — patching the contract mid-orchestration causes drift.

在生成任何Agent之前，Orchestrator需撰写契约，路径为

.claude/orchestration/<topic>-contract.md

（或用户偏好的任意路径）。内容包括：

共享类型与接口 — 包含完整文件路径和签名
新符号命名规范（例如：“路由文件命名为
```
<resource>.ts
```
，服务文件命名为
```
<resource>-service.ts
```
”）
必须遵守的不变量（例如：“所有路由使用认证中间件”、“所有服务返回
```
Result<T, Error>
```
”、“所有数据库写入均通过CLAUDE.md中记录的标准写入工具执行”）
与
CONTEXT.md
及相关ADR的交叉引用
集成触点 — 编码者输出必须对接的位置（例如：“路由A从包B导入类型X；服务C调用服务E中的函数D”）

每位编码者的行为规范 — 为每位编码者列出3–7个可测试的行为，包含：测试名称、断言的单行描述、用于验证的公共接口。示例：

### 编码者2：速率限制服务（示例）

- test: 无使用记录时返回完整限额
  assert: `getRemainingThisWeek(userId)`为新用户返回`{ remaining: 25, resetsAt: null }`
  surface: services/rate-limit.ts（公共接口）

- test: 统计ACTIVE和PENDING状态的记录，排除CANCELLED/FAILED状态
  assert: 插入不同状态的记录后，`getRemainingThisWeek`返回正确计数
  surface: services/rate-limit.ts（公共接口）

集成触点的集成测试 — 对于每个编码者输出被其他编码者使用的位置，命名一个可证明端到端连接正常的集成测试。所有相关编码者必须满足该测试要求。集成测试是可执行的契约 — 它能防止接口偏差，这是文字规范无法做到的。

契约是所有下游环节的唯一事实来源。如果契约存在错误或歧义，需在生成Agent前修正——在编排过程中修补契约会导致偏差。

2. Decompose into Coder Tasks

2. 分解为编码者任务

Each coder task is a self-contained brief. Includes:

One unit of work — one file, or one logical module
Files to create/modify with exact paths
Required behaviors to test (behavior, not mechanics — see
```
evanflow-tdd
```
)
Reference to the contract with explicit "must conform to" pointers
Explicit out-of-scope list so the coder doesn't expand

Max 5 coders in parallel. More than that is unmanageable to review.

每个编码者任务都是一个独立的简要说明，包含：

一个工作单元 — 一个文件或一个逻辑模块
需创建/修改的文件（含精确路径）
需测试的必要行为（关注行为而非实现机制——详见
```
evanflow-tdd
```
）
契约引用，明确标注“必须遵循”的条款
明确的范围外事项列表，避免编码者扩大工作范围

并行编码者最多5人。超过这个数量会导致评审难以管理。

3. Spawn Coders — RED Checkpoint First

3. 生成编码者 — 先执行RED检查点

Coder dispatch happens in two phases to enforce TDD at the orchestration level.

Phase A — RED checkpoint. Single message, multiple

Agent

calls. Prefer
subagent_type: evanflow-coder
if available (tool-restricted to prevent git ops and other dangerous actions); else

general-purpose

. Each coder gets:

The self-contained brief from step 2
Path to the contract
Path to the plan
Instructions:

"Phase A: write ONLY the first failing test for your first behavior (per the contract's test list for your section). Run it. Confirm it fails for the right reason — not a setup error, not an import error, not a missing dependency. Report back: test file path, test name, the exact failure message, and confirmation that the failure matches expected behavior. Do NOT write any implementation yet. Do NOT touch any production source file other than minimal scaffolding (e.g., empty function stubs that exist only so the import resolves)."

After all coders return Phase A reports, the orchestrator verifies every test is RED:

Run the project's test command for affected workspaces
Confirm each coder's named test appears in the failure list
Confirm the failure reason matches what the coder reported
Catch: tests passing accidentally (assertion is too weak), tests failing for wrong reason (setup bug), tests not actually running (wrong filename pattern)

If any test isn't cleanly RED, send that coder back with the specific issue. Do NOT proceed to Phase B until all RED reports check out.

Phase B — vertical-slice GREEN. Re-message each coder:

Instructions:
"RED checkpoint confirmed. Phase B: vertical-slice TDD per
```
.claude/skills/evanflow-tdd/SKILL.md
```
. One test → minimal impl → confirm GREEN → next test → repeat. Cover ALL behaviors named in the contract for your section. Watch each test fail before writing the implementation that makes it pass. Conform to the contract exactly — if a test name in the contract is unclear or wrong, stop and report back instead of guessing. Do NOT modify files outside your task scope. Do NOT commit, stage, or run any git op. When done, report: files changed, every test name + status, integration tests touched (if any), anything deferred."

Coders run Phase B in parallel.

编码者调度分为两个阶段，以在编排层面强制执行TDD。

阶段A — RED检查点。单条消息，多次调用

Agent

。优先使用
subagent_type: evanflow-coder
（工具受限，可防止git操作等危险行为）；否则使用

general-purpose

。每位编码者会收到：

步骤2中的独立任务简要说明
契约路径
计划路径
指令：

“阶段A：仅为你的第一个行为撰写第一个失败测试（根据契约中你的部分的测试列表）。运行测试。确认测试因预期原因失败——不是设置错误、导入错误或依赖缺失。返回报告：测试文件路径、测试名称、确切的失败消息，以及失败符合预期行为的确认。请勿编写任何实现代码。除了最小化脚手架（例如仅为解决导入问题而存在的空函数存根），请勿修改任何生产源文件。”

所有编码者返回阶段A报告后，Orchestrator需验证所有测试均处于RED状态：

运行项目针对受影响工作区的测试命令
确认每位编码者指定的测试出现在失败列表中
确认失败原因与编码者报告的一致
捕捉以下问题：测试意外通过（断言太弱）、测试因错误原因失败（设置漏洞）、测试未实际运行（文件名模式错误）

如果任何测试未处于清晰的RED状态，需将具体问题反馈给该编码者。在所有RED报告验证通过前，请勿进入阶段B。

阶段B — 垂直切片实现至GREEN。重新向每位编码者发送消息：

指令：
“RED检查点已确认。阶段B：按照
```
.claude/skills/evanflow-tdd/SKILL.md
```
执行垂直切片TDD。一个测试 → 最小化实现 → 确认GREEN → 下一个测试 → 重复此流程。覆盖契约中你负责部分的所有行为。**确保每个测试在编写使其通过的实现代码前先失败。**严格遵循契约——如果契约中的测试名称不清晰或有误，请停止并反馈，不要猜测。请勿修改任务范围外的文件。请勿提交、暂存或执行任何git操作。完成后返回报告：修改的文件、每个测试的名称+状态、涉及的集成测试（如有）、任何延迟处理的事项。”

编码者并行执行阶段B。

4. Spawn Overseers — One Per Coder

4. 生成监督者 — 每位编码者对应一位

After each coder reports done, spawn its overseer. Prefer
subagent_type: evanflow-overseer
(the bundled read-only subagent — its tool restrictions enforce "report findings, never fix"). If a specialized code-reviewer subagent is available in your environment, that also works. Else

general-purpose

Each overseer gets:

The coder's diff (orchestrator runs
```
git diff <files>
```
and passes it inline)
The coder's original brief
The contract
The coder's Phase A and Phase B reports
Instructions:
"Review the diff for: (a) bugs — wrong logic, off-by-ones, race conditions, missing error handling (b) gaps — behaviors in the contract that aren't tested or aren't implemented (c) errors — type unsafety, missing validation at boundaries, wrong domain language (d) cohesion violations — anything that diverges from the contract (e) TDD compliance — was each test written before the code that makes it pass? (Check Phase A report for RED, then Phase B order.) Are tests behavior-through-public-interface, or do they reach into internals? Would the tests survive a refactor that doesn't change behavior? (f) ASSERTION CORRECTNESS — research shows 62% of LLM-generated test assertions are wrong. For each assertion: would a one-character bug in the implementation still let it pass? If yes, the assertion is too weak. Is the assertion on the right field? Is the expected value computed correctly? (g) Five Failure Modes — explicit pass against each: - Hallucinated actions — invented paths, env vars, IDs, function names, library APIs not in the contract or codebase? - Scope creep — files or behaviors touched outside the brief? - Cascading errors — silent fallbacks, swallowed exceptions, suppressed failures that hide root cause? - Context loss — contradicts the contract, CONTEXT.md, ADRs, or established conventions? - Tool misuse — wrong tool for the job, or right tool with wrong params? Report findings as a numbered list, each tagged severity (blocker / important / nit) and location (file:line). Do NOT propose fixes. Do NOT modify files. Do NOT commit, stage, or run git ops.
If using the
```
evanflow-overseer
```
subagent type, your tool restrictions (read-only) enforce this — you literally cannot fix, only report."

Prefer

subagent_type: evanflow-overseer

(tool-restricted to enforce read-only review). Else any specialized code-reviewer subagent your environment provides, or

general-purpose

Overseers run in parallel — single message, multiple

Agent

calls.

每位编码者报告完成后，生成对应的监督者。优先使用
subagent_type: evanflow-overseer
（捆绑的只读子Agent——其工具限制强制执行“仅报告发现，绝不修复”）。如果你的环境中有专门的代码评审子Agent，也可使用。否则使用

general-purpose

。

每位监督者会收到：

编码者的差异文件（Orchestrator运行
```
git diff <files>
```
并将内容内联传递）
编码者的原始任务简要说明
契约
编码者的阶段A和阶段B报告
指令：
“评审差异文件，检查以下内容： (a) 漏洞 — 逻辑错误、差一错误、竞态条件、缺失的错误处理 (b) 遗漏 — 契约中要求但未测试或未实现的行为 (c) 错误 — 类型不安全、边界处缺失验证、错误的领域语言 (d) 一致性违反 — 任何偏离契约的内容 (e) TDD合规性 — 每个测试是否在使其通过的代码之前编写？（检查阶段A的RED报告，然后是阶段B的顺序。）测试是否通过公共接口验证行为，还是直接访问内部实现？如果重构不改变行为，测试是否仍能通过？ (f) 断言正确性 — 研究表明62%的LLM生成的测试断言存在错误。对于每个断言：如果实现中存在一个字符的漏洞，断言是否仍会通过？如果是，说明断言太弱。断言是否针对正确的字段？预期值是否计算正确？ (g) 五种失败模式 — 需明确检查是否存在以下情况： - 幻觉行为 — 虚构路径、环境变量、ID、函数名称、库API，而这些在契约或代码库中不存在？ - 范围蔓延 — 触及任务简要说明之外的文件或行为？ - 级联错误 — 静默回退、吞入异常、抑制失败从而隐藏根本原因？ - 上下文丢失 — 与契约、CONTEXT.md、ADR或既定规范相矛盾？ - 工具误用 — 使用错误工具完成任务，或使用正确工具但参数错误？将发现的问题整理为编号列表，每个问题标注严重程度（阻塞/重要/细微）和位置（文件:行号）。请勿提出修复方案。请勿修改文件。请勿提交、暂存或执行git操作。
如果使用
```
evanflow-overseer
```
子Agent类型，你的工具限制（只读）会强制执行此要求——你实际上只能报告，无法修复。”

优先使用

subagent_type: evanflow-overseer

（工具受限，强制执行只读评审）。否则使用你的环境提供的任何专门代码评审子Agent，或

general-purpose

。

监督者并行运行——单条消息，多次调用

Agent

。

5. Spawn the Integration Overseer

5. 生成集成监督者

After all coder/overseer pairs return, spawn ONE final overseer (use

evanflow-overseer

again, or any specialized code-reviewer subagent your environment provides). Inputs:

The combined diff across all coders (orchestrator runs
```
git diff
```
against the working tree)
The contract
All individual overseer reports inline
Instructions:
"You're checking cohesion across multiple coders' outputs. Look for: (a) type mismatches at boundaries — one coder produces type X, another expects type X' (b) naming drift — resource called
```
Foo
```
in one file,
```
Foos
```
in another,
```
foo_id
```
vs
```
fooId
```
inconsistencies (c) invariants applied inconsistently — e.g., one router uses
```
authenticatedProcedure
```
, another forgot (d) integration points that don't connect — coder A exports something coder B doesn't import, or shapes don't match (e) integration tests at touchpoints — for every touchpoint named in the contract, verify a passing integration test exists. Run it. Confirm it actually exercises the connection (not a stub or a mock). The integration test IS the executable contract; if it doesn't exist or doesn't verify, the cohesion guarantee is unproven. Report findings tagged by severity and affected files. Do NOT fix."

所有编码者/监督者对返回结果后，生成一位最终监督者（再次使用

evanflow-overseer

，或你的环境提供的任何专门代码评审子Agent）。输入内容包括：

所有编码者的合并差异文件（Orchestrator针对工作树运行
```
git diff
```
）
契约
所有单个监督者的报告内容
指令：
“你需要检查多个编码者输出之间的一致性。查找以下问题： (a) 边界处的类型不匹配 — 一位编码者生成类型X，另一位期望类型X' (b) 命名偏差 — 一个文件中资源名为
```
Foo
```
，另一个文件中为
```
Foos
```
，
```
foo_id
```
与
```
fooId
```
不一致 (c) 不变量应用不一致 — 例如，一个路由使用
```
authenticatedProcedure
```
，另一个遗漏了该中间件 (d) 集成点未对接 — 编码者A导出的内容编码者B未导入，或数据结构不匹配 (e) 集成触点的集成测试 — 对于契约中命名的每个集成触点，验证是否存在通过的集成测试。运行该测试。确认它实际测试了对接情况（不是存根或模拟）。集成测试是可执行的契约；如果集成测试不存在或未验证，则一致性保证未得到证实。将发现的问题按严重程度和受影响文件进行标注。请勿修复。”

6. Reconcile

6. 协调处理

Orchestrator collects every overseer finding:

Group by severity: blocker → important → nit
Decide per finding:
- Send back to specific coder for revision (preserves coder ownership; recommended for important/blocker that touch one coder's work)
- Fix in main session (when the issue spans multiple coders or is a contract update)
- Drop / accept (for nits the user wouldn't care about)
- Escalate to user (when the right answer isn't clear)

For revisions: spawn that coder again with: original brief + the finding + their existing diff + "fix only this finding; don't expand scope." Re-run that coder's overseer afterward.

Hard cap: 3 reconciliation rounds. If still issues at round 3, the original decomposition or the contract was wrong — stop, report state, ask the user.

Orchestrator收集所有监督者的发现：

按严重程度分组：阻塞 → 重要 → 细微
针对每个发现做出决策：
- 反馈给特定编码者进行修订（保留编码者所有权；建议用于影响单个编码者工作的重要/阻塞问题）
- 在主会话中修复（当问题涉及多个编码者或需要更新契约时）
- 忽略/接受（用户不关心的细微问题）
- 上报给用户（当正确解决方案不明确时）

对于修订任务：重新生成该编码者，提供原始任务简要说明+发现的问题+现有差异文件+“仅修复此问题；请勿扩大范围”。之后重新运行该编码者的监督者评审。

严格限制：最多3轮协调处理。如果第3轮仍存在问题，说明原始分解或契约存在错误——停止操作，报告当前状态，询问用户。

7. Stop and Report

7. 停止并汇报

When all overseers report clean (or remaining findings explicitly accepted):

Run project-wide quality checks (
```
tsc
```
,
```
lint
```
,
```
test:run
```
for affected workspaces)
Report what was done across all tasks. STOP. No staging, no commit, no integration step. The user decides every step from here.

A good final report:

One line per coder: "Coder N —
```
<files>
```
— <one-line summary>"
Test count delta
Quality check results
Any deferred findings (with reason for accepting)

当所有监督者报告无问题（或剩余发现已明确被接受）：

运行项目范围的质量检查（
```
tsc
```
、
```
lint
```
、针对受影响工作区的
```
test:run
```
）
汇报所有任务的完成情况。停止操作。 不进行暂存、提交或集成步骤。后续所有步骤由用户决定。

一份优质的最终报告应包含：

每位编码者一行内容：“编码者N —
```
<文件>
```
— <一行摘要>”
测试数量变化
质量检查结果
任何延迟处理的发现（含接受原因）

Hard Rules

硬性规则

Contract first, with test specifications. No coder spawned before the contract is written, including per-coder behavior lists with test names AND named integration tests at every touchpoint.
RED checkpoint before any implementation. Phase A (failing test) precedes Phase B (impl). Orchestrator verifies all tests are cleanly RED before authorizing GREEN. Catches setup bugs and accidentally-passing tests before they cost a full coder cycle.
Coders use
evanflow-tdd
in Phase B. Vertical slices: one test → impl → next. Watch each test fail before writing the impl that passes it.
Integration tests at touchpoints are mandatory. Where coder A's output is consumed by coder B, a passing integration test must exist that exercises the connection. The test is the executable contract.
Overseers report, never fix. Separation of roles is the point. If overseers fix, you lose the QA signal.
Max 5 coders in parallel. Beyond that, the integration overseer can't hold the whole picture.
Max 3 reconciliation rounds. If you can't converge in 3, the decomposition was wrong.
No coder talks to another coder. All coordination flows through the contract + orchestrator. This prevents emergent miscommunication.
Never auto-commit, never auto-stage. Same hard rule as everywhere else in EvanFlow.

先撰写契约，包含测试规范。在撰写契约（含每位编码者的行为列表、测试名称以及每个集成触点的命名集成测试）之前，不得生成任何编码者。
在任何实现前执行RED检查点。阶段A（失败测试）先于阶段B（实现）。Orchestrator需在授权GREEN前验证所有测试均处于清晰的RED状态。在编码者开始编写实际代码前捕捉设置漏洞和意外通过的测试，可节省整个编码者+监督者周期。
编码者在阶段B使用
evanflow-tdd
。垂直切片：一个测试 → 实现 → 下一个测试。确保每个测试在编写使其通过的实现代码前先失败。
集成触点必须有集成测试。在编码者A的输出被编码者B使用的位置，必须存在一个通过的集成测试来验证对接情况。测试是可执行的契约。
监督者仅报告，绝不修复。角色分离是关键。如果监督者进行修复，会失去QA信号。
并行编码者最多5人。超过这个数量，集成监督者无法掌握全局情况。
最多3轮协调处理。如果3轮仍无法达成一致，说明分解存在错误。
编码者之间不得直接沟通。所有协调通过契约+Orchestrator进行。这可防止突发的沟通误解。
绝不自动提交，绝不自动暂存。与EvanFlow中所有场景的硬性规则一致。

Hand-offs

切换场景

Plan has 3+ parallelizable tasks → THIS skill (replaces
```
evanflow-executing-plans
```
for that subset)
Tasks turn out to have hidden dependencies mid-execution → abort, switch to
```
evanflow-executing-plans
```
(sequential)
Findings reveal an architectural issue →
```
evanflow-improve-architecture
```
All clean → STOP. Report. Await user direction.

计划包含3个及以上可并行任务 → 使用本技能（替代该子集的
```
evanflow-executing-plans
```
）
执行过程中发现任务存在隐藏依赖 → 终止本流程，切换至
```
evanflow-executing-plans
```
（顺序执行）
发现的问题暴露架构问题 → 使用
```
evanflow-improve-architecture
```
所有检查通过 → 停止操作。汇报情况。等待用户指示。

Why "Coders" and "Overseers" (vs. one combined role)

为何设置“编码者”和“监督者”（而非合并为单一角色）

The split exists because you can't trust a coder to be its own reviewer. A coder optimizes for "make my task pass." An overseer optimizes for "find what's wrong." Different incentives produce different attention. Combining them means review-during-implementation, which catches less.

The integration overseer exists because per-task overseers can't see the whole. Each one has a narrow window — the contract + one diff. Boundary mismatches between two diffs are invisible from inside either. The integration overseer's job is the cross-section view.

拆分角色是因为无法信任编码者同时担任自己的评审者。编码者的目标是“让我的任务通过”，而监督者的目标是“找出问题”。不同的目标会导致不同的关注点。合并角色意味着在实现过程中进行评审，发现的问题会更少。

设置集成监督者是因为单个任务的监督者无法掌握全局。每个监督者的视野有限——仅能看到契约+单个差异文件。两个差异文件之间的边界不匹配，从任何一个文件内部都无法发现。集成监督者的工作就是查看全局交叉情况。

Why TDD-on-Orchestration (RED Checkpoint + Integration Tests)

为何在编排层面实施TDD（RED检查点+集成测试）

The cohesion contract is prose. Integration tests are executable. Prose contracts drift the moment two people read them differently. A failing integration test cannot drift — either it passes, or someone's wrong. Forcing integration tests at every touchpoint converts cohesion from a hope into a guarantee.

The RED checkpoint catches the cheapest class of failures cheaply. A test that imports the wrong file, a test that asserts on the wrong field, a test that doesn't actually run — all of these are invisible while you're writing implementation. Catching them before any coder writes real code saves the entire coder + overseer cycle.

Vertical slices per coder prevent imagined-behavior tests. If a coder writes 7 tests up front and then 7 implementations, the tests describe what the coder thought the system would do, not what it does. One test → one impl → next test forces the tests to track what the code actually does.

一致性契约是文字描述，集成测试是可执行的。文字契约从两个人理解不同的那一刻起就会出现偏差。失败的集成测试不会出现偏差——要么通过，要么有人出错。强制在每个集成触点设置集成测试，将一致性从希望变为保证。

RED检查点以最低成本捕捉最廉价的失败类别。导入错误文件的测试、断言错误字段的测试、未实际运行的测试——所有这些在编写实现代码时都是不可见的。在编码者编写实际代码前捕捉这些问题，可节省整个编码者+监督者周期。

编码者采用垂直切片可防止基于想象行为的测试。如果编码者先编写7个测试，再编写7个实现，测试描述的是编码者认为系统会做什么，而非系统实际做什么。一个测试 → 一个实现 → 下一个测试，可确保测试跟踪代码的实际行为。