tdd

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Test-Driven Development — Multi-Agent Orchestration

测试驱动开发 — 多Agent编排

Enforce disciplined RED-GREEN-REFACTOR cycles using separate subagents for test writing and implementation. The core innovation: the Test Writer never sees implementation code, and the Implementer never sees the specification. This prevents the LLM from leaking implementation intent into test design.
通过为测试编写代码实现分配独立的子Agent,严格执行规范的RED-GREEN-REFACTOR循环。核心创新点:测试编写Agent永远看不到实现代码,实现Agent永远看不到需求规格。这可以防止大语言模型将实现意图泄露到测试设计中。

When to Use

使用场景

  • User requests TDD, test-first, or red-green-refactor workflow
  • User says
    /tdd
    with a feature description or bug report
  • User wants to add a feature with test coverage enforced from the start
  • User wants to fix a bug by first writing a reproducing test
  • 用户要求采用TDD、测试优先或红-绿-重构工作流
  • 用户输入
    /tdd
    并附带功能描述或Bug报告
  • 用户希望从一开始就强制要求测试覆盖率的前提下添加新功能
  • 用户希望先编写复现测试用例再修复Bug

Invocation Modes

调用模式

InvocationBehavior
/tdd <feature>
Interactive mode — pause for approval at slices and each RED checkpoint
/tdd --auto <feature>
Autonomous mode — run all slices without pausing; stop ONLY on unrecoverable errors
/tdd --resume
Resume from
.tdd-state.json
in project root
/tdd --dry-run <feature>
Validation mode — runs Phase 0 + Phase 1 fully, renders all prompts, but skips
Task()
calls. No code is written.
In
--auto
mode, skip all
[HUMAN CHECKPOINT]
steps. Print status lines instead:
[auto] RED  slice 1/4: "validates email format" — test failing as expected
[auto] GREEN slice 1/4: passing (attempt 1)
[auto] REFACTOR slice 1/4: 1 suggestion applied, 0 skipped
Stop and ask the user ONLY when:
  • Implementation fails after 5 attempts
  • Regressions cannot be auto-fixed after 3 attempts
  • A script error makes it impossible to continue (missing binary, permission denied, etc.)
In
--dry-run
mode, validate the entire orchestration pipeline without executing any subagents or writing any code:
  1. Phase 0 runs fully: detect framework, verify baseline, extract API, discover docs, create state file
  2. Phase 1 runs fully: decompose into slices (still requires user approval)
  3. For each slice: render all three agent prompts (Test Writer, Implementer, Refactorer) with actual variables. Print rendered prompts to the user with character counts.
  4. No
    Task()
    calls are made
    . No test files are written. No implementation code is generated.
  5. Validate: check that all template variables resolve (no
    {UNRESOLVED}
    placeholders), all scripts execute without error, and the state file is well-formed.
  6. Report summary:
DRY RUN COMPLETE: {feature name}

Phase 0:
  Framework: {framework}
  Language: {language}
  Baseline: {pass|greenfield}
  API surface: {line count} lines
  Doc context: {line count} lines (or "none")

Phase 1:
  Slices: {N} ({layer breakdown})

Prompts rendered: {N * 3} (all variables resolved)
  Test Writer:   {char count} chars
  Implementer:   {char count} chars
  Refactorer:    {char count} chars

State file: .tdd-state.json written
No code was modified.
This mode is useful for:
  • Validating that scripts work in the project's environment
  • Reviewing prompt content before committing to a full TDD run
  • Testing skill changes without side effects
调用方式行为
/tdd <feature>
交互模式 — 在切片阶段和每个RED检查点暂停,等待用户批准
/tdd --auto <feature>
自主模式 — 无需暂停,运行所有切片;仅在遇到无法恢复的错误时停止
/tdd --resume
从项目根目录的
.tdd-state.json
文件恢复会话
/tdd --dry-run <feature>
验证模式 — 完整运行Phase 0和Phase 1,渲染所有提示,但跳过
Task()
调用。不会写入任何代码。
--auto
模式下,跳过所有
[HUMAN CHECKPOINT]
步骤,改为打印状态行:
[auto] RED  slice 1/4: "validates email format" — test failing as expected
[auto] GREEN slice 1/4: passing (attempt 1)
[auto] REFACTOR slice 1/4: 1 suggestion applied, 0 skipped
仅在以下情况时停止并询问用户:
  • 经过5次尝试后实现仍失败
  • 回归问题无法自动修复(超过3次尝试)
  • 脚本错误导致无法继续(如缺失二进制文件、权限不足等)
--dry-run
模式下,验证整个编排流程,但不执行任何子Agent或写入任何代码:
  1. 完整运行Phase 0:检测框架、验证基线、提取API、发现文档、创建状态文件
  2. 完整运行Phase 1:分解为切片(仍需用户批准)
  3. 针对每个切片:使用实际变量渲染所有三个Agent的提示(Test Writer、Implementer、Refactorer)。将渲染后的提示连同字符数一起打印给用户。
  4. 不调用
    Task()
    :不写入测试文件,不生成实现代码。
  5. 验证:检查所有模板变量是否已解析(无
    {UNRESOLVED}
    占位符)、所有脚本执行无错误、状态文件格式正确。
  6. 生成报告摘要
DRY RUN COMPLETE: {feature name}

Phase 0:
  Framework: {framework}
  Language: {language}
  Baseline: {pass|greenfield}
  API surface: {line count} lines
  Doc context: {line count} lines (or "none")

Phase 1:
  Slices: {N} ({layer breakdown})

Prompts rendered: {N * 3} (all variables resolved)
  Test Writer:   {char count} chars
  Implementer:   {char count} chars
  Refactorer:    {char count} chars

State file: .tdd-state.json written
No code was modified.
该模式适用于:
  • 验证脚本在项目环境中是否正常工作
  • 在正式执行完整TDD流程前审查提示内容
  • 在无副作用的情况下测试Skill变更

Architecture Overview

架构概述

ORCHESTRATOR (you, reading this file)
├─ Phase 0: Setup — detect framework, extract API, create state file
├─ Phase 1: Decompose into vertical slices → user approves
├─ FOR EACH SLICE:
│   ├─ Phase 2 (RED):    Task(Test Writer)  ← spec + API only
│   ├─ Phase 3 (GREEN):  Task(Implementer)  ← failing test + error only
│   └─ Phase 4 (REFACTOR): Task(Refactorer) ← all code + green results
└─ Summary
ORCHESTRATOR (你正在阅读本文件)
├─ Phase 0: 初始化 — 检测框架、提取API、创建状态文件
├─ Phase 1: 分解为垂直切片 → 用户批准
├─ 针对每个切片:
│   ├─ Phase 2 (RED):    Task(Test Writer)  ← 仅传入规格和API
│   ├─ Phase 3 (GREEN):  Task(Implementer)  ← 仅传入失败测试和错误信息
│   └─ Phase 4 (REFACTOR): Task(Refactorer) ← 传入所有代码和测试通过结果
└─ 总结

Context Boundaries (the key constraint)

上下文边界(核心约束)

AgentSeesDoes NOT See
Test WriterSlice spec, public API signatures, framework conventions, layer constraintsImplementation code, other slices, implementation plans
ImplementerFailing test code, test failure output, file tree, existing source, layer constraintsOriginal spec, slice descriptions, future plans
RefactorerAll implementation + all tests + green results, layers touchedOriginal spec, decomposition rationale
Agent可见内容不可见内容
Test Writer切片规格、公共API签名、框架约定、层约束实现代码、其他切片、实现计划
Implementer失败测试代码、测试失败输出、文件树、现有源码、层约束原始规格、切片描述、未来计划
Refactorer所有实现代码 + 所有测试代码 + 测试通过结果、涉及的层级原始规格、分解依据

Workflow

工作流程

Phase 0: Setup (once per session)

Phase 0: 初始化(每个会话执行一次)

Step 1: Detect framework and test runner.
Check for: package.json (jest/vitest), pyproject.toml/pytest.ini (pytest),
go.mod (go test), Cargo.toml (cargo test), Gemfile (rspec), composer.json (phpunit)
If ambiguous, ask: "What command runs your tests? (e.g.,
npm test
,
pytest
)"
Step 2: Detect language from source files (for agent prompts):
TypeScript (.ts/.tsx), JavaScript (.js/.jsx), Python (.py), Go (.go), Rust (.rs), Ruby (.rb), PHP (.php)
Step 3: Verify green baseline.
bash
bash ~/.claude/skills/tdd/scripts/run_tests.sh {FRAMEWORK} "{TEST_COMMAND}"
Parse the JSON output.
  • If
    status
    is
    "pass"
    : proceed.
  • If
    status
    is
    "fail"
    : stop — "Existing tests are failing. TDD starts from a green baseline."
  • If
    status
    is
    "error"
    AND
    total
    is 0: greenfield project — no tests exist yet. This is fine. Proceed.
Step 4: Extract the public API surface.
bash
bash ~/.claude/skills/tdd/scripts/extract_api.sh {SOURCE_DIR}
Save the output — this is what the Test Writer will see. If empty (greenfield), that's expected.
Step 5: Discover project documentation.
bash
bash ~/.claude/skills/tdd/scripts/discover_docs.sh {PROJECT_ROOT} --lang {LANGUAGE}
This searches for:
  • Documentation files: README, ARCHITECTURE.md, docs/ folder, DESIGN.md, SPEC files, ADRs
  • API specifications: OpenAPI/Swagger, GraphQL schemas, .proto files
  • Source docstrings: JSDoc, Python docstrings, Go doc comments, Rust
    ///
    comments
Save the output as
{DOC_CONTEXT}
. This feeds into:
  • Phase 1 — so slice decomposition is informed by documented behavior and API contracts
  • Phase 2 — so the Test Writer writes tests aligned with documented intent, not just code signatures
If empty (no docs found), that's fine — proceed without doc context.
Step 6: Create the state file
.tdd-state.json
in the project root:
json
{
  "feature": "user's feature description",
  "framework": "jest|vitest|pytest|go|cargo|rspec|phpunit",
  "language": "typescript|javascript|python|go|rust|ruby|php",
  "test_command": "the full test command",
  "source_dir": "src/",
  "doc_context": "output from discover_docs.sh (or empty string)",
  "auto_mode": false,
  "dry_run": false,
  "slices": [],
  "current_slice": 0,
  "phase": "setup",
  "layer_map": {},
  "files_modified": [],
  "test_files_created": []
}
Each slice in the
slices
array includes a
layer
field:
"domain"
,
"domain-service"
,
"application"
, or
"infrastructure"
. See Phase 1 for how layers are assigned.
The
layer_map
maps directory prefixes to layers. Built during Phase 1 from project structure:
json
{
  "layer_map": {
    "src/domain/": "domain",
    "src/services/": "domain-service",
    "src/application/": "application",
    "src/infrastructure/": "infrastructure",
    "src/adapters/": "infrastructure",
    "src/controllers/": "infrastructure"
  }
}
If the project has no clear directory-layer mapping (flat structure), set
layer_map
to
{}
and skip path-based validation.
Step 5a (auto-detect layer_map): If
layer_map
is empty, scan the source directory for common DDD/layered architecture directory names and auto-populate:
Common directory → layer mappings (check if directories exist):
  */domain/       → "domain"
  */models/       → "domain"          (ORM models often serve as domain entities)
  */entities/     → "domain"
  */value_objects/ → "domain"
  */services/     → "application"     (unless clearly infrastructure)
  */application/  → "application"
  */use_cases/    → "application"
  */core/         → "application"
  */infrastructure/ → "infrastructure"
  */adapters/     → "infrastructure"
  */controllers/  → "infrastructure"
  */api/          → "infrastructure"
  */bot/          → "infrastructure"  (Telegram/Discord bot handlers)
  */handlers/     → "infrastructure"
  */repositories/ → "infrastructure"  (concrete repo implementations)
Only add entries for directories that actually exist in the source tree. If fewer than 2 directories match, leave
layer_map
empty (flat project). Present the auto-detected map to the user for confirmation:
Auto-detected layer map from directory structure:
  src/models/     → domain
  src/services/   → application
  src/core/       → application
  src/bot/        → infrastructure
  src/api/        → infrastructure

Does this mapping look correct? (adjust if needed)
Update state:
"phase": "setup"
. Write state file immediately.

步骤1:检测框架和测试运行器。
检查以下文件:package.json (jest/vitest), pyproject.toml/pytest.ini (pytest),
go.mod (go test), Cargo.toml (cargo test), Gemfile (rspec), composer.json (phpunit)
如果存在歧义,询问用户:"运行测试的命令是什么?(例如:
npm test
,
pytest
)"
步骤2:从源码文件检测语言(用于Agent提示):
TypeScript (.ts/.tsx), JavaScript (.js/.jsx), Python (.py), Go (.go), Rust (.rs), Ruby (.rb), PHP (.php)
步骤3:验证绿色基线。
bash
bash ~/.claude/skills/tdd/scripts/run_tests.sh {FRAMEWORK} "{TEST_COMMAND}"
解析JSON输出。
  • 如果
    status
    "pass"
    :继续执行。
  • 如果
    status
    "fail"
    :停止执行 — "现有测试失败。TDD需从绿色基线开始。"
  • 如果
    status
    "error"
    total
    为0:全新项目 — 尚未存在任何测试。此情况正常,可继续执行。
步骤4:提取公共API表面。
bash
bash ~/.claude/skills/tdd/scripts/extract_api.sh {SOURCE_DIR}
保存输出 — 这将是Test Writer可见的内容。如果为空(全新项目),属于预期情况。
步骤5:发现项目文档。
bash
bash ~/.claude/skills/tdd/scripts/discover_docs.sh {PROJECT_ROOT} --lang {LANGUAGE}
该脚本会搜索:
  • 文档文件:README、ARCHITECTURE.md、docs/文件夹、DESIGN.md、SPEC文件、ADRs
  • API规格:OpenAPI/Swagger、GraphQL schema、.proto文件
  • 源码注释:JSDoc、Python docstrings、Go doc注释、Rust
    ///
    注释
将输出保存为
{DOC_CONTEXT}
。该内容会用于:
  • Phase 1 — 使切片分解能够参考文档化的行为和API契约
  • Phase 2 — 使Test Writer编写的测试与文档化意图保持一致,而不仅仅是代码签名
如果为空(未找到文档),正常情况 — 无需文档上下文即可继续执行。
步骤6:在项目根目录创建状态文件
.tdd-state.json
json
{
  "feature": "用户的功能描述",
  "framework": "jest|vitest|pytest|go|cargo|rspec|phpunit",
  "language": "typescript|javascript|python|go|rust|ruby|php",
  "test_command": "完整的测试命令",
  "source_dir": "src/",
  "doc_context": "discover_docs.sh的输出(或空字符串)",
  "auto_mode": false,
  "dry_run": false,
  "slices": [],
  "current_slice": 0,
  "phase": "setup",
  "layer_map": {},
  "files_modified": [],
  "test_files_created": []
}
slices
数组中的每个切片包含
layer
字段:
"domain"
"domain-service"
"application"
"infrastructure"
。具体分配方式见Phase 1。
layer_map
将目录前缀映射到层级。在Phase 1期间根据项目结构构建:
json
{
  "layer_map": {
    "src/domain/": "domain",
    "src/services/": "domain-service",
    "src/application/": "application",
    "src/infrastructure/": "infrastructure",
    "src/adapters/": "infrastructure",
    "src/controllers/": "infrastructure"
  }
}
如果项目没有清晰的目录-层级映射(扁平结构),将
layer_map
设置为
{}
并跳过基于路径的验证。
步骤5a(自动检测layer_map):如果
layer_map
为空,扫描源码目录中常见的DDD/分层架构目录名称并自动填充:
常见目录 → 层级映射(检查目录是否存在):
  */domain/       → "domain"
  */models/       → "domain"          (ORM模型通常作为领域实体)
  */entities/     → "domain"
  */value_objects/ → "domain"
  */services/     → "application"     (除非明确属于基础设施)
  */application/  → "application"
  */use_cases/    → "application"
  */core/         → "application"
  */infrastructure/ → "infrastructure"
  */adapters/     → "infrastructure"
  */controllers/  → "infrastructure"
  */api/          → "infrastructure"
  */bot/          → "infrastructure"  (Telegram/Discord机器人处理器)
  */handlers/     → "infrastructure"
  */repositories/ → "infrastructure"  (具体仓库实现)
仅添加源码树中实际存在的目录条目。如果匹配的目录少于2个,保持
layer_map
为空(扁平项目)。将自动检测到的映射呈现给用户确认:
从目录结构自动检测到的层级映射:
  src/models/     → domain
  src/services/   → application
  src/core/       → application
  src/bot/        → infrastructure
  src/api/        → infrastructure

此映射是否正确?(如有需要可调整)
更新状态
"phase": "setup"
。立即写入状态文件。

Phase 1: Specification Decomposition

Phase 1: 需求规格分解

Take the user's feature request and decompose into ordered vertical slices. Each slice is one testable behavior.
Use doc context: When decomposing, cross-reference
{DOC_CONTEXT}
from Phase 0 Step 5. Documentation often describes intended behaviors, edge cases, and API contracts that should inform slice boundaries. If docs mention specific error cases, validation rules, or behavioral requirements, consider them as slice candidates.
将用户的功能请求分解为有序的垂直切片。每个切片对应一个可测试的行为。
使用文档上下文:分解时,参考Phase 0步骤5中的
{DOC_CONTEXT}
。文档通常会描述预期行为、边缘情况和API契约,这些都应作为切片边界的依据。如果文档提到特定错误案例、验证规则或行为要求,可将其作为切片候选。

Inside-Out Slice Ordering

由内向外的切片排序

After identifying all slices, sort them inside-out by architectural layer. This ensures each slice can build on real (not mocked) implementations from previous slices:
  1. Domain model slices first — pure logic, no dependencies, no mocks needed
  2. Domain service slices — cross-aggregate operations using real domain objects
  3. Application service / use case slices — orchestration using in-memory fakes for ports
  4. Infrastructure adapter slices last — repos, external APIs, framework adapters
Assign each slice a
layer
tag:
domain
,
domain-service
,
application
, or
infrastructure
. Use the heuristics from
references/layer_guide.md
to classify.
Why inside-out? Domain slices produce real objects that later slices use directly. This minimizes mocking and catches integration issues early. It also ensures business rules are implemented and tested before any infrastructure decisions are made.
For simple projects where all code lives in one layer, all slices get
layer: "application"
and the ordering doesn't change — the guidance degrades gracefully.
确定所有切片后,按架构层级由内向外排序。这确保每个切片可以基于前一切片的真实(而非模拟)实现进行构建:
  1. 领域模型切片优先 — 纯逻辑,无依赖,无需模拟
  2. 领域服务切片 — 使用真实领域对象的跨聚合操作
  3. 应用服务/用例切片 — 使用内存模拟端口的编排
  4. 基础设施适配器切片最后 — 仓库、外部API、框架适配器
为每个切片分配
layer
标签:
domain
domain-service
application
infrastructure
。使用
references/layer_guide.md
中的启发式规则进行分类。
为什么由内向外? 领域切片生成的真实对象可供后续切片直接使用。这最大限度减少了模拟,并能尽早发现集成问题。同时确保业务规则在做出任何基础设施决策之前就已实现并测试完毕。
对于所有代码都位于同一层级的简单项目,所有切片的
layer
均设为
"application"
,排序无需改变 — 该指导规则会自动适配。

Edge Cases in Slice Ordering

切片排序中的边缘情况

Infrastructure-only features (e.g., "add email provider retry logic", "switch from Postgres to MySQL"):
  • If a feature has NO domain or application behavior changes, all slices may be
    infrastructure
    . This is valid — skip the inner layers entirely.
  • Present as: "This is a pure infrastructure change. All slices are infrastructure-layer."
Missing port interface (domain-service needs a port that doesn't exist yet):
  • The first slice that needs the port should create the interface as part of its implementation. The Implementer is allowed to create files in inner layers (domain/domain-service can define their own ports).
  • Example: a
    domain-service
    slice for
    RegistrationService
    creates
    domain/ports/UserRepository
    interface as part of GREEN.
Cross-cutting slices (a slice touches multiple layers):
  • Tag with the INNERMOST layer it touches. The Implementer may create files in that layer and any inner layers.
  • Example: a use case that also introduces a new domain event is tagged
    application
    but creates a file in
    domain/events/
    .
Present to the user:
I've broken this into N vertical slices (ordered inside-out):

Domain:
1. [behavior] — [what the test verifies]

Domain Services:
2. [behavior] — [what the test verifies]

Application:
3. [behavior] — [what the test verifies]

Infrastructure:
4. [behavior] — [what the test verifies]

Each slice follows RED -> GREEN -> REFACTOR before moving to the next.
Does this decomposition look right?
If all slices fall in one layer, skip the layer headings and present as a flat list.
Wait for user approval (even in
--auto
mode — slice decomposition always needs sign-off).
Update state: Write slices array (each with
layer
field), set
"phase": "decomposed"
.

仅涉及基础设施的功能(例如:"添加邮件提供商重试逻辑"、"从Postgres切换到MySQL"):
  • 如果功能不涉及领域或应用行为变更,所有切片可能均为
    infrastructure
    。此情况有效 — 直接跳过内部层级。
  • 呈现给用户:"这是纯基础设施变更。所有切片均属于基础设施层级。"
缺失端口接口(领域服务需要的端口尚未存在):
  • 第一个需要该端口的切片应在其实现过程中创建接口。Implementer允许在内部层级创建文件(领域/领域服务可定义自己的端口)。
  • 示例:
    RegistrationService
    的领域服务切片在GREEN阶段创建
    domain/ports/UserRepository
    接口。
跨切片(一个切片涉及多个层级):
  • 标记为其涉及的最内层层级。Implementer可在该层级及任何内部层级创建文件。
  • 示例:同时引入新领域事件的用例标记为
    application
    ,但会在
    domain/events/
    创建文件。
呈现给用户:
我已将需求分解为N个垂直切片(按由内向外排序):

领域层:
1. [行为] — [测试验证内容]

领域服务层:
2. [行为] — [测试验证内容]

应用层:
3. [行为] — [测试验证内容]

基础设施层:
4. [行为] — [测试验证内容]

每个切片会依次执行RED -> GREEN -> REFACTOR,然后再进入下一切片。
此分解是否合理?
如果所有切片都属于同一层级,跳过层级标题,直接以扁平列表呈现。
等待用户批准(即使在
--auto
模式下 — 切片分解始终需要确认)。
更新状态:写入包含
layer
字段的slices数组,设置
"phase": "decomposed"

Dry-Run Phase Override (Phase 2–4)

试运行阶段覆盖(Phase 2–4)

In
--dry-run
mode, replace Phases 2–4 entirely with the following for each slice:
  1. Refresh API surface (
    extract_api.sh
    )
  2. Render the Test Writer prompt with all variables filled in. Print it under a
    ### Test Writer Prompt (slice N)
    heading.
  3. Render the Implementer prompt using placeholder test code:
    "(dry-run: test code would be generated by Test Writer)"
    for
    {FAILING_TEST_CODE}
    and
    "(dry-run: no test output)"
    for
    {TEST_FAILURE_OUTPUT}
    .
  4. Render the Refactorer prompt using placeholder values:
    "(dry-run: no green output)"
    for
    {GREEN_TEST_OUTPUT}
    ,
    "(dry-run: code from Test Writer)"
    for
    {ALL_TEST_CODE}
    ,
    "(dry-run: code from Implementer)"
    for
    {ALL_IMPLEMENTATION_CODE}
    .
  5. For each rendered prompt, verify no
    {UNRESOLVED_VARIABLE}
    patterns remain (regex:
    \{[A-Z][A-Z_]+\}
    ). Report any unresolved variables as errors.
  6. Print character counts for each prompt.
  7. Move to next slice (no
    Task()
    calls, no file writes, no test runs).
After all slices are processed, print the dry-run summary and exit. Do NOT clean up the state file — it's useful for subsequent
--resume
.

--dry-run
模式下,完全替换Phase 2–4,针对每个切片执行以下操作:
  1. 刷新API表面(
    extract_api.sh
  2. 渲染Test Writer提示,填充所有变量。在
    ### Test Writer Prompt (slice N)
    标题下打印。
  3. 使用占位符测试代码渲染Implementer提示
    "(dry-run: test code would be generated by Test Writer)"
    作为
    {FAILING_TEST_CODE}
    "(dry-run: no test output)"
    作为
    {TEST_FAILURE_OUTPUT}
  4. 使用占位符值渲染Refactorer提示
    "(dry-run: no green output)"
    作为
    {GREEN_TEST_OUTPUT}
    "(dry-run: code from Test Writer)"
    作为
    {ALL_TEST_CODE}
    "(dry-run: code from Implementer)"
    作为
    {ALL_IMPLEMENTATION_CODE}
  5. 对于每个渲染后的提示,验证是否存在
    {UNRESOLVED_VARIABLE}
    模式(正则:
    \{[A-Z][A-Z_]+\}
    )。如果存在未解析变量,报告为错误。
  6. 打印每个提示的字符数。
  7. 进入下一切片(不调用
    Task()
    ,不写入文件,不运行测试)。
处理完所有切片后,打印试运行摘要并退出。不要清理状态文件 — 它对后续的
--resume
有用。

Phase 2: RED — Write One Failing Test

Phase 2: RED — 编写一个失败的测试

Step 1: Refresh the API surface (it changes as slices are implemented):
bash
bash ~/.claude/skills/tdd/scripts/extract_api.sh {SOURCE_DIR}
Step 2: Read the prompt template from
references/agent_prompts.md
-> "Test Writer Agent" section. Construct the prompt by filling in:
  • {SLICE_SPEC}
    : The current slice's behavior description
  • {LANGUAGE}
    : Detected language from Phase 0
  • {FRAMEWORK}
    : Detected framework name
  • {API_SURFACE}
    : Output from extract_api.sh
  • {DOC_CONTEXT}
    : Output from discover_docs.sh (Phase 0 Step 5). Include only sections relevant to the current slice — filter by keyword match if the full output is large.
  • {TEST_FILE_PATH}
    : Where the test should go (follow project conventions)
  • {EXISTING_TEST_CONTENT}
    : Current content of the test file (if it exists), or "No test file exists yet."
  • {FRAMEWORK_SKELETON}
    : The relevant skeleton from
    references/framework_configs.md
  • {LAYER}
    : The slice's layer tag from Phase 1
  • {LAYER_TEST_CONSTRAINTS}
    : Layer-specific test constraints (see agent_prompts.md -> Layer-Specific Constraint Lookup)
Step 3: Launch the Test Writer agent:
Task(subagent_type="general-purpose", prompt=<constructed prompt>)
Step 4: Parse the JSON response using the
parse_agent_json
logic from
agent_prompts.md
:
  1. Strip markdown fences if present
  2. Try direct JSON parse
  3. If that fails, find first
    {
    and last
    }
    , try that substring
  4. If still invalid: retry the Task call once with appended "Return ONLY a JSON object."
  5. If still failing: extract test code manually from the raw response
Step 5: Write the test code to the test file. If the file exists, append the test function (and merge imports). If new, create with the agent's
imports_needed
+
test_code
.
Step 5a (post-write test smell scan): Scan the test code for common smells before running:
SmellDetectionAction
Assertion RouletteMultiple bare
assert
statements without messages in the same test function (3+)
Warn the user (don't block): "Test has N bare assertions — consider adding failure messages for easier debugging."
Unknown TestTest name is generic: matches
test_1
,
test_it
,
test_works
,
test_example
,
test_thing
Re-launch Test Writer with appended: "Use a descriptive test name that reads as a behavior spec (e.g., test_rejects_empty_email)."
Tautological assertion
assert True
,
assert result is not None
when function has no None return path,
assert isinstance(result, X)
as sole assertion
Re-launch Test Writer with appended: "The assertion is tautological — test the actual behavior/value, not just that the function returns something."
Step 5b (post-write layer lint): Scan the test code for layer-violating patterns:
LayerForbidden patterns in test code
domain
jest.mock(
,
vi.mock(
,
Mock(
,
mock.patch
,
unittest.mock
,
gomock
,
mockery
— domain tests must not use mocking libraries
domain-service
Same mocking patterns for domain objects (mocking ports/repos is OK)
application
No forbidden patterns (mocking ports is expected)
infrastructure
No forbidden patterns
If forbidden patterns found:
  1. Remove the offending mock/pattern from the test
  2. Re-launch Test Writer with appended: "Do NOT use mocking libraries. This is a {LAYER} layer test. Use real domain objects."
  3. If second attempt still uses forbidden patterns, ask user
Step 6: Run the test to confirm it FAILS (expect an assertion failure, not a setup error):
bash
bash ~/.claude/skills/tdd/scripts/run_tests.sh {FRAMEWORK} "{TEST_COMMAND_FOR_SPECIFIC_TEST}"
Step 7: Evaluate the result with semantic validation:
ResultAction
status: "fail"
, assertion error
Proper RED — test fails because the expected behavior doesn't exist yet. Proceed.
status: "fail"
,
ImportError
/
ModuleNotFoundError
Setup problem, not a proper RED. The test can't even import the module under test. Fix: create a minimal stub (empty class/function) so the import resolves, then re-run. The test should now fail on the assertion instead.
status: "fail"
,
AttributeError
on missing method
Similar to import error — the class exists but the method doesn't. This is an acceptable RED if the assertion would also fail. Proceed.
status: "pass"
Behavior already exists. Log: "Test passes — skipping slice (already implemented)." Increment
current_slice
, move to next slice.
status: "error"
,
SyntaxError
Fix: the test has a typo. Read the
raw_tail
, fix the test file directly. Re-run. If still erroring after 2 fix attempts, ask user.
status: "error"
, compile/framework error
Fix: bad import, missing fixture, or framework misconfiguration. Read the
raw_tail
, fix the test file directly. Re-run. If still erroring after 2 fix attempts, ask user.
Step 8 (interactive mode only — skip in
--auto
): Present to the user:
RED: Test written and failing as expected.

Test: {test_name}
File: {test_file_path}
Failure: {failure message from JSON}

This test verifies: {test_description from agent response}

Proceed to GREEN phase? (or adjust the test?)
Wait for user approval before proceeding to GREEN.
Update state:
"phase": "red"
, add test file to
test_files_created
. Write state immediately.

步骤1:刷新API表面(随着切片实现,API会发生变化):
bash
bash ~/.claude/skills/tdd/scripts/extract_api.sh {SOURCE_DIR}
步骤2:从
references/agent_prompts.md
的 "Test Writer Agent" 部分读取提示模板。填充以下内容构建提示:
  • {SLICE_SPEC}
    :当前切片的行为描述
  • {LANGUAGE}
    :Phase 0中检测到的语言
  • {FRAMEWORK}
    :检测到的框架名称
  • {API_SURFACE}
    :extract_api.sh的输出
  • {DOC_CONTEXT}
    :discover_docs.sh的输出(Phase 0步骤5)。仅包含与当前切片相关的部分 — 如果完整输出过大,按关键字匹配筛选。
  • {TEST_FILE_PATH}
    :测试文件的存放路径(遵循项目约定)
  • {EXISTING_TEST_CONTENT}
    :测试文件的当前内容(如果存在),或 "No test file exists yet."
  • {FRAMEWORK_SKELETON}
    references/framework_configs.md
    中的相关骨架代码
  • {LAYER}
    :Phase 1中切片的层级标签
  • {LAYER_TEST_CONSTRAINTS}
    :层级特定的测试约束(见agent_prompts.md -> Layer-Specific Constraint Lookup)
步骤3:启动Test Writer agent:
Task(subagent_type="general-purpose", prompt=<constructed prompt>)
步骤4:使用
agent_prompts.md
中的
parse_agent_json
逻辑解析JSON响应:
  1. 如果存在markdown围栏,移除
  2. 尝试直接解析JSON
  3. 如果失败,找到第一个
    {
    和最后一个
    }
    ,尝试解析该子串
  4. 如果仍然无效:追加 "Return ONLY a JSON object." 重试一次Task调用
  5. 如果仍然失败:从原始响应中手动提取测试代码
步骤5:将测试代码写入测试文件。如果文件已存在,追加测试函数(并合并导入)。如果是新文件,使用agent返回的
imports_needed
+
test_code
创建。
步骤5a(写入后测试异味扫描):在运行测试前,扫描测试代码中常见的异味:
异味检测方式操作
断言轮盘同一测试函数中存在多个无消息的裸
assert
语句(≥3个)
警告用户(不阻止执行):"测试包含N个裸断言 — 建议添加失败消息以方便调试。"
通用测试名称测试名称过于通用:匹配
test_1
test_it
test_works
test_example
test_thing
重新启动Test Writer,追加提示:"使用描述性的测试名称,使其能作为行为规格(例如:test_rejects_empty_email)。"
同义反复断言
assert True
、当函数无None返回路径时的
assert result is not None
、仅使用
assert isinstance(result, X)
作为断言
重新启动Test Writer,追加提示:"该断言是同义反复 — 测试实际行为/值,而不仅仅是函数有返回值。"
步骤5b(写入后层级检查):扫描测试代码中违反层级的模式:
层级测试代码中禁止的模式
domain
jest.mock(
vi.mock(
Mock(
mock.patch
unittest.mock
gomock
mockery
— 领域测试不得使用模拟库
domain-service
针对领域对象的相同模拟模式(模拟端口/仓库是允许的)
application
无禁止模式(模拟端口是预期行为)
infrastructure
无禁止模式
如果发现禁止模式:
  1. 从测试中移除违规的模拟/模式
  2. 重新启动Test Writer,追加提示:"请勿使用模拟库。这是 {LAYER} 层级的测试。请使用真实领域对象。"
  3. 如果第二次尝试仍使用禁止模式,询问用户
步骤6:运行测试以确认其失败(预期为断言失败,而非初始化错误):
bash
bash ~/.claude/skills/tdd/scripts/run_tests.sh {FRAMEWORK} "{TEST_COMMAND_FOR_SPECIFIC_TEST}"
步骤7:通过语义验证评估结果:
结果操作
status: "fail"
,断言错误
标准RED状态 — 测试失败是因为预期行为尚未实现。继续执行。
status: "fail"
ImportError
/
ModuleNotFoundError
初始化问题,非标准RED状态。测试甚至无法导入被测模块。修复方法:创建最小存根(空类/函数)使导入可解析,然后重新运行。此时测试应因断言失败而失败。
status: "fail"
,缺失方法导致的
AttributeError
与导入错误类似 — 类存在但方法缺失。如果断言也会失败,此情况属于可接受的RED状态。继续执行。
status: "pass"
行为已存在。记录:"测试通过 — 跳过切片(已实现)。" 递增
current_slice
,进入下一切片。
status: "error"
SyntaxError
修复:测试存在拼写错误。读取
raw_tail
,直接修复测试文件。重新运行。如果2次修复尝试后仍报错,询问用户。
status: "error"
,编译/框架错误
修复:导入错误、缺失fixture或框架配置错误。读取
raw_tail
,直接修复测试文件。重新运行。如果2次修复尝试后仍报错,询问用户。
步骤8(仅交互模式 —
--auto
模式下跳过):呈现给用户:
RED: 测试已编写并按预期失败。

测试: {test_name}
文件: {test_file_path}
失败信息: {failure message from JSON}

该测试验证: {test_description from agent response}

是否进入GREEN阶段?(或调整测试?)
进入GREEN阶段前等待用户批准
更新状态
"phase": "red"
,将测试文件添加到
test_files_created
。立即写入状态文件。

Phase 3: GREEN — Minimal Implementation

Phase 3: GREEN — 最小化实现

Step 1: Read the failing test file and the test failure output (the full
raw_tail
from the RED phase run_tests.sh result).
Step 2: Build the file tree of source files (not test files, not node_modules, etc.):
bash
find {SOURCE_DIR} -type f \( -name '*.ts' -o -name '*.js' -o -name '*.py' -o -name '*.go' -o -name '*.rs' -o -name '*.rb' -o -name '*.php' \) | grep -v test | grep -v spec | grep -v node_modules | grep -v __pycache__ | grep -v vendor | grep -v target | grep -v dist | grep -v build | head -50
Step 3: Read existing source files that the test imports or references.
Step 4: Read the prompt template from
references/agent_prompts.md
-> "Implementer Agent" section. Fill in:
  • {LANGUAGE}
    : Detected language
  • {FAILING_TEST_CODE}
    : The complete test file content
  • {TEST_FAILURE_OUTPUT}
    : The
    raw_tail
    from run_tests.sh JSON output
  • {FILE_TREE}
    : Source file listing from Step 2
  • {EXISTING_SOURCE}
    : Content of relevant source files (if any — may be empty for greenfield)
  • {LAYER}
    : The slice's layer tag from Phase 1
  • {LAYER_DEPENDENCY_CONSTRAINT}
    : Layer-specific dependency constraint (see agent_prompts.md -> Layer-Specific Constraint Lookup)
On retries (attempt > 1), also fill in the
{?PREVIOUS_ATTEMPT}
section:
  • {PREVIOUS_ATTEMPT_DESCRIPTION}
    : the
    explanation
    field from the failed attempt
  • {PREVIOUS_ATTEMPT_ERROR}
    : the
    raw_tail
    from the test run after the failed attempt
CRITICAL: Do NOT include the slice specification, feature description, or any future plans. The Implementer works from the test alone.
Step 5: Launch the Implementer agent:
Task(subagent_type="general-purpose", prompt=<constructed prompt>)
Step 6: Parse the JSON response. Validate layer boundaries, then apply file changes.
Step 6a (Layer path validation): If
layer_map
is not empty, check each file path in the response against the current slice's layer:
For each file in response.files:
  inferred_layer = lookup file.path against layer_map (longest prefix match)
  if inferred_layer exists AND inferred_layer != current_slice.layer:
    if inferred_layer is OUTER relative to current_slice.layer:
      REJECT: "Implementer created/modified {file.path} which belongs to
      the {inferred_layer} layer, but this is a {current_slice.layer} slice.
      Inner layers must not depend on outer layers."
      → Re-launch Implementer with appended constraint:
        "Do NOT create or modify files in {inferred_layer} directories.
        This slice is {current_slice.layer} only."
    if inferred_layer is INNER relative to current_slice.layer:
      ALLOW: outer layers may touch inner-layer files (e.g., adding a port interface)
Layer ordering for "outer" check: domain < domain-service < application < infrastructure.
If
layer_map
is empty (flat project), skip this validation.
Step 6b: Apply validated file changes:
For each file in the response
files
array:
  • If
    action
    is
    "create"
    or
    "overwrite"
    : Use the Write tool to create or overwrite the file with the complete content
  • If
    action
    is
    "edit"
    (used for existing files over 200 lines): Use the Edit tool with
    old_string
    new_string
    to apply the changes. The Implementer returns only the changed functions with surrounding context — identify the insertion point or the function being replaced, and use Edit tool accordingly. If the edit target is ambiguous, fall back to reading the full file and using Write.
  • For existing files over 200 lines where the Implementer returned full content anyway (action = "overwrite"), prefer using Edit tool to apply only the diff — this prevents accidental reformatting of untouched code
Step 7: Run the specific test:
bash
bash ~/.claude/skills/tdd/scripts/run_tests.sh {FRAMEWORK} "{TEST_COMMAND_FOR_SPECIFIC_TEST}"
Step 8: RETRY LOOP (if test still fails):
attempt = 1
max_attempts = 5
previous_explanation = null
previous_error = null

while status != "pass" AND attempt <= max_attempts:
    previous_explanation = explanation from last Implementer response
    previous_error = raw_tail from last test run

    Launch FRESH Task(Implementer) with:
      - same test code + file tree + existing source (re-read!)
      - NEW failure output
      - PREVIOUS_ATTEMPT section filled in

    Apply changes (Write tool for each file)
    Re-run test
    attempt += 1

if still failing after max_attempts:
    STOP. Present to user:
    "Implementation failed after 5 attempts. Last error: {raw_tail}"
    Ask: "Adjust the test, try a different approach, or debug manually?"
Each retry is a fresh Task call with only the previous attempt's explanation and error. This prevents the Implementer from going down rabbit holes while giving it enough context to try a different strategy.
Step 9: Once the specific test passes, run the FULL test suite:
bash
bash ~/.claude/skills/tdd/scripts/run_tests.sh {FRAMEWORK} "{FULL_TEST_COMMAND}" --all
Step 10: Handle regressions:
ResultAction
All passProceed to REFACTOR
Regressions foundAuto-fix: launch a fresh Implementer with the regression test failures. Apply. Re-run full suite. Repeat up to 3 times. If still failing after 3 regression-fix attempts, STOP and present to user.
Step 11 (interactive mode only — skip in
--auto
): Present to the user:
GREEN: Test passing with minimal implementation.

Implementation: {explanation from agent response}
Files changed: {list}
All tests: {passed} passing, {failed} failing

Proceed to REFACTOR phase? (or adjust?)
Update state:
"phase": "green"
, update
files_modified
. Write state immediately.
Step 12 (domain/domain-service slices only): Layer purity check before REFACTOR:
For each new/modified file in a
domain
or
domain-service
layer slice:
  • Import scan: Read all import/require statements. Check each imported module against
    layer_map
    . Flag any import from an outer layer as a violation.
  • Constructor check: Verify constructor takes NO parameters typed from outer layers (no ORM sessions, HTTP clients, framework configs)
  • Static call check: No static method calls to outer-layer code
  • If violations found, fix them now (move the dependency to a port interface) before entering REFACTOR
Step 13: Full-repo import scan (all layers, runs once per slice):
Scan ALL source files (not just session-modified) for dependency direction violations:
bash
undefined
步骤1:读取失败的测试文件和测试失败输出(RED阶段run_tests.sh结果中的完整
raw_tail
)。
步骤2:构建源码文件的文件树(不包含测试文件、node_modules等):
bash
find {SOURCE_DIR} -type f \( -name '*.ts' -o -name '*.js' -o -name '*.py' -o -name '*.go' -o -name '*.rs' -o -name '*.rb' -o -name '*.php' \) | grep -v test | grep -v spec | grep -v node_modules | grep -v __pycache__ | grep -v vendor | grep -v target | grep -v dist | grep -v build | head -50
步骤3:读取测试导入或引用的现有源码文件。
步骤4:从
references/agent_prompts.md
的 "Implementer Agent" 部分读取提示模板。填充以下内容:
  • {LANGUAGE}
    :检测到的语言
  • {FAILING_TEST_CODE}
    :完整的测试文件内容
  • {TEST_FAILURE_OUTPUT}
    :run_tests.sh JSON输出中的
    raw_tail
  • {FILE_TREE}
    :步骤2中的源码文件列表
  • {EXISTING_SOURCE}
    :相关源码文件的内容(如有 — 全新项目可能为空)
  • {LAYER}
    :Phase 1中切片的层级标签
  • {LAYER_DEPENDENCY_CONSTRAINT}
    :层级特定的依赖约束(见agent_prompts.md -> Layer-Specific Constraint Lookup)
在重试(attempt > 1)时,还需填充
{?PREVIOUS_ATTEMPT}
部分:
  • {PREVIOUS_ATTEMPT_DESCRIPTION}
    :失败尝试的
    explanation
    字段
  • {PREVIOUS_ATTEMPT_ERROR}
    :失败尝试后测试运行的
    raw_tail
关键注意事项:请勿包含切片规格、功能描述或任何未来计划。Implementer仅基于测试进行工作。
步骤5:启动Implementer agent:
Task(subagent_type="general-purpose", prompt=<constructed prompt>)
步骤6:解析JSON响应。验证层级边界,然后应用文件变更。
步骤6a(层级路径验证):如果
layer_map
不为空,检查响应中的每个文件路径是否符合当前切片的层级:
对于响应.files中的每个文件:
  inferred_layer = 根据layer_map查找file.path(最长前缀匹配)
  如果inferred_layer存在且inferred_layer != current_slice.layer:
    如果inferred_layer相对于current_slice.layer是外层:
      拒绝: "Implementer创建/修改了{file.path},该文件属于{inferred_layer}层级,但当前是{current_slice.layer}切片。内部层级不得依赖外部层级。"
      → 重新启动Implementer,追加约束:
        "请勿创建或修改{inferred_layer}目录中的文件。当前切片仅属于{current_slice.layer}层级。"
    如果inferred_layer相对于current_slice.layer是内层:
      允许: 外层可以修改内层文件(例如:添加端口接口)
用于判断“外层”的层级顺序:domain < domain-service < application < infrastructure。
如果
layer_map
为空(扁平项目),跳过此验证。
步骤6b:应用已验证的文件变更:
对于响应
files
数组中的每个文件:
  • 如果
    action
    "create"
    "overwrite"
    :使用Write工具创建或覆盖文件,写入完整内容
  • 如果
    action
    "edit"
    (用于超过200行的现有文件):使用Edit工具,将
    old_string
    替换为
    new_string
    以应用变更。Implementer仅返回包含上下文的已更改函数 — 确定插入点或要替换的函数,然后使用Edit工具。如果编辑目标不明确,回退为读取完整文件并使用Write工具。
  • 对于超过200行且Implementer仍返回完整内容的现有文件(action = "overwrite"),优先使用Edit工具仅应用差异 — 这可防止意外格式化未修改的代码
步骤7:运行特定测试:
bash
bash ~/.claude/skills/tdd/scripts/run_tests.sh {FRAMEWORK} "{TEST_COMMAND_FOR_SPECIFIC_TEST}"
步骤8:重试循环(如果测试仍失败):
attempt = 1
max_attempts = 5
previous_explanation = null
previous_error = null

while status != "pass" AND attempt <= max_attempts:
    previous_explanation = 上次Implementer响应中的explanation
    previous_error = 上次测试运行的raw_tail

    启动新的Task(Implementer),传入:
      - 相同的测试代码 + 文件树 + 现有源码(重新读取!)
      - 新的失败输出
      - 已填充的PREVIOUS_ATTEMPT部分

    应用变更(对每个文件使用Write工具)
    重新运行测试
    attempt += 1

如果max_attempts次尝试后仍失败:
    停止执行。呈现给用户:
    "经过5次尝试后实现仍失败。最后错误: {raw_tail}"
    询问: "调整测试、尝试其他方法,还是手动调试?"
每次重试都是全新的Task调用,仅传入上次尝试的解释和错误。这可防止Implementer陷入死胡同,同时为其提供足够上下文以尝试不同策略。
步骤9:特定测试通过后,运行完整测试套件
bash
bash ~/.claude/skills/tdd/scripts/run_tests.sh {FRAMEWORK} "{FULL_TEST_COMMAND}" --all
步骤10:处理回归问题:
结果操作
全部通过进入REFACTOR阶段
发现回归问题自动修复:启动新的Implementer,传入回归测试失败信息。应用变更。重新运行完整套件。重复最多3次。如果3次修复尝试后仍失败,停止执行并呈现给用户。
步骤11(仅交互模式 —
--auto
模式下跳过):呈现给用户:
GREEN: 测试通过,实现代码已最小化。

实现说明: {explanation from agent response}
已更改文件: {list}
所有测试: {passed} 通过, {failed} 失败

是否进入REFACTOR阶段?(或调整?)
更新状态
"phase": "green"
,更新
files_modified
。立即写入状态文件。
步骤12(仅domain/domain-service切片):进入REFACTOR前的层级纯度检查:
对于
domain
domain-service
层级切片中的每个新建/修改文件:
  • 导入扫描:读取所有import/require语句。检查每个导入的模块是否符合
    layer_map
    。标记任何从外层导入的模块为违规。
  • 构造函数检查:验证构造函数不接受任何外层类型的参数(无ORM会话、HTTP客户端、框架配置)
  • 静态调用检查:无对外层代码的静态方法调用
  • 如果发现违规,在进入REFACTOR前立即修复(将依赖项替换为端口接口)
步骤13:全仓库导入扫描(所有层级,每个切片执行一次):
扫描所有源码文件(不仅是会话中修改的文件),检查依赖方向违规:
bash
undefined

For each source file, extract imports and check against layer_map

针对每个源码文件,提取导入并对照layer_map检查

Language-specific patterns:

语言特定模式:

Python: from X import Y, import X

Python: from X import Y, import X

TypeScript/JS: import ... from 'X', require('X')

TypeScript/JS: import ... from 'X', require('X')

Go: import "X"

Go: import "X"


For each file:
1. Determine its layer from `layer_map` (skip if no match)
2. For each import, determine the imported module's layer from `layer_map`
3. If imported layer is OUTER relative to file's layer → violation

Report violations to the user before REFACTOR:
Layer scan found N dependency direction violation(s):
  • domain/user.py imports infrastructure/db.py (domain → infrastructure)
  • domain/services/registration.py imports adapters/email.py (domain-service → infrastructure)

In `--auto` mode: attempt auto-fix (replace concrete import with port interface). In interactive mode: present violations and ask user how to proceed.

This supplements the Refactorer's import checking (which only sees session files) with a repo-wide scan. Static tools miss ~23% of violations (Pruijt et al., 2017) — combining textual + structural checks improves coverage.

---

对于每个文件:
1. 根据 `layer_map` 确定其层级(无匹配则跳过)
2. 对于每个导入,根据 `layer_map` 确定导入模块的层级
3. 如果导入层级相对于文件层级是外层 → 违规

在进入REFACTOR前向用户报告违规:
层级扫描发现N个依赖方向违规:
  • domain/user.py imports infrastructure/db.py (domain → infrastructure)
  • domain/services/registration.py imports adapters/email.py (domain-service → infrastructure)

在 `--auto` 模式下:尝试自动修复(将具体导入替换为端口接口)。在交互模式下:呈现违规情况并询问用户如何处理。

这补充了Refactorer的导入检查(仅能看到会话文件),实现了全仓库扫描。静态工具会遗漏约23%的违规(Pruijt et al., 2017)— 结合文本和结构检查可提高覆盖率。

---

Phase 4: REFACTOR

Phase 4: REFACTOR

Step 1: Gather all context:
  • All test files created/modified during this session
  • All source files modified during this session
  • The green test output
Step 2: Read the prompt template from
references/agent_prompts.md
-> "Refactorer Agent" section. Fill in:
  • {LANGUAGE}
    : Detected language
  • {GREEN_TEST_OUTPUT}
    : Full test output showing all green
  • {ALL_TEST_CODE}
    : Content of all test files
  • {ALL_IMPLEMENTATION_CODE}
    : Content of all modified source files
  • {SLICE_LAYERS}
    : Comma-separated list of unique layers from all slices completed so far
Step 3: Launch the Refactorer agent:
Task(subagent_type="general-purpose", prompt=<constructed prompt>)
Step 4: Parse the JSON response. If
suggestions
is empty, skip to Step 6.
Apply suggestions one at a time, in priority order (high first):
For each suggestion:
  1. Apply the code change (Edit tool, using
    old_code
    ->
    new_code
    for each file)
  2. Run the project linter/formatter check (detect from project config):
    • Python:
      python -m black --check {files} && python -m flake8 {files} && python -m mypy {files}
    • TypeScript/JS:
      npx eslint {files}
      or
      npx tsc --noEmit
    • Go:
      go vet ./...
    • Rust:
      cargo clippy
    • If lint fails -> revert immediately and skip this suggestion (same as test failure)
  3. Run the full test suite
  4. If any test fails -> revert immediately (re-read the file from before the edit and Write it back) and skip this suggestion
  5. If all tests pass and lint passes -> keep the change
Step 5 (interactive mode only — skip in
--auto
): Present:
REFACTOR: Code improved, all tests still passing.

Applied: {list of accepted suggestions}
Skipped: {list of reverted suggestions, if any}
All tests: {count} passing

[Moving to slice N of M] or [All slices complete]
In
--auto
mode, print one-liner:
[auto] REFACTOR slice N/M: {applied_count} applied, {skipped_count} skipped
Update state:
"phase": "refactor"
. Write state immediately.

步骤1:收集所有上下文:
  • 本次会话中创建/修改的所有测试文件
  • 本次会话中修改的所有源码文件
  • 测试通过的输出
步骤2:从
references/agent_prompts.md
的 "Refactorer Agent" 部分读取提示模板。填充以下内容:
  • {LANGUAGE}
    :检测到的语言
  • {GREEN_TEST_OUTPUT}
    :显示所有测试通过的完整输出
  • {ALL_TEST_CODE}
    :所有测试文件的内容
  • {ALL_IMPLEMENTATION_CODE}
    :所有修改后的源码文件内容
  • {SLICE_LAYERS}
    :到目前为止完成的所有切片的唯一层级列表(逗号分隔)
步骤3:启动Refactorer agent:
Task(subagent_type="general-purpose", prompt=<constructed prompt>)
步骤4:解析JSON响应。如果
suggestions
为空,跳至步骤6。
按优先级顺序(高优先级优先)逐个应用建议:
对于每个建议:
  1. 应用代码变更(使用Edit工具,将每个文件的
    old_code
    替换为
    new_code
  2. 运行项目的检查器/格式化器检查(从项目配置中检测):
    • Python:
      python -m black --check {files} && python -m flake8 {files} && python -m mypy {files}
    • TypeScript/JS:
      npx eslint {files}
      npx tsc --noEmit
    • Go:
      go vet ./...
    • Rust:
      cargo clippy
    • 如果检查失败 → 立即回退并跳过该建议(与测试失败处理方式相同)
  3. 运行完整测试套件
  4. 如果任何测试失败 → 立即回退(重新读取编辑前的文件并写入)并跳过该建议
  5. 如果所有测试通过且检查通过 → 保留变更
步骤5(仅交互模式 —
--auto
模式下跳过):呈现:
REFACTOR: 代码已优化,所有测试仍通过。

已应用: {已接受建议列表}
已跳过: {已回退建议列表(如有)}
所有测试: {count} 通过

[进入第N/M个切片] 或 [所有切片已完成]
--auto
模式下,打印单行信息:
[auto] REFACTOR slice N/M: {applied_count} applied, {skipped_count} skipped
更新状态
"phase": "refactor"
。立即写入状态文件。

Phase 5: Next Slice or Complete

Phase 5: 进入下一切片或完成

If more slices remain -> increment
current_slice
in state, return to Phase 2.
If all slices complete -> present summary:
TDD Complete: {feature name}

Slices implemented: N
Tests written: N
Files created/modified: {list}
All tests passing: yes
Clean up: remove
.tdd-state.json
(in
--auto
mode, remove silently; in interactive, ask user).

如果还有剩余切片 → 在状态中递增
current_slice
,返回Phase 2。
如果所有切片完成 → 呈现总结:
TDD已完成: {feature name}

已实现切片: N
已编写测试: N
已创建/修改文件: {list}
所有测试是否通过: 是
清理工作:删除
.tdd-state.json
--auto
模式下静默删除;交互模式下询问用户)。

Resume Support

恢复会话支持

When user invokes
/tdd --resume
:
  1. Read
    .tdd-state.json
    from project root
  2. Report current state: "Found TDD session for '{feature}'. Currently at slice {N}/{total}, phase: {phase}."
  3. Resume from the current phase of the current slice
  4. If
    auto_mode
    is true in state, continue in auto mode

当用户调用
/tdd --resume
时:
  1. 从项目根目录读取
    .tdd-state.json
  2. 报告当前状态:"找到针对'{feature}'的TDD会话。当前处于第N/{total}个切片,阶段: {phase}。"
  3. 从当前切片的当前阶段恢复执行
  4. 如果状态中的
    auto_mode
    为true,继续使用自动模式

Edge Cases

边缘情况

Greenfield Projects

全新项目

No source files, no tests, no test configuration. Handle gracefully:
  1. Phase 0 Step 3: If run_tests.sh returns
    status: "error"
    with
    total: 0
    , check if any test files exist. If none, this is greenfield — proceed.
  2. Phase 0 Step 4: extract_api.sh will return empty output. Pass
    "(No existing API — this is a new project)"
    to the Test Writer.
  3. Phase 2: The Test Writer will create test files from scratch. May need to set up the test framework config (e.g.,
    jest.config.js
    ,
    pytest.ini
    ). If the first test run fails with a framework error (not a test failure), create minimal framework config and retry.
无源码文件、无测试、无测试配置。需优雅处理:
  1. Phase 0步骤3:如果run_tests.sh返回
    status: "error"
    total: 0
    ,检查是否存在测试文件。如果不存在,说明是全新项目 — 继续执行。
  2. Phase 0步骤4:extract_api.sh将返回空输出。向Test Writer传递
    "(No existing API — this is a new project)"
  3. Phase 2:Test Writer将从头创建测试文件。可能需要设置测试框架配置(例如:
    jest.config.js
    pytest.ini
    )。如果首次测试运行因框架错误(而非测试失败)而失败,创建最小框架配置并重试。

Bug Fix TDD

Bug修复TDD

  1. Write a test demonstrating the bug (should FAIL showing the bug exists)
  2. Confirm failure matches the reported bug — human checkpoint
  3. Fix: minimal code to make test pass (GREEN phase as normal)
  4. Verify: no regressions
  1. 编写一个演示Bug的测试(应失败,显示Bug存在)
  2. 确认失败情况与报告的Bug匹配 — 人工检查点
  3. 修复:编写最小化代码使测试通过(正常执行GREEN阶段)
  4. 验证:无回归问题

Existing Code (Characterization Tests)

现有代码(特征测试)

  1. Write a test for CURRENT behavior (should PASS — this is a characterization test)
  2. Modify the test for DESIRED behavior (should FAIL)
  3. Proceed with GREEN -> REFACTOR
  1. 针对当前行为编写测试(应通过 — 这是特征测试)
  2. 修改测试以匹配期望行为(应失败)
  3. 继续执行GREEN -> REFACTOR

User-Provided Tests

用户提供的测试

If user provides test code:
  1. Run to confirm it fails (RED confirmed)
  2. Skip to Phase 3 (GREEN) — user-provided tests are authoritative
  3. Do not modify without asking
如果用户提供测试代码:
  1. 运行测试以确认其失败(确认RED状态)
  2. 跳至Phase 3(GREEN)— 用户提供的测试具有权威性
  3. 未经询问不得修改

Flaky Tests

不稳定测试

If a test sometimes passes/fails: stop, report, fix the flaky test before continuing.

如果测试有时通过有时失败:停止执行,报告问题,在继续前修复不稳定测试。

Failure Recovery Reference

故障恢复参考

FailurePhaseRecovery
Test Writer returns invalid JSONREDParse with fence-stripping + substring extraction. Retry once with "Return ONLY JSON." Fall back to manual extraction.
Test passes when it should failREDLog "already implemented", skip slice, move to next.
Test has syntax/compile errorREDRead raw_tail, fix test file directly. Retry up to 2 times. Then ask user.
Implementer returns invalid JSONGREENSame JSON recovery as Test Writer.
Test still fails after implementationGREENRetry loop: up to 5 fresh Implementer calls with previous-attempt context. Then ask user.
Full suite has regressionsGREENAuto-fix: fresh Implementer with regression failures. Up to 3 attempts. Then ask user.
Refactorer suggestion breaks testsREFACTORRevert immediately, skip suggestion, continue with next.
run_tests.sh timeoutAnyIncrease timeout. If persistent, ask user about test performance.
run_tests.sh returns
"error"
AnyRead raw_tail for cause. Script error (missing binary, bad path) -> fix and retry. Compilation error -> treat as implementation error.
extract_api.sh returns emptyREDNormal for greenfield. Pass "(No existing API)" message.
Agent response is completely emptyAnyRetry the Task call once. If still empty, ask user.

故障阶段恢复方式
Test Writer返回无效JSONRED使用围栏剥离+子串提取解析。追加"Return ONLY JSON."重试一次。回退为手动提取。
测试本应失败却通过RED记录"已实现",跳过切片,进入下一切片。
测试存在语法/编译错误RED读取raw_tail,直接修复测试文件。最多重试2次。然后询问用户。
Implementer返回无效JSONGREEN与Test Writer相同的JSON恢复方式。
实现后测试仍失败GREEN重试循环:最多5次全新的Implementer调用,传入上次尝试的上下文。然后询问用户。
完整套件存在回归问题GREEN自动修复:全新的Implementer调用,传入回归失败信息。最多3次尝试。然后询问用户。
Refactorer建议导致测试失败REFACTOR立即回退,跳过建议,继续处理下一个。
run_tests.sh超时任何阶段增加超时时间。如果持续超时,询问用户测试性能问题。
run_tests.sh返回
"error"
任何阶段读取raw_tail查找原因。脚本错误(缺失二进制文件、路径错误)→ 修复并重试。编译错误 → 视为实现错误。
extract_api.sh返回空RED全新项目的正常情况。传递"(No existing API)"消息。
Agent响应完全为空任何阶段重试一次Task调用。如果仍然为空,询问用户。

Layer Reference

层级参考

See
references/layer_guide.md
for layer definitions, dependency rules, test strategies by layer, and detection heuristics.
层级定义、依赖规则、各层级测试策略以及检测启发式规则,请参阅
references/layer_guide.md

Anti-Patterns to Avoid

需避免的反模式

See
references/anti_patterns.md
. Critical ones:
  • Never modify a test to make it pass (change implementation, not tests)
  • Never write implementation before tests
  • Never write all tests at once (vertical slicing)
  • Never test implementation details
  • Never skip the RED phase
  • Never let domain code import infrastructure (dependency direction violation)
  • Never mock domain objects — construct real instances instead

请参阅
references/anti_patterns.md
。关键反模式:
  • 永远不要修改测试使其通过(应修改实现,而非测试)
  • 永远不要在编写测试前编写实现
  • 永远不要一次性编写所有测试(应使用垂直切片)
  • 永远不要测试实现细节
  • 永远不要跳过RED阶段
  • 永远不要让领域代码导入基础设施(依赖方向违规)
  • 永远不要模拟领域对象 — 应构造真实实例

Framework Quick Reference

框架快速参考

See
references/framework_configs.md
for setup details.
FrameworkRun single testRun allWatch mode
Jest
npx jest --testPathPattern=<file> -t "<name>"
npx jest
npx jest --watch
Vitest
npx vitest run <file> -t "<name>"
npx vitest run
npx vitest
pytest
pytest <file>::<test_name> -v
pytest -v
pytest-watch
Go
go test -run <TestName> ./...
go test ./...
Cargo
cargo test <test_name>
cargo test
cargo watch -x test
RSpec
rspec <file>:<line>
rspec
guard
PHPUnit
phpunit --filter <test_name>
phpunit
设置细节请参阅
references/framework_configs.md
框架运行单个测试运行所有测试监听模式
Jest
npx jest --testPathPattern=<file> -t "<name>"
npx jest
npx jest --watch
Vitest
npx vitest run <file> -t "<name>"
npx vitest run
npx vitest
pytest
pytest <file>::<test_name> -v
pytest -v
pytest-watch
Go
go test -run <TestName> ./...
go test ./...
Cargo
cargo test <test_name>
cargo test
cargo watch -x test
RSpec
rspec <file>:<line>
rspec
guard
PHPUnit
phpunit --filter <test_name>
phpunit