test-driven-development

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Test-Driven Development (TDD)

测试驱动开发（TDD）

Overview

概述

Write the test first. Watch it fail. Write minimal code to pass.

Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.

Violating the letter of the rules is violating the spirit of the rules.

先编写测试，看着测试失败，再编写最少的代码让测试通过。

**核心原则：**如果没看到测试失败，你就无法确定它是否测试了正确的内容。

违反规则的字面要求就是违反规则的精神。

When to Use

适用场景

Always:

New features
Bug fixes
Refactoring
Behavior changes

Exceptions (ask the user first):

Throwaway prototypes
Generated code
Configuration files

Thinking "skip TDD just this once"? Stop. That's rationalization.

始终适用：

新功能开发
Bug修复
代码重构
行为变更

例外情况（需先征得用户同意）：

一次性原型
生成的代码
配置文件

想着“就这一次跳过TDD”？打住。这只是合理化借口。

The Iron Law

铁律

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

Write code before the test? Delete it. Start over.

No exceptions:

Don't keep it as "reference"
Don't "adapt" it while writing tests
Don't look at it
Delete means delete

Implement fresh from tests. Period.

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

先写代码再写测试？删掉代码，重新开始。

无例外：

不要把它当作“参考”保留
不要在写测试时“修改”它
不要看它
删除就是彻底删除

从测试开始重新实现。就这么简单。

Red-Green-Refactor Cycle

红-绿-重构循环

RED — Write Failing Test

红阶段 — 编写失败的测试

Write one minimal test showing what should happen.

Good test:

python

def test_retries_failed_operations_3_times():
    attempts = 0
    def operation():
        nonlocal attempts
        attempts += 1
        if attempts < 3:
            raise Exception('fail')
        return 'success'

    result = retry_operation(operation)

    assert result == 'success'
    assert attempts == 3

Clear name, tests real behavior, one thing.

Bad test:

python

def test_retry_works():
    mock = MagicMock()
    mock.side_effect = [Exception(), Exception(), 'success']
    result = retry_operation(mock)
    assert result == 'success'  # What about retry count? Timing?

Vague name, tests mock not real code.

Requirements:

One behavior per test
Clear descriptive name ("and" in name? Split it)
Real code, not mocks (unless truly unavoidable)
Name describes behavior, not implementation

编写一个最小化的测试，展示预期行为。

良好的测试：

python

def test_retries_failed_operations_3_times():
    attempts = 0
    def operation():
        nonlocal attempts
        attempts += 1
        if attempts < 3:
            raise Exception('fail')
        return 'success'

    result = retry_operation(operation)

    assert result == 'success'
    assert attempts == 3

名称清晰，测试真实行为，只关注一件事。

糟糕的测试：

python

def test_retry_works():
    mock = MagicMock()
    mock.side_effect = [Exception(), Exception(), 'success']
    result = retry_operation(mock)
    assert result == 'success'  # 重试次数呢？时序呢？

名称模糊，测试的是mock而非真实代码。

要求：

每个测试只验证一种行为
名称清晰描述行为（名称里有“和”？拆分它）
使用真实代码，而非mock（除非确实无法避免）
名称描述行为，而非实现细节

Verify RED — Watch It Fail

验证红阶段 — 确认测试失败

MANDATORY. Never skip.

bash

undefined

必须执行，绝不能跳过。

bash

undefined

Use terminal tool to run the specific test

使用终端工具运行指定测试

pytest tests/test_feature.py::test_specific_behavior -v


Confirm:
- Test fails (not errors from typos)
- Failure message is expected
- Fails because the feature is missing

**Test passes immediately?** You're testing existing behavior. Fix the test.

**Test errors?** Fix the error, re-run until it fails correctly.

pytest tests/test_feature.py::test_specific_behavior -v


确认：
- 测试失败（不是拼写错误导致的报错）
- 失败信息符合预期
- 失败原因是功能缺失

**测试立即通过？**你在测试已有的行为。修改测试。

**测试报错？**修复错误，重新运行直到测试正确失败。

GREEN — Minimal Code

绿阶段 — 编写最少代码

Write the simplest code to pass the test. Nothing more.

Good:

python

def add(a, b):
    return a + b  # Nothing extra

Bad:

python

def add(a, b):
    result = a + b
    logging.info(f"Adding {a} + {b} = {result}")  # Extra!
    return result

Don't add features, refactor other code, or "improve" beyond the test.

Cheating is OK in GREEN:

Hardcode return values
Copy-paste
Duplicate code
Skip edge cases

We'll fix it in REFACTOR.

编写最简单的代码让测试通过，仅此而已。

良好示例：

python

def add(a, b):
    return a + b  # 没有多余内容

糟糕示例：

python

def add(a, b):
    result = a + b
    logging.info(f"Adding {a} + {b} = {result}")  # 多余内容！
    return result

不要添加额外功能、重构其他代码或“优化”超出测试要求的内容。

绿阶段可以“投机取巧”：

硬编码返回值
复制粘贴
重复代码
跳过边界情况

我们会在重构阶段修复这些问题。

Verify GREEN — Watch It Pass

验证绿阶段 — 确认测试通过

MANDATORY.

bash

undefined

必须执行。

bash

undefined

Run the specific test

运行指定测试

pytest tests/test_feature.py::test_specific_behavior -v

Then run ALL tests to check for regressions

然后运行所有测试检查是否有回归

pytest tests/ -q


Confirm:
- Test passes
- Other tests still pass
- Output pristine (no errors, warnings)

**Test fails?** Fix the code, not the test.

**Other tests fail?** Fix regressions now.

pytest tests/ -q


确认：
- 测试通过
- 其他测试仍能通过
- 输出干净（无错误、警告）

**测试失败？**修复代码，而非测试。

**其他测试失败？**立即修复回归问题。

REFACTOR — Clean Up

重构阶段 — 代码清理

After green only:

Remove duplication
Improve names
Extract helpers
Simplify expressions

Keep tests green throughout. Don't add behavior.

If tests fail during refactor: Undo immediately. Take smaller steps.

仅在绿阶段之后进行：

移除重复代码
优化命名
提取辅助函数
简化表达式

重构过程中保持测试一直通过。不要添加新行为。

**重构时测试失败？**立即撤销操作，采取更小的步骤。

Repeat

重复循环

Next failing test for next behavior. One cycle at a time.

为下一个行为编写新的失败测试，一次一个循环。

Why Order Matters

顺序的重要性

"I'll write tests after to verify it works"

Tests written after code pass immediately. Passing immediately proves nothing:

Might test the wrong thing
Might test implementation, not behavior
Might miss edge cases you forgot
You never saw it catch the bug

Test-first forces you to see the test fail, proving it actually tests something.

"I already manually tested all the edge cases"

Manual testing is ad-hoc. You think you tested everything but:

No record of what you tested
Can't re-run when code changes
Easy to forget cases under pressure
"It worked when I tried it" ≠ comprehensive

Automated tests are systematic. They run the same way every time.

"Deleting X hours of work is wasteful"

Sunk cost fallacy. The time is already gone. Your choice now:

Delete and rewrite with TDD (high confidence)
Keep it and add tests after (low confidence, likely bugs)

The "waste" is keeping code you can't trust.

"TDD is dogmatic, being pragmatic means adapting"

TDD IS pragmatic:

Finds bugs before commit (faster than debugging after)
Prevents regressions (tests catch breaks immediately)
Documents behavior (tests show how to use code)
Enables refactoring (change freely, tests catch breaks)

"Pragmatic" shortcuts = debugging in production = slower.

"Tests after achieve the same goals — it's spirit not ritual"

No. Tests-after answer "What does this do?" Tests-first answer "What should this do?"

Tests-after are biased by your implementation. You test what you built, not what's required. Tests-first force edge case discovery before implementing.

“我会在写完代码后再写测试来验证功能”

写完代码后再写的测试会立即通过。立即通过的测试无法证明任何事情：

可能测试了错误的内容
可能测试的是实现细节而非行为
可能遗漏了你忘记的边界情况
你从未看到它捕获过Bug

先写测试会迫使你看到测试失败，证明它确实测试了某些内容。

“我已经手动测试了所有边界情况”

手动测试是临时的。你以为测试了所有内容，但：

没有记录测试过什么
代码变更时无法重新运行测试
压力下容易忘记测试用例
“我试的时候能用”≠全面测试

自动化测试是系统化的，每次运行方式都一致。

“删除X小时的工作是浪费”

这是沉没成本谬误。时间已经花了，你现在的选择是：

删除代码并使用TDD重写（高可信度）
保留代码并事后添加测试（低可信度，可能存在Bug）

“浪费”的是保留你无法信任的代码。

“TDD太教条，务实意味着灵活变通”

TDD才是务实的：

在提交前发现Bug（比事后调试更快）
防止回归（测试立即捕获代码破坏）
记录行为（测试展示如何使用代码）
支持重构（自由修改代码，测试捕获破坏）

“务实”的捷径=生产环境调试=更慢。

“事后写测试能达到同样的目标——重要的是精神而非仪式”

不。事后写的测试回答“这代码做了什么？”，先写的测试回答“这代码应该做什么？”

事后写的测试会受你的实现方式影响，你测试的是你构建的内容，而非需求。先写测试会迫使你在实现前发现边界情况。

Common Rationalizations

常见合理化借口

Excuse	Reality
"Too simple to test"	Simple code breaks. Test takes 30 seconds.
"I'll test after"	Tests passing immediately prove nothing.
"Tests after achieve same goals"	Tests-after = "what does this do?" Tests-first = "what should this do?"
"Already manually tested"	Ad-hoc ≠ systematic. No record, can't re-run.
"Deleting X hours is wasteful"	Sunk cost fallacy. Keeping unverified code is technical debt.
"Keep as reference, write tests first"	You'll adapt it. That's testing after. Delete means delete.
"Need to explore first"	Fine. Throw away exploration, start with TDD.
"Test hard = design unclear"	Listen to the test. Hard to test = hard to use.
"TDD will slow me down"	TDD faster than debugging. Pragmatic = test-first.
"Manual test faster"	Manual doesn't prove edge cases. You'll re-test every change.
"Existing code has no tests"	You're improving it. Add tests for the code you touch.

借口	真相
“太简单了不用测试”	简单代码也会出错。测试只需要30秒。
“我之后再写测试”	立即通过的测试无法证明任何事情。
“事后写测试能达到同样目标”	事后测试=“这代码做了什么？” 先写测试=“这代码应该做什么？”
“已经手动测试过了”	临时测试≠系统化测试。没有记录，无法重新运行。
“删除X小时的工作是浪费”	沉没成本谬误。保留未验证的代码是技术债务。
“保留作为参考，先写测试”	你会修改它。那其实是事后测试。删除就是彻底删除。
“需要先探索一下”	没问题。扔掉探索性代码，用TDD重新开始。
“测试难度大=设计不清晰”	倾听测试的反馈。难测试=难使用。
“TDD会拖慢我的速度”	TDD比调试更快。务实=先写测试。
“手动测试更快”	手动测试无法覆盖边界情况。每次代码变更你都要重新测试。
“现有代码没有测试”	你正在改进它。为你修改的代码添加测试。

Red Flags — STOP and Start Over

危险信号 — 停止并重新开始

If you catch yourself doing any of these, delete the code and restart with TDD:

Code before test
Test after implementation
Test passes immediately on first run
Can't explain why test failed
Tests added "later"
Rationalizing "just this once"
"I already manually tested it"
"Tests after achieve the same purpose"
"Keep as reference" or "adapt existing code"
"Already spent X hours, deleting is wasteful"
"TDD is dogmatic, I'm being pragmatic"
"This is different because..."

All of these mean: Delete code. Start over with TDD.

如果你发现自己有以下任何行为，删除代码并使用TDD重新开始：

先写代码再写测试
实现后再写测试
第一次运行测试就立即通过
无法解释测试失败的原因
“稍后”添加测试
找借口“就这一次”
“我已经手动测试过了”
“事后写测试能达到同样目的”
“保留作为参考”或“修改现有代码”
“已经花了X小时，删除太浪费”
“TDD太教条，我这是务实”
“这次情况特殊因为……”

所有这些情况都意味着：删除代码，用TDD重新开始。

Verification Checklist

验证清单

When Stuck

遇到瓶颈时

Problem	Solution
Don't know how to test	Write the wished-for API. Write the assertion first. Ask the user.
Test too complicated	Design too complicated. Simplify the interface.
Must mock everything	Code too coupled. Use dependency injection.
Test setup huge	Extract helpers. Still complex? Simplify the design.

问题	解决方案
不知道如何测试	编写期望的API。先写断言。询问用户。
测试太复杂	设计太复杂。简化接口。
必须mock所有内容	代码耦合度太高。使用依赖注入。
测试设置工作量大	提取辅助函数。仍然复杂？简化设计。

Hermes Agent Integration

Hermes Agent 集成

Running Tests

运行测试

Use the

terminal

tool to run tests at each step:

python

undefined

在每个步骤使用

terminal

工具运行测试：

python

undefined

RED — verify failure

红阶段 — 验证失败

terminal("pytest tests/test_feature.py::test_name -v")

GREEN — verify pass

绿阶段 — 验证通过

terminal("pytest tests/test_feature.py::test_name -v")

Full suite — verify no regressions

全量测试 — 验证无回归

terminal("pytest tests/ -q")

undefined

terminal("pytest tests/ -q")

undefined

With delegate_task

使用delegate_task

When dispatching subagents for implementation, enforce TDD in the goal:

python

delegate_task(
    goal="Implement [feature] using strict TDD",
    context="""
    Follow test-driven-development skill:
    1. Write failing test FIRST
    2. Run test to verify it fails
    3. Write minimal code to pass
    4. Run test to verify it passes
    5. Refactor if needed
    6. Commit

    Project test command: pytest tests/ -q
    Project structure: [describe relevant files]
    """,
    toolsets=['terminal', 'file']
)

分派子Agent执行实现任务时，在目标中强制要求遵循TDD：

python

delegate_task(
    goal="Implement [feature] using strict TDD",
    context="""
    遵循测试驱动开发规范：
    1. 先编写失败的测试
    2. 运行测试验证失败
    3. 编写最少代码让测试通过
    4. 运行测试验证通过
    5. 如有需要进行重构
    6. 提交

    项目测试命令：pytest tests/ -q
    项目结构：[描述相关文件]
    """,
    toolsets=['terminal', 'file']
)

With systematic-debugging

结合systematic-debugging

Bug found? Write failing test reproducing it. Follow TDD cycle. The test proves the fix and prevents regression.

Never fix bugs without a test.

发现Bug？编写能复现Bug的失败测试，遵循TDD循环。测试能证明修复有效并防止回归。

永远不要在没有测试的情况下修复Bug。

Testing Anti-Patterns

测试反模式

Testing mock behavior instead of real behavior — mocks should verify interactions, not replace the system under test
Testing implementation details — test behavior/results, not internal method calls
Happy path only — always test edge cases, errors, and boundaries
Brittle tests — tests should verify behavior, not structure; refactoring shouldn't break them

测试mock行为而非真实行为 — mock应用于验证交互，而非替代被测系统
测试实现细节 — 测试行为/结果，而非内部方法调用
只测试正常路径 — 始终测试边界情况、错误场景和临界值
脆弱的测试 — 测试应验证行为而非结构；重构不应导致测试失败

Final Rule

最终规则

Production code → test exists and failed first
Otherwise → not TDD

No exceptions without the user's explicit permission.

Production code → test exists and failed first
Otherwise → not TDD

未经用户明确许可，无例外。