tdd
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTest-Driven Development (TDD)
测试驱动开发(TDD)
Strict Red-Green-Refactor workflow for robust, self-documenting, production-ready code.
用于生成健壮、自解释、可投产代码的严格Red-Green-Refactor工作流。
Quick Navigation
快速导航
| Situation | Go To |
|---|---|
| New to this codebase | Step 1: Explore Environment |
| Know the framework, starting work | Step 2: Select Mode |
| Need the core loop reference | Step 3: Core TDD Loop |
| Complex edge cases to cover | Property-Based Testing |
| Tests are flaky/unreliable | Flaky Test Management |
| Need isolated test environment | Hermetic Testing |
| Measuring test quality | Mutation Testing |
| 场景 | 跳转至 |
|---|---|
| 首次接触该代码库 | 步骤1:探索测试环境 |
| 已知测试框架,准备开始开发 | 步骤2:选择工作模式 |
| 需要核心循环参考 | 步骤3:TDD核心循环 |
| 需要覆盖复杂边界场景 | 属性化测试 |
| 测试结果不稳定/不可靠 | 脆弱测试管理 |
| 需要隔离的测试环境 | 密封测试 |
| 衡量测试质量 | 变异测试 |
The Three Rules (Robert C. Martin)
三大规则(罗伯特·C·马丁)
- No Production Code without a failing test
- Write Only Enough Test to Fail (compilation errors count)
- Write Only Enough Code to Pass (no optimizations yet)
The Loop: 🔴 RED (write failing test) → 🟢 GREEN (minimal code to pass) → 🔵 REFACTOR (clean up) → Repeat
- 没有失败的测试就不写生产代码
- 仅编写刚好能失败的测试(编译错误也算失败)
- 仅编写刚好能通过测试的代码(暂不做任何优化)
核心循环: 🔴 RED(编写失败测试)→ 🟢 GREEN(编写最少代码通过测试)→ 🔵 REFACTOR(代码清理)→ 重复
Step 1: Explore Test Environment
步骤1:探索测试环境
Do NOT assume anything. Explore the codebase first.
Checklist:
- Search for test files: ,
glob("**/*.test.*"),glob("**/*.spec.*")glob("**/test_*.py") - Check scripts,
package.json, or CI workflowsMakefile - Look for config: ,
vitest.config.*,jest.config.*,pytest.iniCargo.toml
Framework Detection:
| Language | Config Files | Test Command |
|---|---|---|
| Node.js | | |
| Python | | |
| Go | | |
| Rust | | |
不要做任何假设,先探索代码库。
检查清单:
- 搜索测试文件:、
glob("**/*.test.*")、glob("**/*.spec.*")glob("**/test_*.py") - 检查脚本、
package.json或CI工作流配置Makefile - 查找配置文件:、
vitest.config.*、jest.config.*、pytest.iniCargo.toml
框架检测:
| 开发语言 | 配置文件 | 测试命令 |
|---|---|---|
| Node.js | | |
| Python | | |
| Go | | |
| Rust | | |
Step 2: Select Mode
步骤2:选择工作模式
| Mode | When | First Action |
|---|---|---|
| New Feature | Adding functionality | Read existing module tests, confirm green baseline |
| Bug Fix | Reproducing issue | Write failing reproduction test FIRST |
| Refactor | Cleaning code | Ensure ≥80% coverage on target code |
| Legacy | No tests exist | Add characterization tests before changing |
Tie-breaker: If coverage <20% or tests absent → use Legacy Mode first.
| 模式 | 适用场景 | 第一步操作 |
|---|---|---|
| 新功能开发 | 新增功能 | 阅读现有模块的测试,确认基线测试全部通过 |
| 缺陷修复 | 复现问题 | 优先编写可复现问题的失败测试 |
| 代码重构 | 代码清理 | 确保目标代码的测试覆盖率≥80% |
| 遗留代码处理 | 无可用测试 | 修改代码前先添加特性测试 |
优先级规则: 如果覆盖率<20%或无测试 → 优先使用遗留代码模式。
Mode: New Feature
模式:新功能开发
- Read existing tests for the module
- Run tests to confirm green baseline
- Enter Core Loop for new behavior
- Commits: →
test(module): add test for Xfeat(module): implement X
- 阅读目标模块的现有测试
- 运行测试确认基线全部通过
- 针对新功能进入核心循环
- 提交规范: →
test(module): add test for Xfeat(module): implement X
Mode: Bug Fix
模式:缺陷修复
- Write failing reproduction test (MUST fail before fix)
- Confirm failure is assertion error, not syntax error
- Write minimal fix
- Run full test suite
- Commits: →
test: add failing test for bug #123fix: description (#123)
- 编写可复现缺陷的失败测试(修复前必须确实运行失败)
- 确认失败是断言错误,而非语法错误
- 编写最简修复代码
- 运行完整测试套件
- 提交规范: →
test: add failing test for bug #123fix: 缺陷描述 (#123)
Mode: Refactor
模式:代码重构
- Run coverage on the specific function you'll refactor
- If coverage <80% → add characterization tests first
- Refactor in small steps (ONE change → run tests → repeat)
- Never change behavior during refactor
- 针对要重构的具体函数运行覆盖率检测
- 如果覆盖率<80% → 优先添加特性测试
- 小步重构(每次仅做一处修改 → 运行测试 → 重复)
- 重构过程中绝对不修改代码行为
Mode: Legacy Code
模式:遗留代码处理
- Find Seams - insertion points for tests (Sensing Seams, Separation Seams)
- Break Dependencies - use Sprout Method or Wrap Method
- Add characterization tests (capture current behavior)
- Build safety net: happy path + error cases + boundaries
- Then apply TDD for your changes
→ See for full code examples of each mode.
references/examples.md- 找到接缝 - 可插入测试的接入点(感知接缝、分离接缝)
- 解除依赖 - 使用萌芽方法或包装方法
- 添加特性测试(捕获当前代码行为)
- 搭建测试安全网:覆盖正常路径 + 错误场景 + 边界值
- 之后再对需要修改的部分应用TDD流程
→ 查看获取各模式的完整代码示例。
references/examples.mdStep 3: The Core TDD Loop
步骤3:TDD核心循环
Before Starting: Scenario List
开始前:梳理场景清单
List all behaviors to cover:
- Happy path cases
- Edge cases and boundaries
- Error/failure cases
- Pessimism: 3 ways this could fail (network, null, invalid state)
列出所有需要覆盖的行为:
- 正常路径场景
- 边界场景和极值
- 错误/失败场景
- 悲观假设: 列举3种可能的失败原因(网络、空值、无效状态)
🔴 RED Phase
🔴 RED 阶段
- Write ONE test (single behavior or edge case)
- Use AAA: Arrange → Act → Assert
- Run test, verify it FAILS for expected reason
Checks:
- Is failure an assertion error? (Not /
SyntaxError)ModuleNotFoundError - Can I explain why this should fail?
- If test passes immediately → STOP. Test is broken or feature exists.
- 编写单个测试(覆盖单一行为或边界场景)
- 采用AAA模式:准备测试数据 → 执行待测逻辑 → 断言结果
- 运行测试,确认它因预期原因失败
检查项:
- 失败是否为断言错误?(不是/
SyntaxError这类错误)ModuleNotFoundError - 我能解释测试失败的原因吗?
- 如果测试直接通过 → 停止,要么测试本身有问题,要么该功能已经实现。
🟢 GREEN Phase
🟢 GREEN 阶段
- Write minimal code to pass
- Do NOT implement "perfect" solution
- Verify test passes
Checks:
- Is this the simplest solution?
- Can I delete any of this code and still pass?
- 编写最少的代码让测试通过
- 不要实现「完美」方案
- 确认测试通过
检查项:
- 这是最简解决方案吗?
- 删除其中任意部分代码还能通过测试吗?
🔵 REFACTOR Phase
🔵 REFACTOR 阶段
- Look for duplication, unclear names, magic values
- Clean up without changing behavior
- Verify tests still pass
- 排查重复代码、不清晰的命名、魔法值
- 清理代码且不改变原有行为
- 确认测试仍然全部通过
Repeat
重复
Select next scenario, return to RED.
Triangulation: If implementation is too specific (hardcoded), write another test with different inputs to force generalization.
选择下一个测试场景,回到RED阶段。
三角验证: 如果实现过于具体(硬编码),编写另一个不同输入的测试,倒逼代码实现通用化。
Stop Conditions
停止条件
| Signal | Response |
|---|---|
| Test passes immediately | Check assertions, verify feature isn't already built |
| Test fails for wrong reason | Fix setup/imports first |
| Flaky test | STOP. Fix non-determinism immediately |
| Slow feedback (>5s) | Optimize or mock external calls |
| Coverage decreased | Add tests for uncovered paths |
| 信号 | 应对措施 |
|---|---|
| 测试直接通过 | 检查断言,确认该功能尚未被实现 |
| 测试因非预期原因失败 | 优先修复配置/导入问题 |
| 测试结果不稳定 | 立即停止,第一时间修复非确定性问题 |
| 反馈过慢 (>5秒) | 优化或模拟外部调用 |
| 覆盖率下降 | 为未覆盖的路径添加测试 |
Test Distribution: The Testing Trophy
测试分层:测试奖杯
The Testing Trophy (Kent C. Dodds) reflects modern testing reality: integration tests give the best confidence-to-effort ratio.
_____________
/ System \ ← Few, slow, high confidence; brittle (E2E)
/_______________\
/ \
/ Integration \ ← Real interactions between units — **BEST ROI** (Integration)
\ /
\_________________/
\ Unit / ← Fast & cheap but test in isolation (Unit)
\___________/
/ Static \ ← Typecheck, linting — typos/types (Static)
/_____________\测试奖杯(Kent C. Dodds)反映了现代测试的实际情况:集成测试能带来最高的信心投入比。
_____________
/ System \ ← 数量少,运行慢,置信度高;易失效(端到端测试)
/_______________\
/ \
/ Integration \ ← 单元间的真实交互 — **投入回报率最高**(集成测试)
\ /
\_________________/
\ Unit / ← 运行快、成本低,但仅测试隔离逻辑(单元测试)
\___________/
/ Static \ ← 类型检查、语法检查 — 捕获拼写/类型错误(静态检查)
/_____________\Layer Breakdown
分层说明
| Layer | What | Tools | When |
|---|---|---|---|
| Static | Type errors, syntax, linting | TypeScript, ESLint | Always on, catches 50%+ of bugs for free |
| Unit | Pure functions, algorithms, utilities | vitest, jest, pytest | Isolated logic with no dependencies |
| Integration | Components + hooks + services together | Testing Library, MSW, Testcontainers | Real user flows, real(ish) data |
| E2E | Full app in browser | Playwright, Cypress | Critical paths only (login, checkout) |
| 分层 | 内容 | 工具 | 适用场景 |
|---|---|---|---|
| 静态检查 | 类型错误、语法问题、代码规范 | TypeScript、ESLint | 始终开启,可免费捕获50%以上的缺陷 |
| 单元测试 | 纯函数、算法、工具函数 | vitest、jest、pytest | 无外部依赖的隔离逻辑 |
| 集成测试 | 组件 + hooks + 服务联合测试 | Testing Library、MSW、Testcontainers | 真实用户流程、接近真实的业务数据 |
| 端到端测试 | 浏览器中运行的完整应用 | Playwright、Cypress | 仅覆盖核心路径(登录、支付) |
Why Integration Tests Win
集成测试的优势
Unit tests prove code works in isolation. Integration tests prove code works together.
| Concern | Unit Test | Integration Test |
|---|---|---|
| Component renders | ✅ | ✅ |
| Component + hook works | ❌ | ✅ |
| Component + API works | ❌ | ✅ |
| User flow works | ❌ | ✅ |
| Catches real bugs | Sometimes | Usually |
The insight: Most bugs live in the seams between modules, not inside pure functions. Integration tests catch seam bugs; unit tests don't.
单元测试 证明代码在隔离环境下可以运行。集成测试 证明代码联合起来可以正常工作。
| 考察项 | 单元测试 | 集成测试 |
|---|---|---|
| 组件渲染 | ✅ | ✅ |
| 组件+hook正常工作 | ❌ | ✅ |
| 组件+API正常工作 | ❌ | ✅ |
| 用户流程正常 | ❌ | ✅ |
| 捕获真实缺陷 | 有时 | 通常 |
核心洞察: 大部分缺陷存在于模块之间的接缝处,而非纯函数内部。集成测试可以捕获接缝处的缺陷,单元测试做不到。
Practical Guidance
实践指导
- Start with integration tests - Test the way users use your code
- Drop to unit tests for complex algorithms or edge cases
- Use E2E sparingly - Slow, flaky, expensive to maintain
- Let static analysis do the heavy lifting - TypeScript catches more bugs than most unit tests
- Prefer fakes over mocks - Fakes have real behavior; mocks just return canned data
- SMURF quality: Sustainable, Maintainable, Useful, Resilient, Fast
- 从集成测试开始 - 按照用户使用代码的方式测试
- 复杂算法或边界场景降级为单元测试
- 谨慎使用端到端测试 - 运行慢、易失效、维护成本高
- 让静态分析承担大部分工作 - TypeScript能捕获比大多数单元测试更多的缺陷
- 优先使用伪造对象而非模拟对象 - 伪造对象有真实行为,模拟对象仅返回预设数据
- SMURF质量标准: 可持续、可维护、有用、有韧性、运行快
Anti-Patterns
反模式
| Pattern | Problem | Fix |
|---|---|---|
| Mirror Blindness | Same agent writes test AND code | State test intent before GREEN |
| Happy Path Bias | Only success scenarios | Include errors in Scenario List |
| Refactoring While Red | Changing structure with failing tests | Get to GREEN first |
| The Mockery | Over-mocking hides bugs | Prefer fakes or real implementations |
| Coverage Theater | Tests without meaningful assertions | Assert behavior, not lines |
| Multi-Test Step | Multiple tests before implementing | One test at a time |
| Verification Trap 🤖 | AI tests what code does not what it should do | State intent in plain language; separate agent review |
| Test Exploitation 🤖 | LLMs exploit weak assertions or overload operators | Use PBT alongside examples; strict equality |
| Assertion Omission 🤖 | Missing edge cases (null, undefined, boundaries) | Scenario list with errors; |
| Hallucinated Mock 🤖 | AI generates fake mocks without proper setup | Testcontainers for integration; real Fakes for unit |
Critical: Verify tests by (1) running them, (2) having separate agent review, (3) never trusting generated tests blindly.
| 模式 | 问题 | 修复方案 |
|---|---|---|
| 镜像盲区 | 同一开发者同时编写测试和代码 | 在GREEN阶段前明确说明测试意图 |
| 正常路径偏好 | 仅测试成功场景 | 在场景清单中包含错误场景 |
| 红色阶段重构 | 测试失败时修改代码结构 | 先让测试通过到GREEN阶段 |
| 过度模拟 | 过多模拟隐藏了真实缺陷 | 优先使用伪造对象或真实实现 |
| 覆盖率表演 | 测试没有有效断言 | 断言行为而非代码行 |
| 多测试步长 | 编写多个测试后才实现功能 | 一次仅编写一个测试 |
| 验证陷阱 🤖 | AI仅测试「代码做了什么」而非「代码应该做什么」 | 用自然语言明确需求,引入独立评审 |
| 测试投机 🤖 | 大语言模型利用弱断言或重载运算符绕过检测 | 结合属性化测试和示例测试,使用严格相等断言 |
| 断言缺失 🤖 | 遗漏边界场景(null、undefined、极值) | 场景清单包含错误场景,使用 |
| 幻觉模拟 🤖 | AI生成没有合理配置的虚假模拟对象 | 集成测试使用Testcontainers,单元测试使用真实伪造对象 |
关键提醒: 验证测试需要(1) 实际运行测试,(2) 引入独立评审,(3) 永远不要盲目信任生成的测试。
Advanced Techniques
高级技术
Use these techniques at specific points in your workflow:
| Technique | Use During | Purpose |
|---|---|---|
| Test Doubles | 🔴 RED phase | Isolate dependencies when writing tests |
| Property-Based Testing | 🔴 RED phase | Cover edge cases for complex logic |
| Contract Testing | 🔴 RED phase | Define API expectations between services |
| Snapshot Testing | 🔴 RED phase | Capture UI/response structure |
| Hermetic Testing | 🔵 Setup | Ensure test isolation and determinism |
| Mutation Testing | ✅ After GREEN | Validate test suite effectiveness |
| Coverage Analysis | ✅ After GREEN | Find untested code paths |
| Flaky Test Management | 🔧 Maintenance | Fix unreliable tests blocking CI |
在工作流的特定阶段使用这些技术:
| 技术 | 使用阶段 | 目的 |
|---|---|---|
| 测试替身 | 🔴 RED阶段 | 编写测试时隔离依赖 |
| 属性化测试 | 🔴 RED阶段 | 覆盖复杂逻辑的边界场景 |
| 契约测试 | 🔴 RED阶段 | 定义服务间的API预期 |
| 快照测试 | 🔴 RED阶段 | 捕获UI/响应结构 |
| 密封测试 | 🔵 配置阶段 | 确保测试隔离性和确定性 |
| 变异测试 | ✅ GREEN阶段后 | 验证测试套件的有效性 |
| 覆盖率分析 | ✅ GREEN阶段后 | 发现未测试的代码路径 |
| 脆弱测试管理 | 🔧 维护阶段 | 修复阻塞CI的不稳定测试 |
Test Doubles (Use: Writing Tests with Dependencies)
测试替身(适用场景:编写有依赖的测试)
When: Your code depends on something slow, unreliable, or complex (DB, API, filesystem).
| Type | Purpose | When |
|---|---|---|
| Stub | Returns canned answers | Need specific return values |
| Mock | Verifies interactions | Need to verify calls made |
| Fake | Simplified implementation | Need real behavior without cost |
| Spy | Records calls | Need to observe without changing |
Decision: Dependency slow/unreliable? → Fake (complex) or Stub (simple). Need to verify calls? → Mock/Spy. Otherwise → real implementation.
→ See → Test Double Examples
references/examples.md适用场景: 代码依赖运行慢、不可靠或复杂的组件(数据库、API、文件系统)。
| 类型 | 用途 | 适用场景 |
|---|---|---|
| 存根(Stub) | 返回预设响应 | 需要特定返回值时 |
| 模拟(Mock) | 验证交互行为 | 需要确认接口调用情况时 |
| 伪造(Fake) | 简化的真实实现 | 需要真实行为但不想承担性能成本时 |
| 间谍(Spy) | 记录调用信息 | 需要观测调用但不改变行为时 |
决策逻辑: 依赖运行慢/不可靠?→ 伪造(复杂场景)或存根(简单场景)。需要验证调用?→ 模拟/间谍。其他情况 → 使用真实实现。
→ 查看 → 测试替身示例
references/examples.mdHermetic Testing (Use: Test Environment Setup)
密封测试(适用场景:测试环境配置)
When: Setting up test infrastructure. Tests must be isolated and deterministic.
Principles:
- Isolation: Unique temp directories/state per test
- Reset: Clean up in setUp/tearDown
- Determinism: No time-based logic or shared mutable state
Database Strategies:
| Strategy | Speed | Fidelity | Use When |
|---|---|---|---|
| In-memory (SQLite) | Fast | Low | Unit tests, simple queries |
| Testcontainers | Medium | High | Integration tests |
| Transactional Rollback | Fast | High | Tests sharing schema (80x faster than TRUNCATE) |
→ See → Hermetic Testing Examples
references/examples.md适用场景: 搭建测试基础设施时,测试必须具备隔离性和确定性。
原则:
- 隔离性: 每个测试使用唯一的临时目录/状态
- 重置: 在setUp/tearDown阶段清理资源
- 确定性: 没有基于时间的逻辑或共享可变状态
数据库策略:
| 策略 | 速度 | 保真度 | 适用场景 |
|---|---|---|---|
| 内存数据库(SQLite) | 快 | 低 | 单元测试、简单查询 |
| Testcontainers | 中等 | 高 | 集成测试 |
| 事务回滚 | 快 | 高 | 共享schema的测试(比TRUNCATE快80倍) |
→ 查看 → 密封测试示例
references/examples.mdProperty-Based Testing (Use: Writing Tests for Complex Logic)
属性化测试(适用场景:为复杂逻辑编写测试)
When: Writing tests for algorithms, state machines, serialization, or code with many edge cases.
Tools: fast-check (JS/TS), Hypothesis (Python), proptest (Rust)
Properties to Test:
- Commutativity:
f(a, b) == f(b, a) - Associativity:
f(f(a, b), c) == f(a, f(b, c)) - Identity:
f(a, identity) == a - Round-trip:
decode(encode(x)) == x - Metamorphic: If input changes by X, output changes by Y (useful when you don't know expected output)
How: Replace multiple example-based tests with one property test that generates random inputs.
Critical: Always log the seed on failure. Without it, you cannot reproduce the failing case.
→ See → Property-Based Testing Examples
references/examples.md适用场景: 为算法、状态机、序列化或存在大量边界场景的代码编写测试。
工具: fast-check(JS/TS)、Hypothesis(Python)、proptest(Rust)
可测试的属性:
- 交换律:
f(a, b) == f(b, a) - 结合律:
f(f(a, b), c) == f(a, f(b, c)) - 恒等律:
f(a, identity) == a - 往返一致性:
decode(encode(x)) == x - 变质关系:如果输入变化X,输出变化Y(不知道预期输出时非常有用)
使用方式: 用一个可生成随机输入的属性测试替代多个基于示例的测试。
关键提醒: 测试失败时务必记录种子值,没有种子值就无法复现失败场景。
→ 查看 → 属性化测试示例
references/examples.mdMutation Testing (Use: Validating Test Quality)
变异测试(适用场景:验证测试质量)
When: After tests pass, to verify they actually catch bugs. Use for critical code (auth, payments) or before major refactors.
Tools: Stryker (JS/TS), PIT (Java), mutmut (Python)
How: Tool mutates your code (e.g., changes to ). If tests still pass → your tests are weak.
>>=Interpretation:
- >80% mutation score = good test suite
- Survived mutants = tests don't catch those changes → add tests for these
Equivalent Mutant Problem: Some mutants change syntax but not behavior (e.g., → in a loop where i only increments). These can't be killed—100% score is often impossible. Focus on surviving mutants in critical paths, not chasing perfect scores.
i < 10i != 10When NOT to use: Tool-generated code (OpenAPI clients, Protobuf stubs, ORM models), simple DTOs/getters, legacy code with slow tests, or CI pipelines that must finish in <5 minutes. Use for PR-focused runs. Note: This does NOT mean skip mutation testing on code you (the agent) wrote—always validate your own work.
--incremental --since main→ See → Mutation Testing Examples
references/examples.md适用场景: 测试通过后,验证测试确实能捕获缺陷。用于核心代码(权限、支付)或大规模重构前。
工具: Stryker(JS/TS)、PIT(Java)、mutmut(Python)
使用方式: 工具修改你的代码(例如把改成),如果修改后测试仍然全部通过 → 说明你的测试强度不足。
>>=结果解读:
- 变异得分>80% = 优秀的测试套件
- 存活变异体 = 测试无法捕获这些修改 → 为这些场景添加测试
等价变异体问题: 部分变异体修改了语法但没有改变行为(例如循环中改成,i仅自增)。这类变异体无法被「杀死」,通常不可能拿到100%的得分。重点关注核心路径的存活变异体,不要追求完美得分。
i < 10i != 10不适用场景: 工具生成的代码(OpenAPI客户端、Protobuf存根、ORM模型)、简单DTO/取值方法、测试运行很慢的遗留代码、要求5分钟内完成的CI流水线。PR场景下使用增量运行。注意:这并不意味着你(Agent)编写的代码可以跳过变异测试——始终要验证自己的产出。
--incremental --since main→ 查看 → 变异测试示例
references/examples.mdFlaky Test Management (Use: CI/CD Maintenance)
脆弱测试管理(适用场景:CI/CD维护)
When: Tests fail intermittently, blocking CI or eroding trust in the test suite.
Root Causes:
| Cause | Fix |
|---|---|
Timing ( | Fake timers, await properly |
| Shared state | Isolate per test |
| Randomness | Seed or mock |
| Network | Use MSW or fakes |
| Order dependency | Make tests independent |
| Parallel transaction conflicts | Isolate DB connections per worker |
How: Detect () → Quarantine (separate suite) → Fix root cause → Restore
--repeat 10Quarantine Rules:
- Issue-linked: Every quarantined test MUST link to a tracking issue. Prevents "quarantine-and-forget."
- Mute, don't skip: Prefer muting (runs but doesn't fail build) over skipping. You still collect failure data.
- Reintroduction criteria: Test must pass N consecutive runs (e.g., 100) on main before leaving quarantine.
→ See → Flaky Test Examples
references/examples.md适用场景: 测试间歇性失败,阻塞CI或降低测试套件可信度。
根本原因:
| 原因 | 修复方案 |
|---|---|
时序问题( | 使用模拟计时器,正确await异步逻辑 |
| 共享状态 | 每个测试隔离状态 |
| 随机逻辑 | 固定种子或模拟随机逻辑 |
| 网络依赖 | 使用MSW或伪造对象 |
| 执行顺序依赖 | 让测试完全独立 |
| 并行事务冲突 | 每个工作线程隔离数据库连接 |
处理流程: 检测()→ 隔离(放入独立测试套件)→ 修复根本原因 → 恢复
--repeat 10隔离规则:
- 关联问题: 每个被隔离的测试必须关联跟踪问题,避免「隔离后就遗忘」
- 静音而非跳过: 优先使用静音(运行但不阻塞构建)而非跳过,仍然可以收集失败数据
- 恢复标准: 测试在主分支连续通过N次(例如100次)后才能移出隔离区
→ 查看 → 脆弱测试示例
references/examples.mdContract Testing (Use: Writing Tests for Service Boundaries)
契约测试(适用场景:为服务边界编写测试)
When: Writing tests for code that calls or exposes APIs. Prevents integration breakage.
How (Pact): Consumer defines expected interactions → Contract published → Provider verifies → CI fails if contract broken.
→ See → Contract Testing Examples
references/examples.md适用场景: 为调用或暴露API的代码编写测试,避免集成失败。
使用方式(Pact): 消费方定义预期交互 → 发布契约 → 提供方验证契约 → 契约被破坏时CI失败
→ 查看 → 契约测试示例
references/examples.mdCoverage Analysis (Use: Finding Gaps After Tests Pass)
覆盖率分析(适用场景:测试通过后发现覆盖缺口)
When: After writing tests, to find untested code paths. NOT a goal in itself.
| Metric | Measures | Threshold |
|---|---|---|
| Line | Lines executed | 70-80% |
| Branch | Decision paths | 60-70% |
| Mutation | Test effectiveness | >80% |
Risk-Based Prioritization: P0 (auth, payments) → P1 (core logic) → P2 (helpers) → P3 (config)
Warning: High coverage ≠ good tests. Tests must assert meaningful behavior.
适用场景: 编写测试后,发现未测试的代码路径。覆盖率本身不是最终目标。
| 指标 | 衡量内容 | 阈值 |
|---|---|---|
| 行覆盖率 | 执行到的代码行 | 70-80% |
| 分支覆盖率 | 覆盖的决策路径 | 60-70% |
| 变异得分 | 测试有效性 | >80% |
风险优先级: P0(权限、支付)→ P1(核心逻辑)→ P2(工具函数)→ P3(配置)
警告: 高覆盖率≠高质量测试,测试必须断言有意义的行为。
Snapshot Testing (Use: Writing Tests for UI/Output Structure)
快照测试(适用场景:为UI/输出结构编写测试)
When: Writing tests for UI components, API responses, or error message formats.
Appropriate: UI structure, API response shapes, error formats.
Avoid: Behavior testing, dynamic content, entire pages.
How: Capture output once, verify it doesn't change unexpectedly. Always review diffs carefully.
→ See → Snapshot Testing Examples
references/examples.md适用场景: 为UI组件、API响应或错误信息格式编写测试。
适用: UI结构、API响应格式、错误信息格式。
避免: 行为测试、动态内容、整页测试。
使用方式: 首次运行捕获输出,后续运行验证输出没有意外变更。务必仔细审查快照差异。
→ 查看 → 快照测试示例
references/examples.mdIntegration with Other Skills
与其他技能的联动
| Task | Skill | Usage |
|---|---|---|
| Committing | | |
| Code Quality | | Run during REFACTOR phase |
| Documentation | | Check if behavior changes need docs |
| 任务 | 技能 | 使用方式 |
|---|---|---|
| 代码提交 | | RED阶段用 |
| 代码质量 | | REFACTOR阶段运行 |
| 文档 | | 检查行为变更是否需要更新文档 |
References
参考资料
Foundational:
- Three Rules of TDD - Robert C. Martin
- Test Pyramid - Martin Fowler
- Testing Trophy - Kent C. Dodds
- Working Effectively with Legacy Code - Michael Feathers
基础资料: