test-designer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Test Designer

测试设计器

Independent test-design orchestrator. Encodes Independent Evaluation: the agent writing the tests must not be the agent implementing the feature, and must not inherit the implementation's assumptions.
独立测试设计协调器。遵循独立评估原则:编写测试的Agent不能是实现该功能的Agent,且不能继承实现方案的假设。

When to Use

使用场景

  • TDD red phase for a complex / non-trivial feature (multi-file, multi-branch logic, new subsystem)
  • Requirement is ambiguous enough that the implementer's tests would likely rationalize the implementation instead of catching bugs
  • User explicitly asks for "independent test design", "fresh-eyes tests", or runs
    /test-designer
Don't use for:
  • Trivial changes (one-line fix, rename) — just write the test inline
  • Bug reproduction tests — write directly from the bug report
  • Non-code changes (pure docs, pure config, pure prompt)
  • 针对复杂/非简单功能的TDD红阶段(多文件、多分支逻辑、新子系统)
  • 需求模糊,导致实现者编写的测试可能会合理化实现方案而非发现Bug
  • 用户明确要求“独立测试设计”“第三方视角测试”,或调用
    /test-designer
    指令
不适用场景:
  • 微小变更(单行修复、重命名)——直接在代码内编写测试即可
  • Bug复现测试——直接根据Bug报告编写
  • 非代码变更(纯文档、纯配置、纯提示词)

The Iron Law

铁律

The agent designing the tests must not carry the implementation's context. If you (the main Agent) are about to implement the feature, you are disqualified from designing its tests. Dispatch.
Violating this = tests that pass because they mirror the buggy implementation.
设计测试的Agent不得携带实现方案的上下文。 如果你(主Agent)即将实现该功能,则你没有资格设计其测试。必须调度其他Agent。
违反此规则会导致测试仅因与有Bug的实现一致而通过。

Steps

步骤

Step 1: Assemble the dispatch package

步骤1:准备调度包

Collect only these inputs — nothing else:
  1. Requirement description — "what to do" and acceptance criteria (not "how to do")
  2. Relevant code file paths — read-only access to the code the feature will touch or integrate with
  3. Edge case prompts — categories the dispatched agent should enumerate:
    • Boundary inputs (empty, max, min, off-by-one)
    • Concurrency / ordering (if applicable)
    • Resource lifecycle (cleanup on error, partial failure)
    • Invariants (data consistency, idempotency)
    • Adversarial inputs (malformed, oversized, mis-encoded)
Explicitly exclude:
  • The implementation plan or design you've been developing
  • Hints about which approach you've chosen
  • Code excerpts from a work-in-progress branch
  • Your own guesses about "the right way to test this"
仅收集以下输入——不得包含其他内容:
  1. 需求描述 —— “要做什么”以及验收标准(而非“怎么做”)
  2. 相关代码文件路径 —— 对功能将涉及或集成的代码拥有只读权限
  3. 边缘场景提示 —— 调度的Agent需要枚举的场景类别:
    • 边界输入(空值、最大值、最小值、差一错误)
    • 并发/顺序问题(如适用)
    • 资源生命周期(错误时清理、部分失败)
    • 不变量(数据一致性、幂等性)
    • 恶意输入(格式错误、超大体积、编码错误)
明确排除:
  • 你正在制定的实现计划或设计方案
  • 关于你选择的实现方式的提示
  • 开发中分支的代码片段
  • 你自己对“正确测试方式”的猜测

Step 2: Choose the executor

步骤2:选择执行主体

Task shapeExecutorReason
Complex, architectural implicationsIndependent Agent (e.g.,
codex-agent
or
claude-code-agent
with fresh session)
True zero-context isolation; can use strongest model at highest effort
Medium complexity, current conversation cleanIn-conversation subagentCheaper; still acceptable if main Agent hasn't yet proposed an implementation
TrivialDon't dispatch — write tests inline
Default to Independent Agent when the main Agent has already discussed or sketched implementation. Subagent isolation within the same conversation doesn't undo prior context pollution.
任务类型执行主体原因
复杂、涉及架构影响Independent Agent(例如:全新会话的
codex-agent
claude-code-agent
真正的无上下文隔离;可使用最强模型并投入最高精力
中等复杂度、当前对话无干扰对话内子Agent成本更低;若主Agent尚未提出实现方案,此方式仍可接受
简单任务不要调度 —— 直接在代码内编写测试
默认选择Independent Agent的情况:主Agent已讨论或草拟过实现方案。同一对话内的子Agent隔离无法消除之前的上下文污染。

Step 3: Dispatch with the strongest model and highest effort

步骤3:使用最强模型和最高精力调度

Test design is a correctness-critical reasoning task, not a rote mechanical one. Use:
  • Model: strongest reasoning model the runtime offers — inherit if the main Agent is already on that tier; otherwise override. Don't hardcode a specific brand name
  • Effort:
    xhigh
    (the maximum level the runtime supports). Escalation ladder:
    low
    medium
    high
    xhigh
  • Tools: Read / Grep / Glob on code paths; Write on test files only
  • Permission: read-only on non-test files; writable on test files
Example dispatch prompt skeleton:
You are designing failing tests for a feature. You will NOT see or write the
implementation. Your job is to produce executable tests that fail today and
pass only when the feature is correctly implemented.

Requirement:
<paste requirement description + acceptance criteria>

Code paths (read-only, for understanding context):
<list of file paths>

Existing test framework and conventions:
<infer from repo or specify>

Produce:
1. A test plan — enumerate the behaviors being tested (happy path + edge
   cases), grouped by category (boundary / concurrency / lifecycle /
   invariants / adversarial).
2. Executable test files that fail against the current code (or against
   an empty implementation).
3. For each test, one-line rationale explaining the bug it would catch.

Constraints:
- Do NOT propose an implementation.
- Do NOT edit files outside the test directory.
- Cover edge cases explicitly; don't only test the happy path.
- Use the project's existing test framework and style.
测试设计是正确性关键的推理任务,而非机械性工作。需使用:
  • 模型:运行环境提供的最强推理模型——若主Agent已使用该层级模型则沿用;否则覆盖设置。不要硬编码特定品牌名称
  • 精力
    xhigh
    (运行环境支持的最高级别)。优先级顺序:
    low
    medium
    high
    xhigh
  • 工具:对代码路径进行读取/全局搜索/模式匹配;仅可对测试文件进行写入操作
  • 权限:对非测试文件只读;对测试文件可写
示例调度提示模板:
You are designing failing tests for a feature. You will NOT see or write the
implementation. Your job is to produce executable tests that fail today and
pass only when the feature is correctly implemented.

Requirement:
<paste requirement description + acceptance criteria>

Code paths (read-only, for understanding context):
<list of file paths>

Existing test framework and conventions:
<infer from repo or specify>

Produce:
1. A test plan — enumerate the behaviors being tested (happy path + edge
   cases), grouped by category (boundary / concurrency / lifecycle /
   invariants / adversarial).
2. Executable test files that fail against the current code (or against
   an empty implementation).
3. For each test, one-line rationale explaining the bug it would catch.

Constraints:
- Do NOT propose an implementation.
- Do NOT edit files outside the test directory.
- Cover edge cases explicitly; don't only test the happy path.
- Use the project's existing test framework and style.

Step 4: Validate the returned tests

步骤4:验证返回的测试

Before handing the tests to the implementation phase:
  1. Run the tests — they should FAIL (red), and fail for the reason the rationale predicts. A test that fails on
    ImportError
    , missing fixture, syntax error, or "module not found" is fake red — the test isn't actually exercising the behavior it claims to. Fix the test or drop it.
  2. Scan the rationale — does each test catch a distinct failure mode? Drop duplicates.
  3. Check coverage — are all edge case categories represented? Request additions if not.
  4. Confirm the test framework matches — ensure the dispatched agent used the right runner / assertion lib / fixtures.
  5. Check for shape-to-example tests — a test that asserts on specific happy-path values (e.g., "output equals exactly
    [1, 2, 3]
    for this fixture") is shaping the test to the example, not to the requirement. Such a test passes when the implementation matches the fixture and breaks for any valid variant input. Replace with property-style assertions ("output is sorted and contains all input elements") or add a second test with a different input that exercises the same property.
将测试移交至实现阶段前:
  1. 运行测试 —— 测试必须失败(红态),且失败原因与理由描述一致。因
    ImportError
    、缺少测试夹具、语法错误或“模块未找到”而失败的测试是虚假红态——该测试并未实际验证其宣称的行为。需修复或丢弃此类测试。
  2. 检查测试理由 —— 每个测试是否针对不同的失败模式?丢弃重复测试。
  3. 检查覆盖范围 —— 是否涵盖所有边缘场景类别?若未涵盖,请求补充。
  4. 确认测试框架匹配 —— 确保调度的Agent使用了正确的测试运行器/断言库/测试夹具。
  5. 检查“示例匹配型测试” —— 断言特定正常路径值的测试(例如:“对于此测试夹具,输出恰好等于
    [1, 2, 3]
    ”)是将测试匹配示例,而非匹配需求。此类测试仅在实现与夹具一致时通过,对任何合法的变体输入都会失败。需替换为属性式断言(“输出已排序且包含所有输入元素”),或添加使用不同输入的第二个测试以验证相同属性。

Step 5: Hand off to implementation

步骤5:移交至实现阶段

With the validated failing tests in place, implementation proceeds per
test-driven-development
skill: write minimal code to make them pass (green), then regression.
在验证通过的失败测试就绪后,实现阶段遵循
test-driven-development
技能流程:编写最少代码使测试通过(绿态),然后进行回归测试。

Output Format (from the dispatched agent)

调度Agent的输出格式

Require the agent to return:
A test plan (bullet list, grouped by category) followed by the test files. Each test must include a one-line rationale comment. No implementation code. No commentary on how to implement. If assumptions about the code are needed, list them explicitly at the top of the test file.
要求Agent返回:
测试计划(按类别分组的项目符号列表),随后是测试文件。每个测试必须包含一行理由注释。不得包含实现代码。不得提供实现建议。若需要对代码做出假设,需在测试文件顶部明确列出。

Anti-patterns

反模式

  • ❌ Main Agent writes the tests after sketching the implementation — tests will mirror the implementation's assumptions
  • ❌ Dispatching with medium effort / weaker model to save cost — test design quality compounds across the whole feature's lifetime
  • ❌ Passing the work-in-progress branch contents to the dispatched agent — defeats Independent Evaluation
  • ❌ Accepting tests that pass against an empty implementation — those tests don't constrain anything
  • ❌ Skipping Step 4 validation — unvalidated tests get merged as fake green
  • ❌ Accepting "shape-to-example" tests — a test that asserts on specific happy-path values from the requirement's example data passes whenever input==fixture and breaks for any variant. Use property assertions (sorted, idempotent, contains-all-inputs) or pair the example test with a variant-input test that exercises the same invariant
  • ❌ Accepting fake red — a test that fails on
    ImportError
    , missing fixture, or "module not found" looks red but isn't testing anything. Step 4 must verify the test fails for the reason the rationale predicts
  • ❌ 主Agent草拟实现方案后编写测试——测试会镜像实现方案的假设
  • ❌ 为节省成本使用中等精力/较弱模型调度——测试设计质量会影响整个功能的生命周期
  • ❌ 将开发中分支的内容传递给调度的Agent——违反独立评估原则
  • ❌ 接受针对空实现仍能通过的测试——此类测试无任何约束作用
  • ❌ 跳过步骤4的验证——未经验证的测试会被合并为虚假绿态
  • ❌ 接受“示例匹配型测试”——针对需求示例数据断言特定正常路径值的测试,仅在输入与夹具一致时通过,对任何变体都会失败。需使用属性断言(已排序、幂等、包含所有输入),或为示例测试搭配使用不同输入的变体测试以验证相同不变量
  • ❌ 接受虚假红态测试——因
    ImportError
    、缺少测试夹具或“模块未找到”而失败的测试看似红态,但并未测试任何内容。步骤4必须验证测试的失败原因与理由描述一致

Relationship to other skills

与其他技能的关系

  • brainstorming
    → clarifies the requirement (upstream of
    test-designer
    )
  • test-driven-development
    → governs the red-green-refactor loop (downstream; consumes the failing tests)
  • systematic-debugging
    → kicks in if tests unexpectedly fail after implementation (downstream)
  • verification-before-completion
    → runs the tests at the "done" gate (downstream)
  • brainstorming
    → 明确需求(
    test-designer
    的上游技能)
  • test-driven-development
    → 管控红-绿-重构循环(下游技能;使用失败测试)
  • systematic-debugging
    → 若实现后测试意外失败时启动(下游技能)
  • verification-before-completion
    → 在“完成”节点运行测试(下游技能)

Example invocation

调用示例

User: Starting work on the new plugin dependency resolver. Design tests first.
Assistant:
  1. Requirement: "Resolver takes a plugin manifest and returns install order
     respecting deps and detecting cycles. Must handle: transitive deps,
     diamond deps, self-references, missing deps, cycles."
  2. Code paths: src/plugins.ts, .claude/plugins.json schema, tests/ dir
  3. Dispatch to an independent-agent skill (fresh session) at `xhigh` effort,
     read-only on src/, writable on tests/
  4. Agent returns: test plan (5 categories, 18 tests), tests/resolver.test.ts
     with failing assertions + per-test rationale comments
  5. Main Agent runs tests → all red → validates rationale → hands off
User: Starting work on the new plugin dependency resolver. Design tests first.
Assistant:
  1. Requirement: "Resolver takes a plugin manifest and returns install order
     respecting deps and detecting cycles. Must handle: transitive deps,
     diamond deps, self-references, missing deps, cycles."
  2. Code paths: src/plugins.ts, .claude/plugins.json schema, tests/ dir
  3. Dispatch to an independent-agent skill (fresh session) at `xhigh` effort,
     read-only on src/, writable on tests/
  4. Agent returns: test plan (5 categories, 18 tests), tests/resolver.test.ts
     with failing assertions + per-test rationale comments
  5. Main Agent runs tests → all red → validates rationale → hands off