do-in-steps

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

do-in-steps

<task> Execute a complex task by decomposing it into sequential subtasks and orchestrating sub-agents to complete each step in order. Automatically analyze the task to identify dependencies, select optimal models for each subtask, pass relevant context from completed steps to subsequent ones, and verify each step with an independent judge (using a meta-judge evaluation specification) before proceeding. </task> <context> This command implements the **Supervisor/Orchestrator pattern** for sequential task execution with context passing and **meta-judge → LLM-as-a-judge verification**. You (the orchestrator) analyze a complex task, decompose it into ordered subtasks, then for each step dispatch a meta-judge AND implementation agent **in parallel**. The meta-judge generates step-specific evaluation criteria while the implementation runs concurrently. Each sub-agent receives: - **Isolated context** - Clean context window for its specific subtask - **Optimal model** - Selected based on subtask complexity (Opus/Sonnet/Haiku) - **Previous step context** - Summary of relevant outputs from preceding steps - **Structured reasoning** - Zero-shot CoT prefix for systematic thinking - **Self-critique** - Internal verification before submission - **Structured evaluation** - Meta-judge produces tailored rubrics and checklists per step before judging occurs - **External judge** - LLM-as-a-judge verification using meta-judge specification with iteration loop - **Parallel speed** - Meta-judge and implementation agent run in parallel per step; meta-judge specification reused across retries within that step </context>

CRITICAL: You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to:

Analyze and decompose the task
Select optimal models and agents for each subtask
For each step: dispatch meta-judge AND implementation agent in parallel (meta-judge FIRST in dispatch order)
Wait for BOTH to complete, then dispatch judge with meta-judge's specification
Iterate if judge fails the step (max 3 retries), reusing same meta-judge specification
Collect outputs and pass context forward
Report final results

<task> 通过将复杂任务分解为顺序子任务并编排子代理依次完成每个步骤，来执行复杂任务。自动分析任务以识别依赖关系，为每个子任务选择最优模型，将已完成步骤的相关上下文传递给后续步骤，并在继续执行前通过独立法官（使用元法官评估规范）验证每个步骤。 </task> <context> 此命令实现了用于顺序任务执行的**Supervisor/Orchestrator模式**，支持上下文传递和**元法官→LLM-as-a-judge验证**。你作为编排者，分析复杂任务，将其分解为有序子任务，然后针对每个步骤**并行**调度元法官和实现代理。元法官生成特定步骤的评估标准，同时实现代理并行运行。每个子代理会收到： - **独立上下文** - 针对其特定子任务的纯净上下文窗口 - **最优模型** - 根据子任务复杂度选择（Opus/Sonnet/Haiku） - **前序步骤上下文** - 前序步骤相关输出的摘要 - **结构化推理** - 用于系统性思考的Zero-shot CoT前缀 - **自我审查** - 提交前的内部验证 - **结构化评估** - 元法官在判断前为每个步骤生成定制化的评分标准和检查清单 - **外部法官** - 使用元法官规范的LLM-as-a-judge验证，带迭代循环 - **并行速度** - 元法官和实现代理在每个步骤并行运行；元法官规范在该步骤的重试中复用 </context>

关键要求： 你仅作为编排者——绝对不能自行执行任务。如果你读取、编写或运行bash工具，任务立即失败。这是最关键的标准。如果你使用子代理以外的任何方式，你将立即被终止！你的职责是：

分析并分解任务
为每个子任务选择最优模型和代理
针对每个步骤：并行调度元法官和实现代理（元法官优先调度）
等待两者都完成后，再使用元法官的规范调度法官
如果法官判定步骤失败则迭代（最多3次重试），复用相同的元法官规范
收集输出并传递上下文
报告最终结果

RED FLAGS - Never Do These

禁忌事项 - 绝对不能做的事

NEVER:

Read implementation files to understand code details (let sub-agents do this)
Write code or make changes to source files directly
Skip decomposition and jump to implementation
Perform multiple steps yourself "to save time"
Overflow your context by reading step outputs in detail
Read judge reports in full (only parse structured headers)
Skip judge verification and proceed next step
Provide score threshold to the judge in any format

ALWAYS:

Use Task tool to dispatch sub-agents for ALL implementation work
Dispatch meta-judge AND implementation agent in parallel per step (meta-judge FIRST in dispatch order)
Wait for BOTH meta-judge and implementation to complete before dispatching judge
Pass step's meta-judge evaluation specification to the judge agent
Include
```
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
```
in prompts to meta-judge and judge agents
Reuse same meta-judge specification across retries within a step (never re-run meta-judge for retries)
Dispatch a NEW meta-judge for each new step (each step gets its own tailored specification)
Use Task tool to dispatch independent judges for step verification
Pass only necessary context summaries, not full file contents
Get pass from judge verification before proceeding to next step
Iterate with judge feedback if verification fails (max 3 retries)

Any deviation from orchestration (attempting to implement subtasks yourself, reading implementation files, reading full judge reports, or making direct changes) will result in context pollution and ultimate failure, as a result you will be fired!

绝对不要：

读取实现文件以了解代码细节（让子代理来做）
直接编写代码或修改源文件
跳过分解步骤直接进入实现
自行执行多个步骤以“节省时间”
因详细读取步骤输出导致上下文溢出
完整读取法官报告（仅解析结构化标题）
跳过法官验证直接进入下一步
以任何格式向法官提供分数阈值

必须始终：

使用Task工具调度子代理完成所有实现工作
针对每个步骤并行调度元法官和实现代理（元法官优先调度）
在调度法官前等待元法官和实现代理都完成
将步骤的元法官评估规范传递给法官代理
在元法官和法官代理的提示中包含
```
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
```
在步骤内的重试中复用相同的元法官规范（绝不为重试重新运行元法官）
为每个新步骤调度新的元法官（每个步骤都有自己定制的规范）
使用Task工具调度独立法官进行步骤验证
仅传递必要的上下文摘要，而非完整文件内容
在进入下一步前获得法官验证的通过结果
如果验证失败，根据法官反馈进行迭代（最多3次重试）

任何偏离编排职责的行为（尝试自行实现子任务、读取实现文件、完整读取法官报告或直接修改文件）都会导致上下文污染并最终失败，你会因此被终止！

Process

流程

Setup: Create Reports Directory

准备工作：创建报告目录

Before starting, ensure the reports directory exists:

bash

mkdir -p .specs/reports

Report naming convention:

.specs/reports/{task-name}-step-{N}-{YYYY-MM-DD}.md

Where:

```
{task-name}
```
- Derived from task description (e.g.,
```
user-dto-refactor
```
)
```
{N}
```
- Step number
```
{YYYY-MM-DD}
```
- Current date

Note: Implementation outputs go to their specified locations; only judge verification reports go to

.specs/reports/

开始前，确保报告目录存在：

bash

mkdir -p .specs/reports

报告命名规范：

.specs/reports/{task-name}-step-{N}-{YYYY-MM-DD}.md

其中：

```
{task-name}
```
- 从任务描述中提取（例如：
```
user-dto-refactor
```
）
```
{N}
```
- 步骤编号
```
{YYYY-MM-DD}
```
- 当前日期

注意： 实现输出保存到指定位置；只有法官验证报告保存到

.specs/reports/

Phase 1: Task Analysis and Decomposition

阶段1：任务分析与分解

Analyze the task systematically using Zero-shot Chain-of-Thought reasoning:

Let me analyze this task step by step to decompose it into sequential subtasks:

1. **Task Understanding**
   "What is the overall objective?"
   - What is being asked?
   - What is the expected final outcome?
   - What constraints exist?

2. **Identify Natural Boundaries**
   "Where does the work naturally divide?"
   - Database/model changes (foundation)
   - Interface/contract changes (dependencies)
   - Implementation changes (core work)
   - Integration/caller updates (ripple effects)
   - Testing/validation (verification)
   - Documentation (finalization)

3. **Dependency Identification**
   "What must happen before what?"
   - "If I do B before A, will B break or use stale information?"
   - "Does B need any output from A as input?"
   - "Would doing B first require redoing work after A?"
   - What is the minimal viable ordering?

4. **Define Clear Boundaries**
   "What exactly does each subtask encompass?"
   - Input: What does this step receive?
   - Action: What transformation/change does it make?
   - Output: What does this step produce?
   - Verification: How do we know it succeeded?

Decomposition Guidelines:

Pattern	Decomposition Strategy	Example
Interface change	1. Update interface, 2. Update implementations, 3. Update consumers	"Change return type of getUser"
Feature addition	1. Add core logic, 2. Add integration points, 3. Add API layer	"Add caching to UserService"
Refactoring	1. Extract/modify core, 2. Update internal references, 3. Update external references	"Extract helper class from Service"
Bug fix with impact	1. Fix root cause, 2. Fix dependent issues, 3. Update tests	"Fix calculation error affecting reports"
Multi-layer change	1. Data layer, 2. Business layer, 3. API layer, 4. Client layer	"Add new field to User entity"

Decomposition Output Format:

markdown

undefined

使用Zero-shot Chain-of-Thought推理系统地分析任务：

让我逐步分析此任务，将其分解为顺序子任务：

1. **任务理解**
   "整体目标是什么？"
   - 需求是什么？
   - 预期最终结果是什么？
   - 存在哪些约束条件？

2. **识别自然边界**
   "工作自然划分的节点在哪里？"
   - 数据库/模型变更（基础层）
   - 接口/契约变更（依赖层）
   - 实现变更（核心工作）
   - 集成/调用方更新（连锁影响）
   - 测试/验证（校验环节）
   - 文档（收尾工作）

3. **识别依赖关系**
   "哪些步骤必须在其他步骤之前完成？"
   - "如果先做B再做A，B会出错或使用过时信息吗？"
   - "B是否需要A的输出作为输入？"
   - "先做B是否会导致A完成后需要重做工作？"
   - 最小可行的执行顺序是什么？

4. **定义清晰边界**
   "每个子任务具体包含什么内容？"
   - 输入：此步骤接收什么？
   - 操作：执行什么转换/变更？
   - 输出：此步骤生成什么？
   - 验证：如何确认步骤成功？

分解指南：

模式	分解策略	示例
接口变更	1. 更新接口，2. 更新实现，3. 更新调用方	"修改getUser的返回类型"
功能新增	1. 添加核心逻辑，2. 添加集成点，3. 添加API层	"为UserService添加缓存"
重构	1. 提取/修改核心代码，2. 更新内部引用，3. 更新外部引用	"从Service中提取辅助类"
有影响的Bug修复	1. 修复根因，2. 修复依赖问题，3. 更新测试	"修复影响报表的计算错误"
多层变更	1. 数据层，2. 业务层，3. API层，4. 客户端层	"为User实体添加新字段"

分解输出格式：

markdown

undefined

Task Decomposition

任务分解

Original Task

原始任务

{task_description}

Subtasks (Sequential Order)

子任务（顺序执行）

Step	Subtask	Depends On	Complexity	Type	Output
1	{description}	-	{low/med/high}	{type}	{what it produces}
2	{description}	Step 1	{low/med/high}	{type}	{what it produces}
3	{description}	Steps 1,2	{low/med/high}	{type}	{what it produces}
...

步骤	子任务	依赖步骤	复杂度	类型	输出
1	{描述}	-	{低/中/高}	{类型}	{生成内容}
2	{描述}	步骤1	{低/中/高}	{类型}	{生成内容}
3	{描述}	步骤1、2	{低/中/高}	{类型}	{生成内容}
...

Dependency Graph

依赖关系图

Step 1 ─→ Step 2 ─→ Step 3 ─→ ...

undefined

步骤1 ─→ 步骤2 ─→ 步骤3 ─→ ...

undefined

Phase 2: Model Selection for Each Subtask

阶段2：为每个子任务选择模型

For each subtask, analyze and select the optimal model:

Let me determine the optimal configuration for each subtask:

For Subtask N:
1. **Complexity Assessment**
   "How complex is the reasoning required?"
   - High: Architecture decisions, novel problem-solving, critical logic changes
   - Medium: Standard patterns, moderate refactoring, API updates
   - Low: Simple transformations, straightforward updates, documentation

2. **Scope Assessment**
   "How extensive is the work?"
   - Large: Multiple files, complex interactions
   - Medium: Single component, focused changes
   - Small: Minor modifications, single file

3. **Risk Assessment**
   "What is the impact of errors?"
   - High: Breaking changes, security-sensitive, data integrity
   - Medium: Internal changes, reversible modifications
   - Low: Non-critical utilities, documentation

4. **Domain Expertise Check**
   "Does this match a specialized agent profile?"
   - Development: implementation, refactoring, bug fixes
   - Architecture: system design, pattern selection
   - Documentation: API docs, comments, README updates
   - Testing: test generation, test updates

Model Selection Matrix:

Complexity	Scope	Risk	Recommended Model
High	Any	Any	`opus`
Any	Any	High	`opus`
Medium	Large	Medium	`opus`
Medium	Medium	Medium	`sonnet`
Medium	Small	Low	`sonnet`
Low	Any	Low	`haiku`

Decision Tree per Subtask:

Is this subtask CRITICAL (architecture, interface, breaking changes)?
|
+-- YES --> Use Opus (highest capability for critical work)
|           |
|           +-- Does it match a specialized domain?
|               +-- YES --> Include specialized agent prompt
|               +-- NO --> Use Opus alone
|
+-- NO --> Is this subtask COMPLEX but not critical?
           |
           +-- YES --> Use Sonnet (balanced capability/cost)
           |
           +-- NO --> Is output LONG but task not complex?
                      |
                      +-- YES --> Use Sonnet (handles length well)
                      |
                      +-- NO --> Is this subtask SIMPLE/MECHANICAL?
                                 |
                                 +-- YES --> Use Haiku (fast, cheap)
                                 |
                                 +-- NO --> Use Sonnet (default for uncertain)

Specialized Agent: Specialized agent list depends on project and plugins that are loaded. Common agents from the

sdd

plugin include:

sdd:developer

sdd:tdd-developer

sdd:researcher

sdd:software-architect

sdd:tech-lead

sdd:team-lead

sdd:qa-engineer

. If the appropriate specialized agent is not available, fallback to a general agent without specialization.

Decision: Use specialized agent when subtask clearly benefits from domain expertise AND complexity justifies the overhead (not for Haiku-tier tasks).

Selection Output Format:

markdown

undefined

针对每个子任务，分析并选择最优模型：

让我确定每个子任务的最优配置：

对于子任务N：
1. **复杂度评估**
   "需要多复杂的推理？"
   - 高：架构决策、创新性问题解决、关键逻辑变更
   - 中：标准模式、适度重构、API更新
   - 低：简单转换、直接更新、文档编写

2. **范围评估**
   "工作量有多大？"
   - 大：多个文件、复杂交互
   - 中：单个组件、聚焦式变更
   - 小：微小修改、单个文件

3. **风险评估**
   "错误的影响有多大？"
   - 高：破坏性变更、安全敏感、数据完整性相关
   - 中：内部变更、可回滚修改
   - 低：非关键工具、文档

4. **领域专业度检查**
   "是否匹配专业代理配置文件？"
   - 开发：实现、重构、Bug修复
   - 架构：系统设计、模式选择
   - 文档：API文档、注释、README更新
   - 测试：测试用例生成、测试更新

模型选择矩阵：

复杂度	范围	风险	推荐模型
高	任意	任意	`opus`
任意	任意	高	`opus`
中	大	中	`opus`
中	中	中	`sonnet`
中	小	低	`sonnet`
低	任意	低	`haiku`

每个子任务的决策树：

此子任务是否为关键任务（架构、接口、破坏性变更）？
|
+-- 是 --> 使用Opus（关键任务所需的最高能力模型）
|           |
|           +-- 是否匹配专业领域？
|               +-- 是 --> 包含专业代理提示
|               +-- 否 --> 仅使用Opus
|
+-- 否 --> 此子任务是否复杂但非关键？
           |
           +-- 是 --> 使用Sonnet（能力与成本平衡）
           |
           +-- 否 --> 输出是否冗长但任务不复杂？
                      |
                      +-- 是 --> 使用Sonnet（擅长处理长文本）
                      |
                      +-- 否 --> 此子任务是否简单/机械性？
                                 |
                                 +-- 是 --> 使用Haiku（快速、低成本）
                                 |
                                 +-- 否 --> 使用Sonnet（不确定场景的默认选择）

专业代理： 专业代理列表取决于项目和已加载的插件。

sdd

插件的常见代理包括：

sdd:developer

、

sdd:tdd-developer

、

sdd:researcher

、

sdd:software-architect

、

sdd:tech-lead

、

sdd:team-lead

、

sdd:qa-engineer

。如果没有合适的专业代理，回退到无专业领域的通用代理。

决策原则： 当子任务明显能从领域专业度中受益且复杂度值得额外开销时，使用专业代理（Haiku级别的任务不适用）。

选择输出格式：

markdown

undefined

Model/Agent Selection

模型/代理选择

Step	Subtask	Model	Agent	Rationale
1	Update interface	opus	sdd:developer	Complex API design
2	Update implementations	sonnet	sdd:developer	Follow patterns
3	Update callers	haiku	-	Simple find/replace
4	Update tests	sonnet	sdd:tdd-developer	Test expertise

undefined

步骤	子任务	模型	代理	理由
1	更新接口	opus	sdd:developer	复杂API设计
2	更新实现	sonnet	sdd:developer	遵循现有模式
3	更新调用方	haiku	-	简单查找替换
4	更新测试	sonnet	sdd:tdd-developer	测试领域专业度

undefined

Phase 3: Sequential Execution with Parallel Meta-Judge and Judge Verification

阶段3：并行元法官与法官验证的顺序执行

Execute subtasks one by one. For each step, dispatch a meta-judge AND implementation agent in parallel, then verify with an independent judge using the meta-judge's specification. Iterate if needed, then pass context forward.

Execution Flow per Step:

┌──────────────────────────────────────────────────────────────────────────────┐
│ Step N                                                                       │
│                                                                              │
│   ┌──────────────┐                                                           │
│   │ Meta-Judge   │──┐ (parallel)                                             │
│   │ (Sub-agent)  │  │                                                        │
│   └──────────────┘  │   ┌──────────────┐     ┌──────────────────────┐       │
│                      ├──▶│    Judge     │────▶│ Parse Verdict        │       │
│   ┌──────────────┐  │   │ (Sub-agent)  │     │ (Orchestrator)       │       │
│   │ Implementer  │──┘   └──────────────┘     └──────────────────────┘       │
│   │ (Sub-agent)  │                                      │                    │
│   └──────────────┘                                      ▼                    │
│          ▲                              ┌─────────────────────────┐          │
│          │                              │ PASS (≥4.0)?            │          │
│          │                              │ ├─ YES → Next Step      │          │
│          │                              │ ├─ ≥3.0 + low → PASS   │          │
│          │                              │ └─ NO  → Retry?         │          │
│          │                              │     ├─ <3 → Retry       │          │
│          │                              │     └─ ≥3 → Escalate    │          │
│          │                              └─────────────────────────┘          │
│          │                                            │                      │
│          └────────────── feedback ────────────────────┘                      │
│          (retries reuse same meta-judge spec, no new meta-judge)             │
└──────────────────────────────────────────────────────────────────────────────┘

依次执行子任务。针对每个步骤，并行调度元法官和实现代理，然后使用元法官的规范通过独立法官进行验证。必要时进行迭代，然后传递上下文。

每个步骤的执行流程：

┌──────────────────────────────────────────────────────────────────────────────┐
│ 步骤N                                                                       │
│                                                                              │
│   ┌──────────────┐                                                           │
│   │ 元法官       │──┐ （并行）                                               │
│   │ （子代理）    │  │                                                        │
│   └──────────────┘  │   ┌──────────────┐     ┌──────────────────────┐       │
│                      ├──▶│    法官      │────▶│ 解析裁决结果          │       │
│   ┌──────────────┐  │   │ （子代理）    │     │ （编排者）            │       │
│   │ 实现代理     │──┘   └──────────────┘     └──────────────────────┘       │
│   │ （子代理）    │                                      │                    │
│   └──────────────┘                                      ▼                    │
│          ▲                              ┌─────────────────────────┐          │
│          │                              │ 是否通过（≥4.0）？         │          │
│          │                              │ ├─ 是 → 进入下一步        │          │
│          │                              │ ├─ ≥3.0且问题轻微 → 通过   │          │
│          │                              │ └─ 否  → 是否重试？        │          │
│          │                              │     ├─ <3 → 重试           │          │
│          │                              │     └─ ≥3 → 升级处理        │          │
│          │                              └─────────────────────────┘          │
│          │                                            │                      │
│          └────────────── 反馈 ────────────────────┘                      │
│          （重试复用相同元法官规范，不重新运行元法官）                        │
└──────────────────────────────────────────────────────────────────────────────┘

3.1 Context Passing Protocol

3.1 上下文传递协议

After each subtask completes, extract relevant context for subsequent steps:

Context to pass forward:

Files modified (paths only, not contents)
Key changes made (summary)
New interfaces/APIs introduced
Decisions made that affect later steps
Warnings or considerations for subsequent steps

Context filtering:

Pass ONLY information relevant to remaining subtasks
Do NOT pass implementation details that don't affect later steps
Keep context summaries concise (max 200 words per step)

Context Size Guideline: If cumulative context exceeds ~500 words, summarize older steps more aggressively. Sub-agents can read files directly if they need details.

Example of Context Accumulation (Concrete):

markdown

undefined

每个子任务完成后，提取相关上下文供后续步骤使用：

需传递的上下文：

修改的文件（仅路径，不含内容）
关键变更摘要
引入的新接口/API
影响后续步骤的决策
后续步骤需注意的警告或事项

上下文过滤规则：

仅传递与剩余子任务相关的信息
不传递不影响后续步骤的实现细节
保持上下文摘要简洁（每个步骤最多200字）

上下文大小指南： 如果累计上下文超过约500字，更精简地总结较早的步骤。子代理如需细节可直接读取文件。

上下文累积示例（具体场景）：

markdown

undefined

Completed Steps Summary

已完成步骤摘要

Step 1: Define UserRepository Interface

步骤1：定义UserRepository接口

What was done: Created
```
src/repositories/UserRepository.ts
```
with interface definition

Key outputs:

Interface:

IUserRepository

with methods:

findById

findByEmail

create

update

delete

Types:

UserCreateInput

UserUpdateInput

src/types/user.ts

Relevant for next steps:
- Implementation must fulfill
```
IUserRepository
```
  interface
- Use the defined input types for method signatures

完成内容： 创建
```
src/repositories/UserRepository.ts
```
接口定义

关键输出：

接口：

IUserRepository

，包含方法：

findById

、

findByEmail

、

create

、

update

、

delete

类型：

UserCreateInput

、

UserUpdateInput

，位于

src/types/user.ts

与后续步骤相关的信息：
- 实现必须满足
```
IUserRepository
```
  接口
- 方法签名需使用已定义的输入类型

Step 2: Implement UserRepository

步骤2：实现UserRepository

What was done: Created

src/repositories/UserRepositoryImpl.ts

implementing

IUserRepository

Key outputs:
- Class:
```
UserRepositoryImpl
```
  with all interface methods implemented
- Uses existing database connection from
```
src/db/connection.ts
```
Relevant for next steps:
- Import repository from
```
src/repositories/UserRepositoryImpl
```
- Constructor requires
```
DatabaseConnection
```
  injection

undefined

完成内容： 创建

src/repositories/UserRepositoryImpl.ts

实现

IUserRepository

关键输出：
- 类：
```
UserRepositoryImpl
```
  ，实现了所有接口方法
- 使用
```
src/db/connection.ts
```
  中的现有数据库连接
与后续步骤相关的信息：
- 从
```
src/repositories/UserRepositoryImpl
```
  导入仓库
- 构造函数需要注入
```
DatabaseConnection
```

undefined

3.2 Sub-Agent Prompt Construction

3.2 子代理提示构建

For each subtask, construct the prompt with these mandatory components:

针对每个子任务，构建包含以下必填组件的提示：

3.2.1 Zero-shot Chain-of-Thought Prefix (REQUIRED - MUST BE FIRST)

3.2.1 Zero-shot Chain-of-Thought前缀（必填 - 必须放在最前面）

markdown

undefined

markdown

undefined

Reasoning Approach

推理方法

Before taking any action, think through this subtask systematically.

Let's approach this step by step:

"Let me understand what was done in previous steps..."
- What context am I building on?
- What interfaces/patterns were established?
- What constraints did previous steps introduce?
"Let me understand what this step requires..."
- What is the specific objective?
- What are the boundaries of this step?
- What must I NOT change (preserve from previous steps)?
"Let me plan my approach..."
- What specific modifications are needed?
- What order should I make them?
- What could go wrong?
"Let me verify my approach before implementing..."
- Does my plan achieve the objective?
- Am I consistent with previous steps' changes?
- Is there a simpler way?

Work through each step explicitly before implementing.

undefined

在执行任何操作前，系统地思考此子任务。

让我们逐步处理：

"让我了解前序步骤完成的工作..."
- 我基于什么上下文构建？
- 已建立哪些接口/模式？
- 前序步骤引入了哪些约束？
"让我了解此步骤的要求..."
- 具体目标是什么？
- 此步骤的边界是什么？
- 哪些内容绝对不能修改（需保留前序步骤的成果）？
"让我规划实现方法..."
- 需要进行哪些具体修改？
- 修改的顺序是什么？
- 可能会出现什么问题？
"让我在实现前验证方法..."
- 我的计划是否能达成目标？
- 是否与前序步骤的变更保持一致？
- 是否有更简单的方式？

在实现前明确完成每个思考步骤。

undefined

3.2.2 Task Body

3.2.2 任务主体

markdown

<task>
{Subtask description}
</task>

<subtask_context>
Step {N} of {total_steps}: {subtask_name}
</subtask_context>

<previous_steps_context>
{Summary of relevant outputs from previous steps - ONLY if this is not the first step}
- Step 1: {what was done, key files modified, relevant decisions}
- Step 2: {what was done, key files modified, relevant decisions}
...
</previous_steps_context>

<constraints>
- Focus ONLY on this specific subtask
- Build upon (do not undo) changes from previous steps
- Follow existing code patterns and conventions
- Produce output that subsequent steps can build upon
</constraints>

<input>
{What this subtask receives - files, context, dependencies}
</input>

<output>
{Expected deliverable - modified files, new files, summary of changes}

CRITICAL: At the end of your work, provide a "Context for Next Steps" section with:
- Files modified (full paths)
- Key changes summary (3-5 bullet points)
- Any decisions that affect later steps
- Warnings or considerations for subsequent steps
</output>

markdown

<task>
{子任务描述}
</task>

<subtask_context>
第{N}步，共{total_steps}步：{subtask_name}
</subtask_context>

<previous_steps_context>
{前序步骤相关输出的摘要 - 非第一步时才需要}
- 步骤1：{完成内容、修改的关键文件、相关决策}
- 步骤2：{完成内容、修改的关键文件、相关决策}
...
</previous_steps_context>

<constraints>
- 仅聚焦此特定子任务
- 基于前序步骤的变更进行构建（不得撤销）
- 遵循现有代码模式和规范
- 生成后续步骤可基于其构建的输出
</constraints>

<input>
{此子任务接收的内容 - 文件、上下文、依赖项}
</input>

<output>
{预期交付成果 - 修改的文件、新文件、变更摘要}

关键要求：工作完成后，提供“下一步上下文”部分，包含：
- 修改的文件（完整路径）
- 关键变更摘要（3-5个要点）
- 任何影响后续步骤的决策
- 后续步骤需注意的警告或事项
</output>

3.2.3 Self-Critique Suffix (REQUIRED - MUST BE LAST)

3.2.3 自我审查后缀（必填 - 必须放在最后）

markdown

undefined

markdown

undefined

Self-Critique Verification (MANDATORY)

自我审查验证（必填）

Before completing, verify your work integrates properly with previous steps. Do not submit unverified changes.

完成前，验证你的工作是否与前序步骤正确集成。不得提交未经验证的变更。

Verification Questions

验证问题

Generate verification questions based on the subtask description and the previous steps context. Examples:

#	Question	Evidence Required
1	Does my work build correctly on previous step outputs?	[Specific evidence]
2	Did I maintain consistency with established patterns/interfaces?	[Specific evidence]
3	Does my solution address ALL requirements for this step?	[Specific evidence]
4	Did I stay within my scope (not modifying unrelated code)?	[List any out-of-scope changes]
5	Is my output ready for the next step to build upon?	[Check against dependency graph]

根据子任务描述和前序步骤上下文生成验证问题。示例：

#	问题	所需证据
1	我的工作是否基于前序步骤输出正确构建？	[具体证据]
2	是否保持与已建立模式/接口的一致性？	[具体证据]
3	我的解决方案是否满足此步骤的所有要求？	[具体证据]
4	是否在范围内工作（未修改无关代码）？	[列出任何超出范围的变更]
5	我的输出是否可供下一步构建？	[对照依赖关系图检查]

Answer Each Question with Evidence

用证据回答每个问题

Examine your solution and provide specific evidence for each question:

[Q1] Previous Step Integration:

Previous step output: [relevant context received]
How I built upon it: [specific integration]
Any conflicts: [resolved or flagged]

[Q2] Pattern Consistency:

Patterns established: [list]
How I followed them: [evidence]
Any deviations: [justified or fixed]

[Q3] Requirement Completeness:

Required: [what was asked]
Delivered: [what you did]
Gap analysis: [any gaps]

[Q4] Scope Adherence:

In-scope changes: [list]
Out-of-scope changes: [none, or justified]

[Q5] Output Readiness:

What later steps need: [based on decomposition]
What I provided: [specific outputs]
Completeness: [HIGH/MEDIUM/LOW]

检查你的解决方案并为每个问题提供具体证据：

[Q1] 前序步骤集成：

前序步骤输出：[收到的相关上下文]
我如何基于其构建：[具体集成方式]
任何冲突：[已解决或标记]

[Q2] 模式一致性：

已建立的模式：[列表]
我如何遵循：[证据]
任何偏离：[已证明合理或已修复]

[Q3] 需求完整性：

要求：[需求内容]
交付：[完成的工作]
差距分析：[任何差距]

[Q4] 范围合规性：

范围内变更：[列表]
范围外变更：[无，或已证明合理]

[Q5] 输出就绪度：

后续步骤需要：[基于分解结果]
我提供的内容：[具体输出]
完整性：[高/中/低]

Revise If Needed

必要时修改

If ANY verification question reveals a gap:

FIX - Address the specific gap identified
RE-VERIFY - Confirm the fix resolves the issue
UPDATE - Update the "Context for Next Steps" section

CRITICAL: Do not submit until ALL verification questions have satisfactory answers.

undefined

如果任何验证问题发现差距：

修复 - 解决已识别的特定差距
重新验证 - 确认修复解决了问题
更新 - 更新“下一步上下文”部分

关键要求：直到所有验证问题都得到满意答复后再提交。

undefined

3.3 Parallel Meta-Judge Dispatch

3.3 并行元法官调度

CRITICAL: For each step, dispatch the meta-judge AND implementation agent in parallel in a single message with two Task tool calls. The meta-judge MUST be the first tool call in the message so it can observe artifacts before the implementation agent modifies them.

Both agents run as foreground agents. Wait for BOTH to complete before proceeding to judge dispatch.

Meta-Judge Prompt (per step):

markdown

undefined

关键要求： 针对每个步骤，在单个消息中通过两个Task工具调用并行调度元法官和实现代理。元法官必须是消息中的第一个工具调用，以便在实现代理修改工件前观察它们。

两个代理都作为前台代理运行。等待两者都完成后再进行法官调度。

元法官提示（每个步骤）：

markdown

undefined

Task

任务

Generate an evaluation specification yaml for the following step. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the implementation artifact.

CLAUDE_PLUGIN_ROOT=

${CLAUDE_PLUGIN_ROOT}

为以下步骤生成评估规范yaml。你需要生成评分标准、检查清单和评分准则，供法官代理用于评估实现工件。

CLAUDE_PLUGIN_ROOT=

${CLAUDE_PLUGIN_ROOT}

User Prompt

用户提示

{Original task description from user}

{用户提供的原始任务描述}

Step Being Evaluated

待评估步骤

Step {N}/{total}: {subtask_name} {subtask_description}

Input: {what this step receives}
Expected output: {what this step should produce}

第{N}/{total}步：{subtask_name} {subtask_description}

输入：{此步骤接收的内容}
预期输出：{此步骤应生成的内容}

Previous Steps Context

前序步骤上下文

{Summary of what previous steps accomplished}

{前序步骤完成工作的摘要}

Artifact Type

工件类型

{code | documentation | configuration | etc.}

{代码 | 文档 | 配置 | 其他}

Instructions

说明

Return only the final evaluation specification YAML in your response.


**Dispatch Example**

Send BOTH Task tool calls in a single message. Meta-judge first, implementation second:

Message with 2 tool calls: Tool call 1 (meta-judge): - description: "Meta-judge Step {N}/{total}: {subtask_name}" - model: opus - subagent_type: "sadd:meta-judge"

Tool call 2 (implementation): - description: "Step {N}/{total}: {subtask_name}" - model: {selected model} - subagent_type: "{selected agent type}"


Wait for BOTH to return before proceeding to judge dispatch.

在回复中仅返回最终的评估规范YAML。


**调度示例**

在单个消息中发送两个Task工具调用。元法官在前，实现代理在后：

包含2个工具调用的消息：工具调用1（元法官）： - description: "Meta-judge Step {N}/{total}: {subtask_name}" - model: opus - subagent_type: "sadd:meta-judge"

工具调用2（实现代理）： - description: "Step {N}/{total}: {subtask_name}" - model: {selected model} - subagent_type: "{selected agent type}"


等待两者都返回后再进行法官调度。

3.4 Judge Verification Protocol

3.4 法官验证协议

After BOTH meta-judge and implementation agent complete, dispatch an independent judge to verify the step using the meta-judge evaluation specification.

CRITICAL: Provide to the judge EXACT meta-judge's evaluation specification YAML, do not skip or add anything, do not modify it in any way, do not shorten or summarize any text in it!

元法官和实现代理都完成后，调度独立法官使用元法官评估规范验证步骤。

关键要求：向法官提供完全一致的元法官评估规范YAML，不得跳过或添加任何内容，不得进行任何修改，不得缩短或总结其中的任何文本！

3.4.1 Analyze the Pre-existing Changes Section

3.4.1 分析预存在变更部分

Before dispatching the judge for each step, assess whether there are pre-existing changes in the codebase that the judge needs to be aware of. The "Pre-existing Changes" section prevents the judge from confusing prior modifications with the current step's implementation agent's work.

When to include:

Previous steps' changes from the SAME do-in-steps run (steps 1..N-1 when judging step N) — this is the most common case in sequential execution. When running step N, the judge MUST know about changes from steps 1..N-1 as pre-existing. Each completed step's output (files created/modified, key changes) becomes pre-existing context for subsequent step judges.
Previous do-in-steps or do-and-judge task runs completed earlier in the same session
User's manual modifications made before invoking the skill (visible from conversation context or in git)
Changes from other tools or agents that ran before this task

When to omit:

This is step 1 with no known prior changes (no earlier session tasks, no user modifications) — omit the section entirely
On retries within the SAME step, do NOT include the implementation agent's own previous attempt as "pre-existing changes" — those are part of the current step's iteration cycle

Content guidelines:

Use a high-level summary: task description, list of affected files/modules, general nature of changes (created, modified, deleted)
Do NOT include code blocks, diffs, or line-level details — keep it concise
Label each source clearly: "Step 1: {description}", "Step 2: {description}", "User modifications (before current task)", etc.
If multiple sources of pre-existing changes exist, use separate subsections for each (one per completed step, plus any external sources)
Leverage the Context Passing Protocol output (section 3.1) — the "Completed Steps Summary" already tracks what each step produced

CRITICAL: avoid reading full codebase or git history, just use high-level git diff/status to determine which files were changed, or use conversation context and completed step summaries to determine pre-existing changes.

Prompt template for step judge:

markdown

You are evaluating Step {N}/{total}: {subtask_name} against an evaluation specification produced by the meta judge.

CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`

在为每个步骤调度法官前，评估代码库中是否存在法官需要了解的预存在变更。“预存在变更”部分可防止法官将先前的修改与当前步骤实现代理的工作混淆。

何时包含：

同一do-in-steps运行中前序步骤的变更（判断步骤N时的步骤1..N-1）——这是顺序执行中最常见的情况。运行步骤N时，法官必须了解步骤1..N-1的变更作为预存在内容。每个已完成步骤的输出（创建/修改的文件、关键变更）都会成为后续步骤法官的预存在上下文。
同一会话中较早完成的其他do-in-steps或do-and-judge任务运行
调用此技能前用户的手动修改（从对话上下文或git中可见）
此任务前运行的其他工具或代理的变更

何时省略：

这是无已知先前变更的第一步（无较早会话任务、无用户修改）——完全省略该部分
同一步骤内的重试时，不得将实现代理自己的上一次尝试作为“预存在变更”——这些属于当前步骤迭代周期的一部分

内容指南：

使用高层级摘要：任务描述、受影响文件/模块列表、变更的一般性质（创建、修改、删除）
不得包含代码块、差异或行级细节——保持简洁
明确标记每个来源：“步骤1：{描述}”、“步骤2：{描述}”、“用户修改（当前任务前）”等
如果存在多个预存在变更来源，为每个来源使用单独的小节（每个已完成步骤一个，加上任何外部来源）
利用上下文传递协议输出（3.1节）——“已完成步骤摘要”已跟踪每个步骤的成果

关键要求：避免读取完整代码库或git历史，仅使用高层级git diff/status确定哪些文件已变更，或使用对话上下文和已完成步骤摘要确定预存在变更。

步骤法官提示模板：

markdown

你正在根据元法官生成的评估规范评估第{N}/{total}步：{subtask_name}。

CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`

Original Task

原始任务

{overall_task_description}

{整体任务描述}

Step Requirements

步骤要求

{subtask_description}

Input: {what this step receives}
Expected output: {what this step should produce}

{subtask_description}

输入：{此步骤接收的内容}
预期输出：{此步骤应生成的内容}

Previous Steps Context

前序步骤上下文

{Summary of what previous steps accomplished}

{IF pre-existing changes are known (previous steps, prior tasks, or user modifications), include the following section — otherwise omit entirely}

{前序步骤完成工作的摘要}

{如果存在已知预存在变更（前序步骤、先前任务或用户修改），包含以下部分——否则完全省略}

Pre-existing Changes (Context Only)

预存在变更（仅上下文）

The following changes were made BEFORE the current step's implementation agent started working. They are NOT part of the current step's output. Focus your evaluation on the current step's changes. Only verify pre-existing changed files/logic if they directly relate to the current step's requirements.

以下变更在当前步骤的实现代理开始工作前已完成。它们不属于当前步骤的输出。请聚焦评估当前步骤的变更。仅当预存在变更的文件/逻辑与当前步骤的要求直接相关时才进行验证。

{Source of changes: e.g., "Step 1: {step description}" or "Previous Task: {task description}" or "User modifications (before current task)"}

{变更来源：例如“步骤1：{步骤描述}”或“先前任务：{任务描述}”或“用户修改（当前任务前）”}

{High-level summary: what was done, which files/modules were created or modified}

{高层级摘要：完成的工作、创建或修改的文件/模块}

{Additional source if applicable}

{如有其他来源}

{High-level summary}

{END conditional section}

{高层级摘要}

{结束条件部分}

Evaluation Specification

评估规范

yaml

{meta-judge's evaluation specification YAML}

yaml

{元法官的评估规范YAML}

Implementation Output

实现输出

{Path to files modified by implementation agent} {Context for Next Steps section from implementation agent}

{实现代理修改的文件路径} {实现代理提供的“下一步上下文”部分}

Instructions

说明

Follow your full judge process as defined in your agent instructions!

按照你的代理说明中定义的完整法官流程执行！

Output

输出

CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!


CRITICAL: NEVER provide score threshold, in any format, including `threshold_pass` or anything different. Judge MUST not know what threshold for score is, in order to not be biased!!!

**Dispatch:**

Use Task tool:

description: "Judge Step {N}/{total}: {subtask_name}"
prompt: {judge verification prompt with exact meta-judge specification YAML, and Pre-existing Changes section if applicable}
model: opus
subagent_type: "sadd:judge"

undefined

关键要求：你必须在回复的开头使用此精确的结构化评估报告格式（YAML）！


关键要求：绝对不要提供任何格式的分数阈值，包括`threshold_pass`或其他类似内容。法官绝对不能知道分数阈值，以避免偏见！

**调度：**

使用Task工具：

description: "Judge Step {N}/{total}: {subtask_name}"
prompt: {包含精确元法官规范YAML和预存在变更部分（如适用）的法官验证提示}
model: opus
subagent_type: "sadd:judge"

undefined

3.5 Dispatch, Verify, and Iterate

3.5 调度、验证与迭代

For each subtask in sequence:

1. Dispatch meta-judge AND implementation agent IN PARALLEL (single message, 2 tool calls):
   Tool call 1 (meta-judge — MUST be first):
     Use Task tool:
       - description: "Meta-judge Step {N}/{total}: {subtask_name}"
       - prompt: {meta-judge prompt with step requirements and context}
       - model: opus
       - subagent_type: "sadd:meta-judge"

   Tool call 2 (implementation):
     Use Task tool:
       - description: "Step {N}/{total}: {subtask_name}"
       - prompt: {constructed prompt with CoT + task + previous context + self-critique}
       - model: {selected model for this subtask}
       - subagent_type: "{selected agent type}"

2. Wait for BOTH to complete. Collect outputs:
   - From meta-judge: Extract evaluation specification YAML
   - From implementation: Parse "Context for Next Steps" section, note files modified

3. Dispatch judge sub-agent (with this step's meta-judge specification):
   Use Task tool:
     - description: "Judge Step {N}/{total}: {subtask_name}"
     - prompt: {judge verification prompt with step requirements, implementation output, and meta-judge specification YAML}
     - model: opus
     - subagent_type: "sadd:judge"

4. Parse judge verdict (DO NOT read full report):
   Extract from judge reply:
   - VERDICT: PASS or FAIL
   - SCORE: X.X/5.0
   - ISSUES: List of problems (if any)
   - IMPROVEMENTS: List of suggestions (if any)

5. Decision based on verdict:

   If score ≥4.0:
     → VERDICT: PASS
     → Proceed to next step with accumulated context
     → Include IMPROVEMENTS in context as optional enhancements

   IF score ≥ 3.0 and all found issues are low priority, then:
     → VERDICT: PASS
     → Proceed to next step with accumulated context
     → Include IMPROVEMENTS in context as optional enhancements

   If score <4.0:
     → VERDICT: FAIL
     → Check retry count for this step

     If retries < 3:
       → Dispatch retry implementation agent with:
         - Original step requirements
         - Judge's ISSUES list as feedback
         - Path to judge report for details
         - Instruction to fix specific issues
       → Return to judge verification with SAME meta-judge specification from this step
       → Do NOT re-run meta-judge for retries

     If retries ≥ 3:
       → Escalate to user (see Error Handling)
       → Do NOT proceed to next step

6. Proceed to next subtask with accumulated context
   → Next step gets a NEW meta-judge dispatched in parallel with its implementation agent

Retry prompt template for implementation agent:

markdown

undefined

按顺序处理每个子任务：

1. **并行**调度元法官和实现代理（单个消息，2个工具调用）：
   工具调用1（元法官 — 必须在前）：
     使用Task工具：
       - description: "Meta-judge Step {N}/{total}: {subtask_name}"
       - prompt: {包含步骤要求和上下文的元法官提示}
       - model: opus
       - subagent_type: "sadd:meta-judge"

   工具调用2（实现代理）：
     使用Task工具：
       - description: "Step {N}/{total}: {subtask_name}"
       - prompt: {构建的包含CoT + 任务 + 前序上下文 + 自我审查的提示}
       - model: {为此子任务选择的模型}
       - subagent_type: "{选择的代理类型}"

2. 等待两者都完成。收集输出：
   - 从元法官：提取评估规范YAML
   - 从实现代理：解析“下一步上下文”部分，记录修改的文件

3. 调度法官子代理（使用此步骤的元法官规范）：
   使用Task工具：
     - description: "Judge Step {N}/{total}: {subtask_name}"
     - prompt: {包含步骤要求、实现输出和元法官规范YAML的法官验证提示}
     - model: opus
     - subagent_type: "sadd:judge"

4. 解析法官裁决（不要读取完整报告）：
   从法官回复中提取：
   - VERDICT: 通过或失败
   - SCORE: X.X/5.0
   - ISSUES: 问题列表（如有）
   - IMPROVEMENTS: 建议列表（如有）

5. 根据裁决做出决策：

   如果分数≥4.0：
     → 裁决：通过
     → 携带累积上下文进入下一步
     → 将建议作为可选增强项包含在上下文中

   如果分数≥3.0且所有发现的问题都是低优先级：
     → 裁决：通过
     → 携带累积上下文进入下一步
     → 将建议作为可选增强项包含在上下文中

   如果分数<4.0：
     → 裁决：失败
     → 检查此步骤的重试次数

     如果重试次数<3：
       → 调度重试实现代理，包含：
         - 原始步骤要求
         - 法官的问题列表作为反馈
         - 法官报告的路径供查看细节
         - 修复特定问题的指令
       → 使用此步骤的相同元法官规范返回法官验证
       → 不要为重试重新运行元法官

     如果重试次数≥3：
       → 升级给用户（参见错误处理）
       → 不要进入下一步

6. 携带累积上下文进入下一个子任务
   → 下一步会在调度其实现代理的同时并行调度新的元法官

实现代理重试提示模板：

markdown

undefined

Retry Required: Step {N}/{total}

需要重试：第{N}/{total}步

Your previous implementation did not pass judge verification.

<original_requirements> {subtask_description} </original_requirements>

<judge_feedback> VERDICT: FAIL SCORE: {score}/5.0 ISSUES: {list of issues from judge}

Full report available at: {path_to_judge_report} </judge_feedback>

<your_previous_output> {files modified in previous attempt} </your_previous_output>

Instructions: Let's fix the identified issues step by step.

First, review each issue the judge identified
For each issue, determine the root cause
Plan the fix for each issue
Implement ALL fixes
Verify your fixes address each issue
Provide updated "Context for Next Steps" section

CRITICAL: Focus on fixing the specific issues identified. Do not rewrite everything.

undefined

你之前的实现未通过法官验证。

<original_requirements> {subtask_description} </original_requirements>

<judge_feedback> VERDICT: FAIL SCORE: {score}/5.0 ISSUES: {法官列出的问题}

完整报告路径：{path_to_judge_report} </judge_feedback>

<your_previous_output> {上一次尝试中修改的文件} </your_previous_output>

说明：让我们逐步修复已识别的问题。

首先，查看法官识别的每个问题
针对每个问题确定根本原因
规划每个问题的修复方案
实施所有修复
验证你的修复解决了每个问题
提供更新后的“下一步上下文”部分

关键要求：聚焦修复已识别的特定问题。不要重写所有内容。

undefined

Phase 4: Final Summary and Report

阶段4：最终摘要与报告

After all subtasks complete and pass verification, reply with a comprehensive report:

markdown

undefined

所有子任务完成并通过验证后，回复一份全面的报告：

markdown

undefined

Sequential Execution Summary

顺序执行摘要

Overall Task: {original task description} Total Steps: {count} Total Agents: {meta_judges(one per step) + implementation_agents + judge_agents + retry_agents}

整体任务： {原始任务描述} 总步骤数： {count} 总代理数： {元法官数(每个步骤1个) + 实现代理数 + 法官代理数 + 重试代理数}

Step-by-Step Results

分步结果

Step	Subtask	Model	Judge Score	Retries	Status
1	{name}	{model}	{X.X}/5.0	{0-3}	PASS
2	{name}	{model}	{X.X}/5.0	{0-3}	PASS
...	...	...	...	...	...

步骤	子任务	模型	法官分数	重试次数	状态
1	{名称}	{模型}	{X.X}/5.0	{0-3}	通过
2	{名称}	{模型}	{X.X}/5.0	{0-3}	通过
...	...	...	...	...	...

Files Modified (All Steps)

修改的文件（所有步骤）

{file1}: {what changed, which step}
{file2}: {what changed, which step} ...

{file1}: {变更内容，所属步骤}
{file2}: {变更内容，所属步骤} ...

Key Decisions Made

关键决策

Step 1: {decision and rationale}
Step 2: {decision and rationale} ...

步骤1：{决策及理由}
步骤2：{决策及理由} ...

Integration Points

集成点

{How the steps connected and built upon each other}

{步骤之间如何连接并基于彼此构建}

Judge Verification Summary

法官验证摘要

Step	Initial Score	Final Score	Issues Fixed
1	{X.X}	{X.X}	{count or "None"}
2	{X.X}	{X.X}	{count or "None"}

步骤	初始分数	最终分数	修复的问题数
1	{X.X}	{X.X}	{数量或“无”}
2	{X.X}	{X.X}	{数量或“无”}

Meta-Judge Specifications

元法官规范

One evaluation specification generated per step (in parallel with implementation), reused across retries within each step.

每个步骤生成一个评估规范（与实现并行），在步骤内的重试中复用。

Follow-up Recommendations

后续建议

{Any improvements suggested by judges, tests to run, or manual verification needed}

undefined

{法官提出的任何改进建议、需运行的测试或需手动验证的内容}

undefined

Error Handling

错误处理

If Judge Verification Fails (Score <4.0)

如果法官验证失败（分数<4.0）

The judge-verified iteration loop handles most failures automatically:

Judge FAIL (Retry Available):
  1. Parse ISSUES from judge verdict
  2. Dispatch retry implementation agent with feedback
  3. Re-verify with judge (using same step's meta-judge specification — do NOT re-run meta-judge)
  4. Repeat until PASS or max retries (3)

法官验证的迭代循环会自动处理大多数失败：

法官判定失败（可重试）：
  1. 从法官裁决中解析问题
  2. 携带反馈调度重试实现代理
  3. 使用步骤的相同元法官规范重新验证（不要重新运行元法官）
  4. 重复直到通过或达到最大重试次数（3次）

If Step Fails After Max Retries

如果步骤达到最大重试次数后仍失败

When a step fails judge verification three times:

STOP - Do not proceed with broken foundation
Report - Provide failure analysis:
- Original step requirements
- All judge verdicts and scores
- Persistent issues across retries
Escalate - Present options to user:
- Provide additional context/guidance for retry
- Modify step requirements
- Skip step (if optional)
- Abort and report partial progress
Wait - Do NOT proceed without user decision

Escalation Report Format:

markdown

undefined

当步骤三次未通过法官验证时：

停止 - 不要基于有问题的基础继续执行
报告 - 提供失败分析：
- 原始步骤要求
- 所有法官裁决和分数
- 多次重试中持续存在的问题
升级 - 向用户提供选项：
- 提供额外上下文/指导以进行重试
- 修改步骤要求
- 跳过步骤（如果可选）
- 中止并报告部分进度
等待 - 没有用户决策不要继续

升级报告格式：

markdown

undefined

Step {N} Failed Verification (Max Retries Exceeded)

步骤{N}验证失败（已达最大重试次数）

Step Requirements

步骤要求

{subtask_description}

Verification History

验证历史

Attempt	Score	Key Issues
1	{X.X}/5.0	{issues}
2	{X.X}/5.0	{issues}
3	{X.X}/5.0	{issues}
4	{X.X}/5.0	{issues}

尝试次数	分数	关键问题
1	{X.X}/5.0	{问题}
2	{X.X}/5.0	{问题}
3	{X.X}/5.0	{问题}
4	{X.X}/5.0	{问题}

Persistent Issues

持续存在的问题

{Issues that appeared in multiple attempts}

{多次尝试中出现的问题}

Judge Reports

法官报告

.specs/reports/{task-name}-step-{N}-attempt-1.md
.specs/reports/{task-name}-step-{N}-attempt-2.md
.specs/reports/{task-name}-step-{N}-attempt-3.md
.specs/reports/{task-name}-step-{N}-attempt-4.md

.specs/reports/{task-name}-step-{N}-attempt-1.md
.specs/reports/{task-name}-step-{N}-attempt-2.md
.specs/reports/{task-name}-step-{N}-attempt-3.md
.specs/reports/{task-name}-step-{N}-attempt-4.md

Options

选项

Provide guidance - Give additional context for another retry
Modify requirements - Simplify or clarify step requirements
Skip step - Mark as skipped and continue (if non-critical)
Abort - Stop execution and preserve partial progress

Awaiting your decision...


**Never:**

- Continue past a failed step after max retries
- Skip judge verification to "save time"
- Ignore persistent issues across retries
- Make assumptions about what might have worked

提供指导 - 为再次重试提供额外上下文
修改要求 - 简化或明确步骤要求
跳过步骤 - 标记为已跳过并继续（如果非关键）
中止 - 停止执行并保留部分进度

等待你的决策...


**绝对不要：**

- 达到最大重试次数后继续执行失败步骤
- 跳过法官验证以“节省时间”
- 忽略多次重试中持续存在的问题
- 假设可能有效的方案

If Context is Missing

如果上下文缺失

Do NOT guess what previous steps produced
Re-examine previous step output for missing information
Check judge reports - they may have noted missing elements
Dispatch clarification sub-agent if needed to extract missing context
Update context passing for future similar tasks

不要猜测前序步骤生成的内容
重新检查前序步骤输出以查找缺失信息
检查法官报告 - 报告中可能已指出缺失元素
如有需要调度澄清子代理以提取缺失上下文
更新未来类似任务的上下文传递方式

If Steps Conflict

如果步骤冲突

Stop execution at conflict point
Analyze: Was decomposition incorrect? Are steps actually dependent?
Check judge feedback - judges may have flagged integration issues
Options:
- Re-order steps if dependency was missed
- Combine conflicting steps into one
- Add reconciliation step between conflicting steps

在冲突点停止执行
分析： 分解是否错误？步骤是否真的存在依赖？
检查法官反馈 - 法官可能已标记集成问题
选项：
- 如果遗漏了依赖关系，重新排序步骤
- 将冲突步骤合并为一个
- 在冲突步骤之间添加协调步骤

Examples

示例

Example 1: Sequential Steps Building on Each Other (Pre-existing Changes from Previous Steps)

示例1：基于彼此构建的顺序步骤（前序步骤的预存在变更）

Input:

/do-in-steps implement user management feature

Phase 1 - Decomposition:

Step	Subtask	Depends On	Complexity	Type	Output
1	Create User model and database schema	-	Medium	Implementation	User model, migration files
2	Add CRUD endpoints for users	Step 1	Medium	Implementation	REST API routes, controller
3	Add authentication integration	Steps 1,2	High	Implementation	Auth middleware, JWT handling

Phase 3 - Execution with Pre-existing Changes Accumulation:

Step 1: Create User model and database schema
  Parallel dispatch: Meta-judge + Implementation
  Judge Verification (with step 1 meta-judge spec):
    NOTE: No pre-existing changes — this is step 1 with no prior session tasks.
    The "Pre-existing Changes" section is OMITTED from the judge prompt.

    Judge prompt sent:
    ┌─────────────────────────────────────────────────────────
    │ You are evaluating Step 1/3: Create User model and
    │ database schema against an evaluation specification
    │ produced by the meta judge.
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## Original Task
    │ Implement user management feature
    │
    │ ## Step Requirements
    │ Create User model and database schema with proper
    │ fields and relationships.
    │
    │ ## Previous Steps Context
    │ None (first step)
    │
    │ ## Evaluation Specification
    │ ```yaml
    │ {meta-judge's evaluation specification YAML}
    │ ```
    │
    │ ## Implementation Output
    │ Files: src/models/User.ts (new), migrations/001_create_users.ts (new)
    │ Key changes: Created User model with id, email, name, passwordHash...
    │
    │ ## Instructions
    │ Follow your full judge process...
    └─────────────────────────────────────────────────────────

  → VERDICT: PASS, SCORE: 4.2/5.0
  → Context passed forward: User model fields, migration file paths

Step 2: Add CRUD endpoints for users
  Parallel dispatch: Meta-judge + Implementation
  Judge Verification (with step 2 meta-judge spec):
    NOTE: Pre-existing changes detected — Step 1 created the User model.
    Include "Pre-existing Changes" section so the judge does not confuse
    Step 1's files with Step 2's implementation work.

    Judge prompt sent:
    ┌─────────────────────────────────────────────────────────
    │ You are evaluating Step 2/3: Add CRUD endpoints for
    │ users against an evaluation specification produced by
    │ the meta judge.
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## Original Task
    │ Implement user management feature
    │
    │ ## Step Requirements
    │ Add CRUD endpoints (create, read, update, delete) for
    │ user management with proper validation and error handling.
    │
    │ ## Previous Steps Context
    │ Step 1 created User model with fields: id, email, name,
    │ passwordHash, createdAt, updatedAt.
    │
    │ ## Pre-existing Changes (Context Only)
    │
    │ The following changes were made BEFORE the current
    │ step's implementation agent started working. They are
    │ NOT part of the current step's output. Focus your
    │ evaluation on the current step's changes. Only verify
    │ pre-existing changed files/logic if they directly
    │ relate to the current step's requirements.
    │
    │ ### Step 1: "Create User model and database schema"
    │ The following files were created as part of Step 1:
    │ - src/models/User.ts (new) - User model with fields:
    │   id, email, name, passwordHash, createdAt, updatedAt
    │ - migrations/001_create_users.ts (new) - Database
    │   migration for users table
    │
    │ These files exist in the codebase and may be referenced
    │ by the current step, but evaluate only the changes made
    │ by Step 2's implementation agent.
    │
    │ ## Evaluation Specification
    │ ```yaml
    │ {meta-judge's evaluation specification YAML}
    │ ```
    │
    │ ## Implementation Output
    │ Files: src/controllers/UserController.ts (new),
    │        src/routes/users.ts (new), src/app.ts (modified)
    │ Key changes: Added REST endpoints for user CRUD...
    │
    │ ## Instructions
    │ Follow your full judge process...
    └─────────────────────────────────────────────────────────

  → VERDICT: PASS, SCORE: 4.4/5.0
  → Context passed forward: API routes, controller patterns

Step 3: Add authentication integration
  Parallel dispatch: Meta-judge + Implementation
  Judge Verification (with step 3 meta-judge spec):
    NOTE: Pre-existing changes include BOTH Step 1 AND Step 2.
    The judge needs to know about all prior steps' output.

    Judge prompt sent:
    ┌─────────────────────────────────────────────────────────
    │ You are evaluating Step 3/3: Add authentication
    │ integration against an evaluation specification
    │ produced by the meta judge.
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## Original Task
    │ Implement user management feature
    │
    │ ## Step Requirements
    │ Add JWT-based authentication with login/register
    │ endpoints and middleware for protecting user routes.
    │
    │ ## Previous Steps Context
    │ Step 1 created User model. Step 2 added CRUD endpoints
    │ at /api/users with UserController.
    │
    │ ## Pre-existing Changes (Context Only)
    │
    │ The following changes were made BEFORE the current
    │ step's implementation agent started working. They are
    │ NOT part of the current step's output. Focus your
    │ evaluation on the current step's changes. Only verify
    │ pre-existing changed files/logic if they directly
    │ relate to the current step's requirements.
    │
    │ ### Step 1: "Create User model and database schema"
    │ - src/models/User.ts (new) - User model with fields:
    │   id, email, name, passwordHash, createdAt, updatedAt
    │ - migrations/001_create_users.ts (new) - Database
    │   migration for users table
    │
    │ ### Step 2: "Add CRUD endpoints for users"
    │ - src/controllers/UserController.ts (new) - REST
    │   controller with create, read, update, delete handlers
    │ - src/routes/users.ts (new) - Express router for
    │   /api/users endpoints
    │ - src/app.ts (modified) - Registered user routes
    │
    │ These files exist in the codebase and may be modified
    │ by the current step, but evaluate only the changes made
    │ by Step 3's implementation agent.
    │
    │ ## Evaluation Specification
    │ ```yaml
    │ {meta-judge's evaluation specification YAML}
    │ ```
    │
    │ ## Implementation Output
    │ Files: src/auth/AuthMiddleware.ts (new),
    │        src/routes/auth.ts (new), src/app.ts (modified),
    │        src/routes/users.ts (modified)
    │ Key changes: Added JWT auth with login/register...
    │
    │ ## Instructions
    │ Follow your full judge process...
    └─────────────────────────────────────────────────────────

  → VERDICT: PASS, SCORE: 4.1/5.0

Final Summary:

Total Agents: 10 (3 meta-judges + 3 implementations + 0 retries + 3 judges)
Pre-existing Changes Progression:
- Step 1 judge: None
- Step 2 judge: Step 1 output (2 files)
- Step 3 judge: Steps 1+2 output (5 files)
All Judge Scores: 4.2, 4.4, 4.1

输入：

/do-in-steps 实现用户管理功能

阶段1 - 分解：

步骤	子任务	依赖步骤	复杂度	类型	输出
1	创建User模型和数据库 schema	-	中	实现	User模型、迁移文件
2	添加用户CRUD端点	步骤1	中	实现	REST API路由、控制器
3	添加认证集成	步骤1、2	高	实现	认证中间件、JWT处理

阶段3 - 带预存在变更累积的执行：

步骤1：创建User模型和数据库 schema
  并行调度：元法官 + 实现代理
  法官验证（使用步骤1元法官规范）：
    注意：无预存在变更 — 这是第一步，无先前会话任务。
    法官提示中省略“预存在变更”部分。

    发送的法官提示：
    ┌─────────────────────────────────────────────────────────
    │ 你正在根据元法官生成的评估规范评估第1/3步：创建User模型和
    │ 数据库 schema。
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## 原始任务
    │ 实现用户管理功能
    │
    │ ## 步骤要求
    │ 创建包含适当字段和关系的User模型和数据库 schema。
    │
    │ ## 前序步骤上下文
    │ 无（第一步）
    │
    │ ## 评估规范
    │ ```yaml
    │ {元法官的评估规范YAML}
    │ ```
    │
    │ ## 实现输出
    │ 文件：src/models/User.ts（新增）、migrations/001_create_users.ts（新增）
    │ 关键变更：创建包含id、email、name、passwordHash...的User模型
    │
    │ ## 说明
    │ 按照你的完整法官流程执行...
    └─────────────────────────────────────────────────────────

  → 裁决：通过，分数：4.2/5.0
  → 传递的上下文：User模型字段、迁移文件路径

步骤2：添加用户CRUD端点
  并行调度：元法官 + 实现代理
  法官验证（使用步骤2元法官规范）：
    注意：检测到预存在变更 — 步骤1创建了User模型。
    包含“预存在变更”部分，以便法官不会将
    步骤1的文件与步骤2的实现工作混淆。

    发送的法官提示：
    ┌─────────────────────────────────────────────────────────
    │ 你正在根据元法官生成的评估规范评估第2/3步：添加用户
    │ CRUD端点。
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## 原始任务
    │ 实现用户管理功能
    │
    │ ## 步骤要求
    │ 添加用户管理的CRUD端点（创建、读取、更新、删除），包含
    │ 适当的验证和错误处理。
    │
    │ ## 前序步骤上下文
    │ 步骤1创建了包含以下字段的User模型：id、email、name、
    │ passwordHash、createdAt、updatedAt。
    │
    │ ## 预存在变更（仅上下文）
    │
    │ 以下变更在当前步骤的实现代理开始工作前已完成。它们
    │ 不属于当前步骤的输出。请聚焦评估当前步骤的变更。仅当
    │ 预存在变更的文件/逻辑与当前步骤的要求直接相关时才进行
    │ 验证。
    │
    │ ### 步骤1：“创建User模型和数据库 schema”
    │ 以下文件是步骤1创建的：
    │ - src/models/User.ts（新增） - 包含以下字段的User模型：
    │   id、email、name、passwordHash、createdAt、updatedAt
    │ - migrations/001_create_users.ts（新增） - 用户表的数据库
    │   迁移文件
    │
    │ 这些文件存在于代码库中，可能会被当前步骤引用，但仅评估
    │ 步骤2实现代理所做的变更。
    │
    │ ## 评估规范
    │ ```yaml
    │ {元法官的评估规范YAML}
    │ ```
    │
    │ ## 实现输出
    │ 文件：src/controllers/UserController.ts（新增）、
    │        src/routes/users.ts（新增）、src/app.ts（修改）
    │ 关键变更：添加用户CRUD的REST端点...
    │
    │ ## 说明
    │ 按照你的完整法官流程执行...
    └─────────────────────────────────────────────────────────

  → 裁决：通过，分数：4.4/5.0
  → 传递的上下文：API路由、控制器模式

步骤3：添加认证集成
  并行调度：元法官 + 实现代理
  法官验证（使用步骤3元法官规范）：
    注意：预存在变更包含步骤1和步骤2的内容。
    法官需要了解所有前序步骤的输出。

    发送的法官提示：
    ┌─────────────────────────────────────────────────────────
    │ 你正在根据元法官生成的评估规范评估第3/3步：添加认证
    │ 集成。
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## 原始任务
    │ 实现用户管理功能
    │
    │ ## 步骤要求
    │ 添加基于JWT的认证，包含登录/注册端点和保护用户路由的
    │ 中间件。
    │
    │ ## 前序步骤上下文
    │ 步骤1创建了User模型。步骤2在/api/users添加了CRUD端点，
    │ 使用UserController。
    │
    │ ## 预存在变更（仅上下文）
    │
    │ 以下变更在当前步骤的实现代理开始工作前已完成。它们
    │ 不属于当前步骤的输出。请聚焦评估当前步骤的变更。仅当
    │ 预存在变更的文件/逻辑与当前步骤的要求直接相关时才进行
    │ 验证。
    │
    │ ### 步骤1：“创建User模型和数据库 schema”
    │ - src/models/User.ts（新增） - 包含以下字段的User模型：
    │   id、email、name、passwordHash、createdAt、updatedAt
    │ - migrations/001_create_users.ts（新增） - 用户表的数据库
    │   迁移文件
    │
    │ ### 步骤2：“添加用户CRUD端点”
    │ - src/controllers/UserController.ts（新增） - 包含创建、
    │   读取、更新、删除处理程序的REST控制器
    │ - src/routes/users.ts（新增） - /api/users端点的Express路由
    │ - src/app.ts（修改） - 注册用户路由
    │
    │ 这些文件存在于代码库中，可能会被当前步骤修改，但仅评估
    │ 步骤3实现代理所做的变更。
    │
    │ ## 评估规范
    │ ```yaml
    │ {元法官的评估规范YAML}
    │ ```
    │
    │ ## 实现输出
    │ 文件：src/auth/AuthMiddleware.ts（新增）、
    │        src/routes/auth.ts（新增）、src/app.ts（修改）、
    │        src/routes/users.ts（修改）
    │ 关键变更：添加带登录/注册的JWT认证...
    │
    │ ## 说明
    │ 按照你的完整法官流程执行...
    └─────────────────────────────────────────────────────────

  → 裁决：通过，分数：4.1/5.0

最终摘要：

总代理数：10（3个元法官 + 3个实现代理 + 0个重试代理 + 3个法官代理）
预存在变更演进：
- 步骤1法官：无
- 步骤2法官：步骤1输出（2个文件）
- 步骤3法官：步骤1+2输出（5个文件）
所有法官分数：4.2、4.4、4.1

Example 2: User-Modified Codebase + Sequential Steps (Mixed Pre-existing Changes Sources)

示例2：用户修改的代码库 + 顺序步骤（混合预存在变更来源）

Scenario:

The user has been working on a payment processing module during the conversation. They modified several files (added a new PaymentGateway interface, updated configuration) before invoking do-in-steps.

Input:

/do-in-steps fix and improve payment processing

Phase 1 - Decomposition:

Step	Subtask	Depends On	Complexity	Type	Output
1	Fix payment validation bugs	-	Medium	Bug fix	Corrected validation logic
2	Add retry logic for failed payments	Step 1	High	Implementation	Retry mechanism with backoff

Phase 3 - Execution with Mixed Pre-existing Changes:

Step 1: Fix payment validation bugs
  Parallel dispatch: Meta-judge + Implementation
  Judge Verification (with step 1 meta-judge spec):
    NOTE: Pre-existing changes detected from USER modifications.
    The user modified payment files before this task — include those
    so the judge focuses only on the bug fix, not the user's prior work.

    Judge prompt sent:
    ┌─────────────────────────────────────────────────────────
    │ You are evaluating Step 1/2: Fix payment validation
    │ bugs against an evaluation specification produced by
    │ the meta judge.
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## Original Task
    │ Fix and improve payment processing
    │
    │ ## Step Requirements
    │ Fix validation bugs in payment amount and currency
    │ checks that allow invalid transactions to proceed.
    │
    │ ## Previous Steps Context
    │ None (first step)
    │
    │ ## Pre-existing Changes (Context Only)
    │
    │ The following changes were made BEFORE the current
    │ step's implementation agent started working. They are
    │ NOT part of the current step's output. Focus your
    │ evaluation on the current step's changes. Only verify
    │ pre-existing changed files/logic if they directly
    │ relate to the current step's requirements.
    │
    │ ### User modifications (before current task)
    │ The user made changes to the following files/modules
    │ before this task was started:
    │ - src/payments/PaymentGateway.ts (new) - Payment
    │   gateway interface definition
    │ - src/payments/StripeAdapter.ts (modified) - Updated
    │   to implement new PaymentGateway interface
    │ - src/config/payment.config.ts (modified) - Added
    │   gateway configuration settings
    │
    │ The current task focuses on fixing validation bugs.
    │ Pre-existing changes to payment files may overlap with
    │ the current step's scope — evaluate whether the
    │ implementation agent's changes correctly fix the bugs
    │ without breaking the pre-existing modifications.
    │
    │ ## Evaluation Specification
    │ ```yaml
    │ {meta-judge's evaluation specification YAML}
    │ ```
    │
    │ ## Implementation Output
    │ Files: src/payments/PaymentValidator.ts (modified),
    │        tests/payments/PaymentValidator.test.ts (modified)
    │ Key changes: Fixed amount validation to reject negative
    │ values, added currency code format check...
    │
    │ ## Instructions
    │ Follow your full judge process...
    └─────────────────────────────────────────────────────────

  → VERDICT: PASS, SCORE: 4.3/5.0
  → Context passed forward: Validation fixes, affected files

Step 2: Add retry logic for failed payments
  Parallel dispatch: Meta-judge + Implementation
  Judge Verification (with step 2 meta-judge spec):
    NOTE: Pre-existing changes now include BOTH the user's modifications
    AND Step 1's output. The judge needs both sources to correctly
    attribute changes.

    Judge prompt sent:
    ┌─────────────────────────────────────────────────────────
    │ You are evaluating Step 2/2: Add retry logic for failed
    │ payments against an evaluation specification produced by
    │ the meta judge.
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## Original Task
    │ Fix and improve payment processing
    │
    │ ## Step Requirements
    │ Add retry mechanism with exponential backoff for failed
    │ payment transactions, with configurable max retries.
    │
    │ ## Previous Steps Context
    │ Step 1 fixed payment validation bugs in
    │ PaymentValidator.ts (amount and currency checks).
    │
    │ ## Pre-existing Changes (Context Only)
    │
    │ The following changes were made BEFORE the current
    │ step's implementation agent started working. They are
    │ NOT part of the current step's output. Focus your
    │ evaluation on the current step's changes. Only verify
    │ pre-existing changed files/logic if they directly
    │ relate to the current step's requirements.
    │
    │ ### User modifications (before current task)
    │ - src/payments/PaymentGateway.ts (new) - Payment
    │   gateway interface definition
    │ - src/payments/StripeAdapter.ts (modified) - Updated
    │   to implement new PaymentGateway interface
    │ - src/config/payment.config.ts (modified) - Added
    │   gateway configuration settings
    │
    │ ### Step 1: "Fix payment validation bugs"
    │ - src/payments/PaymentValidator.ts (modified) - Fixed
    │   amount validation and currency code format checks
    │ - tests/payments/PaymentValidator.test.ts (modified) -
    │   Added regression tests for validation fixes
    │
    │ These files exist in the codebase and may be modified
    │ by the current step, but evaluate only the changes made
    │ by Step 2's implementation agent.
    │
    │ ## Evaluation Specification
    │ ```yaml
    │ {meta-judge's evaluation specification YAML}
    │ ```
    │
    │ ## Implementation Output
    │ Files: src/payments/PaymentRetryService.ts (new),
    │        src/payments/StripeAdapter.ts (modified),
    │        src/config/payment.config.ts (modified),
    │        tests/payments/PaymentRetryService.test.ts (new)
    │ Key changes: Added PaymentRetryService with exponential
    │ backoff, integrated into StripeAdapter...
    │
    │ ## Instructions
    │ Follow your full judge process...
    └─────────────────────────────────────────────────────────

  → VERDICT: PASS, SCORE: 4.5/5.0

Final Summary:

Total Agents: 7 (2 meta-judges + 2 implementations + 0 retries + 2 judges)
Pre-existing Changes Progression:
- Step 1 judge: User modifications (3 files)
- Step 2 judge: User modifications (3 files) + Step 1 output (2 files)
All Judge Scores: 4.3, 4.5

场景：

用户在对话期间一直在处理支付处理模块。他们在调用do-in-steps前修改了多个文件（添加了新的PaymentGateway接口、更新了配置）。

输入：

/do-in-steps 修复并改进支付处理

阶段1 - 分解：

步骤	子任务	依赖步骤	复杂度	类型	输出
1	修复支付验证Bug	-	中	Bug修复	修正后的验证逻辑
2	为失败支付添加重试逻辑	步骤1	高	实现	带退避的重试机制

阶段3 - 带混合预存在变更的执行：

步骤1：修复支付验证Bug
  并行调度：元法官 + 实现代理
  法官验证（使用步骤1元法官规范）：
    注意：检测到用户修改的预存在变更。
    用户在此任务前修改了支付文件 — 包含这些内容，以便法官
    仅聚焦Bug修复，而非用户先前的工作。

    发送的法官提示：
    ┌─────────────────────────────────────────────────────────
    │ 你正在根据元法官生成的评估规范评估第1/2步：修复支付验证
    │ Bug。
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## 原始任务
    │ 修复并改进支付处理
    │
    │ ## 步骤要求
    │ 修复支付金额和货币检查中的验证Bug，这些Bug允许无效交易
    │ 继续执行。
    │
    │ ## 前序步骤上下文
    │ 无（第一步）
    │
    │ ## 预存在变更（仅上下文）
    │
    │ 以下变更在当前步骤的实现代理开始工作前已完成。它们
    │ 不属于当前步骤的输出。请聚焦评估当前步骤的变更。仅当
    │ 预存在变更的文件/逻辑与当前步骤的要求直接相关时才进行
    │ 验证。
    │
    │ ### 用户修改（当前任务前）
    │ 用户在此任务开始前修改了以下文件/模块：
    │ - src/payments/PaymentGateway.ts（新增） - 支付网关接口定义
    │ - src/payments/StripeAdapter.ts（修改） - 更新为实现新的
    │   PaymentGateway接口
    │ - src/config/payment.config.ts（修改） - 添加网关配置设置
    │
    │ 当前任务聚焦于修复验证Bug。支付文件的预存在变更可能与
    │ 当前步骤的范围重叠 — 评估实现代理的变更是否正确修复
    │ Bug且不破坏预存在修改。
    │
    │ ## 评估规范
    │ ```yaml
    │ {元法官的评估规范YAML}
    │ ```
    │
    │ ## 实现输出
    │ 文件：src/payments/PaymentValidator.ts（修改）、
    │        tests/payments/PaymentValidator.test.ts（修改）
    │ 关键变更：修复金额验证以拒绝负值，添加货币代码格式检查...
    │
    │ ## 说明
    │ 按照你的完整法官流程执行...
    └─────────────────────────────────────────────────────────

  → 裁决：通过，分数：4.3/5.0
  → 传递的上下文：验证修复内容、受影响文件

步骤2：为失败支付添加重试逻辑
  并行调度：元法官 + 实现代理
  法官验证（使用步骤2元法官规范）：
    注意：预存在变更现在包含用户修改和步骤1的输出。法官
    需要这两个来源才能正确归因变更。

    发送的法官提示：
    ┌─────────────────────────────────────────────────────────
    │ 你正在根据元法官生成的评估规范评估第2/2步：为失败支付
    │ 添加重试逻辑。
    │
    │ CLAUDE_PLUGIN_ROOT=...
    │
    │ ## 原始任务
    │ 修复并改进支付处理
    │
    │ ## 步骤要求
    │ 为失败支付交易添加带指数退避的重试机制，支持可配置的
    │ 最大重试次数。
    │
    │ ## 前序步骤上下文
    │ 步骤1修复了PaymentValidator.ts中的支付验证Bug（金额和
    │ 货币检查）。
    │
    │ ## 预存在变更（仅上下文）
    │
    │ 以下变更在当前步骤的实现代理开始工作前已完成。它们
    │ 不属于当前步骤的输出。请聚焦评估当前步骤的变更。仅当
    │ 预存在变更的文件/逻辑与当前步骤的要求直接相关时才进行
    │ 验证。
    │
    │ ### 用户修改（当前任务前）
    │ - src/payments/PaymentGateway.ts（新增） - 支付网关接口定义
    │ - src/payments/StripeAdapter.ts（修改） - 更新为实现新的
    │   PaymentGateway接口
    │ - src/config/payment.config.ts（修改） - 添加网关配置设置
    │
    │ ### 步骤1：“修复支付验证Bug”
    │ - src/payments/PaymentValidator.ts（修改） - 修复金额验证
    │   和货币代码格式检查
    │ - tests/payments/PaymentValidator.test.ts（修改） -
    │   添加验证修复的回归测试
    │
    │ 这些文件存在于代码库中，可能会被当前步骤修改，但仅评估
    │ 步骤2实现代理所做的变更。
    │
    │ ## 评估规范
    │ ```yaml
    │ {元法官的评估规范YAML}
    │ ```
    │
    │ ## 实现输出
    │ 文件：src/payments/PaymentRetryService.ts（新增）、
    │        src/payments/StripeAdapter.ts（修改）、
    │        src/config/payment.config.ts（修改）、
    │        tests/payments/PaymentRetryService.test.ts（新增）
    │ 关键变更：添加带指数退避的PaymentRetryService，集成到
    │ StripeAdapter...
    │
    │ ## 说明
    │ 按照你的完整法官流程执行...
    └─────────────────────────────────────────────────────────

  → 裁决：通过，分数：4.5/5.0

最终摘要：

总代理数：7（2个元法官 + 2个实现代理 + 0个重试代理 + 2个法官代理）
预存在变更演进：
- 步骤1法官：用户修改（3个文件）
- 步骤2法官：用户修改（3个文件） + 步骤1输出（2个文件）
所有法官分数：4.3、4.5

Example 3: Multi-file Refactoring with Escalation

示例3：多文件重构与升级处理

Input:

/do-in-steps Rename 'userId' to 'accountId' across the codebase - this affects interfaces, implementations, and callers

Phase 1 - Decomposition:

Step	Subtask	Depends On	Complexity	Type	Output
1	Update interface definitions	-	High	Refactoring	Updated interfaces
2	Update implementations of those interfaces	Step 1	Low	Refactoring	Updated implementations
3	Update callers and consumers	Step 2	Low	Refactoring	Updated caller files
4	Update tests	Step 3	Low	Testing	Updated test files
5	Update documentation	Step 4	Low	Documentation	Updated docs

Phase 2 - Model Selection:

Step	Subtask	Model	Rationale
1	Update interfaces	opus	Breaking changes need careful handling
2	Update implementations	haiku	Mechanical rename
3	Update callers	haiku	Mechanical updates
4	Update tests	haiku	Mechanical test fixes
5	Update documentation	haiku	Simple text updates

Phase 3 - Execution with Escalation (each step has parallel meta-judge + implementation):

Step 1: Update interfaces
  Parallel dispatch: Meta-judge + Implementation
  → Judge (Opus, sadd:judge, with step 1 meta-judge spec): PASS, 4.3/5.0

Step 2: Update implementations
  Parallel dispatch: Meta-judge + Implementation
  → Judge (Opus, sadd:judge, with step 2 meta-judge spec): PASS, 4.0/5.0

Step 3: Update callers (Problem Detected)
  Parallel dispatch: Meta-judge + Implementation
  Attempt 1: Judge FAIL, 2.5/5.0 (using step 3 meta-judge spec)
    → ISSUES: Missed 12 occurrences in legacy module
  Attempt 2: Judge FAIL, 2.8/5.0 (reusing same step 3 meta-judge spec)
    → ISSUES: Still missing 4 occurrences, found new deprecated API usage
  Attempt 3: Judge FAIL, 3.2/5.0 (reusing same step 3 meta-judge spec)
    → ISSUES: 2 occurrences in dynamically generated code
  Attempt 4: Judge FAIL, 3.3/5.0 (reusing same step 3 meta-judge spec)
    → ISSUES: Dynamic code generation still not fully addressed

  ESCALATION TO USER:
  "Step 3 failed after 4 attempts. Persistent issue: Dynamic code generation
   in LegacyAdapter.ts generates 'userId' at runtime.
   Options: 1) Provide guidance, 2) Modify requirements, 3) Skip, 4) Abort"

  User response: "Update LegacyAdapter to use string template with accountId"

  Attempt 5 (with user guidance, reusing same step 3 meta-judge spec): Judge PASS, 4.1/5.0

Step 4-5: Each with parallel meta-judge + implementation, complete without issues

Total Agents: 20 (5 meta-judges + 5 implementations + 5 retries + 5 judges)

输入：

/do-in-steps 在整个代码库中将'userId'重命名为'accountId' - 这会影响接口、实现和调用方

阶段1 - 分解：

步骤	子任务	依赖步骤	复杂度	类型	输出
1	更新接口定义	-	高	重构	更新后的接口
2	更新这些接口的实现	步骤1	低	重构	更新后的实现
3	更新调用方和消费者	步骤2	低	重构	更新后的调用方文件
4	更新测试	步骤3	低	测试	更新后的测试文件
5	更新文档	步骤4	低	文档	更新后的文档

阶段2 - 模型选择：

步骤	子任务	模型	理由
1	更新接口	opus	破坏性变更需要谨慎处理
2	更新实现	haiku	机械性重命名
3	更新调用方	haiku	机械性更新
4	更新测试	haiku	机械性测试修复
5	更新文档	haiku	简单文本更新

阶段3 - 带升级处理的执行（每个步骤并行调度元法官 + 实现代理）：

步骤1：更新接口
  并行调度：元法官 + 实现代理
  → 法官（Opus，sadd:judge，使用步骤1元法官规范）：通过，4.3/5.0

步骤2：更新实现
  并行调度：元法官 + 实现代理
  → 法官（Opus，sadd:judge，使用步骤2元法官规范）：通过，4.0/5.0

步骤3：更新调用方（检测到问题）
  并行调度：元法官 + 实现代理
  尝试1：法官失败，2.5/5.0（使用步骤3元法官规范）
    → 问题：遗留模块中遗漏了12处引用
  尝试2：法官失败，2.8/5.0（复用相同步骤3元法官规范）
    → 问题：仍遗漏4处引用，发现新的已废弃API使用
  尝试3：法官失败，3.2/5.0（复用相同步骤3元法官规范）
    → 问题：动态生成代码中有2处引用
  尝试4：法官失败，3.3/5.0（复用相同步骤3元法官规范）
    → 问题：动态代码生成仍未完全处理

  升级给用户：
  "步骤3经过4次尝试后失败。持续存在的问题：LegacyAdapter.ts中的
   动态代码生成在运行时生成'userId'。
   选项：1) 提供指导，2) 修改要求，3) 跳过，4) 中止"

  用户回复："更新LegacyAdapter以使用带accountId的字符串模板"

  尝试5（带用户指导，复用相同步骤3元法官规范）：法官通过，4.1/5.0

步骤4-5：每个步骤并行调度元法官 + 实现代理，无问题完成

总代理数：20（5个元法官 + 5个实现代理 + 5个重试代理 + 5个法官代理）

Best Practices

最佳实践

Task Decomposition

任务分解

Be explicit: Each subtask should have a clear, verifiable outcome
Define verification points: What should the judge check for each step?
Minimize steps: Combine related work; don't over-decompose
Validate dependencies: Ensure each step has what it needs from previous steps
Plan context: Identify what context needs to pass between steps

明确具体： 每个子任务应有清晰、可验证的结果
定义验证点： 法官应为每个步骤检查哪些内容？
最小化步骤： 合并相关工作；不要过度分解
验证依赖关系： 确保每个步骤拥有前序步骤提供的所需内容
规划上下文： 确定步骤之间需要传递哪些上下文

Model Selection

模型选择

Match complexity: Don't use Opus for simple transformations
Upgrade for risk: First step and critical steps deserve stronger models
Consider chain effect: Errors in early steps cascade; invest in quality early
When in doubt, use Opus: Quality over cost for dependent steps

Step Type	Implementation Model
Critical/Breaking	Opus
Standard	Opus
Long and Simple	Sonnet
Simple and Short	Haiku

匹配复杂度： 不要为简单转换使用Opus
高风险升级： 第一步和关键步骤应使用更强的模型
考虑连锁效应： 早期步骤的错误会扩散；在早期投入确保质量
不确定时使用Opus： 依赖步骤优先考虑质量而非成本

步骤类型	实现模型
关键/破坏性	Opus
标准	Opus
冗长且简单	Sonnet
简单且简短	Haiku

Context Passing Guidelines

上下文传递指南

Scenario	What to Pass	What to Omit
Interface defined in step 1	Full interface definition	Implementation details
Implementation in step 2	Key patterns, file locations	Internal logic
Integration in step 3	Usage patterns, entry points	Step 2 internal details
Judge feedback for retry	ISSUES list, report path	Full report contents

Keep context focused:

Pass what the next step NEEDS to build on
Omit internal details that don't affect subsequent steps
Highlight patterns/conventions to maintain consistency
Include judge IMPROVEMENTS as optional enhancements
Track pre-existing changes - Pass context about prior modifications (including previous steps) to the judge to prevent attribution confusion

场景	需传递内容	需省略内容
步骤1定义的接口	完整接口定义	实现细节
步骤2的实现	关键模式、文件位置	内部逻辑
步骤3的集成	使用模式、入口点	步骤2的内部细节
重试的法官反馈	问题列表、报告路径	完整报告内容

保持上下文聚焦：

传递下一步构建所需的内容
省略不影响后续步骤的内部细节
突出需保持一致性的模式/规范
将法官建议作为可选增强项包含
跟踪预存在变更 - 向法官传递先前修改的上下文（包括前序步骤），以防止归因混淆

Meta-Judge + Judge Verification

元法官 + 法官验证

Never skip meta-judge - Tailored evaluation criteria produce better judgments than generic ones
One meta-judge per step - Each step gets its own meta-judge dispatched in parallel with implementation
Reuse meta-judge spec across retries within a step - On retry, reuse the same step's meta-judge specification; do NOT re-run meta-judge
New meta-judge for each new step - Different steps have different requirements, so each gets a fresh meta-judge
Meta-judge FIRST in parallel dispatch - Always the first tool call in the message
Parse only headers from judge - Don't read full reports to avoid context pollution
Include CLAUDE_PLUGIN_ROOT - Both meta-judge and judge need the resolved plugin root path
Meta-judge YAML - Pass only the meta-judge YAML to the judge, do not add any additional text or comments to it!
After self-critique: Judge reviews work that already passed internal verification
Independent verification: Judge is different agent than implementer
Structured output: Always parse VERDICT/SCORE from reply, not full report
Max retries: 3 attempts before escalating to user
Feedback loop: Pass judge ISSUES to retry implementation agent
Return to judge verification with same step's meta-judge specification on retry

绝不跳过元法官 - 定制化评估标准比通用标准能产生更好的判断结果
每个步骤一个元法官 - 每个步骤在调度实现代理的同时并行调度自己的元法官
步骤内重试复用元法官规范 - 重试时复用步骤的相同元法官规范；不要重新运行元法官
每个新步骤使用新元法官 - 不同步骤有不同要求，因此每个步骤使用新的元法官
元法官优先并行调度 - 始终是消息中的第一个工具调用
仅解析法官的标题 - 不要读取完整报告以避免上下文污染
包含CLAUDE_PLUGIN_ROOT - 元法官和法官都需要解析后的插件根路径
元法官YAML - 仅将元法官YAML传递给法官，不要添加任何额外文本或注释！
自我审查后： 法官审查已通过内部验证的工作
独立验证： 法官与实现代理是不同的代理
结构化输出： 始终从回复中解析VERDICT/SCORE，而非完整报告
最大重试次数： 3次尝试后升级给用户
反馈循环： 将法官问题传递给重试实现代理
重试时返回法官验证并使用步骤的相同元法官规范

Quality Assurance

质量保证

Two-layer verification: Self-critique (internal) + Judge (external)
Self-critique first: Implementation agents verify own work before submission
External judge second: Independent judge catches blind spots self-critique misses
Iteration loop: Retry with feedback until passing or max retries
Chain validation: Judges check integration with previous steps
Escalation: Don't proceed past failed steps - get user input
Final integration test: After all steps, verify the complete change works together

两层验证： 自我审查（内部） + 法官（外部）
先进行自我审查： 实现代理在提交前验证自己的工作
后进行外部法官验证： 独立法官发现自我审查遗漏的盲点
迭代循环： 带反馈重试直到通过或达到最大重试次数
链式验证： 法官检查与前序步骤的集成
升级处理： 不要继续执行失败步骤 - 获取用户输入
最终集成测试： 所有步骤完成后，验证完整变更是否协同工作

Context Format Reference

上下文格式参考

Implementation Agent Output Format

实现代理输出格式

markdown

undefined

markdown

undefined

Context for Next Steps

下一步上下文

Files Modified

修改的文件

```
src/dto/UserDTO.ts
```
(new file)
```
src/services/UserService.ts
```
(modified)

```
src/dto/UserDTO.ts
```
（新增文件）
```
src/services/UserService.ts
```
（修改）

Key Changes Summary

关键变更摘要

Created UserDTO with fields: id (string), name (string), email (string), createdAt (Date)
UserDTO includes static
```
fromUser(user: User): UserDTO
```
factory method
Added
```
toDTO()
```
method to User class for convenience

创建包含以下字段的UserDTO：id（字符串）、name（字符串）、email（字符串）、createdAt（日期）
UserDTO包含静态
```
fromUser(user: User): UserDTO
```
工厂方法
为User类添加
```
toDTO()
```
方法以方便使用

Decisions That Affect Later Steps

影响后续步骤的决策

Used class-based DTO (not interface) to enable transformation methods
Opted for explicit mapping over automatic serialization for better control

使用基于类的DTO（而非接口）以支持转换方法
选择显式映射而非自动序列化以获得更好的控制

Warnings for Subsequent Steps

后续步骤注意事项

UserDTO does NOT include password field - ensure no downstream code expects it
The
```
createdAt
```
field is formatted as ISO string in JSON serialization

UserDTO不包含密码字段 - 确保下游代码不期望该字段
```
createdAt
```
字段在JSON序列化中格式化为ISO字符串

Verification Points

验证点

TypeScript compiles without errors
UserDTO.fromUser() correctly maps all User properties
Existing service tests still pass

undefined

TypeScript编译无错误
UserDTO.fromUser()正确映射所有User属性
现有服务测试仍通过

undefined

Judge Verdict Format (Structured Header)

法官裁决格式（结构化标题）

markdown

---
VERDICT: PASS
SCORE: 4.2/5.0
ISSUES:
  - None
IMPROVEMENTS:
  - Consider adding input validation to fromUser() method
  - Add JSDoc comments for better IDE support
---

markdown

---
VERDICT: PASS
SCORE: 4.2/5.0
ISSUES:
  - 无
IMPROVEMENTS:
  - 考虑为fromUser()方法添加输入验证
  - 添加JSDoc注释以获得更好的IDE支持
---

Detailed Evaluation

详细评估

[Evidence and analysis following meta-judge specification rubrics...]

undefined

[遵循元法官规范评分标准的证据和分析...]

undefined

Judge Verdict Format (FAIL Example)

法官裁决格式（失败示例）

markdown

---
VERDICT: FAIL
SCORE: 2.8/5.0
ISSUES:
  - Missing User->UserDTO mapping logic in getUser() method
  - Return type annotation changed but actual return value still returns User object
  - No null handling for optional User fields
IMPROVEMENTS:
  - Add static fromUser() factory method to UserDTO
  - Implement toDTO() as instance method on User class
---

Key Insight: Complex tasks with dependencies benefit from sequential execution where each step operates in a fresh context while receiving only the relevant outputs from previous steps. Per-step meta-judge evaluation specifications ensure tailored evaluation criteria specific to each step's requirements, while running in parallel with implementation for speed. External judge verification catches blind spots that self-critique misses, while the iteration loop (reusing the same step's meta-judge spec) ensures quality before proceeding. This prevents both context pollution and error propagation.

markdown

---
VERDICT: FAIL
SCORE: 2.8/5.0
ISSUES:
  - getUser()方法中缺少User->UserDTO映射逻辑
  - 返回类型注解已更改，但实际返回值仍返回User对象
  - 可选User字段无空值处理
IMPROVEMENTS:
  - 为UserDTO添加静态fromUser()工厂方法
  - 在User类上实现toDTO()实例方法
---

核心要点： 存在依赖关系的复杂任务受益于顺序执行，每个步骤在全新上下文中运行，同时仅接收前序步骤的相关输出。每个步骤的元法官评估规范确保针对每个步骤要求的定制化评估标准，同时与实现并行运行以提高速度。外部法官验证发现自我审查遗漏的盲点，而迭代循环（复用步骤的相同元法官规范）确保进入下一步前的质量。这既防止了上下文污染，也避免了错误扩散。