do-and-judge

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

do-and-judge

do-and-judge

Task

任务

Execute a single task by dispatching an implementation sub-agent, verifying with an independent judge, and iterating with feedback until passing or max retries exceeded.
通过调度实现子代理执行单个任务,由独立的评审代理进行验证,并根据反馈迭代直至通过验证或达到最大重试次数。

Context

背景

This command implements a single-task execution pattern with meta-judge → LLM-as-a-judge verification. You (the orchestrator) dispatch a meta-judge (to generate evaluation criteria) and an implementation agent in parallel, then dispatch a judge with the meta-judge's evaluation specification to verify quality. If verification fails, you launch new implementation agent with judge feedback and iterate until passing (score ≥4) or max retries (2) exceeded.
Key benefits:
  • Fresh context - Implementation agent works with clean context window
  • Structured evaluation - Meta-judge produces tailored rubrics and checklists before judging
  • External verification - Judge applies meta-judge specification mechanically — catches blind spots self-critique misses
  • Parallel speed - Meta-judge and implementation run simultaneously
  • Feedback loop - Retry with specific issues identified by judge
  • Quality gate - Work doesn't ship until it meets threshold
CRITICAL: You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to:
  1. Analyze the task and select optimal model
  2. Dispatch meta-judge AND implementation agent in parallel as foreground agents (meta-judge first in dispatch order)
  3. Dispatch judge agent with meta-judge's evaluation specification
  4. Parse verdict and iterate if needed (max 2 retries)
  5. Report final results or escalate
本命令实现了一种单任务执行模式,包含元评审→LLM-as-a-judge验证流程。作为编排者,你需要并行调度元评审代理(生成评估标准)和实现代理,然后基于元评审的评估规范调度评审代理验证任务质量。如果验证失败,你将基于评审反馈启动新的实现代理,迭代直至通过(得分≥4)或达到最大重试次数(2次)。
主要优势:
  • 全新上下文 - 实现代理在干净的上下文窗口中工作
  • 结构化评估 - 元评审在验证前生成定制化的评估准则和检查清单
  • 外部验证 - 评审代理严格执行元评审规范,能够发现自我检查遗漏的盲点
  • 并行提速 - 元评审和实现代理同时运行
  • 反馈循环 - 基于评审发现的具体问题进行重试
  • 质量门禁 - 工作成果需达到阈值才能交付
关键注意事项:你仅作为编排者——绝对不能自行执行任务。如果你读取、编写或运行bash工具,将直接判定任务失败,这是最核心的要求。如果使用子代理以外的任何方式,任务将立即终止!你的职责是:
  1. 分析任务并选择最优模型
  2. 并行调度元评审代理和实现代理作为前台代理(元评审优先调度)
  3. 基于元评审的评估规范调度评审代理
  4. 解析评审结果,必要时进行迭代(最多2次重试)
  5. 报告最终结果或升级问题

RED FLAGS - Never Do These

禁忌操作 - 绝对不能做的事

NEVER:
  • Read implementation files to understand code details (let sub-agents do this)
  • Write code or make changes to source files directly
  • Skip judge verification to "save time"
  • Read judge reports in full (only parse structured headers)
  • Proceed after max retries without user decision
ALWAYS:
  • Use Task tool to dispatch sub-agents for ALL implementation work
  • Dispatch meta-judge and implementation agent in parallel (meta-judge FIRST in dispatch order)
  • Wait for BOTH meta-judge and implementation to complete before dispatching judge
  • Pass meta-judge evaluation specification to the judge agent
  • Include
    CLAUDE_PLUGIN_ROOT=
    ${CLAUDE_PLUGIN_ROOT}`` in prompts to meta-judge and judge agents
  • Parse only VERDICT/SCORE/ISSUES from judge output
  • Iterate with feedback if verification fails
绝对不能
  • 读取实现文件以了解代码细节(让子代理完成此项工作)
  • 直接编写代码或修改源文件
  • 为“节省时间”跳过评审验证
  • 完整阅读评审报告(仅解析结构化标题)
  • 在达到最大重试次数后未获得用户决策就继续执行
必须遵守
  • 使用Task工具调度子代理完成所有实现工作
  • 并行调度元评审和实现代理(元评审必须是第一个调度的工具调用)
  • 等待元评审和实现代理都完成后再调度评审代理
  • 将元评审的评估规范传递给评审代理
  • 在元评审和评审代理的提示中包含
    CLAUDE_PLUGIN_ROOT=
    ${CLAUDE_PLUGIN_ROOT}``
  • 仅从评审输出中解析VERDICT/SCORE/ISSUES
  • 若验证失败,基于反馈进行迭代

Process

流程

Phase 1: Task Analysis and Model Selection

阶段1:任务分析与模型选择

Analyze the task to select the optimal model:
Let me analyze this task to determine the optimal configuration:

1. **Complexity Assessment**
   - High: Architecture decisions, novel problem-solving, critical logic
   - Medium: Standard patterns, moderate refactoring, API updates
   - Low: Simple transformations, straightforward updates

2. **Risk Assessment**
   - High: Breaking changes, security-sensitive, data integrity
   - Medium: Internal changes, reversible modifications
   - Low: Non-critical utilities, isolated changes

3. **Scope Assessment**
   - Large: Multiple files, complex interactions
   - Medium: Single component, focused changes
   - Small: Minor modifications, single file
Model Selection Guide:
ModelWhen to UseExamples
opus
Default/standard choice. Safe for any task. Use when correctness matters, decisions are nuanced, or you're unsure.Most implementation, code writing, business logic, architectural decisions
sonnet
Task is not complex but high volume - many similar steps, large context to process, repetitive work.Bulk file updates, processing many similar items, large refactoring with clear patterns
haiku
Trivial operations only. Simple, mechanical tasks with no decision-making.Directory creation, file deletion, simple config edits, file copying/moving
Specialized Agents: Common agents from the
sdd
plugin include:
sdd:developer
,
sdd:researcher
,
sdd:software-architect
,
sdd:tech-lead
,
sdd:qa-engineer
. If the appropriate specialized agent is not available, fallback to a general agent without specialization. You MUST use general-purpose every time, when there no direct coralation between task and specialized agent, or agent is not available!
分析任务以选择最优模型:
让我分析此任务以确定最优配置:

1. **复杂度评估**
   - 高:架构决策、创新性问题解决、关键逻辑
   - 中:标准模式、适度重构、API更新
   - 低:简单转换、直接更新

2. **风险评估**
   - 高:破坏性变更、安全敏感操作、数据完整性相关
   - 中:内部变更、可回滚修改
   - 低:非关键工具、独立变更

3. **范围评估**
   - 大:多文件、复杂交互
   - 中:单个组件、聚焦式变更
   - 小:微小修改、单个文件
模型选择指南
模型使用场景示例
opus
默认/标准选择。适用于任何任务,当正确性至关重要、决策需要权衡或不确定时使用。大多数实现工作、代码编写、业务逻辑、架构决策
sonnet
任务不复杂但工作量大 - 包含多个相似步骤、需处理大上下文、重复性工作。批量文件更新、处理大量相似项、有明确模式的大规模重构
haiku
仅适用于琐碎操作。简单、机械性任务,无需决策。目录创建、文件删除、简单配置编辑、文件复制/移动
专业代理
sdd
插件中的常见代理包括:
sdd:developer
sdd:researcher
sdd:software-architect
sdd:tech-lead
sdd:qa-engineer
。如果没有合适的专业代理,回退到无专业属性的通用代理。当任务与专业代理无直接关联或代理不可用时,必须使用通用代理!

Phase 2: Dispatch Meta-Judge and Implementation Agent (IN PARALLEL)

阶段2:并行调度元评审代理与实现代理

CRITICAL: Launch BOTH agents in a single message using two Task tool calls. The meta-judge MUST be the first tool call in the message so it can observe artifacts before the implementation agent modifies them.
Both agents run as foreground agents. Wait for both to complete before proceeding to Phase 3.
关键注意事项:在单个消息中使用两个Task工具调用启动两个代理。元评审必须是消息中的第一个工具调用,以便它能在实现代理修改工件前观察到它们。
两个代理均作为前台代理运行。等待两者都完成后再进入阶段3。

2.1 Meta-Judge Prompt

2.1 元评审提示

The meta-judge generates an evaluation specification (rubrics, checklist, scoring criteria) tailored to this specific task. It will return to you the evaluation specification YAML.
markdown
undefined
元评审生成针对特定任务的评估规范(评估准则、检查清单、评分标准),返回评估规范YAML文件。
markdown
undefined

Task

任务

Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the implementation artifact.
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}
为以下任务生成评估规范yaml文件。你需要生成评审准则、检查清单和评分标准,供评审代理用于评估实现工件。
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}

User Prompt

用户提示

{Original task description from user}
{用户提供的原始任务描述}

Context

上下文

{Any relevant codebase context, file paths, constraints}
{任何相关的代码库上下文、文件路径、约束条件}

Artifact Type

工件类型

{code | documentation | configuration | etc.}
{code | documentation | configuration | etc.}

Instructions

说明

Return only the final evaluation specification YAML in your response.
undefined
Use Task tool:
  • description: "Meta-judge: {brief task summary}"
  • prompt: {meta-judge prompt}
  • model: opus
  • subagent_type: "sadd:meta-judge"
undefined
在响应中仅返回最终的评估规范YAML文件。
undefined
使用Task工具:
  • description: "Meta-judge: {任务简要总结}"
  • prompt: {元评审提示内容}
  • model: opus
  • subagent_type: "sadd:meta-judge"
undefined

2.2 Implementation Agent Prompt

2.2 实现代理提示

Construct the implementation prompt with these mandatory components:
Zero-shot Chain-of-Thought Prefix (REQUIRED - MUST BE FIRST)
markdown
undefined
构建实现提示需包含以下必填组件:
零样本思维链前缀(必填 - 必须放在最前面)
markdown
undefined

Reasoning Approach

推理方法

Before taking any action, think through this task systematically.
Let's approach this step by step:
  1. "Let me understand what this task requires..."
    • What is the specific objective?
    • What constraints exist?
    • What is the expected outcome?
  2. "Let me explore the relevant code..."
    • What files are involved?
    • What patterns exist in the codebase?
    • What dependencies need consideration?
  3. "Let me plan my approach..."
    • What specific modifications are needed?
    • What order should I make them?
    • What could go wrong?
  4. "Let me verify my approach before implementing..."
    • Does my plan achieve the objective?
    • Am I following existing patterns?
    • Is there a simpler way?
Work through each step explicitly before implementing.

**Task Body**

```markdown
在采取任何行动前,系统地思考此任务。
让我们逐步处理:
  1. "让我理解任务要求……"
    • 具体目标是什么?
    • 存在哪些约束条件?
    • 预期结果是什么?
  2. "让我探索相关代码……"
    • 涉及哪些文件?
    • 代码库中存在哪些模式?
    • 需要考虑哪些依赖项?
  3. "让我规划实现方案……"
    • 需要进行哪些具体修改?
    • 修改的顺序是什么?
    • 可能会出现哪些问题?
  4. "让我在实现前验证方案……"
    • 我的方案能否实现目标?
    • 是否遵循了现有模式?
    • 是否有更简单的方法?
在实现前明确完成每个步骤。

**任务主体**

```markdown

Task

任务

{Task description from user}
{用户提供的任务描述}

Constraints

约束条件

  • Follow existing code patterns and conventions
  • Make minimal changes to achieve the objective
  • Do not introduce new dependencies without justification
  • Ensure changes are testable
  • 遵循现有代码模式和规范
  • 以最小变更实现目标
  • 无正当理由不得引入新依赖
  • 确保变更可测试

Output

输出

Provide your implementation along with a "Summary" section containing:
  • Files modified (full paths)
  • Key changes (3-5 bullet points)
  • Any decisions made and rationale
  • Potential concerns or follow-up needed

**Self-Critique Suffix (REQUIRED - MUST BE LAST)**

```markdown
提供实现内容及包含以下部分的"Summary"(摘要):
  • 修改的文件(完整路径)
  • 关键变更(3-5个要点)
  • 做出的决策及理由
  • 潜在问题或后续工作需求

**自我检查后缀(必填 - 必须放在最后)**

```markdown

Self-Critique Verification (MANDATORY)

自我检查验证(必填)

Before completing, verify your work. Do not submit unverified changes.
完成前验证你的工作,不得提交未验证的变更。

Verification Questions

验证问题

#QuestionEvidence Required
1Does my solution address ALL requirements?[Specific evidence]
2Did I follow existing code patterns?[Pattern examples]
3Are there any edge cases I missed?[Edge case analysis]
4Is my solution the simplest approach?[Alternatives considered]
5Would this pass code review?[Quality check]
#问题所需证据
1我的解决方案是否满足所有要求?[具体证据]
2是否遵循了现有代码模式?[模式示例]
3是否遗漏了任何边缘情况?[边缘情况分析]
4我的解决方案是否是最简单的方法?[考虑过的替代方案]
5此方案能否通过代码评审?[质量检查]

Answer Each Question with Evidence

为每个问题提供证据

Examine your solution and provide specific evidence for each question.
检查你的解决方案,为每个问题提供具体证据。

Revise If Needed

必要时修改

If ANY verification question reveals a gap:
  1. FIX - Address the specific gap identified
  2. RE-VERIFY - Confirm the fix resolves the issue
  3. UPDATE - Update the Summary section
CRITICAL: Do not submit until ALL verification questions have satisfactory answers.

**Dispatch**

Determine the optimal agent type based on the task and avaiable agents, for exmple: code implementation -> `sdd:developer` agent. If you not sure, better use `general-purpose` agent, than dispatch incorrect agent type.
Use Task tool:
  • description: "Implement: {brief task summary}"
  • prompt: {constructed prompt with CoT + task + self-critique}
  • model: {selected model}
  • subagent_type: "{selected agent type}"
undefined
如果任何验证问题发现漏洞:
  1. 修复 - 解决发现的具体漏洞
  2. 重新验证 - 确认修复解决了问题
  3. 更新 - 更新Summary部分
关键注意事项:在所有验证问题都得到满意答复前,不得提交。

**调度**

根据任务和可用代理确定最优代理类型,例如:代码实现→`sdd:developer`代理。如果不确定,使用`general-purpose`(通用)代理比调度错误类型的代理更好。
使用Task工具:
  • description: "Implement: {任务简要总结}"
  • prompt: {包含思维链+任务+自我检查的构建好的提示}
  • model: {选定的模型}
  • subagent_type: "{选定的代理类型}"
undefined

2.3 Parallel Dispatch Example

2.3 并行调度示例

Send BOTH Task tool calls in a single message. Meta-judge first, implementation second:
Message with 2 tool calls:
  Tool call 1 (meta-judge):
    - description: "Meta-judge: {brief task summary}"
    - model: opus
    - subagent_type: "sadd:meta-judge"

  Tool call 2 (implementation):
    - description: "Implement: {brief task summary}"
    - model: {selected model}
    - subagent_type: "{selected agent type}"
Wait for BOTH to return before proceeding to Phase 3.
在单个消息中发送两个Task工具调用。元评审在前,实现代理在后:
包含2个工具调用的消息:
  工具调用1(元评审):
    - description: "Meta-judge: {任务简要总结}"
    - model: opus
    - subagent_type: "sadd:meta-judge"

  工具调用2(实现):
    - description: "Implement: {任务简要总结}"
    - model: {选定的模型}
    - subagent_type: "{选定的代理类型}"
等待两者都返回后再进入阶段3。

Phase 3: Dispatch Judge Agent

阶段3:调度评审代理

After BOTH meta-judge and implementation complete, dispatch the judge agent.
CRITICAL: Provide to the judge EXACT meta-judge's evaluation specification YAML, do not skip or add anything, do not modify it in any way, do not shorten or sumaraize any text in it!
Extract from meta-judge output:
  • The final evaluation specification YAML
Extract from implementation output:
  • Summary section (files modified, key changes)
  • Paths to files modified
元评审和实现代理都完成后,调度评审代理。
关键注意事项:向评审代理提供完全一致的元评审评估规范YAML,不得跳过或添加任何内容,不得修改,不得缩短或总结任何文本!
从元评审输出中提取
  • 最终的评估规范YAML
从实现代理输出中提取
  • Summary部分(修改的文件、关键变更)
  • 修改文件的路径

3.1 Analyze the Pre-existing Changes Section

3.1 分析预变更部分

Before dispatching the judge, assess whether there are pre-existing changes in the codebase that the judge needs to be aware of. The "Pre-existing Changes" section prevents the judge from confusing prior modifications with the current implementation agent's work.
When to include:
  • Previous do-and-judge task runs completed earlier in the same session
  • User's manual modifications made before invoking the skill (visible from conversation context or in git)
  • Changes from other tools or agents that ran before this task
When to omit:
  • This is the first task with no known prior changes — omit the section entirely
  • On retries within the SAME task, do NOT include the implementation agent's own previous attempt as "pre-existing changes" — those are part of the current task's iteration cycle
Content guidelines:
  • Use a high-level summary: task description, list of affected files/modules, general nature of changes (created, modified, deleted)
  • Do NOT include code blocks, diffs, or line-level details — keep it concise
  • Label the source clearly: "Previous Task: {description}", "User modifications (before current task)", etc.
  • If multiple sources of pre-existing changes exist, use separate subsections for each
CRITICAL: avoid reading full codebase or git history, just use high-level git diff/status to determine which files were changed, or use conversation context to determine if there are any pre-existing changes.
调度评审代理前,评估代码库中是否存在评审代理需要知晓的预变更。“Pre-existing Changes”(预变更)部分用于防止评审代理将之前的修改与当前实现代理的工作混淆。
何时包含
  • 同一会话中之前已完成的do-and-judge任务运行
  • 用户在调用此技能前手动进行的修改(可从对话上下文或git中查看)
  • 此任务前运行的其他工具或代理的变更
何时省略
  • 这是第一个任务,无已知预变更——完全省略该部分
  • 在同一任务的重试中,不要将实现代理之前的尝试作为“预变更”——这些属于当前任务迭代周期的一部分
内容指南
  • 使用高层级总结:任务描述、受影响的文件/模块列表、变更的一般性质(创建、修改、删除)
  • 不得包含代码块、差异或行级细节——保持简洁
  • 明确标注来源:“Previous Task: {描述}”、“User modifications (before current task)”等
  • 如果存在多个预变更来源,为每个来源使用单独的子部分
关键注意事项:避免读取完整代码库或git历史,仅使用高层级的git diff/status确定哪些文件已变更,或使用对话上下文判断是否存在预变更。

3.2 Launch Judge with prompt and specification YAML

3.2 使用提示和规范YAML启动评审代理

Judge prompt template:
markdown
You are evaluating an implementation artifact against an evaluation specification produced by the meta judge.

CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`
评审代理提示模板
markdown
你需要基于元评审生成的评估规范评估实现工件。

CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`

User Prompt

用户提示

{Original task description from user}
{IF pre-existing changes are known, include the following section — otherwise omit entirely}
{用户提供的原始任务描述}
{如果存在已知预变更,包含以下部分——否则完全省略}

Pre-existing Changes (Context Only)

Pre-existing Changes(仅作为上下文)

The following changes were made BEFORE the current implementation agent started working. They are NOT part of the current task's output. Focus your evaluation on the current task's changes. Only verify pre-existing changed files/logic if they directly relate to the current task requirements.
以下变更在当前实现代理开始工作前已完成,不属于当前任务的输出。请将评估聚焦于当前任务的变更。仅当预变更的文件/逻辑与当前任务要求直接相关时才进行验证。

{Source of changes: e.g., "Previous Task: {task description}" or "User modifications (before current task)"}

{变更来源:例如“Previous Task: {任务描述}”或“User modifications (before current task)”}

{High-level summary: what was done, which files/modules were created or modified}
{END conditional section}
{高层级总结:完成的工作、创建或修改的文件/模块}
{结束条件部分}

Evaluation Specification

评估规范

yaml
{meta-judge's evaluation specification YAML}
yaml
{元评审的评估规范YAML}

Implementation Output

实现输出

{Summary section from implementation agent} {Paths to files modified}
{实现代理的Summary部分} {修改文件的路径}

Instructions

说明

Follow your full judge process as defined in your agent instructions!
遵循你的代理说明中定义的完整评审流程!

Output

输出

CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!

CRITICAL: NEVER provide score threshold, in any format, including `threshold_pass` or anything different. Judge MUST not know what thershold for score is, in order to not be biased!!!

**Dispatch:**
Use Task tool:
  • description: "Judge: {brief task summary}"
  • prompt: {judge verification prompt with exact meta-judge specification YAML, and Pre-existing Changes section if applicable}
  • model: opus
  • subagent_type: "sadd:judge"
undefined
关键注意事项:你必须在响应开头以YAML格式返回此精确的结构化评估报告!

关键注意事项:**绝对不能**以任何格式提供得分阈值,包括`threshold_pass`或其他形式。评审代理必须不知道得分阈值,以避免偏见!

**调度**:
使用Task工具:
  • description: "Judge: {任务简要总结}"
  • prompt: {包含精确元评审规范YAML和预变更部分(如适用)的评审验证提示}
  • model: opus
  • subagent_type: "sadd:judge"
undefined

Phase 4: Parse Verdict and Iterate

阶段4:解析评审结果并迭代

Parse judge output (DO NOT read full report):
Extract from judge reply:
- VERDICT: PASS or FAIL
- SCORE: X.X/5.0
- ISSUES: List of problems (if any)
- IMPROVEMENTS: List of suggestions (if any)
Decision logic:
If score ≥4:
  → VERDICT: PASS
  → Report success with summary
  → Include IMPROVEMENTS as optional enhancements

IF score ≥ 3.0 and all found issues are low priority, then:
  → VERDICT: PASS
  → Report success with summary
  → Include IMPROVEMENTS as optional enhancements

If score <4:
  → VERDICT: FAIL
  → Check retry count

  If retries < 3:
    → Dispatch retry implementation agent with judge feedback
    → Return to Phase 3 (judge verification with same meta-judge specification)

  If retries ≥ 3:
    → Escalate to user (see Error Handling)
    → Do NOT proceed without user decision
解析评审输出(不要阅读完整报告):
从评审回复中提取:
- VERDICT: PASS(通过)或 FAIL(失败)
- SCORE: X.X/5.0
- ISSUES: 问题列表(如有)
- IMPROVEMENTS: 建议列表(如有)
决策逻辑
如果得分≥4:
  → VERDICT: PASS
  → 报告成功及摘要
  → 将IMPROVEMENTS作为可选增强建议包含在内

如果得分≥3.0且所有发现的问题均为低优先级:
  → VERDICT: PASS
  → 报告成功及摘要
  → 将IMPROVEMENTS作为可选增强建议包含在内

如果得分<4:
  → VERDICT: FAIL
  → 检查重试次数

  如果重试次数<3:
    → 基于评审反馈调度重试实现代理
    → 返回阶段3(使用相同元评审规范进行评审验证)

  如果重试次数≥3:
    → 升级给用户(参见错误处理)
    → 未获得用户决策不得继续执行

Phase 5: Retry with Feedback (If Needed)

阶段5:基于反馈重试(如有需要)

Retry prompt template:
markdown
undefined
重试提示模板
markdown
undefined

Retry Required

需要重试

Your previous implementation did not pass judge verification.
你的上一次实现未通过评审验证。

Original Task

原始任务

{Original task description}
{原始任务描述}

Judge Feedback

评审反馈

VERDICT: FAIL SCORE: {score}/5.0 ISSUES: {list of issues from judge}
VERDICT: FAIL SCORE: {得分}/5.0 ISSUES: {评审发现的问题列表}

Your Previous Changes

你之前的变更

{files modified in previous attempt}
{上一次尝试中修改的文件}

Instructions

说明

Let's fix the identified issues step by step.
  1. Review each issue the judge identified
  2. For each issue, determine the root cause
  3. Plan the fix for each issue
  4. Implement ALL fixes
  5. Verify your fixes address each issue
  6. Provide updated Summary section
CRITICAL: Focus on fixing the specific issues identified. Do not rewrite everything.
undefined
让我们逐步修复发现的问题。
  1. 回顾评审发现的每个问题
  2. 针对每个问题确定根本原因
  3. 规划每个问题的修复方案
  4. 实现所有修复
  5. 验证修复解决了每个问题
  6. 提供更新后的Summary部分
关键注意事项:聚焦于修复发现的具体问题,不要重写所有内容。
undefined

Phase 6: Final Report

阶段6:最终报告

After task passes verification:
markdown
undefined
任务通过验证后:
markdown
undefined

Execution Summary

执行摘要

Task: {original task description} Result: ✅ PASS
任务: {原始任务描述} 结果: ✅ PASS

Verification

验证记录

AttemptScoreStatus
1{X.X}/5.0{PASS/FAIL}
2{X.X}/5.0{PASS/FAIL}
尝试次数得分状态
1{X.X}/5.0{PASS/FAIL}
2{X.X}/5.0{PASS/FAIL}

Files Modified

修改的文件

  • {file1}: {what changed}
  • {file2}: {what changed}
  • {文件1}: {变更内容}
  • {文件2}: {变更内容}

Key Changes

关键变更

  • {change 1}
  • {change 2}
  • {变更1}
  • {变更2}

Suggested Improvements (Optional)

建议的改进(可选)

{IMPROVEMENTS from judge, if any}
undefined
{评审提供的IMPROVEMENTS(如有)}
undefined

Error Handling

错误处理

If Max Retries Exceeded

达到最大重试次数时

When task fails verification twice:
  1. STOP - Do not proceed
  2. Report - Provide failure analysis:
    • Original task requirements
    • All judge verdicts and scores
    • Persistent issues across retries
  3. Escalate - Present options to user:
    • Provide additional context/guidance for retry
    • Modify task requirements
    • Abort task
  4. Wait - Do NOT proceed without user decision
Escalation Report Format:
markdown
undefined
当任务两次验证失败:
  1. 停止 - 不得继续执行
  2. 报告 - 提供失败分析:
    • 原始任务要求
    • 所有评审结果和得分
    • 多次尝试中持续存在的问题
  3. 升级 - 向用户提供选项:
    • 为重试提供额外上下文/指导
    • 修改任务要求
    • 中止任务
  4. 等待 - 未获得用户决策不得继续执行
升级报告格式
markdown
undefined

Task Failed Verification (Max Retries Exceeded)

任务验证失败(已达到最大重试次数)

Task Requirements

任务要求

{original task description}
{原始任务描述}

Verification History

验证历史

AttemptScoreKey Issues
1{X.X}/5.0{issues}
2{X.X}/5.0{issues}
3{X.X}/5.0{issues}
尝试次数得分关键问题
1{X.X}/5.0{问题}
2{X.X}/5.0{问题}
3{X.X}/5.0{问题}

Persistent Issues

持续存在的问题

{Issues that appeared in multiple attempts}
{多次尝试中出现的问题}

Options

选项

  1. Provide guidance - Give additional context for another retry
  2. Modify requirements - Simplify or clarify task
  3. Abort - Stop execution
Awaiting your decision...
undefined
  1. 提供指导 - 为下一次重试提供额外上下文
  2. 修改要求 - 简化或明确任务
  3. 中止 - 停止执行
等待你的决策……
undefined

Examples

示例

Example 1: Documentation Update (Pass on First Try)

示例1:文档更新(首次尝试通过)

Input:
/do-and-judge Rewrite the API authentication section in docs/api-reference.md to cover the new OAuth2 flow
Execution:
Phase 1: Task Analysis
  - Complexity: Medium (rewriting existing documentation with new technical flow)
  - Risk: Low (documentation only, no code changes)
  - Scope: Small (single file, focused section)
  → Model: opus
  → Agent type: general-purpose
    Reasoning: This is a documentation task — writing and restructuring
    prose, not implementing code. The sdd:developer agent is optimized
    for code implementation patterns, not technical writing. A
    general-purpose agent handles documentation tasks more effectively
    because it applies broader writing and reasoning skills without
    code-centric constraints.

Phase 2: Parallel Dispatch (single message, 2 tool calls)
  Tool call 1 — Meta-judge (Opus)...
    Meta-judge prompt sent:
    ┌─────────────────────────────────────────────────────────
    │ ## Task
    │ Generate an evaluation specification yaml for the
    │ following task. You will produce rubrics, checklists,
    │ and scoring criteria that a judge agent will use to
    │ evaluate the implementation artifact.
    │ CLAUDE_PLUGIN_ROOT=...
    │ ## User Prompt
    │ Rewrite the API authentication section in
    │ docs/api-reference.md to cover the new OAuth2 flow
    │ ## Context
    │ Existing docs/api-reference.md contains an outdated
    │ "Authentication" section describing API key auth.
    │ The codebase recently migrated to OAuth2 with PKCE.
    │ Related source: src/auth/oauth2.ts, src/auth/config.ts.
    │ ## Artifact Type
    │ documentation
    │ ## Instructions
    │ Return only the final evaluation specification YAML
    │ in your response.
    └─────────────────────────────────────────────────────────
    → Generated evaluation specification YAML
    → 3 rubric dimensions (accuracy, completeness, clarity)
    → 5 checklist items

  Tool call 2 — Implementation (general-purpose + Opus)...
    Implementation prompt sent (abbreviated):
    ┌─────────────────────────────────────────────────────────
    │ ## Reasoning Approach
    │ Before taking any action, think through this task
    │ systematically.
    │ [... step-by-step reasoning template ...]
    │ ## Task
    │ Rewrite the API authentication section in
    │ docs/api-reference.md to cover the new OAuth2 flow.
    │ Replace the outdated API key auth documentation with
    │ OAuth2 + PKCE flow documentation including token
    │ endpoints, scopes, refresh token handling, and
    │ example requests.
    │ ## Constraints
    │ - Follow existing documentation patterns and conventions
    │ - Make minimal changes to achieve the objective
    │ - Do not introduce new dependencies without justification
    │ - Ensure changes are testable
    │ ## Output
    │ Provide your implementation along with a "Summary"
    │ section containing:
    │ - Files modified (full paths)
    │ - Key changes (3-5 bullet points)
    │ - Any decisions made and rationale
    │ - Potential concerns or follow-up needed
    │ ## Self-Critique Verification (MANDATORY)
    │ [... verification questions and revision process ...]
    └─────────────────────────────────────────────────────────
    → Rewrote Authentication section in docs/api-reference.md
    → Added OAuth2 flow diagram, token endpoints, scopes table
    → Added code examples for authorization and token refresh
    → Summary: 1 file modified, authentication section rewritten

Phase 3: Dispatch Judge (with meta-judge specification)
  NOTE: No pre-existing changes — first task on a clean codebase.
  The "Pre-existing Changes" section is OMITTED from the judge prompt.

  Judge prompt sent:
  ┌─────────────────────────────────────────────────────────
  │ You are evaluating an implementation artifact against
  │ an evaluation specification produced by the meta judge.
  │ CLAUDE_PLUGIN_ROOT=...
  │ ## User Prompt
  │ Rewrite the API authentication section in
  │ docs/api-reference.md to cover the new OAuth2 flow
  │ ## Evaluation Specification
  │ ```yaml
  │ {meta-judge's evaluation specification YAML}
  │ ```
  │ ## Implementation Output
  │ Files: docs/api-reference.md (modified)
  │ Key changes: Replaced API key auth section with OAuth2
  │ + PKCE flow, added token endpoints, scopes table,
  │ and code examples for authorization and refresh...
  │ ## Instructions
  │ Follow your full judge process...
  └─────────────────────────────────────────────────────────

  Judge (sadd:judge + Opus)...
    → VERDICT: PASS, SCORE: 4.2/5.0
    → ISSUES: None
    → IMPROVEMENTS: Add error response examples for expired tokens

Phase 4: Parse Verdict
  → Score 4.2 ≥ 4.0 threshold → PASS
  → No retry needed (Phase 5 skipped)

Phase 6: Final Report
  ✅ PASS on attempt 1
  Files: docs/api-reference.md (modified)
输入:
/do-and-judge Rewrite the API authentication section in docs/api-reference.md to cover the new OAuth2 flow
执行:
阶段1:任务分析
  - 复杂度:中(使用新技术流程重写现有文档)
  - 风险:低(仅涉及文档,无代码变更)
  - 范围:小(单个文件、聚焦部分)
  → 模型:opus
  → 代理类型:general-purpose
    理由:这是文档任务——撰写和重构 prose,而非实现代码。sdd:developer代理针对代码实现模式优化,而非技术写作。通用代理能更有效地处理文档任务,因为它应用更广泛的写作和推理技能,不受代码中心约束。

阶段2:并行调度(单个消息,2个工具调用)
  工具调用1 — 元评审(Opus)……
    发送的元评审提示:
    ┌─────────────────────────────────────────────────────────
    │ ## 任务
    │ 为以下任务生成评估规范yaml文件。你需要生成评审准则、检查清单、
    │ 和评分标准,供评审代理用于评估实现工件。
    │ CLAUDE_PLUGIN_ROOT=...
    │ ## 用户提示
    │ Rewrite the API authentication section in
    │ docs/api-reference.md to cover the new OAuth2 flow
    │ ## 上下文
    │ 现有docs/api-reference.md包含过时的
    │ "Authentication"部分,描述API密钥认证。
    │ 代码库最近迁移到带有PKCE的OAuth2。
    │ 相关源码:src/auth/oauth2.ts, src/auth/config.ts。
    │ ## 工件类型
    │ documentation
    │ ## 说明
    │ 在响应中仅返回最终的评估规范YAML文件。
    └─────────────────────────────────────────────────────────
    → 生成评估规范YAML
    → 3个评估维度(准确性、完整性、清晰度)
    → 5个检查项

  工具调用2 — 实现(general-purpose + Opus)……
    发送的实现提示(缩写):
    ┌─────────────────────────────────────────────────────────
    │ ## 推理方法
    │ 在采取任何行动前,系统地思考此任务。
    │ [... 逐步推理模板 ...]
    │ ## 任务
    │ Rewrite the API authentication section in
    │ docs/api-reference.md to cover the new OAuth2 flow.
    │ 替换过时的API密钥认证文档,
    │ 包含OAuth2 + PKCE流程文档,包括令牌
    │ 端点、作用域、刷新令牌处理和
    │ 请求示例。
    │ ## 约束条件
    │ - 遵循现有文档模式和规范
    │ - 以最小变更实现目标
    │ - 无正当理由不得引入新依赖
    │ - 确保变更可测试
    │ ## 输出
    │ 提供实现内容及包含以下部分的"Summary"
    │ 部分:
    │ - 修改的文件(完整路径)
    │ - 关键变更(3-5个要点)
    │ - 做出的决策及理由
    │ - 潜在问题或后续工作需求
    │ ## 自我检查验证(必填)
    │ [... 验证问题和修改流程 ...]
    └─────────────────────────────────────────────────────────
    → 重写docs/api-reference.md中的Authentication部分
    → 添加OAuth2流程图、令牌端点、作用域表
    → 添加授权和令牌刷新的代码示例
    → 摘要:修改1个文件,重写认证部分

阶段3:调度评审代理(使用元评审规范)
  注意:无预变更——在干净代码库上执行的第一个任务。
  评审提示中省略"Pre-existing Changes"部分。

  发送的评审提示:
  ┌─────────────────────────────────────────────────────────
  │ 你需要基于元评审生成的评估规范评估实现工件。
  │ CLAUDE_PLUGIN_ROOT=...
  │ ## 用户提示
  │ Rewrite the API authentication section in
  │ docs/api-reference.md to cover the new OAuth2 flow
  │ ## 评估规范
  │ ```yaml
  │ {元评审的评估规范YAML}
  │ ```
  │ ## 实现输出
  │ 文件:docs/api-reference.md(修改)
  │ 关键变更:将API密钥认证部分替换为OAuth2
  │ + PKCE流程,添加令牌端点、作用域表、
  │ 授权和刷新的代码示例……
  │ ## 说明
  │ 遵循你的完整评审流程……
  └─────────────────────────────────────────────────────────

  评审代理(sadd:judge + Opus)……
    → VERDICT: PASS,得分:4.2/5.0
    → ISSUES: 无
    → IMPROVEMENTS: 添加过期令牌的错误响应示例

阶段4:解析评审结果
  → 得分4.2 ≥ 4.0阈值 → PASS
  → 无需重试(跳过阶段5)

阶段6:最终报告
  ✅ 首次尝试通过
  文件:docs/api-reference.md(修改)

Example 2: Complex Task (Pass After Retry)

示例2:复杂任务(重试后通过)

Input:
/do-and-judge Implement rate limiting middleware with configurable limits per endpoint
Execution:
Phase 1: Task Analysis
  - Complexity: High (new feature, multiple concerns)
  - Risk: High (affects all endpoints)
  - Scope: Medium (single middleware)
  → Model: opus

Phase 2: Parallel Dispatch (Attempt 1)
  Tool call 1 — Meta-judge (Opus)...
    → Generated evaluation specification YAML
    → 4 rubric dimensions, 8 checklist items
  Tool call 2 — Implementation (sdd:developer + Opus)...
    → Created RateLimiter middleware
    → Added configuration schema

Phase 3: Dispatch Judge (with meta-judge specification)
  Judge (sadd:judge + Opus)...
    → VERDICT: FAIL, SCORE: 3.1/5.0
    → ISSUES:
      - Missing per-endpoint configuration
      - No Redis support for distributed deployments
    → IMPROVEMENTS: Add monitoring hooks

Phase 5: Retry with Feedback
  Implementation (sadd:meta-judge + Opus)...
    → Added endpoint-specific limits
    → Added Redis adapter option

Phase 3: Dispatch Judge (Attempt 2, same meta-judge specification)
  Judge (sadd:judge + Opus)...
    → VERDICT: PASS, SCORE: 4.4/5.0
    → IMPROVEMENTS: Add metrics export

Phase 6: Final Report
  ✅ PASS on attempt 2
  Files: RateLimiter.ts, config/rateLimits.ts, adapters/RedisAdapter.ts
输入:
/do-and-judge Implement rate limiting middleware with configurable limits per endpoint
执行:
阶段1:任务分析
  - 复杂度:高(新功能、多关注点)
  - 风险:高(影响所有端点)
  - 范围:中(单个中间件)
  → 模型:opus

阶段2:并行调度(首次尝试)
  工具调用1 — 元评审(Opus)……
    → 生成评估规范YAML
    → 4个评估维度,8个检查项
  工具调用2 — 实现(sdd:developer + Opus)……
    → 创建RateLimiter中间件
    → 添加配置模式

阶段3:调度评审代理(使用元评审规范)
  评审代理(sadd:judge + Opus)……
    → VERDICT: FAIL,得分:3.1/5.0
    → ISSUES:
      - 缺少按端点配置的功能
      - 不支持分布式部署的Redis
    → IMPROVEMENTS: 添加监控钩子

阶段5:基于反馈重试
  实现代理(sdd:meta-judge + Opus)……
    → 添加端点特定限制
    → 添加Redis适配器选项

阶段3:调度评审代理(第二次尝试,使用相同元评审规范)
  评审代理(sadd:judge + Opus)……
    → VERDICT: PASS,得分:4.4/5.0
    → IMPROVEMENTS: 添加指标导出

阶段6:最终报告
  ✅ 第二次尝试通过
  文件:RateLimiter.ts, config/rateLimits.ts, adapters/RedisAdapter.ts

Example 3: Task Requiring Escalation

示例3:需要升级的任务

Input:
/do-and-judge Migrate the database schema to support multi-tenancy
Execution:
Phase 1: Task Analysis
  - Complexity: High
  - Risk: High (database schema change)
  → Model: opus

Phase 2: Parallel Dispatch
  Meta-judge → evaluation specification YAML
  Implementation → initial migration scaffolding

Attempt 1: FAIL (2.8/5.0) - Missing tenant isolation in queries
Attempt 2: FAIL (3.2/5.0) - Incomplete migration script
Attempt 3: FAIL (3.3/5.0) - Edge cases in existing data migration

ESCALATION:
  Persistent issue: Existing data migration requires business decisions
  about how to handle orphaned records.

  Options presented to user:
  1. Provide guidance on orphan handling
  2. Simplify to new tenants only
  3. Abort

User chose: Option 1 - "Delete orphaned records older than 1 year"

Attempt 4 (with guidance): PASS (4.1/5.0)
输入:
/do-and-judge Migrate the database schema to support multi-tenancy
执行:
阶段1:任务分析
  - 复杂度:高
  - 风险:高(数据库 schema 变更)
  → 模型:opus

阶段2:并行调度
  元评审 → 评估规范YAML
  实现 → 初始迁移框架

首次尝试:FAIL(2.8/5.0)- 查询中缺少租户隔离
第二次尝试:FAIL(3.2/5.0)- 迁移脚本不完整
第三次尝试:FAIL(3.3/5.0)- 现有数据迁移存在边缘情况

升级:
  持续问题:现有数据迁移需要关于如何处理孤立记录的业务决策。

  向用户提供的选项:
  1. 提供关于孤立记录处理的指导
  2. 简化为仅支持新租户
  3. 中止

用户选择:选项1 - "删除超过1年的孤立记录"

第四次尝试(基于指导):PASS(4.1/5.0)

Example 4: Sequential do-and-judge Runs (Pre-existing Changes from Previous Task)

示例4:连续do-and-judge运行(前序任务的预变更)

Input (first run):
/do-and-judge add basic authentication module
Execution (first run):
Phase 1: Task Analysis
  - Complexity: High (new feature, security-sensitive)
  - Risk: High (authentication is critical)
  - Scope: Medium (new module)
  → Model: opus
  - Pre-existing Changes: None

Phase 2: Parallel Dispatch (Attempt 1)
  Tool call 1 — Meta-judge (Opus)...
    Meta-judge prompt sent:
    ┌─────────────────────────────────────────────────────────
    │ ## Task
    │ Generate an evaluation specification yaml for the
    │ following task. You will produce rubrics, checklists,
    │ and scoring criteria that a judge agent will use to
    │ evaluate the implementation artifact.
    │ CLAUDE_PLUGIN_ROOT=...
    │ ## User Prompt
    │ Add basic authentication module
    │ ## Context
    │ Express.js backend, src/auth/ directory does not exist
    │ yet. Existing middleware pattern in src/middleware/.
    │ ## Artifact Type
    │ code
    │ ## Instructions
    │ Return only the final evaluation specification YAML
    │ in your response.
    └─────────────────────────────────────────────────────────
    → Generated evaluation specification YAML
    → 4 rubric dimensions, 7 checklist items

  Tool call 2 — Implementation (sdd:developer + Opus)...
    Implementation prompt sent (abbreviated):
    ┌─────────────────────────────────────────────────────────
    │ ## Reasoning Approach
    │ Before taking any action, think through this task
    │ systematically.
    │ [... step-by-step reasoning template ...]
    │ ## Task
    │ Add basic authentication module to the Express.js
    │ backend. Create login, logout, and register endpoints
    │ with proper middleware for route protection.
    │ ## Constraints
    │ - Follow existing code patterns and conventions
    │ - Make minimal changes to achieve the objective
    │ - Do not introduce new dependencies without
    │   justification
    │ - Ensure changes are testable
    │ ## Output
    │ Provide your implementation along with a "Summary"
    │ section containing:
    │ - Files modified (full paths)
    │ - Key changes (3-5 bullet points)
    │ - Any decisions made and rationale
    │ - Potential concerns or follow-up needed
    │ ## Self-Critique Verification (MANDATORY)
    │ [... verification questions and revision process ...]
    └─────────────────────────────────────────────────────────
    → Created src/auth/AuthService.ts
    → Created src/auth/AuthMiddleware.ts
    → Created src/auth/auth.routes.ts
    → Modified src/app.ts
    → Summary: 4 files changed, auth module added

Phase 3: Dispatch Judge (with meta-judge specification)
  NOTE: No pre-existing changes — this is the first task on a clean codebase.
  The "Pre-existing Changes" section is OMITTED from the judge prompt.

  Judge prompt sent:
  ┌─────────────────────────────────────────────────────────
  │ You are evaluating an implementation artifact against
  │ an evaluation specification produced by the meta judge.
  │ CLAUDE_PLUGIN_ROOT=...
  │ ## User Prompt
  │ Add basic authentication module
  │ ## Evaluation Specification
  │ ```yaml
  │ {meta-judge's evaluation specification YAML}
  │ ```
  │ ## Implementation Output
  │ Files: src/auth/AuthService.ts (new), ...
  │ Key changes: Added login/logout/register endpoints...
  │ ## Instructions
  │ Follow your full judge process...
  └─────────────────────────────────────────────────────────

  Judge (sadd:judge + Opus)...
    → VERDICT: FAIL, SCORE: 3.0/5.0
    → ISSUES:
      - Missing password hashing (plain-text storage)
      - No unit tests for AuthService
    → IMPROVEMENTS: Add rate limiting on login endpoint

Phase 5: Retry with Feedback (Attempt 2)
  Implementation (sdd:developer + Opus)...
    → Added bcrypt password hashing
    → Created tests/auth/AuthService.test.ts
    → Summary: 2 files modified, 1 file created

Phase 3: Dispatch Judge (Attempt 2, same meta-judge specification)
  NOTE: This is a retry within the SAME task — do NOT include the
  implementation agent's previous attempt as "pre-existing changes".
  The "Pre-existing Changes" section is still OMITTED.

  Judge (sadd:judge + Opus)...
    → VERDICT: PASS, SCORE: 4.3/5.0
    → IMPROVEMENTS: Add integration tests

Phase 6: Final Report
  ✅ PASS on attempt 2
  Files: AuthService.ts, AuthMiddleware.ts, auth.routes.ts,
         AuthService.test.ts, app.ts
Input (second run, same session):
/do-and-judge refactor auth module to use dependency injection
Execution (second run):
Phase 1: Task Analysis
  - Complexity: Medium (refactoring existing code)
  - Risk: Medium (modifying working auth module)
  - Scope: Medium (single module refactor)
  → Model: opus
  - Pre-existing Changes: Auth module created in previous task

Phase 2: Parallel Dispatch
  Tool call 1 — Meta-judge (Opus)...
    Meta-judge prompt sent:
    ┌─────────────────────────────────────────────────────────
    │ ## Task
    │ Generate an evaluation specification yaml for the
    │ following task. You will produce rubrics, checklists,
    │ and scoring criteria that a judge agent will use to
    │ evaluate the implementation artifact.
    │ CLAUDE_PLUGIN_ROOT=...
    │ ## User Prompt
    │ Refactor auth module to use dependency injection
    │ ## Context
    │ Existing auth module at src/auth/ with AuthService,
    │ AuthMiddleware, auth.routes. Tests in tests/auth/.
    │ ## Artifact Type
    │ code
    │ ## Instructions
    │ Return only the final evaluation specification YAML
    │ in your response.
    └─────────────────────────────────────────────────────────
    → Generated evaluation specification YAML
    → 3 rubric dimensions, 5 checklist items

  Tool call 2 — Implementation (sdd:developer + Opus)...
    Implementation prompt sent (abbreviated):
    ┌─────────────────────────────────────────────────────────
    │ ## Reasoning Approach
    │ Before taking any action, think through this task
    │ systematically.
    │ [... step-by-step reasoning template ...]
    │ ## Task
    │ Refactor the auth module to use dependency injection.
    │ AuthService should accept its dependencies via
    │ constructor instead of importing them directly.
    │ ## Constraints
    │ - Follow existing code patterns and conventions
    │ - Make minimal changes to achieve the objective
    │ - Do not introduce new dependencies without
    │   justification
    │ - Ensure changes are testable
    │ ## Output
    │ Provide your implementation along with a "Summary"
    │ section containing:
    │ - Files modified (full paths)
    │ - Key changes (3-5 bullet points)
    │ - Any decisions made and rationale
    │ - Potential concerns or follow-up needed
    │ ## Self-Critique Verification (MANDATORY)
    │ [... verification questions and revision process ...]
    └─────────────────────────────────────────────────────────
    → Refactored AuthService to accept dependencies via constructor
    → Created src/auth/AuthServiceFactory.ts
    → Updated tests to use mocked dependencies
    → Summary: 4 files modified, 1 file created

Phase 3: Dispatch Judge (with meta-judge specification)
  NOTE: Pre-existing changes detected — the previous do-and-judge run
  created the auth module. Include "Pre-existing Changes" section so the
  judge does not confuse prior work with the current refactoring task.

  Judge prompt sent:
  ┌─────────────────────────────────────────────────────────
  │ You are evaluating an implementation artifact against
  │ an evaluation specification produced by the meta judge.
  │ CLAUDE_PLUGIN_ROOT=...
  │ ## User Prompt
  │ Refactor auth module to use dependency injection
  │ ## Pre-existing Changes (Context Only)
  │ The following changes were made BEFORE the current
  │ implementation agent started working. They are NOT part
  │ of the current task's output. Focus your evaluation on
  │ the current task's changes. Only verify pre-existing
  │ changed files/logic if they directly relate to the
  │ current task requirements.
  │ ### Previous Task: "Add basic authentication module"
  │ The following files were created/modified as part of a
  │ previous task:
  │ - src/auth/AuthService.ts (new) - Authentication service
  │   with login/logout/register
  │ - src/auth/AuthMiddleware.ts (new) - Express middleware
  │   for route protection
  │ - src/auth/auth.routes.ts (new) - Auth API routes
  │ - tests/auth/AuthService.test.ts (new) - Unit tests for
  │   auth service
  │ - src/app.ts (modified) - Integrated auth routes and
  │   middleware
  │ These files exist in the codebase and may be modified by
  │ the current task, but you should evaluate only the
  │ changes made by the current implementation agent for the
  │ current task (refactoring to dependency injection).
  │ ## Evaluation Specification
  │ ```yaml
  │ {meta-judge's evaluation specification YAML}
  │ ```
  │ ## Implementation Output
  │ Files: src/auth/AuthService.ts (modified), ...
  │ Key changes: Refactored to constructor injection...
  │ ## Instructions
  │ Follow your full judge process...
  └─────────────────────────────────────────────────────────

  Judge (sadd:judge + Opus)...
    → VERDICT: PASS, SCORE: 4.5/5.0
    → ISSUES: None
    → IMPROVEMENTS: Add interface documentation

Phase 6: Final Report
  ✅ PASS on attempt 1
  Files: AuthService.ts (modified), AuthServiceFactory.ts (new),
         AuthMiddleware.ts (modified), AuthService.test.ts (modified),
         app.ts (modified)
输入(首次运行):
/do-and-judge add basic authentication module
执行(首次运行):
阶段1:任务分析
  - 复杂度:高(新功能、安全敏感)
  - 风险:高(认证至关重要)
  - 范围:中(新模块)
  → 模型:opus
  - 预变更:无

阶段2:并行调度(首次尝试)
  工具调用1 — 元评审(Opus)……
    发送的元评审提示:
    ┌─────────────────────────────────────────────────────────
    │ ## 任务
    │ 为以下任务生成评估规范yaml文件。你需要生成评审准则、检查清单、
    │ 和评分标准,供评审代理用于评估实现工件。
    │ CLAUDE_PLUGIN_ROOT=...
    │ ## 用户提示
    │ Add basic authentication module
    │ ## 上下文
    │ Express.js后端,src/auth/目录尚不存在。
    │ src/middleware/中存在现有中间件模式。
    │ ## 工件类型
    │ code
    │ ## 说明
    │ 在响应中仅返回最终的评估规范YAML文件。
    └─────────────────────────────────────────────────────────
    → 生成评估规范YAML
    → 4个评估维度,7个检查项

  工具调用2 — 实现(sdd:developer + Opus)……
    发送的实现提示(缩写):
    ┌─────────────────────────────────────────────────────────
    │ ## 推理方法
    │ 在采取任何行动前,系统地思考此任务。
    │ [... 逐步推理模板 ...]
    │ ## 任务
    │ 为Express.js后端添加基础认证模块。创建登录、登出和注册端点,
    │ 并提供适当的路由保护中间件。
    │ ## 约束条件
    │ - 遵循现有代码模式和规范
    │ - 以最小变更实现目标
    │ - 无正当理由不得引入新依赖
    │ - 确保变更可测试
    │ ## 输出
    │ 提供实现内容及包含以下部分的"Summary"
    │ 部分:
    │ - 修改的文件(完整路径)
    │ - 关键变更(3-5个要点)
    │ - 做出的决策及理由
    │ - 潜在问题或后续工作需求
    │ ## 自我检查验证(必填)
    │ [... 验证问题和修改流程 ...]
    └─────────────────────────────────────────────────────────
    → 创建src/auth/AuthService.ts
    → 创建src/auth/AuthMiddleware.ts
    → 创建src/auth/auth.routes.ts
    → 修改src/app.ts
    → 摘要:4个文件变更,添加认证模块

阶段3:调度评审代理(使用元评审规范)
  注意:无预变更——这是在干净代码库上执行的第一个任务。
  评审提示中省略"Pre-existing Changes"部分。

  发送的评审提示:
  ┌─────────────────────────────────────────────────────────
  │ 你需要基于元评审生成的评估规范评估实现工件。
  │ CLAUDE_PLUGIN_ROOT=...
  │ ## 用户提示
  │ Add basic authentication module
  │ ## 评估规范
  │ ```yaml
  │ {元评审的评估规范YAML}
  │ ```
  │ ## 实现输出
  │ 文件:src/auth/AuthService.ts(新增),...
  │ 关键变更:添加登录/登出/注册端点...
  │ ## 说明
  │ 遵循你的完整评审流程……
  └─────────────────────────────────────────────────────────

  评审代理(sadd:judge + Opus)……
    → VERDICT: FAIL,得分:3.0/5.0
    → ISSUES:
      - 缺少密码哈希(明文存储)
      - 无AuthService单元测试
    → IMPROVEMENTS: 为登录端点添加限流

阶段5:基于反馈重试(第二次尝试)
  实现代理(sdd:developer + Opus)……
    → 添加bcrypt密码哈希
    → 创建tests/auth/AuthService.test.ts
    → 摘要:修改2个文件,创建1个文件

阶段3:调度评审代理(第二次尝试,使用相同元评审规范)
  注意:这是同一任务内的重试——不要将实现代理之前的尝试作为"预变更"。
  仍省略"Pre-existing Changes"部分。

  评审代理(sadd:judge + Opus)……
    → VERDICT: PASS,得分:4.3/5.0
    → IMPROVEMENTS: 添加集成测试

阶段6:最终报告
  ✅ 第二次尝试通过
  文件:AuthService.ts, AuthMiddleware.ts, auth.routes.ts,
         AuthService.test.ts, app.ts
输入(第二次运行,同一会话):
/do-and-judge refactor auth module to use dependency injection
执行(第二次运行):
阶段1:任务分析
  - 复杂度:中(重构现有代码)
  - 风险:中(修改正常工作的认证模块)
  - 范围:中(单个模块重构)
  → 模型:opus
  - 预变更:前序任务创建了认证模块

阶段2:并行调度
  工具调用1 — 元评审(Opus)……
    发送的元评审提示:
    ┌─────────────────────────────────────────────────────────
    │ ## 任务
    │ 为以下任务生成评估规范yaml文件。你需要生成评审准则、检查清单、
    │ 和评分标准,供评审代理用于评估实现工件。
    │ CLAUDE_PLUGIN_ROOT=...
    │ ## 用户提示
    │ Refactor auth module to use dependency injection
    │ ## 上下文
    │ 现有认证模块位于src/auth/,包含AuthService、
    │ AuthMiddleware、auth.routes。测试位于tests/auth/。
    │ ## 工件类型
    │ code
    │ ## 说明
    │ 在响应中仅返回最终的评估规范YAML文件。
    └─────────────────────────────────────────────────────────
    → 生成评估规范YAML
    → 3个评估维度,5个检查项

  工具调用2 — 实现(sdd:developer + Opus)……
    发送的实现提示(缩写):
    ┌─────────────────────────────────────────────────────────
    │ ## 推理方法
    │ 在采取任何行动前,系统地思考此任务。
    │ [... 逐步推理模板 ...]
    │ ## 任务
    │ 重构认证模块以使用依赖注入。AuthService应通过
    │ 构造函数接受依赖,而非直接导入。
    │ ## 约束条件
    │ - 遵循现有代码模式和规范
    │ - 以最小变更实现目标
    │ - 无正当理由不得引入新依赖
    │ - 确保变更可测试
    │ ## 输出
    │ 提供实现内容及包含以下部分的"Summary"
    │ 部分:
    │ - 修改的文件(完整路径)
    │ - 关键变更(3-5个要点)
    │ - 做出的决策及理由
    │ - 潜在问题或后续工作需求
    │ ## 自我检查验证(必填)
    │ [... 验证问题和修改流程 ...]
    └─────────────────────────────────────────────────────────
    → 重构AuthService以通过构造函数接受依赖
    → 创建src/auth/AuthServiceFactory.ts
    → 更新测试以使用模拟依赖
    → 摘要:修改4个文件,创建1个文件

阶段3:调度评审代理(使用元评审规范)
  注意:检测到预变更——前序do-and-judge运行创建了认证模块。包含"Pre-existing Changes"部分,以便评审代理不会将之前的工作与当前重构任务混淆。

  发送的评审提示:
  ┌─────────────────────────────────────────────────────────
  │ 你需要基于元评审生成的评估规范评估实现工件。
  │ CLAUDE_PLUGIN_ROOT=...
  │ ## 用户提示
  │ Refactor auth module to use dependency injection
  │ ## Pre-existing Changes(仅作为上下文)
  │ 以下变更在当前实现代理开始工作前已完成,不属于当前任务的输出。请将评估聚焦于当前任务的变更。仅当预变更的文件/逻辑与当前任务要求直接相关时才进行验证。
  │ ### Previous Task: "Add basic authentication module"
  │ 以下文件作为前序任务的一部分被创建/修改:
  │ - src/auth/AuthService.ts(新增)- 包含登录/登出/注册的认证服务
  │ - src/auth/AuthMiddleware.ts(新增)- 用于路由保护的Express中间件
  │ - src/auth/auth.routes.ts(新增)- 认证API路由
  │ - tests/auth/AuthService.test.ts(新增)- 认证服务单元测试
  │ - src/app.ts(修改)- 集成认证路由和中间件
  │ 这些文件存在于代码库中,可能会被当前任务修改,但你应仅评估当前实现代理为当前任务(重构为依赖注入)所做的变更。
  │ ## 评估规范
  │ ```yaml
  │ {元评审的评估规范YAML}
  │ ```
  │ ## 实现输出
  │ 文件:src/auth/AuthService.ts(修改),...
  │ 关键变更:重构为构造函数注入...
  │ ## 说明
  │ 遵循你的完整评审流程……
  └─────────────────────────────────────────────────────────

  评审代理(sadd:judge + Opus)……
    → VERDICT: PASS,得分:4.5/5.0
    → ISSUES: 无
    → IMPROVEMENTS: 添加接口文档

阶段6:最终报告
  ✅ 首次尝试通过
  文件:AuthService.ts(修改),AuthServiceFactory.ts(新增),
         AuthMiddleware.ts(修改),AuthService.test.ts(修改),
         app.ts(修改)

Example 5: User-Modified Codebase Before do-and-judge

示例5:do-and-judge前用户修改的代码库

Scenario:
The user has been working on an e-commerce codebase during the conversation. They modified the shopping cart, product catalog, and checkout flow before invoking do-and-judge.
Input:
/do-and-judge fix shopping cart module bug when it adds duplicated items
Execution:
Phase 1: Task Analysis
  - Complexity: Medium (bug fix in existing module)
  - Risk: Medium (cart logic affects checkout)
  - Scope: Small (focused bug fix)
  → Model: opus
  - Pre-existing Changes: User modified several files before this task

Phase 2: Parallel Dispatch
  Tool call 1 — Meta-judge (Opus)...
    → Generated evaluation specification YAML
    → 3 rubric dimensions, 5 checklist items
  Tool call 2 — Implementation (sdd:developer + Opus)...
    → Fixed duplicate detection in CartService.addItem()
    → Added deduplication guard in cart.routes.ts
    → Added regression test for duplicate item scenario
    → Summary: 3 files modified

Phase 3: Dispatch Judge (with meta-judge specification)
  NOTE: The orchestrator is aware from git diff/status that the user
  modified several files before this task. Include "Pre-existing Changes"
  section so the judge focuses only on the bug fix.

  Judge prompt sent:
  ┌─────────────────────────────────────────────────────────
  │ You are evaluating an implementation artifact against
  │ an evaluation specification produced by the meta judge.
  │ CLAUDE_PLUGIN_ROOT=...
  │ ## User Prompt
  │ Fix shopping cart module bug when it adds duplicated items
  │ ## Pre-existing Changes (Context Only)
  │ The following changes were made BEFORE the current
  │ implementation agent started working. They are NOT part
  │ of the current task's output. Focus your evaluation on
  │ the current task's changes. Only verify pre-existing
  │ changed files/logic if they directly relate to the
  │ current task requirements.
  │ ### User modifications (before current task)
  │ The user made changes to the following files/modules
  │ before this task was started:
  │ - src/cart/CartService.ts (modified) - Shopping cart
  │   business logic updates
  │ - src/cart/cart.routes.ts (modified) - Updated cart API
  │   endpoints
  │ - src/products/ProductCatalog.ts (modified) - Product
  │   listing changes
  │ - src/checkout/CheckoutFlow.ts (modified) - Checkout
  │   process updates
  │ - tests/cart/CartService.test.ts (modified) - Updated
  │   cart tests
  │ The current task focuses specifically on fixing the
  │ duplicate items bug in the shopping cart module.
  │ Pre-existing changes to cart files may overlap with the
  │ current task scope — evaluate whether the implementation
  │ agent's changes correctly address the bug without
  │ breaking the pre-existing modifications.
  │ ## Evaluation Specification
  │ ```yaml
  │ {meta-judge's evaluation specification YAML}
  │ ```
  │ ## Implementation Output
  │ Files: src/cart/CartService.ts (modified), ...
  │ Key changes: Added duplicate item detection...
  │ ## Instructions
  │ Follow your full judge process...
  └─────────────────────────────────────────────────────────

  Judge (sadd:judge + Opus)...
    → VERDICT: PASS, SCORE: 4.1/5.0
    → ISSUES: None
    → IMPROVEMENTS: Consider extracting deduplication logic
      into a shared utility

Phase 6: Final Report
  ✅ PASS on attempt 1
  Files: CartService.ts (modified), cart.routes.ts (modified),
         CartService.test.ts (modified)
场景:
用户在对话期间一直在处理电商代码库,在调用do-and-judge前修改了购物车、产品目录和结账流程。
输入:
/do-and-judge fix shopping cart module bug when it adds duplicated items
执行:
阶段1:任务分析
  - 复杂度:中(现有模块中的bug修复)
  - 风险:中(购物车逻辑影响结账)
  - 范围:小(聚焦式bug修复)
  → 模型:opus
  - 预变更:用户在此任务前修改了多个文件

阶段2:并行调度
  工具调用1 — 元评审(Opus)……
    → 生成评估规范YAML
    → 3个评估维度,5个检查项
  工具调用2 — 实现(sdd:developer + Opus)……
    → 修复CartService.addItem()中的重复检测
    → 在cart.routes.ts中添加去重防护
    → 添加重复项场景的回归测试
    → 摘要:修改3个文件

阶段3:调度评审代理(使用元评审规范)
  注意:编排者从git diff/status中知晓用户在此任务前修改了多个文件。包含"Pre-existing Changes"部分,以便评审代理仅聚焦于bug修复。

  发送的评审提示:
  ┌─────────────────────────────────────────────────────────
  │ 你需要基于元评审生成的评估规范评估实现工件。
  │ CLAUDE_PLUGIN_ROOT=...
  │ ## 用户提示
  │ Fix shopping cart module bug when it adds duplicated items
  │ ## Pre-existing Changes(仅作为上下文)
  │ 以下变更在当前实现代理开始工作前已完成,不属于当前任务的输出。请将评估聚焦于当前任务的变更。仅当预变更的文件/逻辑与当前任务要求直接相关时才进行验证。
  │ ### User modifications (before current task)
  │ 用户在此任务开始前修改了以下文件/模块:
  │ - src/cart/CartService.ts(修改)- 购物车业务逻辑更新
  │ - src/cart/cart.routes.ts(修改)- 更新购物车API端点
  │ - src/products/ProductCatalog.ts(修改)- 产品列表变更
  │ - src/checkout/CheckoutFlow.ts(修改)- 结账流程更新
  │ - tests/cart/CartService.test.ts(修改)- 更新购物车测试
  │ 当前任务专门聚焦于修复购物车模块中添加重复项的bug。购物车文件的预变更可能与当前任务范围重叠——评估实现代理的变更是否正确修复了bug且未破坏预变更。
  │ ## 评估规范
  │ ```yaml
  │ {元评审的评估规范YAML}
  │ ```
  │ ## 实现输出
  │ 文件:src/cart/CartService.ts(修改),...
  │ 关键变更:添加重复项检测...
  │ ## 说明
  │ 遵循你的完整评审流程……
  └─────────────────────────────────────────────────────────

  评审代理(sadd:judge + Opus)……
    → VERDICT: PASS,得分:4.1/5.0
    → ISSUES: 无
    → IMPROVEMENTS: 考虑将去重逻辑提取到共享工具中

阶段6:最终报告
  ✅ 首次尝试通过
  文件:CartService.ts(修改),cart.routes.ts(修改),
         CartService.test.ts(修改)

Best Practices

最佳实践

Model Selection

模型选择

  • When in doubt, use Opus - Quality matters more than cost for verified work
  • Match complexity - Don't use Opus for simple transformations
  • Consider risk - Higher risk = stronger model
  • 不确定时使用Opus - 对于需要验证的工作,质量比成本更重要
  • 匹配复杂度 - 不要将Opus用于简单转换
  • 考虑风险 - 风险越高,使用越强的模型

Meta-Judge + Judge Verification

元评审+评审验证

  • Never skip meta-judge - Tailored evaluation criteria produce better judgments than generic ones
  • Reuse meta-judge spec on retries - The evaluation specification stays constant across retry attempts; only the implementation changes
  • Parse only headers from judge - Don't read full reports to avoid context pollution
  • Trust the threshold - 4/5.0 is the quality gate
  • Include CLAUDE_PLUGIN_ROOT - Both meta-judge and judge need the resolved plugin root path
  • 绝不跳过元评审 - 定制化评估标准比通用标准能产生更好的评审结果
  • 重试时重用元评审规范 - 评估规范在重试尝试中保持不变;仅实现内容变更
  • 仅解析评审的标题部分 - 不要阅读完整报告以避免上下文污染
  • 信任阈值 - 4/5.0是质量门禁
  • 包含CLAUDE_PLUGIN_ROOT - 元评审和评审代理都需要解析后的插件根路径

Iteration

迭代

  • Focus fixes - Don't rewrite everything, fix specific issues
  • Pass feedback verbatim - Let the implementation agent see exact issues
  • Same meta-judge spec - Do NOT re-run meta-judge on retries; the evaluation criteria don't change
  • Escalate appropriately - Don't loop forever on fundamental problems
  • 聚焦修复 - 不要重写所有内容,修复具体问题
  • 逐字传递反馈 - 让实现代理看到确切的问题
  • 使用相同元评审规范 - 重试时不要重新运行元评审;评估标准不变
  • 适当升级 - 不要在根本性问题上无限循环

Context Management

上下文管理

  • Keep it clean - You orchestrate, sub-agents implement
  • Summarize, don't copy - Pass summaries, not full file contents
  • Trust sub-agents - They can read files themselves
  • Meta-judge YAML - Pass only the meta-judge YAML to the judge, do not add any additional text or comments to it!
  • Track pre-existing changes - Pass context about prior modifications to the judge to prevent attribution confusion between pre-existing and current changes
  • 保持干净 - 你负责编排,子代理负责实现
  • 总结而非复制 - 传递摘要,而非完整文件内容
  • 信任子代理 - 它们可以自行读取文件
  • 元评审YAML - 仅将元评审YAML传递给评审代理,不要添加任何额外文本或注释!
  • 跟踪预变更 - 向评审代理传递关于之前修改的上下文,以避免混淆预变更和当前变更的归属