implement-task
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseImplement Task with Verification
实现带验证的任务执行
Your job is to implement solution in best quality using task specification and sub-agents. You MUST NOT stop until it critically neccesary or you are done! Avoid asking questions until it is critically neccesary! Launch implementation agent, judges, iterate till issues are fixed and then move to next step!
Execute task implementation steps with automated quality verification using LLM-as-Judge for critical artifacts.
你的任务是利用任务规范和子代理,以最高质量实现解决方案。除非绝对必要或任务完成,否则不得停止!除非绝对必要,否则避免提问!启动执行代理、评审代理,迭代直至问题解决,再进入下一步!
通过LLM-as-Judge对关键工件执行自动化质量验证,完成任务执行步骤。
User Input
用户输入
text
$ARGUMENTStext
$ARGUMENTSCommand Arguments
命令参数
Parse the following arguments from :
$ARGUMENTS从中解析以下参数:
$ARGUMENTSArgument Definitions
参数定义
| Argument | Format | Default | Description |
|---|---|---|---|
| Path or filename | Auto-detect | Task file name or path (e.g., |
| | None | Continue implementation from last completed step. Launches judge first to verify state, then iterates with implementation agent. |
| | | Incremental refinement mode - detect changes against git and re-implement only affected steps (from modified step onwards). |
| | None | Steps after which to pause for human verification. If no steps specified, pauses after every step. |
| | | Target threshold value (out of 5.0). Single value sets both. Two comma-separated values set standard,critical. |
| | | Maximum fix→verify cycles per step. Default is 3 iterations. Set to |
| | | Skip all judge validation checks - steps proceed without quality gates. |
| 参数 | 格式 | 默认值 | 描述 |
|---|---|---|---|
| 路径或文件名 | 自动检测 | 任务文件名或路径(例如: |
| | 无 | 从上一个已完成步骤继续执行。先启动评审代理验证当前状态,再与执行代理迭代。 |
| | | 增量优化模式 - 检测git变更,仅重新实现受影响的步骤(从修改步骤开始)。 |
| | 无 | 在指定步骤后暂停以进行人工验证。若未指定步骤,则在每个步骤后暂停。 |
| | | 目标阈值(满分5.0)。单个值同时设置标准和关键阈值,两个逗号分隔值分别设置标准、关键阈值。 |
| | | 每个步骤最多的修复→验证循环次数。默认3次迭代,设置为 |
| | | 跳过所有评审验证检查 - 步骤无需质量门即可推进。 |
Configuration Resolution
配置解析
Parse and resolve configuration as follows:
$ARGUMENTSundefined解析并按如下规则解析配置:
$ARGUMENTSundefinedExtract task file (first positional argument, optional - auto-detect if not provided)
提取任务文件(第一个位置参数,可选 - 未提供则自动检测)
TASK_FILE = first argument that is a file path or filename
TASK_FILE = 第一个为文件路径或文件名的参数
Parse --target-quality (supports single value or two comma-separated values)
解析--target-quality(支持单个值或两个逗号分隔值)
if --target-quality has single value X.X:
THRESHOLD_FOR_STANDARD_COMPONENTS = X.X
THRESHOLD_FOR_CRITICAL_COMPONENTS = X.X
elif --target-quality has two values X.X,Y.Y:
THRESHOLD_FOR_STANDARD_COMPONENTS = X.X
THRESHOLD_FOR_CRITICAL_COMPONENTS = Y.Y
else:
THRESHOLD_FOR_STANDARD_COMPONENTS = 4.0 # default
THRESHOLD_FOR_CRITICAL_COMPONENTS = 4.5 # default
if --target-quality 为单个值 X.X:
THRESHOLD_FOR_STANDARD_COMPONENTS = X.X
THRESHOLD_FOR_CRITICAL_COMPONENTS = X.X
elif --target-quality 为两个值 X.X,Y.Y:
THRESHOLD_FOR_STANDARD_COMPONENTS = X.X
THRESHOLD_FOR_CRITICAL_COMPONENTS = Y.Y
else:
THRESHOLD_FOR_STANDARD_COMPONENTS = 4.0 # 默认值
THRESHOLD_FOR_CRITICAL_COMPONENTS = 4.5 # 默认值
Initialize other defaults
初始化其他默认值
MAX_ITERATIONS = --max-iterations || 3 # default is 3 iterations
HUMAN_IN_THE_LOOP_STEPS = --human-in-the-loop || [] (empty = none, "*" = all)
SKIP_JUDGES = --skip-judges || false
REFINE_MODE = --refine || false
CONTINUE_MODE = --continue || false
MAX_ITERATIONS = --max-iterations || 3 # 默认3次迭代
HUMAN_IN_THE_LOOP_STEPS = --human-in-the-loop || [](空数组表示无,"*"表示所有步骤)
SKIP_JUDGES = --skip-judges || false
REFINE_MODE = --refine || false
CONTINUE_MODE = --continue || false
Special handling for --human-in-the-loop without step list
--human-in-the-loop未指定步骤列表的特殊处理
if --human-in-the-loop present without step numbers:
HUMAN_IN_THE_LOOP_STEPS = "*" (all steps)
undefinedif --human-in-the-loop存在但未指定步骤编号:
HUMAN_IN_THE_LOOP_STEPS = "*"(所有步骤)
undefinedContext Resolution for --continue
--continue--continue
模式的上下文解析
--continueWhen is used:
--continue-
Step Resolution:
- Parse the task file for markers on step titles
[DONE] - Identify the last incompleted step
- Launch judge to verify the last INCOMPLETE step's artifacts
- If judge PASS: Mark step as done and resume from the next step
- If judge FAIL: Re-implement the step and iterate until PASS
- Parse the task file for
-
State Recovery:
- Check task file location (,
in-progress/,todo/)done/ - If in , move to
todo/before continuingin-progress/ - Pre-populate captured values from existing artifacts
- Check task file location (
当使用时:
--continue-
步骤解析:
- 解析任务文件中步骤标题的标记
[DONE] - 识别最后一个未完成的步骤
- 启动评审代理验证最后一个未完成步骤的工件
- 若评审通过:标记步骤为已完成,从下一个步骤继续
- 若评审失败:重新实现该步骤并迭代直至通过
- 解析任务文件中步骤标题的
-
状态恢复:
- 检查任务文件位置(、
in-progress/、todo/)done/ - 若在目录,先移至
todo/再继续in-progress/ - 从现有工件中预填充已捕获的值
- 检查任务文件位置(
Refine Mode Behavior (--refine
)
--refine优化模式行为(--refine
)
--refineWhen is used, it detects changes to project files (not the task file) and maps them to implementation steps to determine what needs re-verification.
--refine-
Detect Changed Project Files:First, determine what to compare against based on git state:bash
# Check for staged changes STAGED=$(git diff --cached --name-only) # Check for unstaged changes UNSTAGED=$(git diff --name-only)Comparison logic:Staged Unstaged Compare Against Command Yes Yes Staged (unstaged only) git diff --name-onlyYes No Last commit git diff HEAD --name-onlyNo Yes Last commit git diff HEAD --name-onlyNo No No changes Exit with message - If both staged AND unstaged: Compare working directory vs staging area (unstaged changes only)
- If only staged OR only unstaged: Compare against last commit
- This ensures refine operates on the most recent work in progress
-
Map Changes to Implementation Steps:
- Read the task file to get the list of implementation steps
- For each changed file, determine which step created/modified it:
- Check step's "Expected Output" section for file paths
- Check step's subtasks for file references
- Check step's artifacts in section
#### Verification
- Build a mapping:
{changed_file → step_number}
-
Determine Affected Steps:
- Find all steps that have associated changed files
- The earliest affected step is the starting point
- All steps from that point onwards need re-verification
- Earlier steps (unaffected) are preserved as-is
-
Refine Execution:
- For each affected step (in order):
- Launch judge agent to verify the step's artifacts (including user's changes)
- If judge PASS: Mark step done, proceed to next
- If judge FAIL: Launch implementation agent with user's changes as context, then re-verify
- User's manual fixes are preserved - implementation agent should build upon them, not overwrite
- For each affected step (in order):
-
Example:bash
# User manually fixed src/validation/validation.service.ts # (This file was created in Step 2) /implement my-task.feature.md --refine # Detects: src/validation/validation.service.ts modified # Maps to: Step 2 (Create ValidationService) # Action: Launch judge for Step 2 # - If PASS: User's fix is good, proceed to Step 3 # - If FAIL: Implementation agent align rest of the code with user changes, without overwriting user's changes # Continues: Step 3, Step 4... (re-verify all subsequent steps) -
Multiple Files Changed:bash
# User edited files from Step 2 AND Step 4 /implement my-task.feature.md --refine # Detects: Files from Step 2 and Step 4 modified # Earliest affected: Step 2 # Re-verifies: Step 2, Step 3, Step 4, Step 5... # (Step 3 re-verified even though no direct changes, because it depends on Step 2) -
Staged vs Unstaged Changes:bash
# Scenario: User staged some changes, then made more edits # Staged: src/validation/validation.service.ts (git add done) # Unstaged: src/validation/validators/email.validator.ts (still editing) /implement my-task.feature.md --refine # Detects: Both staged AND unstaged changes exist # Mode: Compares unstaged only (working dir vs staging) # Only email.validator.ts is considered for refine # Staged changes are preserved, not re-verified # -- # Scenario: User only has staged changes (ready to commit) # Staged: src/validation/validation.service.ts # Unstaged: none /implement my-task.feature.md --refine # Detects: Only staged changes # Mode: Compares against last commit # validation.service.ts changes are verified
当使用时,会检测项目文件(非任务文件)的变更,并将其映射到执行步骤,以确定需要重新验证的内容。
--refine-
检测变更的项目文件:首先,基于git状态确定比较基准:bash
# 检查暂存的变更 STAGED=$(git diff --cached --name-only) # 检查未暂存的变更 UNSTAGED=$(git diff --name-only)比较逻辑:暂存 未暂存 比较基准 命令 是 是 暂存区(仅未暂存变更) git diff --name-only是 否 最后一次提交 git diff HEAD --name-only否 是 最后一次提交 git diff HEAD --name-only否 否 无变更 退出并提示信息 - 若同时存在暂存和未暂存变更:比较工作目录与暂存区(仅未暂存变更)
- 若仅存在暂存或仅存在未暂存变更:与最后一次提交比较
- 确保优化操作基于最新的工作进展
-
将变更映射到执行步骤:
- 读取任务文件获取执行步骤列表
- 对每个变更文件,确定由哪个步骤创建/修改:
- 检查步骤的"预期输出"部分中的文件路径
- 检查步骤子任务中的文件引用
- 检查步骤部分中的工件
#### Verification
- 构建映射:
{changed_file → step_number}
-
确定受影响的步骤:
- 找出所有关联变更文件的步骤
- 最早受影响的步骤作为起始点
- 从该步骤开始的所有后续步骤都需要重新验证
- 更早的未受影响步骤保持不变
-
优化执行:
- 按顺序处理每个受影响的步骤:
- 启动评审代理验证步骤的工件(包括用户的变更)
- 若评审通过:标记步骤为已完成,进入下一步
- 若评审失败:以用户变更为上下文启动执行代理,然后重新验证
- 保留用户的手动修复 - 执行代理应基于这些修复构建,而非覆盖
- 按顺序处理每个受影响的步骤:
-
示例:bash
# 用户手动修复了src/validation/validation.service.ts # (该文件在步骤2中创建) /implement my-task.feature.md --refine # 检测到:src/validation/validation.service.ts已修改 # 映射到:步骤2(创建ValidationService) # 操作:启动步骤2的评审代理 # - 若通过:用户的修复有效,进入步骤3 # - 若失败:执行代理调整其余代码以匹配用户变更,且不覆盖用户的修改 # 继续:步骤3、步骤4...(重新验证所有后续步骤) -
多文件变更:bash
# 用户编辑了步骤2和步骤4中的文件 /implement my-task.feature.md --refine # 检测到:步骤2和步骤4中的文件已修改 # 最早受影响的步骤:步骤2 # 重新验证:步骤2、步骤3、步骤4、步骤5... # (步骤3虽无直接变更,但因依赖步骤2也需重新验证) -
暂存与未暂存变更:bash
# 场景:用户暂存了一些变更,之后又进行了更多编辑 # 暂存:src/validation/validation.service.ts(已执行git add) # 未暂存:src/validation/validators/email.validator.ts(仍在编辑) /implement my-task.feature.md --refine # 检测到:同时存在暂存和未暂存变更 # 模式:仅比较未暂存变更(工作目录 vs 暂存区) # 仅email.validator.ts会被纳入优化范围 # 暂存的变更将被保留,不进行重新验证 # -- # 场景:用户仅存在暂存变更(准备提交) # 暂存:src/validation/validation.service.ts # 未暂存:无 /implement my-task.feature.md --refine # 检测到:仅存在暂存变更 # 模式:与最后一次提交比较 # 验证validation.service.ts的变更
Human-in-the-Loop Behavior
人工介入模式行为
Human verification checkpoints occur:
-
Trigger Conditions:
- After implementation + judge verification PASS for a step in
HUMAN_IN_THE_LOOP_STEPS - After implementation + judge + implementation retry (before the next judge retry)
- If is
HUMAN_IN_THE_LOOP_STEPS, triggers after every step"*"
- After implementation + judge verification PASS for a step in
-
At Checkpoint:
- Display current step results summary
- Display generated artifacts with paths
- Display judge score and feedback
- Ask user: "Review step output. Continue? [Y/n/feedback]"
- If user provides feedback, incorporate into next iteration or step
- If user says "n", pause workflow
-
Checkpoint Message Format:markdown
--- ## 🔍 Human Review Checkpoint - Step X **Step:** {step title} **Step Type:** {standard/critical} **Judge Score:** {score}/{threshold for step type} threshold **Status:** ✅ PASS / 🔄 ITERATING (attempt {n}) **Artifacts Created/Modified:** - {artifact_path_1} - {artifact_path_2} **Judge Feedback:** {feedback summary} **Action Required:** Review the above artifacts and provide feedback or continue. > Continue? [Y/n/feedback]: ---
人工验证检查点触发时机:
-
触发条件:
- 在中的步骤完成执行+评审验证通过后
HUMAN_IN_THE_LOOP_STEPS - 在执行+评审+执行重试后(下一次评审重试前)
- 若为
HUMAN_IN_THE_LOOP_STEPS,则在每个步骤后触发"*"
- 在
-
检查点操作:
- 显示当前步骤结果摘要
- 显示生成的工件及其路径
- 显示评审分数和反馈
- 询问用户:"Review step output. Continue? [Y/n/feedback]"
- 若用户提供反馈,将其纳入下一次迭代或步骤
- 若用户回复"n",暂停工作流
-
检查点消息格式:markdown
--- ## 🔍 人工评审检查点 - 步骤X **步骤:** {step title} **步骤类型:** {standard/critical} **评审分数:** {score}/{threshold for step type} threshold **状态:** ✅ 通过 / 🔄 迭代中(第{n}次尝试) **创建/修改的工件:** - {artifact_path_1} - {artifact_path_2} **评审反馈:** {feedback summary} **操作要求:** 评审上述工件并提供反馈或继续执行。 > 是否继续?[Y/n/反馈]: ---
Task Selection and Status Management
任务选择与状态管理
Task Status Folders
任务状态目录
Task status is managed by folder location:
- - Tasks waiting to be implemented
.specs/tasks/todo/ - - Tasks currently being worked on
.specs/tasks/in-progress/ - - Completed tasks
.specs/tasks/done/
任务状态通过目录位置管理:
- - 待实现的任务
.specs/tasks/todo/ - - 正在处理的任务
.specs/tasks/in-progress/ - - 已完成的任务
.specs/tasks/done/
Status Transitions
状态转换
| When | Action |
|---|---|
| Start implementation | Move task from |
| Final verification PASS | Move task from |
| Implementation failure (user aborts) | Keep in |
| 触发时机 | 操作 |
|---|---|
| 开始执行 | 将任务从 |
| 最终验证通过 | 将任务从 |
| 执行失败(用户中止) | 保留在 |
CRITICAL: You Are an ORCHESTRATOR ONLY
重要提示:你仅作为编排器
Your role is DISPATCH and AGGREGATE. You do NOT do the work.
Properly build context of sub agents!
CRITICAL: For each sub-agent (implementation and evaluation), you need to provide:
- Task file path
- Step number
- Item number (if applicable)
- Artifact path (if applicable)
- Value of so agents can resolve paths like
${CLAUDE_PLUGIN_ROOT}@${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.sh
你的角色是调度与聚合。不直接执行任务。
正确构建子代理的上下文!
重要提示:对于每个子代理(执行和评审),你需要提供:
- 任务文件路径
- 步骤编号
- 项编号(如有)
- 工件路径(如有)
- 的值,以便代理解析类似
${CLAUDE_PLUGIN_ROOT}的路径@${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.sh
What You DO
你需要做的
- Read the task file ONCE (Phase 1 only)
- Launch sub-agents via Task tool
- Receive reports from sub-agents
- Mark stages complete after judge confirmation
- Aggregate results and report to user
- 读取任务文件一次(仅阶段1)
- 通过任务工具启动子代理
- 接收子代理的报告
- 经评审确认后标记阶段完成
- 聚合结果并向用户报告
What You NEVER Do
你绝对不能做的
| Prohibited Action | Why | What To Do Instead |
|---|---|---|
| Read implementation outputs | Context bloat → command loss | Sub-agent reports what it created |
| Read reference files | Sub-agent's job to understand patterns | Include path in sub-agent prompt |
| Read artifacts to "check" them | Context bloat → forget verifications | Launch judge agent |
| Evaluate code quality yourself | Not your job, causes forgetting | Launch judge agent |
| Skip verification "because simple" | ALL verifications are mandatory | Launch judge agent anyway |
| 禁止操作 | 原因 | 替代方案 |
|---|---|---|
| 读取执行输出 | 上下文过载→丢失指令 | 子代理会报告创建的内容 |
| 读取参考文件 | 这是子代理的工作 | 在子代理提示中包含文件路径 |
| 读取工件以"检查" | 上下文过载→忘记验证 | 启动评审代理 |
| 自行评估代码质量 | 这不是你的职责,会导致遗漏 | 启动评审代理 |
| 因"简单"而跳过验证 | 所有验证均为强制性要求 | 无论如何都要启动评审代理 |
Anti-Rationalization Rules
反合理化规则
If you think: "I should read this file to understand what was created"
→ STOP. The sub-agent's report tells you what was created. Use that information.
If you think: "I'll quickly verify this looks correct"
→ STOP. Launch a judge agent. That's not your job.
If you think: "This is too simple to need verification"
→ STOP. If the task specifies verification, launch the judge. No exceptions.
If you think: "I need to read the reference file to write a good prompt"
→ STOP. Put the reference file PATH in the sub-agent prompt. Sub-agent reads it.
如果你认为: "我应该读取这个文件来了解创建了什么"
→ 停止。 子代理的报告会告诉你创建了什么。使用该信息即可。
如果你认为: "我快速验证一下看起来是否正确"
→ 停止。 启动评审代理。这不是你的工作。
如果你认为: "这太简单了,不需要验证"
→ 停止。 如果任务指定了验证,就启动评审代理。无例外。
如果你认为: "我需要读取参考文件来编写好的提示"
→ 停止。 将参考文件路径放入子代理的提示中。让子代理去读取。
Why This Matters
为什么这很重要
Orchestrators who read files themselves = context overflow = command loss = forgotten steps. Every time.
Orchestrators who "quickly verify" = skip judge agents = quality collapse = failed artifacts.
Your context window is precious. Protect it. Delegate everything.
自行读取文件的编排器会导致上下文溢出→丢失指令→遗漏步骤。每次都会发生。
"快速验证"的编排器会跳过评审代理→质量崩溃→工件失败。
你的上下文窗口非常宝贵。保护它。将所有工作委托出去。
CRITICAL
重要规则
Configuration Rules
配置规则
- Use (default 4.0) for standard steps!
THRESHOLD_FOR_STANDARD_COMPONENTS - Use (default 4.5) for steps marked as critical in task file!
THRESHOLD_FOR_CRITICAL_COMPONENTS - Default is 3 iterations - stop after 3 fix→verify cycles and proceed to next step (with warning)!
- If is set to
MAX_ITERATIONS: Iterate until quality threshold is met (no limit)unlimited - Trigger human-in-the-loop checkpoints ONLY after steps in (or all steps if
HUMAN_IN_THE_LOOP_STEPS)!"*" - If is true: Skip ALL judge validation - proceed directly to next step after each implementation completes!
SKIP_JUDGES - If is true: Skip to
CONTINUE_MODE- do not re-implement already completed steps!RESUME_FROM_STEP - If is true: Detect changed project files, map to steps, re-verify from
REFINE_MODE- preserve user's fixes!REFINE_FROM_STEP
- 标准步骤使用(默认4.0)!
THRESHOLD_FOR_STANDARD_COMPONENTS - 任务文件中标记为关键的步骤使用(默认4.5)!
THRESHOLD_FOR_CRITICAL_COMPONENTS - 默认3次迭代 - 3次修复→验证循环后停止,继续下一步(并发出警告)!
- 若设置为
MAX_ITERATIONS:迭代直至达到质量阈值(无限制)unlimited - 仅对中的步骤(或
HUMAN_IN_THE_LOOP_STEPS时所有步骤)触发人工介入检查点!"*" - 若为true:跳过所有评审验证 - 每次执行完成后直接进入下一步!
SKIP_JUDGES - 若为true:跳至
CONTINUE_MODE- 不重新实现已完成的步骤!RESUME_FROM_STEP - 若为true:检测变更的项目文件,映射到步骤,从
REFINE_MODE开始重新验证 - 保留用户的修复!REFINE_FROM_STEP
Execution & Evaluation Rules
执行与评审规则
- Use foreground agents only: Do not use background agents. Launch parallel agents when possible. Background agents constantly run in permissions issues and other errors.
Relaunch judge till you get valid results, of following happens:
- Reject Long Reports: If an agent returns a very long report instead of using the scratchpad as requested, reject the result. This indicates the agent failed to follow the "use scratchpad" instruction.
- Judge Score 5.0 is a Hallucination: If a judge returns a score of 5.0/5.0, treat it as a hallucination or lazy evaluation. Reject it and re-run the judge. Perfect scores are practically impossible in this rigorous framework.
- Reject Missing Scores: If a judge report is missing the numerical score, reject it. This indicates the judge failed to read or follow the rubric instructions.
- 仅使用前台代理:不要使用后台代理。尽可能启动并行代理。后台代理经常会出现权限问题和其他错误。
若发生以下情况,重新启动评审代理直至获得有效结果:
- 拒绝过长报告:若代理返回过长报告而非按要求使用暂存区,拒绝该结果。这表明代理未遵循"使用暂存区"的指令。
- 评审分数5.0为幻觉:若评审返回5.0/5.0的分数,将其视为幻觉或懒惰评估。拒绝该结果并重新运行评审。在这个严格的框架中,完美分数实际上是不可能的。
- 拒绝缺失分数:若评审报告缺少数值分数,拒绝该结果。这表明评审未读取或遵循评分规则。
Overview
概述
This command orchestrates multi-step task implementation with:
- Sequential execution respecting step dependencies
- Parallel execution where dependencies allow
- Automated verification using judge agents for critical steps
- Panel of LLMs (PoLL) for high-stakes artifacts
- Aggregated voting with position bias mitigation
- Stage tracking with confirmation after each judge passes
该命令编排多步骤任务执行,具备以下特性:
- 顺序执行,尊重步骤依赖关系
- 并行执行,在依赖允许的情况下
- 自动化验证,对关键步骤使用评审代理
- LLM评审小组(PoLL),用于高风险工件
- 聚合投票,缓解位置偏差
- 阶段跟踪,每次评审通过后确认
Complete Workflow Overview
完整工作流概述
Phase 0: Select Task & Move to In-Progress
│
├─── Use provided task file name or auto-select from todo/ (if only 1 task)
├─── Move task: todo/ → in-progress/
│
▼
Phase 1: Load Task
│
▼
Phase 2: Execute Steps
│
├─── For each step in dependency order:
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Launch sdd:developer agent │
│ │ (implementation) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Launch judge agent(s) │
│ │ (verification per #### Verification section) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Judge PASS? → Mark step complete in task file │
│ │ Judge FAIL? → Fix and re-verify (max 2 retries) │
│ └─────────────────────────────────────────────────┘
│
▼
Phase 3: Final Verification
│
├─── Verify all Definition of Done items
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ Launch judge agent │
│ │ (verify all DoD items) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ All PASS? → Proceed to Phase 4 │
│ │ Any FAIL? → Fix and re-verify (iterate) │
│ └─────────────────────────────────────────────────┘
│
▼
Phase 4: Move Task to Done
│
├─── Move task: in-progress/ → done/
│
▼
Phase 5: Final Report阶段0:选择任务并移至进行中目录
│
├─── 使用提供的任务文件名,或从todo/自动选择(若仅1个任务)
├─── 移动任务:todo/ → in-progress/
│
▼
阶段1:加载任务
│
▼
阶段2:执行步骤
│
├─── 按依赖顺序处理每个步骤:
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ 启动sdd:developer代理 │
│ │(执行任务) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ 启动评审代理 │
│ │(按#### Verification部分要求验证) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ 评审通过?→ 在任务文件中标记步骤完成 │
│ │ 评审失败?→ 修复并重新验证(最多2次重试) │
│ └─────────────────────────────────────────────────┘
│
▼
阶段3:最终验证
│
├─── 验证所有完成定义项
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ 启动评审代理 │
│ │(验证所有DoD项) │
│ └─────────────────┬───────────────────────────────┘
│ │
│ ▼
│ ┌─────────────────────────────────────────────────┐
│ │ 全部通过?→ 进入阶段4 │
│ │ 任何失败?→ 修复并重新验证(迭代) │
│ └─────────────────────────────────────────────────┘
│
▼
阶段4:将任务移至已完成目录
│
├─── 移动任务:in-progress/ → done/
│
▼
阶段5:最终报告Phase 0: Parse User Input and Select Task
阶段0:解析用户输入并选择任务
Parse user input to get the task file path and arguments.
解析用户输入以获取任务文件路径和参数。
Step 0.1: Resolve Task File
步骤0.1:解析任务文件
If is empty or only contains flags:
$ARGUMENTS-
Check in-progress folder first:bash
ls .specs/tasks/in-progress/*.md 2>/dev/null- If exactly 1 file → Set to that file,
$TASK_FILEto$TASK_FOLDERin-progress - If multiple files → List them and ask user: "Multiple tasks in progress. Which one to continue?"
- If no files → Continue to step 2
- If exactly 1 file → Set
-
Check todo folder:bash
ls .specs/tasks/todo/*.md 2>/dev/null- If exactly 1 file → Set to that file,
$TASK_FILEto$TASK_FOLDERtodo - If multiple files → List them and ask user: "Multiple tasks in todo. Which one to implement?"
- If no files → Report "No tasks available. Create one with /add-task first." and STOP
- If exactly 1 file → Set
If contains a task file name:
$ARGUMENTS- Search for the file in order: →
in-progress/→todo/done/ - Set and
$TASK_FILEaccordingly$TASK_FOLDER - If not found, report error and STOP
若为空或仅包含标志:
$ARGUMENTS-
首先检查进行中目录:bash
ls .specs/tasks/in-progress/*.md 2>/dev/null- 若恰好1个文件 → 将设为该文件,
$TASK_FILE设为$TASK_FOLDERin-progress - 若多个文件 → 列出所有文件并询问用户:"Multiple tasks in progress. Which one to continue?"
- 若无文件 → 继续步骤2
- 若恰好1个文件 → 将
-
检查待办目录:bash
ls .specs/tasks/todo/*.md 2>/dev/null- 若恰好1个文件 → 将设为该文件,
$TASK_FILE设为$TASK_FOLDERtodo - 若多个文件 → 列出所有文件并询问用户:"Multiple tasks in todo. Which one to implement?"
- 若无文件 → 报告"No tasks available. Create one with /add-task first."并停止
- 若恰好1个文件 → 将
若包含任务文件名:
$ARGUMENTS- 按顺序搜索文件:→
in-progress/→todo/done/ - 相应设置和
$TASK_FILE$TASK_FOLDER - 若未找到,报告错误并停止
Step 0.2: Move to In-Progress (if needed)
步骤0.2:移至进行中目录(如有需要)
If task is in folder:
todo/bash
git mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/若任务在目录:
todo/bash
git mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/Fallback if git not available: mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/
若git不可用,备用方案:mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/
Update `$TASK_PATH` to `.specs/tasks/in-progress/$TASK_FILE`
**If task is already in `in-progress/`:**
Set `$TASK_PATH` to `.specs/tasks/in-progress/$TASK_FILE`
更新`$TASK_PATH`为`.specs/tasks/in-progress/$TASK_FILE`
**若任务已在`in-progress/`目录:**
设置`$TASK_PATH`为`.specs/tasks/in-progress/$TASK_FILE`Step 0.3: Parse Flags and Initialize Configuration
步骤0.3:解析标志并初始化配置
Parse all flags from and initialize configuration.
Display resolved configuration:
$ARGUMENTSmarkdown
undefined从解析所有标志并初始化配置。显示解析后的配置:
$ARGUMENTSmarkdown
undefinedConfiguration
配置
| Setting | Value |
|---|---|
| Task File | {TASK_PATH} |
| Standard Components Threshold | {THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0 |
| Critical Components Threshold | {THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0 |
| Max Iterations | {MAX_ITERATIONS or "3"} |
| Human Checkpoints | {HUMAN_IN_THE_LOOP_STEPS as comma-separated or "All steps" or "None"} |
| Skip Judges | {SKIP_JUDGES} |
| Continue Mode | {CONTINUE_MODE} |
| Refine Mode | {REFINE_MODE} |
undefined| 设置 | 值 |
|---|---|
| 任务文件 | {TASK_PATH} |
| 标准组件阈值 | {THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0 |
| 关键组件阈值 | {THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0 |
| 最大迭代次数 | {MAX_ITERATIONS or "3"} |
| 人工检查点 | {HUMAN_IN_THE_LOOP_STEPS as comma-separated or "All steps" or "None"} |
| 跳过评审 | {SKIP_JUDGES} |
| 继续模式 | {CONTINUE_MODE} |
| 优化模式 | {REFINE_MODE} |
undefinedStep 0.4: Handle Continue Mode
步骤0.4:处理继续模式
If is true:
CONTINUE_MODE-
Identify Last Completed Step:
- Parse task file for markers on step titles
[DONE] - Find the highest step number marked
[DONE] - Set to that number (or 0 if none)
LAST_COMPLETED_STEP
- Parse task file for
-
Verify Last Completed Step (if any):
- If :
LAST_COMPLETED_STEP > 0- Launch judge agent to verify the artifacts from that step
- If judge PASS: Set
RESUME_FROM_STEP = LAST_COMPLETED_STEP + 1 - If judge FAIL: Set (re-implement)
RESUME_FROM_STEP = LAST_COMPLETED_STEP
- If
-
Skip to Resume Point:
- In Phase 2, skip all steps before
RESUME_FROM_STEP - Continue execution from
RESUME_FROM_STEP
- In Phase 2, skip all steps before
若为true:
CONTINUE_MODE-
识别最后一个已完成步骤:
- 解析任务文件中步骤标题的标记
[DONE] - 找到编号最高的标记为的步骤
[DONE] - 将设为该编号(若无则为0)
LAST_COMPLETED_STEP
- 解析任务文件中步骤标题的
-
验证最后一个已完成步骤(如有):
- 若:
LAST_COMPLETED_STEP > 0- 启动评审代理验证该步骤的工件
- 若评审通过:设置
RESUME_FROM_STEP = LAST_COMPLETED_STEP + 1 - 若评审失败:设置(重新实现)
RESUME_FROM_STEP = LAST_COMPLETED_STEP
- 若
-
跳至恢复点:
- 在阶段2中,跳过之前的所有步骤
RESUME_FROM_STEP - 从开始继续执行
RESUME_FROM_STEP
- 在阶段2中,跳过
Step 0.5: Handle Refine Mode
步骤0.5:处理优化模式
If is true:
REFINE_MODE-
Detect Changed Project Files:bash
# Check for staged and unstaged changes STAGED=$(git diff --cached --name-only) UNSTAGED=$(git diff --name-only)Determine comparison mode:if STAGED is not empty AND UNSTAGED is not empty: # Both staged and unstaged - use unstaged only CHANGED_FILES = git diff --name-only # working dir vs staging COMPARISON_MODE = "unstaged_only" elif STAGED is not empty OR UNSTAGED is not empty: # Only one type - compare against last commit CHANGED_FILES = git diff HEAD --name-only COMPARISON_MODE = "vs_last_commit" else: # No changes Report: "No project changes detected. Make edits first, then run --refine." Exit -
Load Task File and Extract Step→File Mapping:
- Read the task file to get implementation steps
- For each step, extract the files it creates/modifies from:
- "Expected Output" sections
- Subtask descriptions mentioning file paths
- artifact paths
#### Verification
- Build mapping:
STEP_FILE_MAP = {step_number → [file_paths]}
-
Map Changed Files to Steps:
AFFECTED_STEPS = [] for each changed_file: for step_number, file_list in STEP_FILE_MAP: if changed_file matches any path in file_list: AFFECTED_STEPS.append(step_number)- If no steps matched: "Changed files don't map to any implementation step. Verify manually."
-
Determine Refine Scope:
- = min(AFFECTED_STEPS) # earliest affected step
REFINE_FROM_STEP - All steps from onwards need re-verification
REFINE_FROM_STEP - Steps before are preserved as-is
REFINE_FROM_STEP
-
Store Changed Files Context:
- = list of changed file paths
CHANGED_FILES - = git diff output for affected files
USER_CHANGES_CONTEXT - Pass this context to judge and implementation agents
- Agents should build upon user's fixes, not overwrite them
若为true:
REFINE_MODE-
检测变更的项目文件:bash
# 检查暂存和未暂存的变更 STAGED=$(git diff --cached --name-only) UNSTAGED=$(git diff --name-only)确定比较模式:if STAGED非空且UNSTAGED非空: # 同时存在暂存和未暂存变更 - 仅使用未暂存变更 CHANGED_FILES = git diff --name-only # 工作目录 vs 暂存区 COMPARISON_MODE = "unstaged_only" elif STAGED非空或UNSTAGED非空: # 仅存在一种类型的变更 - 与最后一次提交比较 CHANGED_FILES = git diff HEAD --name-only COMPARISON_MODE = "vs_last_commit" else: # 无变更 报告:"No project changes detected. Make edits first, then run --refine." 退出 -
加载任务文件并提取步骤→文件映射:
- 读取任务文件获取执行步骤
- 对每个步骤,从以下位置提取其创建/修改的文件:
- "预期输出"部分
- 提及文件路径的子任务描述
- 部分的工件路径
#### Verification
- 构建映射:
STEP_FILE_MAP = {step_number → [file_paths]}
-
将变更文件映射到步骤:
AFFECTED_STEPS = [] for each changed_file: for step_number, file_list in STEP_FILE_MAP: if changed_file匹配file_list中的任何路径: AFFECTED_STEPS.append(step_number)- 若无匹配步骤:"Changed files don't map to any implementation step. Verify manually."
-
确定优化范围:
- = min(AFFECTED_STEPS) # 最早受影响的步骤
REFINE_FROM_STEP - 从开始的所有后续步骤都需要重新验证
REFINE_FROM_STEP - 之前的步骤保持不变
REFINE_FROM_STEP
-
存储变更文件上下文:
- = 变更文件路径列表
CHANGED_FILES - = 受影响文件的git diff输出
USER_CHANGES_CONTEXT - 将此上下文传递给评审和执行代理
- 代理应基于用户的修复构建,而非覆盖
Phase 1: Load and Analyze Task
阶段1:加载并分析任务
This is the ONLY phase where you read a file.
这是唯一允许读取文件的阶段。
Step 1.1: Load Task Details
步骤1.1:加载任务详情
Read the task file ONCE:
bash
Read $TASK_PATHAfter this read, you MUST NOT read any other files for the rest of execution.
读取任务文件一次:
bash
Read $TASK_PATH读取完成后,执行过程中绝对不能再读取任何其他文件。
Step 1.2: Identify Implementation Steps
步骤1.2:识别执行步骤
Parse the section:
## Implementation Process- List all steps with dependencies
- Identify which steps have annotations
Parallel with: - Classify each step's verification needs from sections:
#### Verification
| Verification Level | When to Use | Judge Configuration |
|---|---|---|
| None | Simple operations (mkdir, delete) | Skip verification |
| Single Judge | Non-critical artifacts | 1 judge, threshold 4.0/5.0 |
| Panel of 2 Judges | Critical artifacts | 2 judges, median voting, threshold 4.5/5.0 |
| Per-Item Judges | Multiple similar items | 1 judge per item, parallel |
解析部分:
## Implementation Process- 列出所有带依赖关系的步骤
- 识别带有注释的步骤
Parallel with: - 根据部分对每个步骤的验证需求进行分类:
#### Verification
| 验证级别 | 使用场景 | 评审配置 |
|---|---|---|
| 无 | 简单操作(mkdir、delete) | 跳过验证 |
| 单个评审 | 非关键工件 | 1个评审,阈值4.0/5.0 |
| 2个评审小组 | 关键工件 | 2个评审,中位数投票,阈值4.5/5.0 |
| 逐项评审 | 多个相似项 | 每个项1个评审,并行执行 |
Step 1.3: Create Todo List
步骤1.3:创建待办列表
Create TodoWrite with all implementation steps, marking verification requirements:
json
{
"todos": [
{"content": "Step 1: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 1"},
{"content": "Step 2: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 2"}
]
}创建包含所有执行步骤的TodoWrite,标记验证要求:
json
{
"todos": [
{"content": "Step 1: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 1"},
{"content": "Step 2: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 2"}
]
}Phase 2: Execute Implementation Steps
阶段2:执行步骤
For each step in dependency order:
按依赖顺序处理每个步骤:
Pattern A: Simple Step (No Verification)
模式A:简单步骤(无验证)
1. Launch Developer Agent:
Use Task tool with:
- Agent Type:
sdd:developer - Model: As specified in step or by default
opus - Description: "Implement Step [N]: [Title]"
- Prompt:
Implement Step [N]: [Step Title]
Task File: $TASK_PATH
Step Number: [N]
Your task:
- Execute ONLY Step [N]: [Step Title]
- Do NOT execute any other steps
- Follow the Expected Output and Success Criteria exactly
When complete, report:
1. What files were created/modified (paths)
2. Confirmation that success criteria are met
3. Any issues encountered2. Use Agent's Report (No Verification)
- Agent reports what was created → Use this information
- DO NOT read the created files yourself
- This pattern has NO verification (simple operations)
3. Mark Step Complete
- Update task file:
- Mark step title with (e.g.,
[DONE])### Step 1: Setup [DONE] - Mark step's subtasks as complete
[X]
- Mark step title with
- Update todo to
completed
1. 启动开发代理:
使用任务工具,参数如下:
- 代理类型:
sdd:developer - 模型: 步骤指定的模型,默认
opus - 描述: "Implement Step [N]: [Title]"
- 提示:
Implement Step [N]: [Step Title]
Task File: $TASK_PATH
Step Number: [N]
Your task:
- 仅执行步骤[N]: [Step Title]
- 不要执行任何其他步骤
- 严格遵循预期输出和成功标准
完成后报告:
1. 创建/修改了哪些文件(路径)
2. 确认成功标准已满足
3. 遇到的任何问题2. 使用代理报告(无验证)
- 代理会报告创建的内容 → 使用该信息
- 绝对不要自行读取创建的文件
- 此模式无验证(简单操作)
3. 标记步骤完成
- 更新任务文件:
- 在步骤标题后标记(例如:
[DONE])### Step 1: Setup [DONE] - 将步骤的子任务标记为已完成
[X]
- 在步骤标题后标记
- 更新待办列表为
completed
Pattern B: Critical Step (Panel of 2 Evaluations)
模式B:关键步骤(2个评审小组)
1. Launch Developer Agent:
Use Task tool with:
- Agent Type:
sdd:developer - Model: As specified in step or by default
opus - Description: "Implement Step [N]: [Title]"
- Prompt:
Implement Step [N]: [Step Title]
Task File: $TASK_PATH
Step Number: [N]
Your task:
- Execute ONLY Step [N]: [Step Title]
- Do NOT execute any other steps
- Follow the Expected Output and Success Criteria exactly
When complete, report:
1. What files were created/modified (paths)
2. Confirmation of completion
3. Self-critique summary2. Wait for Completion
- Receive the agent's report
- Note the artifact path(s) from the report
- DO NOT read the artifact yourself
3. Launch 2 Evaluation Agents in Parallel (MANDATORY):
⚠️ MANDATORY: This pattern requires launching evaluation agents. You MUST launch these evaluations. Do NOT skip. Do NOT verify yourself.
Use agent type for evaluations
sdd:developerEvaluation 1 & 2 (launch both in parallel with same prompt structure):
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology.
Evaluate artifact at: [artifact_path from implementation agent report]
**Chain-of-Thought Requirement:** Justification MUST be provided BEFORE score for each criterion.
Rubric:
[paste rubric table from #### Verification section]
Context:
- Read $TASK_PATH
- Verify Step [N] ONLY: [Step Title]
- Threshold: [from #### Verification section]
- Reference pattern: [if specified in #### Verification section]
You can verify the artifact works - run tests, check imports, validate syntax.
Return: scores per criterion with evidence, overall weighted score, PASS/FAIL, improvements if FAIL.4. Aggregate Results:
- Calculate median score per criterion
- Flag high-variance criteria (std > 1.0)
- Pass if median overall ≥ threshold
5. Determine Threshold:
- Check if step is marked as critical in task file (in section or step metadata)
#### Verification - If critical: use
THRESHOLD_FOR_CRITICAL_COMPONENTS - If standard: use
THRESHOLD_FOR_STANDARD_COMPONENTS
6. On FAIL: Iterate Until PASS (max 3 iterations by default)
- Present issues to implementation agent with judge feedback
- Re-implement with judge feedback incorporated (align code with requirements, preserve user's changes if in refine mode)
- Re-verify with judge
- Iterate until PASS - continue fix → verify cycle until quality threshold is met or max iterations reached
- If reached (default 3):
MAX_ITERATIONS- Log warning: "Step [N] did not pass after {MAX_ITERATIONS} iterations"
- Proceed to next step (do not block indefinitely)
7. On PASS: Mark Step Complete
- Update task file:
- Mark step title with (e.g.,
[DONE])### Step 2: Create Service [DONE] - Mark step's subtasks as complete
[X]
- Mark step title with
- Update todo to
completed - Record judge scores in tracking
8. Human-in-the-Loop Checkpoint (if applicable):
Only after step PASSES, if step number is in (or ):
HUMAN_IN_THE_LOOP_STEPSHUMAN_IN_THE_LOOP_STEPS == "*"markdown
---1. 启动开发代理:
使用任务工具,参数如下:
- 代理类型:
sdd:developer - 模型: 步骤指定的模型,默认
opus - 描述: "Implement Step [N]: [Title]"
- 提示:
Implement Step [N]: [Step Title]
Task File: $TASK_PATH
Step Number: [N]
Your task:
- 仅执行步骤[N]: [Step Title]
- 不要执行任何其他步骤
- 严格遵循预期输出和成功标准
完成后报告:
1. 创建/修改了哪些文件(路径)
2. 完成确认
3. 自我评审摘要2. 等待完成
- 接收代理报告
- 从报告中记录工件路径
- 绝对不要自行读取工件
3. 并行启动2个评审代理(强制性):
⚠️ 强制性:此模式要求启动评审代理。必须启动这些评审,不得跳过,不得自行验证。
使用代理类型进行评审
sdd:developer评审1和评审2(并行启动,提示结构相同):
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md了解评审方法。
评估工件路径:[执行代理报告中的artifact_path]
**链式思考要求:** 每个标准的理由必须在分数之前提供。
评分规则:
[粘贴#### Verification部分的评分规则表格]
上下文:
- 读取$TASK_PATH
- 仅验证步骤[N]: [Step Title]
- 阈值:[来自#### Verification部分]
- 参考模式:[若#### Verification部分指定]
你可以验证工件是否正常工作 - 运行测试、检查导入、验证语法。
返回:每个标准的分数及证据、总体加权分数、通过/失败、失败时的改进建议。4. 聚合结果:
- 计算每个标准的中位数分数
- 标记高方差标准(标准差>1.0)
- 若总体中位数≥阈值则通过
5. 确定阈值:
- 检查任务文件中步骤是否标记为关键(在部分或步骤元数据中)
#### Verification - 若为关键步骤:使用
THRESHOLD_FOR_CRITICAL_COMPONENTS - 若为标准步骤:使用
THRESHOLD_FOR_STANDARD_COMPONENTS
6. 失败时:迭代直至通过(默认最多3次迭代)
- 将问题和评审反馈提交给执行代理
- 结合评审反馈重新实现(调整代码以符合要求,优化模式下保留用户变更)
- 重新启动评审验证
- 迭代直至通过 - 继续修复→验证循环,直至达到质量阈值或达到最大迭代次数
- 若达到(默认3次):
MAX_ITERATIONS- 记录警告:"Step [N] did not pass after {MAX_ITERATIONS} iterations"
- 继续下一步(不无限阻塞)
7. 通过时:标记步骤完成
- 更新任务文件:
- 在步骤标题后标记(例如:
[DONE])### Step 2: Create Service [DONE] - 将步骤的子任务标记为已完成
[X]
- 在步骤标题后标记
- 更新待办列表为
completed - 在跟踪中记录评审分数
8. 人工介入检查点(如适用):
仅在步骤通过后,若步骤编号在中(或):
HUMAN_IN_THE_LOOP_STEPSHUMAN_IN_THE_LOOP_STEPS == "*"markdown
---🔍 Human Review Checkpoint - Step [N]
🔍 人工评审检查点 - 步骤[N]
Step: [Step Title]
Judge Score: [score]/[threshold for step type] threshold
Status: ✅ PASS
Artifacts Created/Modified:
- [artifact_path_1]
- [artifact_path_2]
Judge Feedback:
[feedback summary from judges]
Action Required: Review the above artifacts and provide feedback or continue.
Continue? [Y/n/feedback]:
- If user provides feedback: Store for next step or re-implement current step with feedback
- If user says "n": Pause workflow, report current progress
- If user says "Y" or continues: Proceed to next step
---步骤: [Step Title]
评审分数: [score]/[threshold for step type] threshold
状态: ✅ 通过
创建/修改的工件:
- [artifact_path_1]
- [artifact_path_2]
评审反馈:
[来自评审的反馈摘要]
操作要求: 评审上述工件并提供反馈或继续执行。
是否继续?[Y/n/反馈]:
- 若用户提供反馈:存储反馈用于下一步或结合反馈重新实现当前步骤
- 若用户回复"n":暂停工作流,报告当前进度
- 若用户回复"Y"或继续:进入下一步
---Pattern C: Multi-Item Step (Per-Item Evaluations)
模式C:多步骤项(逐项评审)
For steps that create multiple similar items:
1. Launch Developer Agents in Parallel (one per item):
Use Task tool for EACH item (launch all in parallel):
- Agent Type:
sdd:developer - Model: As specified or by default
opus - Description: "Implement Step [N], Item: [Name]"
- Prompt:
Implement Step [N], Item: [Item Name]
Task File: $TASK_PATH
Step Number: [N]
Item: [Item Name]
Your task:
- Create ONLY [item_name] from Step [N]
- Do NOT create other items or steps
- Follow the Expected Output and Success Criteria exactly
When complete, report:
1. File path created
2. Confirmation of completion
3. Self-critique summary2. Wait for All Completions
- Collect all agent reports
- Note all artifact paths
- DO NOT read any of the created files yourself
3. Launch Evaluation Agents in Parallel (one per item)
⚠️ MANDATORY: Launch evaluation agents. Do NOT skip. Do NOT verify yourself.
Use agent type for evaluations
sdd:developerFor each item:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology.
Evaluate artifact at: [item_path from implementation agent report]
**Chain-of-Thought Requirement:** Justification MUST be provided BEFORE score for each criterion.
Rubric:
[paste rubric from #### Verification section]
Context:
- Read $TASK_PATH
- Verify Step [N]: [Step Title]
- Verify ONLY this Item: [Item Name]
- Threshold: [from #### Verification section]
You can verify the artifact works - run tests, check syntax, confirm dependencies.
Return: scores with evidence, overall score, PASS/FAIL, improvements if FAIL.4. Collect All Results
5. Report Aggregate:
- Items passed: X/Y
- Items needing revision: [list with specific issues]
6. Determine Threshold:
- Check if step is marked as critical in task file (in section or step metadata)
#### Verification - If critical: use
THRESHOLD_FOR_CRITICAL_COMPONENTS - If standard: use
THRESHOLD_FOR_STANDARD_COMPONENTS
7. If Any FAIL: Iterate Until ALL PASS
- Present failing items with judge feedback to implementation agent
- Re-implement only failing items with feedback incorporated (preserve user's changes if in refine mode)
- Re-verify failing items with judge
- Iterate until ALL PASS - continue fix → verify cycle until all items meet quality threshold or max iterations reached
- If reached (default 3):
MAX_ITERATIONS- Log warning: "Step [N] has {X} items that did not pass after {MAX_ITERATIONS} iterations"
- Proceed to next step (do not block indefinitely)
8. On ALL PASS: Mark Step Complete
- Update task file:
- Mark step title with (e.g.,
[DONE])### Step 3: Create Items [DONE] - Mark step's subtasks as complete
[X]
- Mark step title with
- Update todo to
completed - Record pass rate in tracking
9. Human-in-the-Loop Checkpoint (if applicable):
Only after ALL items PASS, if step number is in (or ):
HUMAN_IN_THE_LOOP_STEPSHUMAN_IN_THE_LOOP_STEPS == "*"markdown
---对于创建多个相似项的步骤:
1. 并行启动开发代理(每个项一个):
对每个项使用任务工具(并行启动所有代理):
- 代理类型:
sdd:developer - 模型: 指定的模型,默认
opus - 描述: "Implement Step [N], Item: [Name]"
- 提示:
Implement Step [N], Item: [Item Name]
Task File: $TASK_PATH
Step Number: [N]
Item: [Item Name]
Your task:
- 仅创建步骤[N]中的[item_name]
- 不要创建其他项或步骤
- 严格遵循预期输出和成功标准
完成后报告:
1. 创建的文件路径
2. 完成确认
3. 自我评审摘要2. 等待所有完成
- 收集所有代理报告
- 记录所有工件路径
- 绝对不要自行读取任何创建的文件
3. 并行启动评审代理(每个项一个)
⚠️ 强制性:启动评审代理。不得跳过,不得自行验证。
使用代理类型进行评审
sdd:developer对每个项:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md了解评审方法。
评估工件路径:[执行代理报告中的item_path]
**链式思考要求:** 每个标准的理由必须在分数之前提供。
评分规则:
[粘贴#### Verification部分的评分规则]
上下文:
- 读取$TASK_PATH
- 验证步骤[N]: [Step Title]
- 仅验证此项:[Item Name]
- 阈值:[来自#### Verification部分]
你可以验证工件是否正常工作 - 运行测试、检查语法、确认依赖关系。
返回:带证据的分数、总体分数、通过/失败、失败时的改进建议。4. 收集所有结果
5. 聚合报告:
- 通过的项:X/Y
- 需要修订的项:[列出具体问题]
6. 确定阈值:
- 检查任务文件中步骤是否标记为关键(在部分或步骤元数据中)
#### Verification - 若为关键步骤:使用
THRESHOLD_FOR_CRITICAL_COMPONENTS - 若为标准步骤:使用
THRESHOLD_FOR_STANDARD_COMPONENTS
7. 若有失败项:迭代直至全部通过
- 将失败项及评审反馈提交给执行代理
- 结合反馈重新实现失败项(优化模式下保留用户变更)
- 重新验证失败项
- 迭代直至全部通过 - 继续修复→验证循环,直至所有项达到质量阈值或达到最大迭代次数
- 若达到(默认3次):
MAX_ITERATIONS- 记录警告:"Step [N] has {X} items that did not pass after {MAX_ITERATIONS} iterations"
- 继续下一步(不无限阻塞)
8. 全部通过时:标记步骤完成
- 更新任务文件:
- 在步骤标题后标记(例如:
[DONE])### Step 3: Create Items [DONE] - 将步骤的子任务标记为已完成
[X]
- 在步骤标题后标记
- 更新待办列表为
completed - 在跟踪中记录通过率
9. 人工介入检查点(如适用):
仅在所有项通过后,若步骤编号在中(或):
HUMAN_IN_THE_LOOP_STEPSHUMAN_IN_THE_LOOP_STEPS == "*"markdown
---🔍 Human Review Checkpoint - Step [N]
🔍 人工评审检查点 - 步骤[N]
Step: [Step Title]
Items Passed: X/Y
Status: ✅ ALL PASS
Artifacts Created:
- [item_1_path]
- [item_2_path]
- ...
Action Required: Review the above artifacts and provide feedback or continue.
Continue? [Y/n/feedback]:
- If user provides feedback: Store for next step or re-implement items with feedback
- If user says "n": Pause workflow, report current progress
- If user says "Y" or continues: Proceed to next step
---步骤: [Step Title]
通过的项: X/Y
状态: ✅ 全部通过
创建的工件:
- [item_1_path]
- [item_2_path]
- ...
操作要求: 评审上述工件并提供反馈或继续执行。
是否继续?[Y/n/反馈]:
- 若用户提供反馈:存储反馈用于下一步或结合反馈重新实现项
- 若用户回复"n":暂停工作流,报告当前进度
- 若用户回复"Y"或继续:进入下一步
---⚠️ CHECKPOINT: Before Proceeding to Final Verification
⚠️ 检查点:进入最终验证前
Before moving to final verification, verify you followed the rules:
- Did you launch sdd:developer agents for ALL implementations?
- Did you launch evaluation agents for ALL verifications?
- Did you mark steps complete ONLY after judge PASS?
- Did you avoid reading ANY artifact files yourself?
If you read files other than the task file, you are doing it wrong. STOP and restart.
进入最终验证前,确认你遵循了以下规则:
- 是否为所有执行任务启动了sdd:developer代理?
- 是否为所有验证启动了评审代理?
- 是否仅在评审通过后标记步骤完成?
- 是否绝对没有读取任何工件文件?
若你读取了任务文件以外的任何文件,操作错误。请停止并重新开始。
Phase 3: Final Verification
阶段3:最终验证
After all implementation steps are complete, verify the task meets all Definition of Done criteria.
所有执行步骤完成后,验证任务是否满足所有完成定义(DoD)标准。
Step 3.1: Launch Definition of Done Verification
步骤3.1:启动完成定义验证
Use Task tool with:
- Agent Type:
sdd:developer - Model:
opus - Description: "Verify Definition of Done"
- Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
Verify all Definition of Done items in the task file.
Task File: $TASK_PATH
Your task:
1. Read the task file and locate the "## Definition of Done (Task Level)" section
2. Go through each checkbox item one by one
3. For each item, verify if it passes by:
- Running appropriate tests (unit tests, E2E tests)
- Checking build/compilation status
- Verifying file existence and correctness
- Checking code patterns and linting
4. You MUST mark each item in task file that passed verification with `[X]`
5. Return a structured report:
- List ALL Definition of Done items
- Status for each:
- ✅ PASS - if the item is complete and verified
- ❌ FAIL - if the item fails verification, with specific reason why
- ⚠️ BLOCKED - if the item cannot be verified due to a blocker
- Evidence for each status
- Specific issues for any failures
- Overall pass rate
Be thorough - check everything the task requires.使用任务工具,参数如下:
- 代理类型:
sdd:developer - 模型:
opus - 描述: "Verify Definition of Done"
- 提示:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
验证任务文件中的所有完成定义项。
Task File: $TASK_PATH
Your task:
1. 读取任务文件并定位"## Definition of Done (Task Level)"部分
2. 逐一检查每个复选框项
3. 对每个项,通过以下方式验证是否通过:
- 运行适当的测试(单元测试、端到端测试)
- 检查构建/编译状态
- 验证文件存在性和正确性
- 检查代码模式和代码规范
4. 必须在任务文件中标记所有通过验证的项为`[X]`
5. 返回结构化报告:
- 列出所有完成定义项
- 每个项的状态:
- ✅ 通过 - 项已完成并验证
- ❌ 失败 - 项未通过验证,附具体原因
- ⚠️ 阻塞 - 因阻塞无法验证项
- 每个状态的证据
- 任何失败的具体问题
- 总体通过率
请彻底检查 - 验证任务要求的所有内容。Step 3.2: Review Verification Results
步骤3.2:评审验证结果
- Receive the verification report
- Note which items PASS and which FAIL
- if judge report that all items PASS, you MUST read end of task file to verify that all DoD items are marked with
[X]
- 接收验证报告
- 记录哪些项通过,哪些失败
- 若评审报告所有项通过,必须读取任务文件末尾以确认所有DoD项都标记为
[X]
Step 3.3: Fix Failing Items (If Any)
步骤3.3:修复失败项(如有)
If any Definition of Done items FAIL:
1. Launch Developer Agent for Each Failing Item:
Fix Definition of Done item: [Item Description]
Task File: $TASK_PATH
Current Status:
[paste failure details from verification report]
Your task:
1. Fix the specific issue identified
2. Verify the fix resolves the problem
3. Ensure no regressions (all tests still pass)
Return:
- What was fixed
- Confirmation the item now passes
- Any related changes made2. Re-verify After Fixes:
Launch the verification agent again (Step 3.1) to confirm all items now PASS.
3. Iterate if Needed:
Repeat fix → verify cycle until all Definition of Done items PASS.
若任何完成定义项失败:
1. 为每个失败项启动开发代理:
Fix Definition of Done item: [Item Description]
Task File: $TASK_PATH
当前状态:
[粘贴验证报告中的失败详情]
Your task:
1. 修复识别出的具体问题
2. 验证修复解决了问题
3. 确保无回归(所有测试仍通过)
返回:
- 修复的内容
- 确认项现在已通过
- 进行的任何相关变更2. 修复后重新验证:
再次启动验证代理(步骤3.1)以确认所有项现在通过。
3. 必要时迭代:
重复修复→验证循环,直至所有完成定义项通过。
Phase 4: Move Task to Done
阶段4:将任务移至已完成目录
Once ALL Definition of Done items PASS, move the task to the done folder.
一旦所有完成定义项通过,将任务移至已完成目录。
Step 4.1: Verify Completion
步骤4.1:验证完成状态
Confirm all Definition of Done items are marked complete in the task file.
确认任务文件中所有完成定义项都标记为已完成。
Step 4.2: Move Task
步骤4.2:移动任务
bash
undefinedbash
undefinedExtract just the filename from $TASK_PATH
从$TASK_PATH中提取文件名
TASK_FILENAME=$(basename $TASK_PATH)
TASK_FILENAME=$(basename $TASK_PATH)
Move from in-progress to done
从进行中目录移至已完成目录
git mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/
git mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/
Fallback if git not available: mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/
若git不可用,备用方案:mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/
---
---Phase 5: Aggregation and Reporting
阶段5:聚合与报告
Panel Voting Algorithm
小组投票算法
When using 2+ evaluations, follow these manual computation steps:
- Think in steps, output each step result separately!
- Do not skip steps!
使用2个及以上评审时,遵循以下手动计算步骤:
- 分步思考,单独输出每个步骤的结果!
- 不要跳过步骤!
Step 1: Collect Scores per Criterion
步骤1:收集每个标准的分数
Create a table with each criterion and scores from all evaluations:
| Criterion | Eval 1 | Eval 2 | Median | Difference |
|---|---|---|---|---|
| [Name 1] | X.X | X.X | ? | ? |
| [Name 2] | X.X | X.X | ? | ? |
创建表格,包含每个标准和所有评审的分数:
| 标准 | 评审1 | 评审2 | 中位数 | 差值 |
|---|---|---|---|---|
| [名称1] | X.X | X.X | ? | ? |
| [名称2] | X.X | X.X | ? | ? |
Step 2: Calculate Median for Each Criterion
步骤2:计算每个标准的中位数
For 2 evaluations: Median = (Score1 + Score2) / 2
For 3+ evaluations: Sort scores, take middle value (or average of two middle values if even count)
2个评审时:中位数 = (分数1 + 分数2) / 2
3个及以上评审时:对分数排序,取中间值(偶数个时取两个中间值的平均值)
Step 3: Check for High Variance
步骤3:检查高方差
High variance = evaluators disagree significantly (difference > 2.0 points)
Formula: → Flag as high variance
|Eval1 - Eval2| > 2.0高方差 = 评审意见显著分歧(差值>2.0分)
公式: → 标记为高方差
|评审1 - 评审2| > 2.0Step 4: Calculate Weighted Overall Score
步骤4:计算加权总体分数
Multiply each criterion's median by its weight and sum:
Overall = (Criterion1_Median × Weight1) + (Criterion2_Median × Weight2) + ...将每个标准的中位数乘以其权重并求和:
总体分数 = (标准1中位数 × 权重1) + (标准2中位数 × 权重2) + ...Step 5: Determine Pass/Fail
步骤5:确定通过/失败
Compare overall score to threshold:
- → PASS ✅
Overall ≥ Threshold - → FAIL ❌
Overall < Threshold
将总体分数与阈值比较:
- → 通过 ✅
总体分数 ≥ 阈值 - → 失败 ❌
总体分数 < 阈值
Handling Disagreement
处理分歧
If evaluations significantly disagree (difference > 2.0 on any criterion):
- Flag the criterion
- Present both evaluators' reasoning
- Ask user: "Evaluators disagree on [criterion]. Review manually?"
- If yes: present evidence, get user decision
- If no: use median (conservative approach)
若评审意见显著分歧(任何标准差值>2.0):
- 标记该标准
- 呈现两位评审的理由
- 询问用户:"Evaluators disagree on [criterion]. Review manually?"
- 若用户同意:呈现证据,获取用户决策
- 若用户不同意:使用中位数(保守方法)
Final Report
最终报告
After all steps complete and DoD verification passes:
markdown
undefined所有步骤完成且DoD验证通过后:
markdown
undefinedImplementation Summary
执行摘要
Task Status
任务状态
- Task Status: ✅
done - All Definition of Done items: X/X PASS (100%)
- 任务状态: ✅
done - 所有完成定义项: X/X 通过(100%)
Configuration Used
使用的配置
| Setting | Value |
|---|---|
| Standard Components Threshold | {THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0 |
| Critical Components Threshold | {THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0 |
| Max Iterations | {MAX_ITERATIONS or "3"} |
| Human Checkpoints | {HUMAN_IN_THE_LOOP_STEPS or "None"} |
| Skip Judges | {SKIP_JUDGES} |
| Continue Mode | {CONTINUE_MODE} |
| Refine Mode | {REFINE_MODE} |
| 设置 | 值 |
|---|---|
| 标准组件阈值 | {THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0 |
| 关键组件阈值 | {THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0 |
| 最大迭代次数 | {MAX_ITERATIONS or "3"} |
| 人工检查点 | {HUMAN_IN_THE_LOOP_STEPS or "None"} |
| 跳过评审 | {SKIP_JUDGES} |
| 继续模式 | {CONTINUE_MODE} |
| 优化模式 | {REFINE_MODE} |
Steps Completed
已完成步骤
| Step | Title | Status | Verification | Score | Iterations | Judge Confirmed |
|---|---|---|---|---|---|---|
| 1 | [Title] | ✅ | Skipped | N/A | 1 | - |
| 2 | [Title] | ✅ | Panel (2) | 4.5/5 | 1 | ✅ |
| 3 | [Title] | ✅ | Per-Item | 5/5 passed | 2 | ✅ |
| 4 | [Title] | ✅ | Single | 4.2/5 | 3 | ✅ |
Legend:
- ✅ PASS - Score >= threshold for step type
- ⚠️ MAX_ITER - Did not pass but MAX_ITERATIONS reached, proceeded anyway
- ⏭️ SKIPPED - Step skipped (continue/refine mode)
| 步骤 | 标题 | 状态 | 验证方式 | 分数 | 迭代次数 | 评审确认 |
|---|---|---|---|---|---|---|
| 1 | [Title] | ✅ | 跳过 | N/A | 1 | - |
| 2 | [Title] | ✅ | 小组(2个) | 4.5/5 | 1 | ✅ |
| 3 | [Title] | ✅ | 逐项 | 5/5 通过 | 2 | ✅ |
| 4 | [Title] | ✅ | 单个 | 4.2/5 | 3 | ✅ |
图例:
- ✅ 通过 - 分数≥步骤类型的阈值
- ⚠️ 达到最大迭代次数 - 未通过但已达到最大迭代次数,继续执行
- ⏭️ 跳过 - 步骤已跳过(继续/优化模式)
Verification Summary
验证摘要
- Total steps: X
- Steps with verification: Y
- Passed on first try: Z
- Required iteration: W
- Total iterations across all steps: V
- Final pass rate: 100%
- 总步骤数: X
- 带验证的步骤数: Y
- 首次通过的步骤数: Z
- 需要迭代的步骤数: W
- 所有步骤的总迭代次数: V
- 最终通过率: 100%
Definition of Done Verification
完成定义验证
| Item | Status | Evidence |
|---|---|---|
| [DoD Item 1] | ✅ PASS | [Brief evidence] |
| [DoD Item 2] | ✅ PASS | [Brief evidence] |
| ... | ... | ... |
Issues Fixed During Verification:
- [Issue]: [How it was fixed]
- [Issue]: [How it was fixed]
| 项 | 状态 | 证据 |
|---|---|---|
| [DoD项1] | ✅ 通过 | [简要证据] |
| [DoD项2] | ✅ 通过 | [简要证据] |
| ... | ... | ... |
验证期间修复的问题:
High-Variance Criteria (Evaluators Disagreed)
高方差标准(评审意见分歧)
- [Criterion] in [Step]: Eval 1 scored X, Eval 2 scored Y
- [标准] 在[步骤]: 评审1评X分,评审2评Y分
Human Review Summary (if --human-in-the-loop used)
人工评审摘要(若使用--human-in-the-loop)
| Step | Checkpoint | User Action | Feedback Incorporated |
|---|---|---|---|
| 2 | After PASS | Continued | - |
| 4 | After iteration 2 | Feedback | "Improve error messages" |
| 6 | After PASS | Continued | - |
| 步骤 | 检查点 | 用户操作 | 已纳入的反馈 |
|---|---|---|---|
| 2 | 通过后 | 继续 | - |
| 4 | 第2次迭代后 | 反馈 | "Improve error messages" |
| 6 | 通过后 | 继续 | - |
Task File Updated
任务文件已更新
- Task moved from to
in-progress/folderdone/ - All step titles marked
[DONE] - All step subtasks marked
[X] - All Definition of Done items marked
[X]
- 任务已从移至
in-progress/目录done/ - 所有步骤标题标记为
[DONE] - 所有步骤子任务标记为
[X] - 所有完成定义项标记为
[X]
Recommendations
建议
- [Any follow-up actions]
- [Suggested improvements]
---- [任何后续操作]
- [建议改进点]
---Execution Flow Diagram
执行流程图
┌──────────────────────────────────────────────────────────────┐
│ IMPLEMENT TASK WITH VERIFICATION │
├──────────────────────────────────────────────────────────────┤
│ │
│ Phase 0: Select Task │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Use provided name or auto-select from todo/ (if 1 task) │ │
│ │ → Move task from todo/ to in-progress/ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 1: Load Task │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Read $TASK_PATH → Parse steps │ │
│ │ → Extract #### Verification specs → Create TodoWrite │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 2: Execute Steps (Respecting Dependencies) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ For each step: │ │
│ │ │ │
│ │ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ │ │
│ │ │ developer │───▶│ Judge Agent │───▶│ PASS? │ │ │
│ │ │ Agent │ │ (verify) │ │ │ │ │
│ │ └──────────────┘ └───────────────┘ └───────────┘ │ │
│ │ │ │ │ │
│ │ Yes No │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌────────┐ Fix & │ │ │
│ │ │ Mark │ Retry │ │ │
│ │ │Complete│ ↺ │ │ │
│ │ └────────┘ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 3: Final Verification │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ │ │
│ │ │ Judge Agent │───▶│ All DoD │───▶│ All PASS? │ │ │
│ │ │ (verify DoD) │ │ items checked │ │ │ │ │
│ │ └──────────────┘ └───────────────┘ └───────────┘ │ │
│ │ │ │ │ │
│ │ Yes No │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ Fix & │ │
│ │ Retry │ │
│ │ ↺ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 4: Move Task to Done │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ mv in-progress/$TASK → done/$TASK │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Phase 5: Aggregate & Report │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Collect all verification results │ │
│ │ → Calculate aggregate metrics │ │
│ │ → Generate final report │ │
│ │ → Present to user │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘┌──────────────────────────────────────────────────────────────┐
│ 带验证的任务执行流程 │
├──────────────────────────────────────────────────────────────┤
│ │
│ 阶段0:选择任务 │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 使用提供的名称,或从todo/自动选择(若仅1个任务) │ │
│ │ → 将任务从todo/移至in-progress/ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 阶段1:加载任务 │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 读取$TASK_PATH → 解析步骤 │ │
│ │ → 提取#### Verification规范 → 创建TodoWrite │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 阶段2:执行步骤(遵循依赖关系) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ 对每个步骤: │ │
│ │ │ │
│ │ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ │ │
│ │ │ 开发代理 │───▶│ 评审代理 │───▶│ 通过? │ │ │
│ │ │ Agent │ │ (验证) │ │ │ │ │
│ │ └──────────────┘ └───────────────┘ └───────────┘ │ │
│ │ │ │ │ │
│ │ 是 否 │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌────────┐ 修复 & │ │ │
│ │ │ 标记 │ 重试 │ │ │
│ │ │完成│ ↺ │ │ │
│ │ └────────┘ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 阶段3:最终验证 │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌──────────────┐ ┌───────────────┐ ┌───────────┐ │ │
│ │ │ 评审代理 │───▶│ 所有DoD │───▶│ 全部通过? │ │ │
│ │ │ (验证DoD) │ │ 项已检查 │ │ │ │ │
│ │ └──────────────┘ └───────────────┘ └───────────┘ │ │
│ │ │ │ │ │
│ │ 是 否 │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ 修复 & │ │
│ │ 重试 │ │
│ │ ↺ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 阶段4:将任务移至已完成目录 │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ mv in-progress/$TASK → done/$TASK │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ 阶段5:聚合与报告 │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ 收集所有验证结果 │ │
│ │ → 计算聚合指标 │ │
│ │ → 生成最终报告 │ │
│ │ → 呈现给用户 │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────┘Usage Examples
使用示例
Basic Usage
基础用法
bash
undefinedbash
undefinedImplement a specific task
实现特定任务
/implement add-validation.feature.md
/implement add-validation.feature.md
Auto-select task from todo/ or in-progress/ (if only 1 task)
从todo/或in-progress/自动选择任务(若仅1个任务)
/implement
/implement
Continue from last completed step
从上一个已完成步骤继续
/implement add-validation.feature.md --continue
/implement add-validation.feature.md --continue
Refine after user fixes project files (detects changes, re-verifies affected steps)
用户修复项目文件后进行优化(检测变更,重新验证受影响的步骤)
/implement add-validation.feature.md --refine
/implement add-validation.feature.md --refine
Human review after every step
每个步骤后进行人工评审
/implement add-validation.feature.md --human-in-the-loop
/implement add-validation.feature.md --human-in-the-loop
Human review after specific steps only
仅在特定步骤后进行人工评审
/implement add-validation.feature.md --human-in-the-loop 2,4,6
/implement add-validation.feature.md --human-in-the-loop 2,4,6
Higher quality threshold (stricter) - sets both standard and critical to 4.5
更高质量阈值(更严格)- 将标准和关键阈值都设为4.5
/implement add-validation.feature.md --target-quality 4.5
/implement add-validation.feature.md --target-quality 4.5
Different thresholds for standard (3.5) and critical (4.5) components
标准组件(3.5)和关键组件(4.5)使用不同阈值
/implement add-validation.feature.md --target-quality 3.5,4.5
/implement add-validation.feature.md --target-quality 3.5,4.5
Lower quality threshold for both (faster convergence)
降低两者的质量阈值(更快收敛)
/implement add-validation.feature.md --target-quality 3.5
/implement add-validation.feature.md --target-quality 3.5
Unlimited iterations (default is 3)
无限迭代(默认3次)
/implement add-validation.feature.md --max-iterations unlimited
/implement add-validation.feature.md --max-iterations unlimited
Skip all judge verifications (fast but no quality gates)
跳过所有评审验证(快速但无质量门)
/implement add-validation.feature.md --skip-judges
/implement add-validation.feature.md --skip-judges
Combined: continue with human review
组合使用:继续执行并进行人工评审
/implement add-validation.feature.md --continue --human-in-the-loop
undefined/implement add-validation.feature.md --continue --human-in-the-loop
undefinedExample 1: Implementing a Feature
示例1:实现功能
User: /implement add-validation.feature.md
Phase 0: Task Selection...
Found task in: .specs/tasks/todo/add-validation.feature.md
Moving to in-progress: .specs/tasks/in-progress/add-validation.feature.md
Phase 1: Loading task...
Task: "Add form validation service"
Steps identified: 4 steps
Verification plan (from #### Verification sections):
- Step 1: No verification (directory creation)
- Step 2: Panel of 2 evaluations (ValidationService)
- Step 3: Per-item evaluations (3 validators)
- Step 4: Single evaluation (integration)
Phase 2: Executing...
Step 1: Launching sdd:developer agent...
Agent: "Implement Step 1: Create Directory Structure..."
Result: ✅ Directories created
Verification: Skipped (simple operation)
Status: ✅ COMPLETE
Step 2: Launching sdd:developer agent...
Agent: "Implement Step 2: Create ValidationService..."
Result: Files created, tests passing
Launching 2 judge agents in parallel...
Judge 1: 4.3/5.0 - PASS
Judge 2: 4.5/5.0 - PASS
Panel Result: 4.4/5.0 ✅
Status: ✅ COMPLETE (Judge Confirmed)
[Continue for all steps...]
Phase 3: Final Verification...
Launching DoD verification agent...
Agent: "Verify all Definition of Done items..."
Result: 4/4 items PASS ✅
Phase 4: Moving task to done...
mv .specs/tasks/in-progress/add-validation.feature.md .specs/tasks/done/
Phase 5: Final Report
Implementation complete.
- 4/4 steps completed
- 6 artifacts verified
- All passed first try
- Definition of Done: 4/4 PASS
- Task location: .specs/tasks/done/add-validation.feature.md ✅用户: /implement add-validation.feature.md
阶段0:任务选择...
找到任务:.specs/tasks/todo/add-validation.feature.md
移至进行中目录:.specs/tasks/in-progress/add-validation.feature.md
阶段1:加载任务...
任务:"添加表单验证服务"
识别到步骤:4个步骤
验证计划(来自#### Verification部分):
- 步骤1:无验证(目录创建)
- 步骤2:2个评审小组(ValidationService)
- 步骤3:逐项评审(3个验证器)
- 步骤4:单个评审(集成)
阶段2:执行...
步骤1:启动sdd:developer代理...
代理:"Implement Step 1: Create Directory Structure..."
结果:✅ 目录已创建
验证:跳过(简单操作)
状态:✅ 完成
步骤2:启动sdd:developer代理...
代理:"Implement Step 2: Create ValidationService..."
结果:文件已创建,测试通过
并行启动2个评审代理...
评审1:4.3/5.0 - 通过
评审2:4.5/5.0 - 通过
小组结果:4.4/5.0 ✅
状态:✅ 完成(评审确认)
[继续所有步骤...]
阶段3:最终验证...
启动DoD验证代理...
代理:"Verify all Definition of Done items..."
结果:4/4项通过 ✅
阶段4:将任务移至已完成目录...
mv .specs/tasks/in-progress/add-validation.feature.md .specs/tasks/done/
阶段5:最终报告
执行完成。
- 4/4步骤已完成
- 6个工件已验证
- 所有步骤首次通过
- 完成定义:4/4通过
- 任务位置:.specs/tasks/done/add-validation.feature.md ✅Example 2: Handling DoD Item Failure
示例2:处理DoD项失败
[All steps complete...]
Phase 3: Final Verification...
Launching DoD verification agent...
Agent: "Verify all Definition of Done items..."
Result: 3/4 items PASS, 1 FAIL ❌
Failing item:
- "Code follows ESLint rules": 356 errors found
Should I attempt to fix this issue? [Y/n]
User: Y
Launching sdd:developer agent...
Agent: "Fix ESLint errors..."
Result: Fixed 356 errors, 0 warnings ✅
Re-launching DoD verification agent...
Agent: "Re-verify all Definition of Done items..."
Result: 4/4 items PASS ✅
Phase 4: Moving task to done...
All DoD checkboxes marked complete ✅
Phase 5: Final Report
Task verification complete.
- All DoD items now PASS
- 1 issue fixed (ESLint errors)
- Task location: .specs/tasks/done/ ✅[所有步骤完成...]
阶段3:最终验证...
启动DoD验证代理...
代理:"Verify all Definition of Done items..."
结果:3/4项通过,1项失败 ❌
失败项:
- "代码遵循ESLint规则": 发现356个错误
是否尝试修复此问题?[Y/n]
用户: Y
启动sdd:developer代理...
代理:"Fix ESLint errors..."
结果:修复356个错误,0个警告 ✅
重新启动DoD验证代理...
代理:"Re-verify all Definition of Done items..."
结果:4/4项通过 ✅
阶段4:将任务移至已完成目录...
所有DoD复选框标记为已完成 ✅
阶段5:最终报告
任务验证完成。
- 所有DoD项现在已通过
- 修复1个问题(ESLint错误)
- 任务位置:.specs/tasks/done/ ✅Example 3: Handling Verification Failure
示例3:处理验证失败
Step 3 Implementation complete.
Launching judge agents...
Judge 1: 3.5/5.0 - FAIL (threshold 4.0)
Judge 2: 3.2/5.0 - FAIL
Issues found:
- Test Coverage: 2.5/5
Evidence: "Missing edge case tests for empty input"
Justification: "Success criteria requires edge case coverage"
- Pattern Adherence: 3.0/5
Evidence: "Uses custom Result type instead of project standard"
Justification: "Should use existing Result<T, E> from src/types"
Should I attempt to fix these issues? [Y/n]
User: Y
Launching sdd:developer agent with feedback...
Agent: "Fix Step 3: Address judge feedback..."
Result: Issues fixed, tests added
Re-launching judge agents...
Judge 1: 4.2/5.0 - PASS
Judge 2: 4.4/5.0 - PASS
Panel Result: 4.3/5.0 ✅
Status: ✅ COMPLETE (Judge Confirmed)步骤3执行完成。
启动评审代理...
评审1:3.5/5.0 - 失败(阈值4.0)
评审2:3.2/5.0 - 失败
发现的问题:
- 测试覆盖率:2.5/5
证据:"缺少空输入的边缘情况测试"
理由:"成功标准要求覆盖边缘情况"
- 模式遵循度:3.0/5
证据:"使用自定义Result类型而非项目标准类型"
理由:"应使用src/types中的现有Result<T, E>"
是否尝试修复这些问题?[Y/n]
用户: Y
结合反馈启动sdd:developer代理...
代理:"Fix Step 3: Address judge feedback..."
结果:问题已修复,测试已添加
重新启动评审代理...
评审1:4.2/5.0 - 通过
评审2:4.4/5.0 - 通过
小组结果:4.3/5.0 ✅
状态:✅ 完成(评审确认)Example 4: Continue from Interruption
示例4:从中断处继续
User: /implement add-validation.feature.md --continue
Phase 0: Parsing flags...
Configuration:
- Continue Mode: true
- Target Quality: 4.0/5.0 (default)
Scanning task file for completed steps...
Found: Step 1 [DONE], Step 2 [DONE]
Last completed: Step 2
Verifying Step 2 artifacts...
Launching judge agent for Step 3...
Judge: 4.3/5.0 - PASS ✅
Marking step as complete in task file...
Resuming from Step 4...
Step 3: Launching sdd:developer agent...
[continues normally from Step 4]用户: /implement add-validation.feature.md --continue
阶段0:解析标志...
配置:
- 继续模式: true
- 目标质量: 4.0/5.0(默认)
扫描任务文件查找已完成步骤...
发现:步骤1 [已完成],步骤2 [已完成]
最后完成的步骤:步骤2
验证步骤2的工件...
启动步骤3的评审代理...
评审:4.3/5.0 - 通过 ✅
在任务文件中标记步骤为已完成...
从步骤4继续...
步骤3:启动sdd:developer代理...
[正常从步骤4继续]Example 5: Refine After User Fixes
示例5:用户修复后优化
undefinedundefinedUser manually fixed src/validation/validation.service.ts
用户手动修复了src/validation/validation.service.ts
(This file was created in Step 2: Create ValidationService)
(该文件在步骤2:Create ValidationService中创建)
User: /implement add-validation.feature.md --refine
Phase 0: Parsing flags...
Configuration:
- Refine Mode: true
Detecting changed project files...
Changed files:
- src/validation/validation.service.ts (modified)
Mapping files to implementation steps...
- src/validation/validation.service.ts → Step 2 (Create ValidationService)
Earliest affected step: Step 2
Preserving: Step 1 (unchanged)
Re-verifying from: Step 2 onwards
Step 2: Launching judge to verify rest of logic with user's changes...
Judge: 4.3/5.0 - PASS ✅
Rest of logic is not affected, proceeding...
Step 3: Launching judge to verify...
Judge: typescript error detected in file
Launching imeplementation agent to fix the error, and align logic with user's changes...
Launching judge to verify fixed logic...
Judge: 4.5/5.0 - PASS ✅
[continues verifying remaining steps...]
All steps verified with user's changes incorporated ✅
undefined用户: /implement add-validation.feature.md --refine
阶段0:解析标志...
配置:
- 优化模式: true
检测变更的项目文件...
变更文件:
- src/validation/validation.service.ts(已修改)
将文件映射到执行步骤...
- src/validation/validation.service.ts → 步骤2(Create ValidationService)
最早受影响的步骤:步骤2
保留:步骤1(未变更)
从步骤2开始重新验证
步骤2:启动评审代理验证结合用户变更后的其余逻辑...
评审:4.3/5.0 - 通过 ✅
其余逻辑未受影响,继续执行...
步骤3:启动评审代理验证...
评审:检测到文件中的typescript错误
启动执行代理修复错误,并调整逻辑以匹配用户变更...
启动评审代理验证修复后的逻辑...
评审:4.5/5.0 - 通过 ✅
[继续验证剩余步骤...]
所有步骤已验证,用户变更已纳入 ✅
undefinedExample 6: Human-in-the-Loop Review
示例6:人工介入评审
User: /implement add-validation.feature.md --human-in-the-loop
Configuration:
- Human Checkpoints: All steps
Step 1: Launching sdd:developer agent...
Result: Directories created ✅
---用户: /implement add-validation.feature.md --human-in-the-loop
配置:
- 人工检查点: 所有步骤
步骤1:启动sdd:developer代理...
结果:目录已创建 ✅
---🔍 Human Review Checkpoint - Step 1
🔍 人工评审检查点 - 步骤1
Step: Create Directory Structure
Judge Score: N/A (no verification)
Status: ✅ COMPLETE
Artifacts Created:
- src/validation/
- src/validation/tests/
Action Required: Review the above artifacts and provide feedback or continue.
Continue? [Y/n/feedback]: Y
Step 2: Launching sdd:developer agent...
Result: ValidationService created ✅
Launching judge agents...
Judge 1: 4.5/5.0 - PASS
Judge 2: 4.3/5.0 - PASS
Panel Result: 4.4/5.0 ✅
步骤: 创建目录结构
评审分数: N/A(无验证)
状态: ✅ 完成
创建的工件:
- src/validation/
- src/validation/tests/
操作要求: 评审上述工件并提供反馈或继续执行。
是否继续?[Y/n/反馈]: Y
步骤2:启动sdd:developer代理...
结果:ValidationService已创建 ✅
启动评审代理...
评审1:4.5/5.0 - 通过
评审2:4.3/5.0 - 通过
小组结果:4.4/5.0 ✅
🔍 Human Review Checkpoint - Step 2
🔍 人工评审检查点 - 步骤2
Step: Create ValidationService
Judge Score: 4.4/5.0 (threshold: 4.0)
Status: ✅ PASS
Artifacts Created:
- src/validation/validation.service.ts
- src/validation/tests/validation.service.spec.ts
Judge Feedback:
- All criteria met
- Test coverage comprehensive
Action Required: Review the above artifacts and provide feedback or continue.
Continue? [Y/n/feedback]: The error messages could be more descriptive
Incorporating feedback: "error messages could be more descriptive"
Re-launching sdd:developer agent with feedback...
[iteration continues]
undefined步骤: 创建ValidationService
评审分数: 4.4/5.0(阈值: 4.0)
状态: ✅ 通过
创建的工件:
- src/validation/validation.service.ts
- src/validation/tests/validation.service.spec.ts
评审反馈:
- 所有标准已满足
- 测试覆盖率全面
操作要求: 评审上述工件并提供反馈或继续执行。
是否继续?[Y/n/反馈]: 错误消息可以更具描述性
纳入反馈:"错误消息可以更具描述性"
结合反馈重新启动sdd:developer代理...
[迭代继续]
undefinedExample 7: Strict Quality Threshold
示例7:严格质量阈值
User: /implement critical-api.feature.md --target-quality 4.5
Configuration:
- Target Quality: 4.5/5.0
Step 2: Implementing critical API endpoint...
Result: Endpoint created
Launching judge agents...
Judge 1: 4.2/5.0 - FAIL (threshold: 4.5)
Judge 2: 4.3/5.0 - FAIL
Iteration 1: Re-implementing with feedback...
[fixes applied]
Launching judge agents...
Judge 1: 4.4/5.0 - FAIL
Judge 2: 4.5/5.0 - PASS
Iteration 2: Re-implementing with feedback...
[more fixes applied]
Launching judge agents...
Judge 1: 4.6/5.0 - PASS
Judge 2: 4.5/5.0 - PASS
Panel Result: 4.55/5.0 ✅
Status: ✅ COMPLETE (passed on iteration 2)用户: /implement critical-api.feature.md --target-quality 4.5
配置:
- 目标质量: 4.5/5.0
步骤2:实现关键API端点...
结果:端点已创建
启动评审代理...
评审1:4.2/5.0 - 失败(阈值: 4.5)
评审2:4.3/5.0 - 失败
迭代1:结合反馈重新实现...
[已应用修复]
启动评审代理...
评审1:4.4/5.0 - 失败
评审2:4.5/5.0 - 通过
迭代2:结合反馈重新实现...
[已应用更多修复]
启动评审代理...
评审1:4.6/5.0 - 通过
评审2:4.5/5.0 - 通过
小组结果:4.55/5.0 ✅
状态:✅ 完成(第2次迭代通过)Error Handling
错误处理
Implementation Failure
执行失败
If sdd:developer agent reports failure:
- Present the failure details to user
- Ask clarification questions that could help resolve
- Launch sdd:developer agent again with clarifications
若sdd:developer代理报告失败:
- 向用户呈现失败详情
- 询问有助于解决问题的澄清问题
- 结合澄清信息重新启动sdd:developer代理
Judge Disagreement
评审意见分歧
If judges disagree significantly (difference > 2.0):
- Present both perspectives with evidence
- Ask user to resolve: "Judges disagree. Your decision?"
- Proceed based on user decision
若评审意见显著分歧(差值>2.0):
- 呈现双方观点及证据
- 请用户决策:"Judges disagree. Your decision?"
- 根据用户决策继续执行
Refine Mode: No Changes Detected
优化模式:未检测到变更
If mode finds no git changes in the project:
--refine- Report: "No project file changes detected since last commit."
- Suggest: "Make edits to project files first, then run --refine again."
- Alternatively: "Run without --refine to re-implement all steps."
若模式未发现项目中的git变更:
--refine- 报告:"No project file changes detected since last commit."
- 建议:"Make edits to project files first, then run --refine again."
- 备选方案:"Run without --refine to re-implement all steps."
Refine Mode: Changes Don't Map to Steps
优化模式:变更未映射到步骤
If mode finds changed files but none map to implementation steps:
--refine- Report: "Changed files don't match any implementation step's expected outputs."
- List the changed files detected
- Suggest: "Verify manually or run without --refine to re-verify all steps."
若模式发现变更文件但未映射到任何执行步骤:
--refine- 报告:"Changed files don't match any implementation step's expected outputs."
- 列出检测到的变更文件
- 建议:"Verify manually or run without --refine to re-verify all steps."
Checklist
检查清单
Before completing implementation:
完成执行前:
Configuration Handling
配置处理
- Parsed all flags from correctly
$ARGUMENTS - Used for standard steps
THRESHOLD_FOR_STANDARD_COMPONENTS - Used for critical steps
THRESHOLD_FOR_CRITICAL_COMPONENTS - Iterated until quality threshold met (or reached, default 3)
MAX_ITERATIONS - Triggered human-in-the-loop checkpoints ONLY for steps in
HUMAN_IN_THE_LOOP_STEPS - If is true: Skipped ALL judge validation
SKIP_JUDGES - If is true: Verified last step and resumed correctly
CONTINUE_MODE - If is true: Detected changed project files, mapped to steps, re-verified from earliest affected step
REFINE_MODE
- 正确解析中的所有标志
$ARGUMENTS - 标准步骤使用
THRESHOLD_FOR_STANDARD_COMPONENTS - 关键步骤使用
THRESHOLD_FOR_CRITICAL_COMPONENTS - 迭代直至达到质量阈值(或达到,默认3次)
MAX_ITERATIONS - 仅对中的步骤触发人工介入检查点
HUMAN_IN_THE_LOOP_STEPS - 若为true:跳过所有评审验证
SKIP_JUDGES - 若为true:正确验证最后一步并恢复执行
CONTINUE_MODE - 若为true:检测变更的项目文件,映射到步骤,从最早受影响的步骤开始重新验证
REFINE_MODE
Context Protection (CRITICAL)
上下文保护(关键)
- Read ONLY the task file (in
$TASK_PATH) - no other files.specs/tasks/in-progress/ - Did NOT read implementation outputs, reference files, or artifacts
- Used sub-agent reports for status - did NOT read files to "check"
- 仅读取任务文件(中的
.specs/tasks/in-progress/)- 不读取其他文件$TASK_PATH - 未读取执行输出、参考文件或工件
- 使用子代理报告获取状态 - 未读取文件进行"检查"
Delegation
委托
- ALL implementations done by agents via Task tool
sdd:developer - ALL evaluations done by agents via Task tool
sdd:developer - Did NOT perform any verification yourself
- Did NOT skip any verification steps (unless is true)
SKIP_JUDGES
- 所有执行任务由代理通过任务工具完成
sdd:developer - 所有评审由代理通过任务工具完成
sdd:developer - 未自行执行任何验证
- 未跳过任何验证步骤(除非为true)
SKIP_JUDGES
Stage Tracking
阶段跟踪
- Each step marked complete ONLY after judge PASS (or immediately if )
SKIP_JUDGES - Task file updated after each step completion:
- Step title marked with
[DONE] - Subtasks marked with
[X]
- Step title marked with
- Todo list updated after each step completion
- 仅在评审通过后标记步骤完成(或为true时立即标记)
SKIP_JUDGES - 每个步骤完成后更新任务文件:
- 步骤标题标记为
[DONE] - 子任务标记为
[X]
- 步骤标题标记为
- 每个步骤完成后更新待办列表
Execution Quality
执行质量
- All steps executed in dependency order
- Parallel steps launched simultaneously (not sequentially)
- Each sdd:developer agent received focused prompt with exact step
- All critical artifacts evaluated by judges (unless )
SKIP_JUDGES - Panel voting used for high-stakes artifacts
- Chain-of-thought requirement included in all evaluation prompts
- Failed evaluations iterated until quality threshold met
- Final report generated with judge confirmation status
- User informed of any evaluator disagreements
- 所有步骤按依赖顺序执行
- 并行步骤同时启动(非顺序)
- 每个sdd:developer代理收到聚焦于具体步骤的提示
- 所有关键工件均由评审评估(除非为true)
SKIP_JUDGES - 高风险工件使用小组投票
- 所有评审提示包含链式思考要求
- 失败的评审迭代直至达到质量阈值
- 生成带评审确认状态的最终报告
- 告知用户任何评审意见分歧
Human-in-the-Loop (if enabled)
人工介入(若启用)
- Displayed checkpoint after each step in
HUMAN_IN_THE_LOOP_STEPS - Incorporated user feedback into subsequent iterations/steps
- Paused workflow when user requested
- 在中的每个步骤后显示检查点
HUMAN_IN_THE_LOOP_STEPS - 将用户反馈纳入后续迭代/步骤
- 用户要求时暂停工作流
Final Verification and Completion
最终验证与完成
- Definition of Done verification agent launched
- All DoD items verified (PASS/FAIL/BLOCKED status)
- Failing DoD items fixed via sdd:developer agents
- Re-verification performed after fixes
- Task moved from to
in-progress/folderdone/ - All DoD checkboxes marked in task file
[X] - Final verification report presented to user
- 启动完成定义验证代理
- 所有DoD项已验证(通过/失败/阻塞状态)
- 通过sdd:developer代理修复失败的DoD项
- 修复后重新验证
- 将任务从移至
in-progress/目录done/ - 任务文件中所有DoD复选框标记为
[X] - 向用户呈现最终验证报告
Appendix A: Verification Specifications Reference
附录A:验证规范参考
This appendix documents how verification is specified in task files. During Phase 2 (Execute Steps), you will reference these specifications to understand how to verify each artifact.
本附录记录任务文件中如何指定验证。在阶段2(执行步骤)中,你将参考这些规范了解如何验证每个工件。
How Task Files Define Verification
任务文件如何定义验证
Task files define verification requirements in sections within each implementation step. These sections specify:
#### Verification任务文件在每个执行步骤的部分定义验证要求。这些部分指定:
#### VerificationRequired Elements
必填元素
-
Level: Verification complexity
- - Simple operations (mkdir, delete) - skip verification
None - - Non-critical artifacts - 1 judge, threshold 4.0/5.0
Single Judge - - Critical artifacts - 2 judges, median voting, threshold 4.0/5.0 or 4.5/5.0
Panel of 2 Judges - - Multiple similar items - 1 judge per item, parallel execution
Per-Item Judges
-
Artifact(s): Path(s) to file(s) being verified
- Example: ,
src/decision/decision.service.tssrc/decision/tests/decision.service.spec.ts
- Example:
-
Threshold: Minimum passing score
- Typically 4.0/5.0 for standard quality
- Sometimes 4.5/5.0 for critical components
-
Rubric: Weighted criteria table (see format below)
-
Reference Pattern (Optional): Path to example of good implementation
- Example: for NestJS service patterns
src/app.service.ts
- Example:
-
级别: 验证复杂度
- - 简单操作(mkdir、delete)- 跳过验证
None - - 非关键工件 - 1个评审,阈值4.0/5.0
Single Judge - - 关键工件 - 2个评审,中位数投票,阈值4.0/5.0或4.5/5.0
Panel of 2 Judges - - 多个相似项 - 每个项1个评审,并行执行
Per-Item Judges
-
工件: 待验证文件的路径
- 示例:、
src/decision/decision.service.tssrc/decision/tests/decision.service.spec.ts
- 示例:
-
阈值: 最低通过分数
- 标准质量通常为4.0/5.0
- 关键组件有时为4.5/5.0
-
评分规则: 加权标准表格(见以下格式)
-
参考模式(可选): 良好实现示例的路径
- 示例:(NestJS服务模式)
src/app.service.ts
- 示例:
Rubric Format
评分规则格式
Rubrics in task files use this markdown table format:
markdown
| Criterion | Weight | Description |
|-----------|--------|-------------|
| [Name 1] | 0.XX | [What to evaluate] |
| [Name 2] | 0.XX | [What to evaluate] |
| ... | ... | ... |Requirements:
- Weights MUST sum to 1.0
- Each criterion has a clear, measurable description
- Typically 3-6 criteria per rubric
Example:
markdown
| Criterion | Weight | Description |
|-----------|--------|-------------|
| Type Correctness | 0.35 | Types match specification exactly |
| API Contract Alignment | 0.25 | Aligns with documented API contract |
| Export Structure | 0.20 | Barrel exports correctly expose all types |
| Code Quality | 0.20 | Follows project TypeScript conventions |任务文件中的评分规则使用以下markdown表格格式:
markdown
| 标准 | 权重 | 描述 |
|-----------|--------|-------------|
| [名称1] | 0.XX | [评估内容] |
| [名称2] | 0.XX | [评估内容] |
| ... | ... | ... |要求:
- 权重总和必须为1.0
- 每个标准有清晰、可衡量的描述
- 每个评分规则通常包含3-6个标准
示例:
markdown
| 标准 | 权重 | 描述 |
|-----------|--------|-------------|
| 类型正确性 | 0.35 | 类型与规范完全匹配 |
| API契约一致性 | 0.25 | 与文档化的API契约一致 |
| 导出结构 | 0.20 | 桶导出正确暴露所有类型 |
| 代码质量 | 0.20 | 遵循项目TypeScript约定 |Scoring Scale
评分量表
When judges evaluate artifacts, they use this 5-point scale for each criterion:
-
1 (Poor): Does not meet requirements
- Missing essential elements
- Fundamental misunderstanding of requirements
-
2 (Below Average): Multiple issues, partially meets requirements
- Some correct elements, but significant gaps
- Would require substantial rework
-
3 (Adequate): Meets basic requirements
- Functional but minimal
- Room for improvement in quality or completeness
-
4 (Good): Meets all requirements, few minor issues
- Solid implementation
- Minor polish could improve it
-
5 (Excellent): Exceeds requirements
- Exceptional quality
- Goes beyond what was asked
- Could serve as reference implementation
评审评估工件时,对每个标准使用以下5分制量表:
-
1(差): 未满足要求
- 缺少基本元素
- 对要求存在根本性误解
-
2(低于平均): 存在多个问题,部分满足要求
- 有一些正确元素,但存在重大差距
- 需要大量返工
-
3(合格): 满足基本要求
- 可用但仅达最低标准
- 在质量或完整性方面有改进空间
-
4(良好): 满足所有要求,仅有少量次要问题
- 可靠的实现
- 少量优化即可改进
-
5(优秀): 超出要求
- 质量卓越
- 超出要求范围
- 可作为参考实现
Using Verification Specs During Execution
执行期间使用验证规范
During Phase 2 (Execute Steps):
- After a sdd:developer agent completes implementation
- Read the step's section in the task file
#### Verification - Extract: Level, Artifact paths, Threshold, Rubric, Reference Pattern
- Launch appropriate judge agent(s) based on Level
- Provide judges with: Artifact path, Rubric, Threshold, Reference Pattern
- Aggregate judge results and determine PASS/FAIL
- If FAIL, launch sdd:developer agent to fix issues and re-verify
Example Verification Section in Task File:
markdown
undefined阶段2(执行步骤)期间:
- sdd:developer代理完成执行后
- 读取任务文件中步骤的部分
#### Verification - 提取:级别、工件路径、阈值、评分规则、参考模式
- 根据级别启动相应的评审代理
- 向评审提供:工件路径、评分规则、阈值、参考模式
- 聚合评审结果并确定通过/失败
- 若失败,启动sdd:developer代理修复问题并重新验证
任务文件中的示例验证部分:
markdown
undefinedVerification
Verification
Level: Panel of 2 Judges with Aggregated Voting
Artifact: ,
src/decision/decision.service.tssrc/decision/tests/decision.service.spec.tsRubric:
| Criterion | Weight | Description |
|---|---|---|
| Routing Logic | 0.20 | Correctly routes by customerType |
| Drip Feed Implementation | 0.25 | 2% random approval for rejected New customers only |
| Response Formatting | 0.20 | Correct decision outcome, triggeredRules preserved, ISO 8601 timestamp |
| Testability | 0.15 | Injectable randomGenerator enables deterministic testing |
| Test Coverage | 0.20 | Unit tests cover approval, rejection, drip feed, routing, timestamp |
Reference Pattern: NestJS service patterns, ZenEngineService API
This specification tells you to:
- Launch 2 judge agents in parallel
- Have them evaluate both service and test files
- Use the 5-criterion rubric with specified weights
- Do not pass threshold to judges, only use it to compare it with the average score of the judges
- Reference existing NestJS patterns for comparisonLevel: Panel of 2 Judges with Aggregated Voting
Artifact: ,
src/decision/decision.service.tssrc/decision/tests/decision.service.spec.tsRubric:
| Criterion | Weight | Description |
|---|---|---|
| 路由逻辑 | 0.20 | 按customerType正确路由 |
| drip Feed实现 | 0.25 | 仅对被拒绝的新客户随机批准2% |
| 响应格式 | 0.20 | 正确的决策结果,保留triggeredRules,ISO 8601时间戳 |
| 可测试性 | 0.15 | 可注入的randomGenerator支持确定性测试 |
| 测试覆盖率 | 0.20 | 单元测试覆盖批准、拒绝、drip feed、路由、时间戳 |
Reference Pattern: NestJS服务模式,ZenEngineService API
此规范要求你:
- 并行启动2个评审代理
- 让他们评估服务和测试文件
- 使用包含5个标准的评分规则及指定权重
- 不要将阈值传递给评审,仅将其与评审的平均分数比较
- 参考现有NestJS模式进行比较