implement-task

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Implement Task with Verification

实现带验证的任务执行

Your job is to implement solution in best quality using task specification and sub-agents. You MUST NOT stop until it critically neccesary or you are done! Avoid asking questions until it is critically neccesary! Launch implementation agent, judges, iterate till issues are fixed and then move to next step!
Execute task implementation steps with automated quality verification using LLM-as-Judge for critical artifacts.
你的任务是利用任务规范和子代理,以最高质量实现解决方案。除非绝对必要或任务完成,否则不得停止!除非绝对必要,否则避免提问!启动执行代理、评审代理,迭代直至问题解决,再进入下一步!
通过LLM-as-Judge对关键工件执行自动化质量验证,完成任务执行步骤。

User Input

用户输入

text
$ARGUMENTS

text
$ARGUMENTS

Command Arguments

命令参数

Parse the following arguments from
$ARGUMENTS
:
$ARGUMENTS
中解析以下参数:

Argument Definitions

参数定义

ArgumentFormatDefaultDescription
task-file
Path or filenameAuto-detectTask file name or path (e.g.,
add-validation.feature.md
)
--continue
--continue
NoneContinue implementation from last completed step. Launches judge first to verify state, then iterates with implementation agent.
--refine
--refine
false
Incremental refinement mode - detect changes against git and re-implement only affected steps (from modified step onwards).
--human-in-the-loop
--human-in-the-loop [step1,step2,...]
NoneSteps after which to pause for human verification. If no steps specified, pauses after every step.
--target-quality
--target-quality X.X
or
--target-quality X.X,Y.Y
4.0
(standard) /
4.5
(critical)
Target threshold value (out of 5.0). Single value sets both. Two comma-separated values set standard,critical.
--max-iterations
--max-iterations N
3
Maximum fix→verify cycles per step. Default is 3 iterations. Set to
unlimited
for no limit.
--skip-judges
--skip-judges
false
Skip all judge validation checks - steps proceed without quality gates.
参数格式默认值描述
task-file
路径或文件名自动检测任务文件名或路径(例如:
add-validation.feature.md
--continue
--continue
从上一个已完成步骤继续执行。先启动评审代理验证当前状态,再与执行代理迭代。
--refine
--refine
false
增量优化模式 - 检测git变更,仅重新实现受影响的步骤(从修改步骤开始)。
--human-in-the-loop
--human-in-the-loop [step1,step2,...]
在指定步骤后暂停以进行人工验证。若未指定步骤,则在每个步骤后暂停。
--target-quality
--target-quality X.X
--target-quality X.X,Y.Y
4.0
(标准)/
4.5
(关键)
目标阈值(满分5.0)。单个值同时设置标准和关键阈值,两个逗号分隔值分别设置标准、关键阈值。
--max-iterations
--max-iterations N
3
每个步骤最多的修复→验证循环次数。默认3次迭代,设置为
unlimited
则无限制。
--skip-judges
--skip-judges
false
跳过所有评审验证检查 - 步骤无需质量门即可推进。

Configuration Resolution

配置解析

Parse
$ARGUMENTS
and resolve configuration as follows:
undefined
解析
$ARGUMENTS
并按如下规则解析配置:
undefined

Extract task file (first positional argument, optional - auto-detect if not provided)

提取任务文件(第一个位置参数,可选 - 未提供则自动检测)

TASK_FILE = first argument that is a file path or filename
TASK_FILE = 第一个为文件路径或文件名的参数

Parse --target-quality (supports single value or two comma-separated values)

解析--target-quality(支持单个值或两个逗号分隔值)

if --target-quality has single value X.X: THRESHOLD_FOR_STANDARD_COMPONENTS = X.X THRESHOLD_FOR_CRITICAL_COMPONENTS = X.X elif --target-quality has two values X.X,Y.Y: THRESHOLD_FOR_STANDARD_COMPONENTS = X.X THRESHOLD_FOR_CRITICAL_COMPONENTS = Y.Y else: THRESHOLD_FOR_STANDARD_COMPONENTS = 4.0 # default THRESHOLD_FOR_CRITICAL_COMPONENTS = 4.5 # default
if --target-quality 为单个值 X.X: THRESHOLD_FOR_STANDARD_COMPONENTS = X.X THRESHOLD_FOR_CRITICAL_COMPONENTS = X.X elif --target-quality 为两个值 X.X,Y.Y: THRESHOLD_FOR_STANDARD_COMPONENTS = X.X THRESHOLD_FOR_CRITICAL_COMPONENTS = Y.Y else: THRESHOLD_FOR_STANDARD_COMPONENTS = 4.0 # 默认值 THRESHOLD_FOR_CRITICAL_COMPONENTS = 4.5 # 默认值

Initialize other defaults

初始化其他默认值

MAX_ITERATIONS = --max-iterations || 3 # default is 3 iterations HUMAN_IN_THE_LOOP_STEPS = --human-in-the-loop || [] (empty = none, "*" = all) SKIP_JUDGES = --skip-judges || false REFINE_MODE = --refine || false CONTINUE_MODE = --continue || false
MAX_ITERATIONS = --max-iterations || 3 # 默认3次迭代 HUMAN_IN_THE_LOOP_STEPS = --human-in-the-loop || [](空数组表示无,"*"表示所有步骤) SKIP_JUDGES = --skip-judges || false REFINE_MODE = --refine || false CONTINUE_MODE = --continue || false

Special handling for --human-in-the-loop without step list

--human-in-the-loop未指定步骤列表的特殊处理

if --human-in-the-loop present without step numbers: HUMAN_IN_THE_LOOP_STEPS = "*" (all steps)
undefined
if --human-in-the-loop存在但未指定步骤编号: HUMAN_IN_THE_LOOP_STEPS = "*"(所有步骤)
undefined

Context Resolution for
--continue

--continue
模式的上下文解析

When
--continue
is used:
  1. Step Resolution:
    • Parse the task file for
      [DONE]
      markers on step titles
    • Identify the last incompleted step
    • Launch judge to verify the last INCOMPLETE step's artifacts
    • If judge PASS: Mark step as done and resume from the next step
    • If judge FAIL: Re-implement the step and iterate until PASS
  2. State Recovery:
    • Check task file location (
      in-progress/
      ,
      todo/
      ,
      done/
      )
    • If in
      todo/
      , move to
      in-progress/
      before continuing
    • Pre-populate captured values from existing artifacts
当使用
--continue
时:
  1. 步骤解析:
    • 解析任务文件中步骤标题的
      [DONE]
      标记
    • 识别最后一个未完成的步骤
    • 启动评审代理验证最后一个未完成步骤的工件
    • 若评审通过:标记步骤为已完成,从下一个步骤继续
    • 若评审失败:重新实现该步骤并迭代直至通过
  2. 状态恢复:
    • 检查任务文件位置(
      in-progress/
      todo/
      done/
    • 若在
      todo/
      目录,先移至
      in-progress/
      再继续
    • 从现有工件中预填充已捕获的值

Refine Mode Behavior (
--refine
)

优化模式行为(
--refine

When
--refine
is used, it detects changes to project files (not the task file) and maps them to implementation steps to determine what needs re-verification.
  1. Detect Changed Project Files:
    First, determine what to compare against based on git state:
    bash
    # Check for staged changes
    STAGED=$(git diff --cached --name-only)
    
    # Check for unstaged changes
    UNSTAGED=$(git diff --name-only)
    Comparison logic:
    StagedUnstagedCompare AgainstCommand
    YesYesStaged (unstaged only)
    git diff --name-only
    YesNoLast commit
    git diff HEAD --name-only
    NoYesLast commit
    git diff HEAD --name-only
    NoNoNo changesExit with message
    • If both staged AND unstaged: Compare working directory vs staging area (unstaged changes only)
    • If only staged OR only unstaged: Compare against last commit
    • This ensures refine operates on the most recent work in progress
  2. Map Changes to Implementation Steps:
    • Read the task file to get the list of implementation steps
    • For each changed file, determine which step created/modified it:
      • Check step's "Expected Output" section for file paths
      • Check step's subtasks for file references
      • Check step's artifacts in
        #### Verification
        section
    • Build a mapping:
      {changed_file → step_number}
  3. Determine Affected Steps:
    • Find all steps that have associated changed files
    • The earliest affected step is the starting point
    • All steps from that point onwards need re-verification
    • Earlier steps (unaffected) are preserved as-is
  4. Refine Execution:
    • For each affected step (in order):
      • Launch judge agent to verify the step's artifacts (including user's changes)
      • If judge PASS: Mark step done, proceed to next
      • If judge FAIL: Launch implementation agent with user's changes as context, then re-verify
    • User's manual fixes are preserved - implementation agent should build upon them, not overwrite
  5. Example:
    bash
    # User manually fixed src/validation/validation.service.ts
    # (This file was created in Step 2)
    
    /implement my-task.feature.md --refine
    
    # Detects: src/validation/validation.service.ts modified
    # Maps to: Step 2 (Create ValidationService)
    # Action: Launch judge for Step 2
    #   - If PASS: User's fix is good, proceed to Step 3
    #   - If FAIL: Implementation agent align rest of the code with user changes, without overwriting user's changes
    # Continues: Step 3, Step 4... (re-verify all subsequent steps)
  6. Multiple Files Changed:
    bash
    # User edited files from Step 2 AND Step 4
    
    /implement my-task.feature.md --refine
    
    # Detects: Files from Step 2 and Step 4 modified
    # Earliest affected: Step 2
    # Re-verifies: Step 2, Step 3, Step 4, Step 5...
    # (Step 3 re-verified even though no direct changes, because it depends on Step 2)
  7. Staged vs Unstaged Changes:
    bash
    # Scenario: User staged some changes, then made more edits
    # Staged: src/validation/validation.service.ts (git add done)
    # Unstaged: src/validation/validators/email.validator.ts (still editing)
    
    /implement my-task.feature.md --refine
    
    # Detects: Both staged AND unstaged changes exist
    # Mode: Compares unstaged only (working dir vs staging)
    # Only email.validator.ts is considered for refine
    # Staged changes are preserved, not re-verified
    
    # --
    
    # Scenario: User only has staged changes (ready to commit)
    # Staged: src/validation/validation.service.ts
    # Unstaged: none
    
    /implement my-task.feature.md --refine
    
    # Detects: Only staged changes
    # Mode: Compares against last commit
    # validation.service.ts changes are verified
当使用
--refine
时,会检测项目文件(非任务文件)的变更,并将其映射到执行步骤,以确定需要重新验证的内容。
  1. 检测变更的项目文件:
    首先,基于git状态确定比较基准:
    bash
    # 检查暂存的变更
    STAGED=$(git diff --cached --name-only)
    
    # 检查未暂存的变更
    UNSTAGED=$(git diff --name-only)
    比较逻辑:
    暂存未暂存比较基准命令
    暂存区(仅未暂存变更)
    git diff --name-only
    最后一次提交
    git diff HEAD --name-only
    最后一次提交
    git diff HEAD --name-only
    无变更退出并提示信息
    • 同时存在暂存和未暂存变更:比较工作目录与暂存区(仅未暂存变更)
    • 仅存在暂存或仅存在未暂存变更:与最后一次提交比较
    • 确保优化操作基于最新的工作进展
  2. 将变更映射到执行步骤:
    • 读取任务文件获取执行步骤列表
    • 对每个变更文件,确定由哪个步骤创建/修改:
      • 检查步骤的"预期输出"部分中的文件路径
      • 检查步骤子任务中的文件引用
      • 检查步骤
        #### Verification
        部分中的工件
    • 构建映射:
      {changed_file → step_number}
  3. 确定受影响的步骤:
    • 找出所有关联变更文件的步骤
    • 最早受影响的步骤作为起始点
    • 从该步骤开始的所有后续步骤都需要重新验证
    • 更早的未受影响步骤保持不变
  4. 优化执行:
    • 按顺序处理每个受影响的步骤:
      • 启动评审代理验证步骤的工件(包括用户的变更)
      • 若评审通过:标记步骤为已完成,进入下一步
      • 若评审失败:以用户变更为上下文启动执行代理,然后重新验证
    • 保留用户的手动修复 - 执行代理应基于这些修复构建,而非覆盖
  5. 示例:
    bash
    # 用户手动修复了src/validation/validation.service.ts
    # (该文件在步骤2中创建)
    
    /implement my-task.feature.md --refine
    
    # 检测到:src/validation/validation.service.ts已修改
    # 映射到:步骤2(创建ValidationService)
    # 操作:启动步骤2的评审代理
    #   - 若通过:用户的修复有效,进入步骤3
    #   - 若失败:执行代理调整其余代码以匹配用户变更,且不覆盖用户的修改
    # 继续:步骤3、步骤4...(重新验证所有后续步骤)
  6. 多文件变更:
    bash
    # 用户编辑了步骤2和步骤4中的文件
    
    /implement my-task.feature.md --refine
    
    # 检测到:步骤2和步骤4中的文件已修改
    # 最早受影响的步骤:步骤2
    # 重新验证:步骤2、步骤3、步骤4、步骤5...
    # (步骤3虽无直接变更,但因依赖步骤2也需重新验证)
  7. 暂存与未暂存变更:
    bash
    # 场景:用户暂存了一些变更,之后又进行了更多编辑
    # 暂存:src/validation/validation.service.ts(已执行git add)
    # 未暂存:src/validation/validators/email.validator.ts(仍在编辑)
    
    /implement my-task.feature.md --refine
    
    # 检测到:同时存在暂存和未暂存变更
    # 模式:仅比较未暂存变更(工作目录 vs 暂存区)
    # 仅email.validator.ts会被纳入优化范围
    # 暂存的变更将被保留,不进行重新验证
    
    # --
    
    # 场景:用户仅存在暂存变更(准备提交)
    # 暂存:src/validation/validation.service.ts
    # 未暂存:无
    
    /implement my-task.feature.md --refine
    
    # 检测到:仅存在暂存变更
    # 模式:与最后一次提交比较
    # 验证validation.service.ts的变更

Human-in-the-Loop Behavior

人工介入模式行为

Human verification checkpoints occur:
  1. Trigger Conditions:
    • After implementation + judge verification PASS for a step in
      HUMAN_IN_THE_LOOP_STEPS
    • After implementation + judge + implementation retry (before the next judge retry)
    • If
      HUMAN_IN_THE_LOOP_STEPS
      is
      "*"
      , triggers after every step
  2. At Checkpoint:
    • Display current step results summary
    • Display generated artifacts with paths
    • Display judge score and feedback
    • Ask user: "Review step output. Continue? [Y/n/feedback]"
    • If user provides feedback, incorporate into next iteration or step
    • If user says "n", pause workflow
  3. Checkpoint Message Format:
    markdown
    ---
    ## 🔍 Human Review Checkpoint - Step X
    
    **Step:** {step title}
    **Step Type:** {standard/critical}
    **Judge Score:** {score}/{threshold for step type} threshold
    **Status:** ✅ PASS / 🔄 ITERATING (attempt {n})
    
    **Artifacts Created/Modified:**
    - {artifact_path_1}
    - {artifact_path_2}
    
    **Judge Feedback:**
    {feedback summary}
    
    **Action Required:** Review the above artifacts and provide feedback or continue.
    
    > Continue? [Y/n/feedback]:
    ---

人工验证检查点触发时机:
  1. 触发条件:
    • HUMAN_IN_THE_LOOP_STEPS
      中的步骤完成执行+评审验证通过
    • 在执行+评审+执行重试后(下一次评审重试前)
    • HUMAN_IN_THE_LOOP_STEPS
      "*"
      ,则在每个步骤后触发
  2. 检查点操作:
    • 显示当前步骤结果摘要
    • 显示生成的工件及其路径
    • 显示评审分数和反馈
    • 询问用户:"Review step output. Continue? [Y/n/feedback]"
    • 若用户提供反馈,将其纳入下一次迭代或步骤
    • 若用户回复"n",暂停工作流
  3. 检查点消息格式:
    markdown
    ---
    ## 🔍 人工评审检查点 - 步骤X
    
    **步骤:** {step title}
    **步骤类型:** {standard/critical}
    **评审分数:** {score}/{threshold for step type} threshold
    **状态:** ✅ 通过 / 🔄 迭代中(第{n}次尝试)
    
    **创建/修改的工件:**
    - {artifact_path_1}
    - {artifact_path_2}
    
    **评审反馈:**
    {feedback summary}
    
    **操作要求:** 评审上述工件并提供反馈或继续执行。
    
    > 是否继续?[Y/n/反馈]:
    ---

Task Selection and Status Management

任务选择与状态管理

Task Status Folders

任务状态目录

Task status is managed by folder location:
  • .specs/tasks/todo/
    - Tasks waiting to be implemented
  • .specs/tasks/in-progress/
    - Tasks currently being worked on
  • .specs/tasks/done/
    - Completed tasks
任务状态通过目录位置管理:
  • .specs/tasks/todo/
    - 待实现的任务
  • .specs/tasks/in-progress/
    - 正在处理的任务
  • .specs/tasks/done/
    - 已完成的任务

Status Transitions

状态转换

WhenAction
Start implementationMove task from
todo/
to
in-progress/
Final verification PASSMove task from
in-progress/
to
done/
Implementation failure (user aborts)Keep in
in-progress/

触发时机操作
开始执行将任务从
todo/
移至
in-progress/
最终验证通过将任务从
in-progress/
移至
done/
执行失败(用户中止)保留在
in-progress/

CRITICAL: You Are an ORCHESTRATOR ONLY

重要提示:你仅作为编排器

Your role is DISPATCH and AGGREGATE. You do NOT do the work.
Properly build context of sub agents!
CRITICAL: For each sub-agent (implementation and evaluation), you need to provide:
  • Task file path
  • Step number
  • Item number (if applicable)
  • Artifact path (if applicable)
  • Value of
    ${CLAUDE_PLUGIN_ROOT}
    so agents can resolve paths like
    @${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.sh
你的角色是调度与聚合。不直接执行任务。
正确构建子代理的上下文!
重要提示:对于每个子代理(执行和评审),你需要提供:
  • 任务文件路径
  • 步骤编号
  • 项编号(如有)
  • 工件路径(如有)
  • ${CLAUDE_PLUGIN_ROOT}
    的值,以便代理解析类似
    @${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.sh
    的路径

What You DO

你需要做的

  • Read the task file ONCE (Phase 1 only)
  • Launch sub-agents via Task tool
  • Receive reports from sub-agents
  • Mark stages complete after judge confirmation
  • Aggregate results and report to user
  • 读取任务文件一次(仅阶段1)
  • 通过任务工具启动子代理
  • 接收子代理的报告
  • 经评审确认后标记阶段完成
  • 聚合结果并向用户报告

What You NEVER Do

你绝对不能做的

Prohibited ActionWhyWhat To Do Instead
Read implementation outputsContext bloat → command lossSub-agent reports what it created
Read reference filesSub-agent's job to understand patternsInclude path in sub-agent prompt
Read artifacts to "check" themContext bloat → forget verificationsLaunch judge agent
Evaluate code quality yourselfNot your job, causes forgettingLaunch judge agent
Skip verification "because simple"ALL verifications are mandatoryLaunch judge agent anyway
禁止操作原因替代方案
读取执行输出上下文过载→丢失指令子代理会报告创建的内容
读取参考文件这是子代理的工作在子代理提示中包含文件路径
读取工件以"检查"上下文过载→忘记验证启动评审代理
自行评估代码质量这不是你的职责,会导致遗漏启动评审代理
因"简单"而跳过验证所有验证均为强制性要求无论如何都要启动评审代理

Anti-Rationalization Rules

反合理化规则

If you think: "I should read this file to understand what was created" → STOP. The sub-agent's report tells you what was created. Use that information.
If you think: "I'll quickly verify this looks correct" → STOP. Launch a judge agent. That's not your job.
If you think: "This is too simple to need verification" → STOP. If the task specifies verification, launch the judge. No exceptions.
If you think: "I need to read the reference file to write a good prompt" → STOP. Put the reference file PATH in the sub-agent prompt. Sub-agent reads it.
如果你认为: "我应该读取这个文件来了解创建了什么" → 停止。 子代理的报告会告诉你创建了什么。使用该信息即可。
如果你认为: "我快速验证一下看起来是否正确" → 停止。 启动评审代理。这不是你的工作。
如果你认为: "这太简单了,不需要验证" → 停止。 如果任务指定了验证,就启动评审代理。无例外。
如果你认为: "我需要读取参考文件来编写好的提示" → 停止。 将参考文件路径放入子代理的提示中。让子代理去读取。

Why This Matters

为什么这很重要

Orchestrators who read files themselves = context overflow = command loss = forgotten steps. Every time.
Orchestrators who "quickly verify" = skip judge agents = quality collapse = failed artifacts.
Your context window is precious. Protect it. Delegate everything.

自行读取文件的编排器会导致上下文溢出→丢失指令→遗漏步骤。每次都会发生。
"快速验证"的编排器会跳过评审代理→质量崩溃→工件失败。
你的上下文窗口非常宝贵。保护它。将所有工作委托出去。

CRITICAL

重要规则

Configuration Rules

配置规则

  • Use
    THRESHOLD_FOR_STANDARD_COMPONENTS
    (default 4.0) for standard steps!
  • Use
    THRESHOLD_FOR_CRITICAL_COMPONENTS
    (default 4.5) for steps marked as critical in task file!
  • Default is 3 iterations - stop after 3 fix→verify cycles and proceed to next step (with warning)!
  • If
    MAX_ITERATIONS
    is set to
    unlimited
    : Iterate until quality threshold is met (no limit)
  • Trigger human-in-the-loop checkpoints ONLY after steps in
    HUMAN_IN_THE_LOOP_STEPS
    (or all steps if
    "*"
    )!
  • If
    SKIP_JUDGES
    is true: Skip ALL judge validation - proceed directly to next step after each implementation completes!
  • If
    CONTINUE_MODE
    is true: Skip to
    RESUME_FROM_STEP
    - do not re-implement already completed steps!
  • If
    REFINE_MODE
    is true: Detect changed project files, map to steps, re-verify from
    REFINE_FROM_STEP
    - preserve user's fixes!
  • 标准步骤使用
    THRESHOLD_FOR_STANDARD_COMPONENTS
    (默认4.0)!
  • 任务文件中标记为关键的步骤使用
    THRESHOLD_FOR_CRITICAL_COMPONENTS
    (默认4.5)!
  • 默认3次迭代 - 3次修复→验证循环后停止,继续下一步(并发出警告)!
  • MAX_ITERATIONS
    设置为
    unlimited
    :迭代直至达到质量阈值(无限制)
  • 仅对
    HUMAN_IN_THE_LOOP_STEPS
    中的步骤(或
    "*"
    时所有步骤)触发人工介入检查点!
  • SKIP_JUDGES
    为true:跳过所有评审验证 - 每次执行完成后直接进入下一步!
  • CONTINUE_MODE
    为true:跳至
    RESUME_FROM_STEP
    - 不重新实现已完成的步骤!
  • REFINE_MODE
    为true:检测变更的项目文件,映射到步骤,从
    REFINE_FROM_STEP
    开始重新验证 - 保留用户的修复!

Execution & Evaluation Rules

执行与评审规则

  • Use foreground agents only: Do not use background agents. Launch parallel agents when possible. Background agents constantly run in permissions issues and other errors.
Relaunch judge till you get valid results, of following happens:
  • Reject Long Reports: If an agent returns a very long report instead of using the scratchpad as requested, reject the result. This indicates the agent failed to follow the "use scratchpad" instruction.
  • Judge Score 5.0 is a Hallucination: If a judge returns a score of 5.0/5.0, treat it as a hallucination or lazy evaluation. Reject it and re-run the judge. Perfect scores are practically impossible in this rigorous framework.
  • Reject Missing Scores: If a judge report is missing the numerical score, reject it. This indicates the judge failed to read or follow the rubric instructions.

  • 仅使用前台代理:不要使用后台代理。尽可能启动并行代理。后台代理经常会出现权限问题和其他错误。
若发生以下情况,重新启动评审代理直至获得有效结果:
  • 拒绝过长报告:若代理返回过长报告而非按要求使用暂存区,拒绝该结果。这表明代理未遵循"使用暂存区"的指令。
  • 评审分数5.0为幻觉:若评审返回5.0/5.0的分数,将其视为幻觉或懒惰评估。拒绝该结果并重新运行评审。在这个严格的框架中,完美分数实际上是不可能的。
  • 拒绝缺失分数:若评审报告缺少数值分数,拒绝该结果。这表明评审未读取或遵循评分规则。

Overview

概述

This command orchestrates multi-step task implementation with:
  1. Sequential execution respecting step dependencies
  2. Parallel execution where dependencies allow
  3. Automated verification using judge agents for critical steps
  4. Panel of LLMs (PoLL) for high-stakes artifacts
  5. Aggregated voting with position bias mitigation
  6. Stage tracking with confirmation after each judge passes

该命令编排多步骤任务执行,具备以下特性:
  1. 顺序执行,尊重步骤依赖关系
  2. 并行执行,在依赖允许的情况下
  3. 自动化验证,对关键步骤使用评审代理
  4. LLM评审小组(PoLL),用于高风险工件
  5. 聚合投票,缓解位置偏差
  6. 阶段跟踪,每次评审通过后确认

Complete Workflow Overview

完整工作流概述

Phase 0: Select Task & Move to In-Progress
    ├─── Use provided task file name or auto-select from todo/ (if only 1 task)
    ├─── Move task: todo/ → in-progress/
Phase 1: Load Task
Phase 2: Execute Steps
    ├─── For each step in dependency order:
    │    │
    │    ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ Launch sdd:developer agent                          │
    │    │ (implementation)                                │
    │    └─────────────────┬───────────────────────────────┘
    │                      │
    │                      ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ Launch judge agent(s)                           │
    │    │ (verification per #### Verification section)    │
    │    └─────────────────┬───────────────────────────────┘
    │                      │
    │                      ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ Judge PASS? → Mark step complete in task file   │
    │    │ Judge FAIL? → Fix and re-verify (max 2 retries) │
    │    └─────────────────────────────────────────────────┘
Phase 3: Final Verification
    ├─── Verify all Definition of Done items
    │    │
    │    ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ Launch judge agent                              │
    │    │ (verify all DoD items)                          │
    │    └─────────────────┬───────────────────────────────┘
    │                      │
    │                      ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ All PASS? → Proceed to Phase 4                  │
    │    │ Any FAIL? → Fix and re-verify (iterate)         │
    │    └─────────────────────────────────────────────────┘
Phase 4: Move Task to Done
    ├─── Move task: in-progress/ → done/
Phase 5: Final Report

阶段0:选择任务并移至进行中目录
    ├─── 使用提供的任务文件名,或从todo/自动选择(若仅1个任务)
    ├─── 移动任务:todo/ → in-progress/
阶段1:加载任务
阶段2:执行步骤
    ├─── 按依赖顺序处理每个步骤:
    │    │
    │    ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ 启动sdd:developer代理                          │
    │    │(执行任务)                                │
    │    └─────────────────┬───────────────────────────────┘
    │                      │
    │                      ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ 启动评审代理                           │
    │    │(按#### Verification部分要求验证)    │
    │    └─────────────────┬───────────────────────────────┘
    │                      │
    │                      ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ 评审通过?→ 在任务文件中标记步骤完成   │
    │    │ 评审失败?→ 修复并重新验证(最多2次重试) │
    │    └─────────────────────────────────────────────────┘
阶段3:最终验证
    ├─── 验证所有完成定义项
    │    │
    │    ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ 启动评审代理                              │
    │    │(验证所有DoD项)                          │
    │    └─────────────────┬───────────────────────────────┘
    │                      │
    │                      ▼
    │    ┌─────────────────────────────────────────────────┐
    │    │ 全部通过?→ 进入阶段4                  │
    │    │ 任何失败?→ 修复并重新验证(迭代)         │
    │    └─────────────────────────────────────────────────┘
阶段4:将任务移至已完成目录
    ├─── 移动任务:in-progress/ → done/
阶段5:最终报告

Phase 0: Parse User Input and Select Task

阶段0:解析用户输入并选择任务

Parse user input to get the task file path and arguments.
解析用户输入以获取任务文件路径和参数。

Step 0.1: Resolve Task File

步骤0.1:解析任务文件

If
$ARGUMENTS
is empty or only contains flags:
  1. Check in-progress folder first:
    bash
    ls .specs/tasks/in-progress/*.md 2>/dev/null
    • If exactly 1 file → Set
      $TASK_FILE
      to that file,
      $TASK_FOLDER
      to
      in-progress
    • If multiple files → List them and ask user: "Multiple tasks in progress. Which one to continue?"
    • If no files → Continue to step 2
  2. Check todo folder:
    bash
    ls .specs/tasks/todo/*.md 2>/dev/null
    • If exactly 1 file → Set
      $TASK_FILE
      to that file,
      $TASK_FOLDER
      to
      todo
    • If multiple files → List them and ask user: "Multiple tasks in todo. Which one to implement?"
    • If no files → Report "No tasks available. Create one with /add-task first." and STOP
If
$ARGUMENTS
contains a task file name:
  1. Search for the file in order:
    in-progress/
    todo/
    done/
  2. Set
    $TASK_FILE
    and
    $TASK_FOLDER
    accordingly
  3. If not found, report error and STOP
$ARGUMENTS
为空或仅包含标志:
  1. 首先检查进行中目录:
    bash
    ls .specs/tasks/in-progress/*.md 2>/dev/null
    • 若恰好1个文件 → 将
      $TASK_FILE
      设为该文件,
      $TASK_FOLDER
      设为
      in-progress
    • 若多个文件 → 列出所有文件并询问用户:"Multiple tasks in progress. Which one to continue?"
    • 若无文件 → 继续步骤2
  2. 检查待办目录:
    bash
    ls .specs/tasks/todo/*.md 2>/dev/null
    • 若恰好1个文件 → 将
      $TASK_FILE
      设为该文件,
      $TASK_FOLDER
      设为
      todo
    • 若多个文件 → 列出所有文件并询问用户:"Multiple tasks in todo. Which one to implement?"
    • 若无文件 → 报告"No tasks available. Create one with /add-task first."并停止
$ARGUMENTS
包含任务文件名:
  1. 按顺序搜索文件:
    in-progress/
    todo/
    done/
  2. 相应设置
    $TASK_FILE
    $TASK_FOLDER
  3. 若未找到,报告错误并停止

Step 0.2: Move to In-Progress (if needed)

步骤0.2:移至进行中目录(如有需要)

If task is in
todo/
folder:
bash
git mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/
若任务在
todo/
目录:
bash
git mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/

Fallback if git not available: mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/

若git不可用,备用方案:mv .specs/tasks/todo/$TASK_FILE .specs/tasks/in-progress/


Update `$TASK_PATH` to `.specs/tasks/in-progress/$TASK_FILE`

**If task is already in `in-progress/`:**
Set `$TASK_PATH` to `.specs/tasks/in-progress/$TASK_FILE`

更新`$TASK_PATH`为`.specs/tasks/in-progress/$TASK_FILE`

**若任务已在`in-progress/`目录:**
设置`$TASK_PATH`为`.specs/tasks/in-progress/$TASK_FILE`

Step 0.3: Parse Flags and Initialize Configuration

步骤0.3:解析标志并初始化配置

Parse all flags from
$ARGUMENTS
and initialize configuration. Display resolved configuration:
markdown
undefined
$ARGUMENTS
解析所有标志并初始化配置。显示解析后的配置:
markdown
undefined

Configuration

配置

SettingValue
Task File{TASK_PATH}
Standard Components Threshold{THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0
Critical Components Threshold{THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0
Max Iterations{MAX_ITERATIONS or "3"}
Human Checkpoints{HUMAN_IN_THE_LOOP_STEPS as comma-separated or "All steps" or "None"}
Skip Judges{SKIP_JUDGES}
Continue Mode{CONTINUE_MODE}
Refine Mode{REFINE_MODE}
undefined
设置
任务文件{TASK_PATH}
标准组件阈值{THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0
关键组件阈值{THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0
最大迭代次数{MAX_ITERATIONS or "3"}
人工检查点{HUMAN_IN_THE_LOOP_STEPS as comma-separated or "All steps" or "None"}
跳过评审{SKIP_JUDGES}
继续模式{CONTINUE_MODE}
优化模式{REFINE_MODE}
undefined

Step 0.4: Handle Continue Mode

步骤0.4:处理继续模式

If
CONTINUE_MODE
is true:
  1. Identify Last Completed Step:
    • Parse task file for
      [DONE]
      markers on step titles
    • Find the highest step number marked
      [DONE]
    • Set
      LAST_COMPLETED_STEP
      to that number (or 0 if none)
  2. Verify Last Completed Step (if any):
    • If
      LAST_COMPLETED_STEP > 0
      :
      • Launch judge agent to verify the artifacts from that step
      • If judge PASS: Set
        RESUME_FROM_STEP = LAST_COMPLETED_STEP + 1
      • If judge FAIL: Set
        RESUME_FROM_STEP = LAST_COMPLETED_STEP
        (re-implement)
  3. Skip to Resume Point:
    • In Phase 2, skip all steps before
      RESUME_FROM_STEP
    • Continue execution from
      RESUME_FROM_STEP
CONTINUE_MODE
为true:
  1. 识别最后一个已完成步骤:
    • 解析任务文件中步骤标题的
      [DONE]
      标记
    • 找到编号最高的标记为
      [DONE]
      的步骤
    • LAST_COMPLETED_STEP
      设为该编号(若无则为0)
  2. 验证最后一个已完成步骤(如有):
    • LAST_COMPLETED_STEP > 0
      :
      • 启动评审代理验证该步骤的工件
      • 若评审通过:设置
        RESUME_FROM_STEP = LAST_COMPLETED_STEP + 1
      • 若评审失败:设置
        RESUME_FROM_STEP = LAST_COMPLETED_STEP
        (重新实现)
  3. 跳至恢复点:
    • 在阶段2中,跳过
      RESUME_FROM_STEP
      之前的所有步骤
    • RESUME_FROM_STEP
      开始继续执行

Step 0.5: Handle Refine Mode

步骤0.5:处理优化模式

If
REFINE_MODE
is true:
  1. Detect Changed Project Files:
    bash
    # Check for staged and unstaged changes
    STAGED=$(git diff --cached --name-only)
    UNSTAGED=$(git diff --name-only)
    Determine comparison mode:
    if STAGED is not empty AND UNSTAGED is not empty:
        # Both staged and unstaged - use unstaged only
        CHANGED_FILES = git diff --name-only  # working dir vs staging
        COMPARISON_MODE = "unstaged_only"
    elif STAGED is not empty OR UNSTAGED is not empty:
        # Only one type - compare against last commit
        CHANGED_FILES = git diff HEAD --name-only
        COMPARISON_MODE = "vs_last_commit"
    else:
        # No changes
        Report: "No project changes detected. Make edits first, then run --refine."
        Exit
  2. Load Task File and Extract Step→File Mapping:
    • Read the task file to get implementation steps
    • For each step, extract the files it creates/modifies from:
      • "Expected Output" sections
      • Subtask descriptions mentioning file paths
      • #### Verification
        artifact paths
    • Build mapping:
      STEP_FILE_MAP = {step_number → [file_paths]}
  3. Map Changed Files to Steps:
    AFFECTED_STEPS = []
    for each changed_file:
        for step_number, file_list in STEP_FILE_MAP:
            if changed_file matches any path in file_list:
                AFFECTED_STEPS.append(step_number)
    • If no steps matched: "Changed files don't map to any implementation step. Verify manually."
  4. Determine Refine Scope:
    • REFINE_FROM_STEP
      = min(AFFECTED_STEPS) # earliest affected step
    • All steps from
      REFINE_FROM_STEP
      onwards need re-verification
    • Steps before
      REFINE_FROM_STEP
      are preserved as-is
  5. Store Changed Files Context:
    • CHANGED_FILES
      = list of changed file paths
    • USER_CHANGES_CONTEXT
      = git diff output for affected files
    • Pass this context to judge and implementation agents
    • Agents should build upon user's fixes, not overwrite them
REFINE_MODE
为true:
  1. 检测变更的项目文件:
    bash
    # 检查暂存和未暂存的变更
    STAGED=$(git diff --cached --name-only)
    UNSTAGED=$(git diff --name-only)
    确定比较模式:
    if STAGED非空且UNSTAGED非空:
        # 同时存在暂存和未暂存变更 - 仅使用未暂存变更
        CHANGED_FILES = git diff --name-only  # 工作目录 vs 暂存区
        COMPARISON_MODE = "unstaged_only"
    elif STAGED非空或UNSTAGED非空:
        # 仅存在一种类型的变更 - 与最后一次提交比较
        CHANGED_FILES = git diff HEAD --name-only
        COMPARISON_MODE = "vs_last_commit"
    else:
        # 无变更
        报告:"No project changes detected. Make edits first, then run --refine."
        退出
  2. 加载任务文件并提取步骤→文件映射:
    • 读取任务文件获取执行步骤
    • 对每个步骤,从以下位置提取其创建/修改的文件:
      • "预期输出"部分
      • 提及文件路径的子任务描述
      • #### Verification
        部分的工件路径
    • 构建映射:
      STEP_FILE_MAP = {step_number → [file_paths]}
  3. 将变更文件映射到步骤:
    AFFECTED_STEPS = []
    for each changed_file:
        for step_number, file_list in STEP_FILE_MAP:
            if changed_file匹配file_list中的任何路径:
                AFFECTED_STEPS.append(step_number)
    • 若无匹配步骤:"Changed files don't map to any implementation step. Verify manually."
  4. 确定优化范围:
    • REFINE_FROM_STEP
      = min(AFFECTED_STEPS) # 最早受影响的步骤
    • REFINE_FROM_STEP
      开始的所有后续步骤都需要重新验证
    • REFINE_FROM_STEP
      之前的步骤保持不变
  5. 存储变更文件上下文:
    • CHANGED_FILES
      = 变更文件路径列表
    • USER_CHANGES_CONTEXT
      = 受影响文件的git diff输出
    • 将此上下文传递给评审和执行代理
    • 代理应基于用户的修复构建,而非覆盖

Phase 1: Load and Analyze Task

阶段1:加载并分析任务

This is the ONLY phase where you read a file.
这是唯一允许读取文件的阶段。

Step 1.1: Load Task Details

步骤1.1:加载任务详情

Read the task file ONCE:
bash
Read $TASK_PATH
After this read, you MUST NOT read any other files for the rest of execution.
读取任务文件一次:
bash
Read $TASK_PATH
读取完成后,执行过程中绝对不能再读取任何其他文件。

Step 1.2: Identify Implementation Steps

步骤1.2:识别执行步骤

Parse the
## Implementation Process
section:
  • List all steps with dependencies
  • Identify which steps have
    Parallel with:
    annotations
  • Classify each step's verification needs from
    #### Verification
    sections:
Verification LevelWhen to UseJudge Configuration
NoneSimple operations (mkdir, delete)Skip verification
Single JudgeNon-critical artifacts1 judge, threshold 4.0/5.0
Panel of 2 JudgesCritical artifacts2 judges, median voting, threshold 4.5/5.0
Per-Item JudgesMultiple similar items1 judge per item, parallel
解析
## Implementation Process
部分:
  • 列出所有带依赖关系的步骤
  • 识别带有
    Parallel with:
    注释的步骤
  • 根据
    #### Verification
    部分对每个步骤的验证需求进行分类:
验证级别使用场景评审配置
简单操作(mkdir、delete)跳过验证
单个评审非关键工件1个评审,阈值4.0/5.0
2个评审小组关键工件2个评审,中位数投票,阈值4.5/5.0
逐项评审多个相似项每个项1个评审,并行执行

Step 1.3: Create Todo List

步骤1.3:创建待办列表

Create TodoWrite with all implementation steps, marking verification requirements:
json
{
  "todos": [
    {"content": "Step 1: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 1"},
    {"content": "Step 2: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 2"}
  ]
}

创建包含所有执行步骤的TodoWrite,标记验证要求:
json
{
  "todos": [
    {"content": "Step 1: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 1"},
    {"content": "Step 2: [Title] - [Verification Level]", "status": "pending", "activeForm": "Implementing Step 2"}
  ]
}

Phase 2: Execute Implementation Steps

阶段2:执行步骤

For each step in dependency order:
按依赖顺序处理每个步骤:

Pattern A: Simple Step (No Verification)

模式A:简单步骤(无验证)

1. Launch Developer Agent:
Use Task tool with:
  • Agent Type:
    sdd:developer
  • Model: As specified in step or
    opus
    by default
  • Description: "Implement Step [N]: [Title]"
  • Prompt:
Implement Step [N]: [Step Title]

Task File: $TASK_PATH
Step Number: [N]

Your task:
- Execute ONLY Step [N]: [Step Title]
- Do NOT execute any other steps
- Follow the Expected Output and Success Criteria exactly

When complete, report:
1. What files were created/modified (paths)
2. Confirmation that success criteria are met
3. Any issues encountered
2. Use Agent's Report (No Verification)
  • Agent reports what was created → Use this information
  • DO NOT read the created files yourself
  • This pattern has NO verification (simple operations)
3. Mark Step Complete
  • Update task file:
    • Mark step title with
      [DONE]
      (e.g.,
      ### Step 1: Setup [DONE]
      )
    • Mark step's subtasks as
      [X]
      complete
  • Update todo to
    completed

1. 启动开发代理:
使用任务工具,参数如下:
  • 代理类型:
    sdd:developer
  • 模型: 步骤指定的模型,默认
    opus
  • 描述: "Implement Step [N]: [Title]"
  • 提示:
Implement Step [N]: [Step Title]

Task File: $TASK_PATH
Step Number: [N]

Your task:
- 仅执行步骤[N]: [Step Title]
- 不要执行任何其他步骤
- 严格遵循预期输出和成功标准

完成后报告:
1. 创建/修改了哪些文件(路径)
2. 确认成功标准已满足
3. 遇到的任何问题
2. 使用代理报告(无验证)
  • 代理会报告创建的内容 → 使用该信息
  • 绝对不要自行读取创建的文件
  • 此模式无验证(简单操作)
3. 标记步骤完成
  • 更新任务文件:
    • 在步骤标题后标记
      [DONE]
      (例如:
      ### Step 1: Setup [DONE]
    • 将步骤的子任务标记为
      [X]
      已完成
  • 更新待办列表为
    completed

Pattern B: Critical Step (Panel of 2 Evaluations)

模式B:关键步骤(2个评审小组)

1. Launch Developer Agent:
Use Task tool with:
  • Agent Type:
    sdd:developer
  • Model: As specified in step or
    opus
    by default
  • Description: "Implement Step [N]: [Title]"
  • Prompt:
Implement Step [N]: [Step Title]

Task File: $TASK_PATH
Step Number: [N]

Your task:
- Execute ONLY Step [N]: [Step Title]
- Do NOT execute any other steps
- Follow the Expected Output and Success Criteria exactly

When complete, report:
1. What files were created/modified (paths)
2. Confirmation of completion
3. Self-critique summary
2. Wait for Completion
  • Receive the agent's report
  • Note the artifact path(s) from the report
  • DO NOT read the artifact yourself
3. Launch 2 Evaluation Agents in Parallel (MANDATORY):
⚠️ MANDATORY: This pattern requires launching evaluation agents. You MUST launch these evaluations. Do NOT skip. Do NOT verify yourself.
Use
sdd:developer
agent type for evaluations
Evaluation 1 & 2 (launch both in parallel with same prompt structure):
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}

Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology.

Evaluate artifact at: [artifact_path from implementation agent report]

**Chain-of-Thought Requirement:** Justification MUST be provided BEFORE score for each criterion.

Rubric:
[paste rubric table from #### Verification section]

Context:
- Read $TASK_PATH
- Verify Step [N] ONLY: [Step Title]
- Threshold: [from #### Verification section]
- Reference pattern: [if specified in #### Verification section]

You can verify the artifact works - run tests, check imports, validate syntax.

Return: scores per criterion with evidence, overall weighted score, PASS/FAIL, improvements if FAIL.
4. Aggregate Results:
  • Calculate median score per criterion
  • Flag high-variance criteria (std > 1.0)
  • Pass if median overall ≥ threshold
5. Determine Threshold:
  • Check if step is marked as critical in task file (in
    #### Verification
    section or step metadata)
  • If critical: use
    THRESHOLD_FOR_CRITICAL_COMPONENTS
  • If standard: use
    THRESHOLD_FOR_STANDARD_COMPONENTS
6. On FAIL: Iterate Until PASS (max 3 iterations by default)
  • Present issues to implementation agent with judge feedback
  • Re-implement with judge feedback incorporated (align code with requirements, preserve user's changes if in refine mode)
  • Re-verify with judge
  • Iterate until PASS - continue fix → verify cycle until quality threshold is met or max iterations reached
  • If
    MAX_ITERATIONS
    reached (default 3):
    • Log warning: "Step [N] did not pass after {MAX_ITERATIONS} iterations"
    • Proceed to next step (do not block indefinitely)
7. On PASS: Mark Step Complete
  • Update task file:
    • Mark step title with
      [DONE]
      (e.g.,
      ### Step 2: Create Service [DONE]
      )
    • Mark step's subtasks as
      [X]
      complete
  • Update todo to
    completed
  • Record judge scores in tracking
8. Human-in-the-Loop Checkpoint (if applicable):
Only after step PASSES, if step number is in
HUMAN_IN_THE_LOOP_STEPS
(or
HUMAN_IN_THE_LOOP_STEPS == "*"
):
markdown
---
1. 启动开发代理:
使用任务工具,参数如下:
  • 代理类型:
    sdd:developer
  • 模型: 步骤指定的模型,默认
    opus
  • 描述: "Implement Step [N]: [Title]"
  • 提示:
Implement Step [N]: [Step Title]

Task File: $TASK_PATH
Step Number: [N]

Your task:
- 仅执行步骤[N]: [Step Title]
- 不要执行任何其他步骤
- 严格遵循预期输出和成功标准

完成后报告:
1. 创建/修改了哪些文件(路径)
2. 完成确认
3. 自我评审摘要
2. 等待完成
  • 接收代理报告
  • 从报告中记录工件路径
  • 绝对不要自行读取工件
3. 并行启动2个评审代理(强制性):
⚠️ 强制性:此模式要求启动评审代理。必须启动这些评审,不得跳过,不得自行验证。
使用
sdd:developer
代理类型进行评审
评审1和评审2(并行启动,提示结构相同):
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}

Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md了解评审方法。

评估工件路径:[执行代理报告中的artifact_path]

**链式思考要求:** 每个标准的理由必须在分数之前提供。

评分规则:
[粘贴#### Verification部分的评分规则表格]

上下文:
- 读取$TASK_PATH
- 仅验证步骤[N]: [Step Title]
- 阈值:[来自#### Verification部分]
- 参考模式:[若#### Verification部分指定]

你可以验证工件是否正常工作 - 运行测试、检查导入、验证语法。

返回:每个标准的分数及证据、总体加权分数、通过/失败、失败时的改进建议。
4. 聚合结果:
  • 计算每个标准的中位数分数
  • 标记高方差标准(标准差>1.0)
  • 若总体中位数≥阈值则通过
5. 确定阈值:
  • 检查任务文件中步骤是否标记为关键(在
    #### Verification
    部分或步骤元数据中)
  • 若为关键步骤:使用
    THRESHOLD_FOR_CRITICAL_COMPONENTS
  • 若为标准步骤:使用
    THRESHOLD_FOR_STANDARD_COMPONENTS
6. 失败时:迭代直至通过(默认最多3次迭代)
  • 将问题和评审反馈提交给执行代理
  • 结合评审反馈重新实现(调整代码以符合要求,优化模式下保留用户变更)
  • 重新启动评审验证
  • 迭代直至通过 - 继续修复→验证循环,直至达到质量阈值或达到最大迭代次数
  • 若达到
    MAX_ITERATIONS
    (默认3次):
    • 记录警告:"Step [N] did not pass after {MAX_ITERATIONS} iterations"
    • 继续下一步(不无限阻塞)
7. 通过时:标记步骤完成
  • 更新任务文件:
    • 在步骤标题后标记
      [DONE]
      (例如:
      ### Step 2: Create Service [DONE]
    • 将步骤的子任务标记为
      [X]
      已完成
  • 更新待办列表为
    completed
  • 在跟踪中记录评审分数
8. 人工介入检查点(如适用):
仅在步骤通过后,若步骤编号在
HUMAN_IN_THE_LOOP_STEPS
中(或
HUMAN_IN_THE_LOOP_STEPS == "*"
):
markdown
---

🔍 Human Review Checkpoint - Step [N]

🔍 人工评审检查点 - 步骤[N]

Step: [Step Title] Judge Score: [score]/[threshold for step type] threshold Status: ✅ PASS
Artifacts Created/Modified:
  • [artifact_path_1]
  • [artifact_path_2]
Judge Feedback: [feedback summary from judges]
Action Required: Review the above artifacts and provide feedback or continue.
Continue? [Y/n/feedback]:


- If user provides feedback: Store for next step or re-implement current step with feedback
- If user says "n": Pause workflow, report current progress
- If user says "Y" or continues: Proceed to next step

---
步骤: [Step Title] 评审分数: [score]/[threshold for step type] threshold 状态: ✅ 通过
创建/修改的工件:
  • [artifact_path_1]
  • [artifact_path_2]
评审反馈: [来自评审的反馈摘要]
操作要求: 评审上述工件并提供反馈或继续执行。
是否继续?[Y/n/反馈]:


- 若用户提供反馈:存储反馈用于下一步或结合反馈重新实现当前步骤
- 若用户回复"n":暂停工作流,报告当前进度
- 若用户回复"Y"或继续:进入下一步

---

Pattern C: Multi-Item Step (Per-Item Evaluations)

模式C:多步骤项(逐项评审)

For steps that create multiple similar items:
1. Launch Developer Agents in Parallel (one per item):
Use Task tool for EACH item (launch all in parallel):
  • Agent Type:
    sdd:developer
  • Model: As specified or
    opus
    by default
  • Description: "Implement Step [N], Item: [Name]"
  • Prompt:
Implement Step [N], Item: [Item Name]

Task File: $TASK_PATH
Step Number: [N]
Item: [Item Name]

Your task:
- Create ONLY [item_name] from Step [N]
- Do NOT create other items or steps
- Follow the Expected Output and Success Criteria exactly

When complete, report:
1. File path created
2. Confirmation of completion
3. Self-critique summary
2. Wait for All Completions
  • Collect all agent reports
  • Note all artifact paths
  • DO NOT read any of the created files yourself
3. Launch Evaluation Agents in Parallel (one per item)
⚠️ MANDATORY: Launch evaluation agents. Do NOT skip. Do NOT verify yourself.
Use
sdd:developer
agent type for evaluations
For each item:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}

Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology.

Evaluate artifact at: [item_path from implementation agent report]

**Chain-of-Thought Requirement:** Justification MUST be provided BEFORE score for each criterion.

Rubric:
[paste rubric from #### Verification section]

Context:
- Read $TASK_PATH
- Verify Step [N]: [Step Title]
- Verify ONLY this Item: [Item Name]
- Threshold: [from #### Verification section]

You can verify the artifact works - run tests, check syntax, confirm dependencies.

Return: scores with evidence, overall score, PASS/FAIL, improvements if FAIL.
4. Collect All Results
5. Report Aggregate:
  • Items passed: X/Y
  • Items needing revision: [list with specific issues]
6. Determine Threshold:
  • Check if step is marked as critical in task file (in
    #### Verification
    section or step metadata)
  • If critical: use
    THRESHOLD_FOR_CRITICAL_COMPONENTS
  • If standard: use
    THRESHOLD_FOR_STANDARD_COMPONENTS
7. If Any FAIL: Iterate Until ALL PASS
  • Present failing items with judge feedback to implementation agent
  • Re-implement only failing items with feedback incorporated (preserve user's changes if in refine mode)
  • Re-verify failing items with judge
  • Iterate until ALL PASS - continue fix → verify cycle until all items meet quality threshold or max iterations reached
  • If
    MAX_ITERATIONS
    reached (default 3):
    • Log warning: "Step [N] has {X} items that did not pass after {MAX_ITERATIONS} iterations"
    • Proceed to next step (do not block indefinitely)
8. On ALL PASS: Mark Step Complete
  • Update task file:
    • Mark step title with
      [DONE]
      (e.g.,
      ### Step 3: Create Items [DONE]
      )
    • Mark step's subtasks as
      [X]
      complete
  • Update todo to
    completed
  • Record pass rate in tracking
9. Human-in-the-Loop Checkpoint (if applicable):
Only after ALL items PASS, if step number is in
HUMAN_IN_THE_LOOP_STEPS
(or
HUMAN_IN_THE_LOOP_STEPS == "*"
):
markdown
---
对于创建多个相似项的步骤:
1. 并行启动开发代理(每个项一个):
对每个项使用任务工具(并行启动所有代理):
  • 代理类型:
    sdd:developer
  • 模型: 指定的模型,默认
    opus
  • 描述: "Implement Step [N], Item: [Name]"
  • 提示:
Implement Step [N], Item: [Item Name]

Task File: $TASK_PATH
Step Number: [N]
Item: [Item Name]

Your task:
- 仅创建步骤[N]中的[item_name]
- 不要创建其他项或步骤
- 严格遵循预期输出和成功标准

完成后报告:
1. 创建的文件路径
2. 完成确认
3. 自我评审摘要
2. 等待所有完成
  • 收集所有代理报告
  • 记录所有工件路径
  • 绝对不要自行读取任何创建的文件
3. 并行启动评审代理(每个项一个)
⚠️ 强制性:启动评审代理。不得跳过,不得自行验证。
使用
sdd:developer
代理类型进行评审
对每个项:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}

Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md了解评审方法。

评估工件路径:[执行代理报告中的item_path]

**链式思考要求:** 每个标准的理由必须在分数之前提供。

评分规则:
[粘贴#### Verification部分的评分规则]

上下文:
- 读取$TASK_PATH
- 验证步骤[N]: [Step Title]
- 仅验证此项:[Item Name]
- 阈值:[来自#### Verification部分]

你可以验证工件是否正常工作 - 运行测试、检查语法、确认依赖关系。

返回:带证据的分数、总体分数、通过/失败、失败时的改进建议。
4. 收集所有结果
5. 聚合报告:
  • 通过的项:X/Y
  • 需要修订的项:[列出具体问题]
6. 确定阈值:
  • 检查任务文件中步骤是否标记为关键(在
    #### Verification
    部分或步骤元数据中)
  • 若为关键步骤:使用
    THRESHOLD_FOR_CRITICAL_COMPONENTS
  • 若为标准步骤:使用
    THRESHOLD_FOR_STANDARD_COMPONENTS
7. 若有失败项:迭代直至全部通过
  • 将失败项及评审反馈提交给执行代理
  • 结合反馈重新实现失败项(优化模式下保留用户变更)
  • 重新验证失败项
  • 迭代直至全部通过 - 继续修复→验证循环,直至所有项达到质量阈值或达到最大迭代次数
  • 若达到
    MAX_ITERATIONS
    (默认3次):
    • 记录警告:"Step [N] has {X} items that did not pass after {MAX_ITERATIONS} iterations"
    • 继续下一步(不无限阻塞)
8. 全部通过时:标记步骤完成
  • 更新任务文件:
    • 在步骤标题后标记
      [DONE]
      (例如:
      ### Step 3: Create Items [DONE]
    • 将步骤的子任务标记为
      [X]
      已完成
  • 更新待办列表为
    completed
  • 在跟踪中记录通过率
9. 人工介入检查点(如适用):
仅在所有项通过后,若步骤编号在
HUMAN_IN_THE_LOOP_STEPS
中(或
HUMAN_IN_THE_LOOP_STEPS == "*"
):
markdown
---

🔍 Human Review Checkpoint - Step [N]

🔍 人工评审检查点 - 步骤[N]

Step: [Step Title] Items Passed: X/Y Status: ✅ ALL PASS
Artifacts Created:
  • [item_1_path]
  • [item_2_path]
  • ...
Action Required: Review the above artifacts and provide feedback or continue.
Continue? [Y/n/feedback]:


- If user provides feedback: Store for next step or re-implement items with feedback
- If user says "n": Pause workflow, report current progress
- If user says "Y" or continues: Proceed to next step

---
步骤: [Step Title] 通过的项: X/Y 状态: ✅ 全部通过
创建的工件:
  • [item_1_path]
  • [item_2_path]
  • ...
操作要求: 评审上述工件并提供反馈或继续执行。
是否继续?[Y/n/反馈]:


- 若用户提供反馈:存储反馈用于下一步或结合反馈重新实现项
- 若用户回复"n":暂停工作流,报告当前进度
- 若用户回复"Y"或继续:进入下一步

---

⚠️ CHECKPOINT: Before Proceeding to Final Verification

⚠️ 检查点:进入最终验证前

Before moving to final verification, verify you followed the rules:
  • Did you launch sdd:developer agents for ALL implementations?
  • Did you launch evaluation agents for ALL verifications?
  • Did you mark steps complete ONLY after judge PASS?
  • Did you avoid reading ANY artifact files yourself?
If you read files other than the task file, you are doing it wrong. STOP and restart.

进入最终验证前,确认你遵循了以下规则:
  • 是否为所有执行任务启动了sdd:developer代理?
  • 是否为所有验证启动了评审代理?
  • 是否仅在评审通过后标记步骤完成?
  • 是否绝对没有读取任何工件文件?
若你读取了任务文件以外的任何文件,操作错误。请停止并重新开始。

Phase 3: Final Verification

阶段3:最终验证

After all implementation steps are complete, verify the task meets all Definition of Done criteria.
所有执行步骤完成后,验证任务是否满足所有完成定义(DoD)标准。

Step 3.1: Launch Definition of Done Verification

步骤3.1:启动完成定义验证

Use Task tool with:
  • Agent Type:
    sdd:developer
  • Model:
    opus
  • Description: "Verify Definition of Done"
  • Prompt:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}

Verify all Definition of Done items in the task file.

Task File: $TASK_PATH

Your task:
1. Read the task file and locate the "## Definition of Done (Task Level)" section
2. Go through each checkbox item one by one
3. For each item, verify if it passes by:
   - Running appropriate tests (unit tests, E2E tests)
   - Checking build/compilation status
   - Verifying file existence and correctness
   - Checking code patterns and linting
4. You MUST mark each item in task file that passed verification with `[X]`
5. Return a structured report:
- List ALL Definition of Done items
- Status for each:
   - ✅ PASS - if the item is complete and verified
   - ❌ FAIL - if the item fails verification, with specific reason why
   - ⚠️ BLOCKED - if the item cannot be verified due to a blocker
- Evidence for each status
- Specific issues for any failures
- Overall pass rate

Be thorough - check everything the task requires.
使用任务工具,参数如下:
  • 代理类型:
    sdd:developer
  • 模型:
    opus
  • 描述: "Verify Definition of Done"
  • 提示:
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}

验证任务文件中的所有完成定义项。

Task File: $TASK_PATH

Your task:
1. 读取任务文件并定位"## Definition of Done (Task Level)"部分
2. 逐一检查每个复选框项
3. 对每个项,通过以下方式验证是否通过:
   - 运行适当的测试(单元测试、端到端测试)
   - 检查构建/编译状态
   - 验证文件存在性和正确性
   - 检查代码模式和代码规范
4. 必须在任务文件中标记所有通过验证的项为`[X]`
5. 返回结构化报告:
- 列出所有完成定义项
- 每个项的状态:
   - ✅ 通过 - 项已完成并验证
   - ❌ 失败 - 项未通过验证,附具体原因
   - ⚠️ 阻塞 - 因阻塞无法验证项
- 每个状态的证据
- 任何失败的具体问题
- 总体通过率

请彻底检查 - 验证任务要求的所有内容。

Step 3.2: Review Verification Results

步骤3.2:评审验证结果

  • Receive the verification report
  • Note which items PASS and which FAIL
  • if judge report that all items PASS, you MUST read end of task file to verify that all DoD items are marked with
    [X]
  • 接收验证报告
  • 记录哪些项通过,哪些失败
  • 若评审报告所有项通过,必须读取任务文件末尾以确认所有DoD项都标记为
    [X]

Step 3.3: Fix Failing Items (If Any)

步骤3.3:修复失败项(如有)

If any Definition of Done items FAIL:
1. Launch Developer Agent for Each Failing Item:
Fix Definition of Done item: [Item Description]

Task File: $TASK_PATH

Current Status:
[paste failure details from verification report]

Your task:
1. Fix the specific issue identified
2. Verify the fix resolves the problem
3. Ensure no regressions (all tests still pass)

Return:
- What was fixed
- Confirmation the item now passes
- Any related changes made
2. Re-verify After Fixes:
Launch the verification agent again (Step 3.1) to confirm all items now PASS.
3. Iterate if Needed:
Repeat fix → verify cycle until all Definition of Done items PASS.

若任何完成定义项失败:
1. 为每个失败项启动开发代理:
Fix Definition of Done item: [Item Description]

Task File: $TASK_PATH

当前状态:
[粘贴验证报告中的失败详情]

Your task:
1. 修复识别出的具体问题
2. 验证修复解决了问题
3. 确保无回归(所有测试仍通过)

返回:
- 修复的内容
- 确认项现在已通过
- 进行的任何相关变更
2. 修复后重新验证:
再次启动验证代理(步骤3.1)以确认所有项现在通过。
3. 必要时迭代:
重复修复→验证循环,直至所有完成定义项通过。

Phase 4: Move Task to Done

阶段4:将任务移至已完成目录

Once ALL Definition of Done items PASS, move the task to the done folder.
一旦所有完成定义项通过,将任务移至已完成目录。

Step 4.1: Verify Completion

步骤4.1:验证完成状态

Confirm all Definition of Done items are marked complete in the task file.
确认任务文件中所有完成定义项都标记为已完成。

Step 4.2: Move Task

步骤4.2:移动任务

bash
undefined
bash
undefined

Extract just the filename from $TASK_PATH

从$TASK_PATH中提取文件名

TASK_FILENAME=$(basename $TASK_PATH)
TASK_FILENAME=$(basename $TASK_PATH)

Move from in-progress to done

从进行中目录移至已完成目录

git mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/
git mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/

Fallback if git not available: mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/

若git不可用,备用方案:mv .specs/tasks/in-progress/$TASK_FILENAME .specs/tasks/done/


---

---

Phase 5: Aggregation and Reporting

阶段5:聚合与报告

Panel Voting Algorithm

小组投票算法

When using 2+ evaluations, follow these manual computation steps:
  • Think in steps, output each step result separately!
  • Do not skip steps!
使用2个及以上评审时,遵循以下手动计算步骤:
  • 分步思考,单独输出每个步骤的结果!
  • 不要跳过步骤!

Step 1: Collect Scores per Criterion

步骤1:收集每个标准的分数

Create a table with each criterion and scores from all evaluations:
CriterionEval 1Eval 2MedianDifference
[Name 1]X.XX.X??
[Name 2]X.XX.X??
创建表格,包含每个标准和所有评审的分数:
标准评审1评审2中位数差值
[名称1]X.XX.X??
[名称2]X.XX.X??

Step 2: Calculate Median for Each Criterion

步骤2:计算每个标准的中位数

For 2 evaluations: Median = (Score1 + Score2) / 2
For 3+ evaluations: Sort scores, take middle value (or average of two middle values if even count)
2个评审时:中位数 = (分数1 + 分数2) / 2
3个及以上评审时:对分数排序,取中间值(偶数个时取两个中间值的平均值)

Step 3: Check for High Variance

步骤3:检查高方差

High variance = evaluators disagree significantly (difference > 2.0 points)
Formula:
|Eval1 - Eval2| > 2.0
→ Flag as high variance
高方差 = 评审意见显著分歧(差值>2.0分)
公式:
|评审1 - 评审2| > 2.0
→ 标记为高方差

Step 4: Calculate Weighted Overall Score

步骤4:计算加权总体分数

Multiply each criterion's median by its weight and sum:
Overall = (Criterion1_Median × Weight1) + (Criterion2_Median × Weight2) + ...
将每个标准的中位数乘以其权重并求和:
总体分数 = (标准1中位数 × 权重1) + (标准2中位数 × 权重2) + ...

Step 5: Determine Pass/Fail

步骤5:确定通过/失败

Compare overall score to threshold:
  • Overall ≥ Threshold
    PASS
  • Overall < Threshold
    FAIL

将总体分数与阈值比较:
  • 总体分数 ≥ 阈值
    通过
  • 总体分数 < 阈值
    失败

Handling Disagreement

处理分歧

If evaluations significantly disagree (difference > 2.0 on any criterion):
  1. Flag the criterion
  2. Present both evaluators' reasoning
  3. Ask user: "Evaluators disagree on [criterion]. Review manually?"
  4. If yes: present evidence, get user decision
  5. If no: use median (conservative approach)
若评审意见显著分歧(任何标准差值>2.0):
  1. 标记该标准
  2. 呈现两位评审的理由
  3. 询问用户:"Evaluators disagree on [criterion]. Review manually?"
  4. 若用户同意:呈现证据,获取用户决策
  5. 若用户不同意:使用中位数(保守方法)

Final Report

最终报告

After all steps complete and DoD verification passes:
markdown
undefined
所有步骤完成且DoD验证通过后:
markdown
undefined

Implementation Summary

执行摘要

Task Status

任务状态

  • Task Status:
    done
  • All Definition of Done items: X/X PASS (100%)
  • 任务状态:
    done
  • 所有完成定义项: X/X 通过(100%)

Configuration Used

使用的配置

SettingValue
Standard Components Threshold{THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0
Critical Components Threshold{THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0
Max Iterations{MAX_ITERATIONS or "3"}
Human Checkpoints{HUMAN_IN_THE_LOOP_STEPS or "None"}
Skip Judges{SKIP_JUDGES}
Continue Mode{CONTINUE_MODE}
Refine Mode{REFINE_MODE}
设置
标准组件阈值{THRESHOLD_FOR_STANDARD_COMPONENTS}/5.0
关键组件阈值{THRESHOLD_FOR_CRITICAL_COMPONENTS}/5.0
最大迭代次数{MAX_ITERATIONS or "3"}
人工检查点{HUMAN_IN_THE_LOOP_STEPS or "None"}
跳过评审{SKIP_JUDGES}
继续模式{CONTINUE_MODE}
优化模式{REFINE_MODE}

Steps Completed

已完成步骤

StepTitleStatusVerificationScoreIterationsJudge Confirmed
1[Title]SkippedN/A1-
2[Title]Panel (2)4.5/51
3[Title]Per-Item5/5 passed2
4[Title]Single4.2/53
Legend:
  • ✅ PASS - Score >= threshold for step type
  • ⚠️ MAX_ITER - Did not pass but MAX_ITERATIONS reached, proceeded anyway
  • ⏭️ SKIPPED - Step skipped (continue/refine mode)
步骤标题状态验证方式分数迭代次数评审确认
1[Title]跳过N/A1-
2[Title]小组(2个)4.5/51
3[Title]逐项5/5 通过2
4[Title]单个4.2/53
图例:
  • ✅ 通过 - 分数≥步骤类型的阈值
  • ⚠️ 达到最大迭代次数 - 未通过但已达到最大迭代次数,继续执行
  • ⏭️ 跳过 - 步骤已跳过(继续/优化模式)

Verification Summary

验证摘要

  • Total steps: X
  • Steps with verification: Y
  • Passed on first try: Z
  • Required iteration: W
  • Total iterations across all steps: V
  • Final pass rate: 100%
  • 总步骤数: X
  • 带验证的步骤数: Y
  • 首次通过的步骤数: Z
  • 需要迭代的步骤数: W
  • 所有步骤的总迭代次数: V
  • 最终通过率: 100%

Definition of Done Verification

完成定义验证

ItemStatusEvidence
[DoD Item 1]✅ PASS[Brief evidence]
[DoD Item 2]✅ PASS[Brief evidence]
.........
Issues Fixed During Verification:
  1. [Issue]: [How it was fixed]
  2. [Issue]: [How it was fixed]
状态证据
[DoD项1]✅ 通过[简要证据]
[DoD项2]✅ 通过[简要证据]
.........
验证期间修复的问题:

High-Variance Criteria (Evaluators Disagreed)

高方差标准(评审意见分歧)

  • [Criterion] in [Step]: Eval 1 scored X, Eval 2 scored Y
  • [标准] 在[步骤]: 评审1评X分,评审2评Y分

Human Review Summary (if --human-in-the-loop used)

人工评审摘要(若使用--human-in-the-loop)

StepCheckpointUser ActionFeedback Incorporated
2After PASSContinued-
4After iteration 2Feedback"Improve error messages"
6After PASSContinued-
步骤检查点用户操作已纳入的反馈
2通过后继续-
4第2次迭代后反馈"Improve error messages"
6通过后继续-

Task File Updated

任务文件已更新

  • Task moved from
    in-progress/
    to
    done/
    folder
  • All step titles marked
    [DONE]
  • All step subtasks marked
    [X]
  • All Definition of Done items marked
    [X]
  • 任务已从
    in-progress/
    移至
    done/
    目录
  • 所有步骤标题标记为
    [DONE]
  • 所有步骤子任务标记为
    [X]
  • 所有完成定义项标记为
    [X]

Recommendations

建议

  1. [Any follow-up actions]
  2. [Suggested improvements]

---
  1. [任何后续操作]
  2. [建议改进点]

---

Execution Flow Diagram

执行流程图

┌──────────────────────────────────────────────────────────────┐
│                IMPLEMENT TASK WITH VERIFICATION               │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  Phase 0: Select Task                                         │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ Use provided name or auto-select from todo/ (if 1 task) │  │
│  │ → Move task from todo/ to in-progress/                  │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  Phase 1: Load Task                                           │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ Read $TASK_PATH → Parse steps                           │  │
│  │ → Extract #### Verification specs → Create TodoWrite    │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  Phase 2: Execute Steps (Respecting Dependencies)             │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │                                                          │  │
│  │  For each step:                                          │  │
│  │                                                          │  │
│  │  ┌──────────────┐    ┌───────────────┐    ┌───────────┐ │  │
│  │  │ developer    │───▶│ Judge Agent   │───▶│ PASS?     │ │  │
│  │  │ Agent        │    │ (verify)      │    │           │ │  │
│  │  └──────────────┘    └───────────────┘    └───────────┘ │  │
│  │                                                │   │     │  │
│  │                                               Yes  No    │  │
│  │                                                │   │     │  │
│  │                                                ▼   ▼     │  │
│  │                                    ┌────────┐  Fix & │   │  │
│  │                                    │ Mark   │  Retry │   │  │
│  │                                    │Complete│  ↺     │   │  │
│  │                                    └────────┘        │   │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  Phase 3: Final Verification                                  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │                                                         │  │
│  │  ┌──────────────┐    ┌───────────────┐    ┌───────────┐ │  │
│  │  │ Judge Agent  │───▶│ All DoD       │───▶│ All PASS? │ │  │
│  │  │ (verify DoD) │    │ items checked │    │           │ │  │
│  │  └──────────────┘    └───────────────┘    └───────────┘ │  │
│  │                                                │   │    │  │
│  │                                               Yes  No   │  │
│  │                                                │   │    │  │
│  │                                                ▼   ▼    │  │
│  │                                                Fix &    │  │
│  │                                                Retry    │  │
│  │                                                ↺        │  │
│  │                                                         │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  Phase 4: Move Task to Done                                   │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ mv in-progress/$TASK → done/$TASK                       │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  Phase 5: Aggregate & Report                                  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ Collect all verification results                        │  │
│  │ → Calculate aggregate metrics                           │  │
│  │ → Generate final report                                 │  │
│  │ → Present to user                                       │  │
│  └─────────────────────────────────────────────────────────┘  │
│                                                               │
└──────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────┐
│                带验证的任务执行流程               │
├──────────────────────────────────────────────────────────────┤
│                                                               │
│  阶段0:选择任务                                         │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ 使用提供的名称,或从todo/自动选择(若仅1个任务) │  │
│  │ → 将任务从todo/移至in-progress/                  │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  阶段1:加载任务                                           │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ 读取$TASK_PATH → 解析步骤                           │  │
│  │ → 提取#### Verification规范 → 创建TodoWrite    │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  阶段2:执行步骤(遵循依赖关系)             │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │                                                          │  │
│  │  对每个步骤:                                          │  │
│  │                                                          │  │
│  │  ┌──────────────┐    ┌───────────────┐    ┌───────────┐ │  │
│  │  │ 开发代理    │───▶│ 评审代理   │───▶│ 通过?     │ │  │
│  │  │ Agent        │    │ (验证)      │    │           │ │  │
│  │  └──────────────┘    └───────────────┘    └───────────┘ │  │
│  │                                                │   │     │  │
│  │                                               是  否    │  │
│  │                                                │   │     │  │
│  │                                                ▼   ▼     │  │
│  │                                    ┌────────┐  修复 & │   │  │
│  │                                    │ 标记   │  重试 │   │  │
│  │                                    │完成│  ↺     │   │  │
│  │                                    └────────┘        │   │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  阶段3:最终验证                                  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │                                                         │  │
│  │  ┌──────────────┐    ┌───────────────┐    ┌───────────┐ │  │
│  │  │ 评审代理  │───▶│ 所有DoD       │───▶│ 全部通过? │ │  │
│  │  │ (验证DoD) │    │ 项已检查 │    │           │ │  │
│  │  └──────────────┘    └───────────────┘    └───────────┘ │  │
│  │                                                │   │    │  │
│  │                                               是  否   │  │
│  │                                                │   │    │  │
│  │                                                ▼   ▼    │  │
│  │                                                修复 &    │  │
│  │                                                重试    │  │
│  │                                                ↺        │  │
│  │                                                         │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  阶段4:将任务移至已完成目录                                   │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ mv in-progress/$TASK → done/$TASK                       │  │
│  └─────────────────────────────────────────────────────────┘  │
│                           │                                   │
│                           ▼                                   │
│  阶段5:聚合与报告                                  │
│  ┌─────────────────────────────────────────────────────────┐  │
│  │ 收集所有验证结果                        │  │
│  │ → 计算聚合指标                           │  │
│  │ → 生成最终报告                                 │  │
│  │ → 呈现给用户                                       │  │
│  └─────────────────────────────────────────────────────────┘  │
│                                                               │
└──────────────────────────────────────────────────────────────┘

Usage Examples

使用示例

Basic Usage

基础用法

bash
undefined
bash
undefined

Implement a specific task

实现特定任务

/implement add-validation.feature.md
/implement add-validation.feature.md

Auto-select task from todo/ or in-progress/ (if only 1 task)

从todo/或in-progress/自动选择任务(若仅1个任务)

/implement
/implement

Continue from last completed step

从上一个已完成步骤继续

/implement add-validation.feature.md --continue
/implement add-validation.feature.md --continue

Refine after user fixes project files (detects changes, re-verifies affected steps)

用户修复项目文件后进行优化(检测变更,重新验证受影响的步骤)

/implement add-validation.feature.md --refine
/implement add-validation.feature.md --refine

Human review after every step

每个步骤后进行人工评审

/implement add-validation.feature.md --human-in-the-loop
/implement add-validation.feature.md --human-in-the-loop

Human review after specific steps only

仅在特定步骤后进行人工评审

/implement add-validation.feature.md --human-in-the-loop 2,4,6
/implement add-validation.feature.md --human-in-the-loop 2,4,6

Higher quality threshold (stricter) - sets both standard and critical to 4.5

更高质量阈值(更严格)- 将标准和关键阈值都设为4.5

/implement add-validation.feature.md --target-quality 4.5
/implement add-validation.feature.md --target-quality 4.5

Different thresholds for standard (3.5) and critical (4.5) components

标准组件(3.5)和关键组件(4.5)使用不同阈值

/implement add-validation.feature.md --target-quality 3.5,4.5
/implement add-validation.feature.md --target-quality 3.5,4.5

Lower quality threshold for both (faster convergence)

降低两者的质量阈值(更快收敛)

/implement add-validation.feature.md --target-quality 3.5
/implement add-validation.feature.md --target-quality 3.5

Unlimited iterations (default is 3)

无限迭代(默认3次)

/implement add-validation.feature.md --max-iterations unlimited
/implement add-validation.feature.md --max-iterations unlimited

Skip all judge verifications (fast but no quality gates)

跳过所有评审验证(快速但无质量门)

/implement add-validation.feature.md --skip-judges
/implement add-validation.feature.md --skip-judges

Combined: continue with human review

组合使用:继续执行并进行人工评审

/implement add-validation.feature.md --continue --human-in-the-loop
undefined
/implement add-validation.feature.md --continue --human-in-the-loop
undefined

Example 1: Implementing a Feature

示例1:实现功能

User: /implement add-validation.feature.md

Phase 0: Task Selection...
Found task in: .specs/tasks/todo/add-validation.feature.md
Moving to in-progress: .specs/tasks/in-progress/add-validation.feature.md

Phase 1: Loading task...
Task: "Add form validation service"
Steps identified: 4 steps

Verification plan (from #### Verification sections):
- Step 1: No verification (directory creation)
- Step 2: Panel of 2 evaluations (ValidationService)
- Step 3: Per-item evaluations (3 validators)
- Step 4: Single evaluation (integration)

Phase 2: Executing...

Step 1: Launching sdd:developer agent...
  Agent: "Implement Step 1: Create Directory Structure..."
  Result: ✅ Directories created
  Verification: Skipped (simple operation)
  Status: ✅ COMPLETE

Step 2: Launching sdd:developer agent...
  Agent: "Implement Step 2: Create ValidationService..."
  Result: Files created, tests passing

  Launching 2 judge agents in parallel...
  Judge 1: 4.3/5.0 - PASS
  Judge 2: 4.5/5.0 - PASS
  Panel Result: 4.4/5.0 ✅
  Status: ✅ COMPLETE (Judge Confirmed)

[Continue for all steps...]

Phase 3: Final Verification...
Launching DoD verification agent...
  Agent: "Verify all Definition of Done items..."
  Result: 4/4 items PASS ✅

Phase 4: Moving task to done...
  mv .specs/tasks/in-progress/add-validation.feature.md .specs/tasks/done/

Phase 5: Final Report
Implementation complete.
- 4/4 steps completed
- 6 artifacts verified
- All passed first try
- Definition of Done: 4/4 PASS
- Task location: .specs/tasks/done/add-validation.feature.md ✅
用户: /implement add-validation.feature.md

阶段0:任务选择...
找到任务:.specs/tasks/todo/add-validation.feature.md
移至进行中目录:.specs/tasks/in-progress/add-validation.feature.md

阶段1:加载任务...
任务:"添加表单验证服务"
识别到步骤:4个步骤

验证计划(来自#### Verification部分):
- 步骤1:无验证(目录创建)
- 步骤2:2个评审小组(ValidationService)
- 步骤3:逐项评审(3个验证器)
- 步骤4:单个评审(集成)

阶段2:执行...

步骤1:启动sdd:developer代理...
  代理:"Implement Step 1: Create Directory Structure..."
  结果:✅ 目录已创建
  验证:跳过(简单操作)
  状态:✅ 完成

步骤2:启动sdd:developer代理...
  代理:"Implement Step 2: Create ValidationService..."
  结果:文件已创建,测试通过

  并行启动2个评审代理...
  评审1:4.3/5.0 - 通过
  评审2:4.5/5.0 - 通过
  小组结果:4.4/5.0 ✅
  状态:✅ 完成(评审确认)

[继续所有步骤...]

阶段3:最终验证...
启动DoD验证代理...
  代理:"Verify all Definition of Done items..."
  结果:4/4项通过 ✅

阶段4:将任务移至已完成目录...
  mv .specs/tasks/in-progress/add-validation.feature.md .specs/tasks/done/

阶段5:最终报告
执行完成。
- 4/4步骤已完成
- 6个工件已验证
- 所有步骤首次通过
- 完成定义:4/4通过
- 任务位置:.specs/tasks/done/add-validation.feature.md ✅

Example 2: Handling DoD Item Failure

示例2:处理DoD项失败

[All steps complete...]

Phase 3: Final Verification...
Launching DoD verification agent...
  Agent: "Verify all Definition of Done items..."
  Result: 3/4 items PASS, 1 FAIL ❌

Failing item:
- "Code follows ESLint rules": 356 errors found

Should I attempt to fix this issue? [Y/n]

User: Y

Launching sdd:developer agent...
  Agent: "Fix ESLint errors..."
  Result: Fixed 356 errors, 0 warnings ✅

Re-launching DoD verification agent...
  Agent: "Re-verify all Definition of Done items..."
  Result: 4/4 items PASS ✅

Phase 4: Moving task to done...
All DoD checkboxes marked complete ✅

Phase 5: Final Report
Task verification complete.
- All DoD items now PASS
- 1 issue fixed (ESLint errors)
- Task location: .specs/tasks/done/ ✅
[所有步骤完成...]

阶段3:最终验证...
启动DoD验证代理...
  代理:"Verify all Definition of Done items..."
  结果:3/4项通过,1项失败 ❌

失败项:
- "代码遵循ESLint规则": 发现356个错误

是否尝试修复此问题?[Y/n]

用户: Y

启动sdd:developer代理...
  代理:"Fix ESLint errors..."
  结果:修复356个错误,0个警告 ✅

重新启动DoD验证代理...
  代理:"Re-verify all Definition of Done items..."
  结果:4/4项通过 ✅

阶段4:将任务移至已完成目录...
所有DoD复选框标记为已完成 ✅

阶段5:最终报告
任务验证完成。
- 所有DoD项现在已通过
- 修复1个问题(ESLint错误)
- 任务位置:.specs/tasks/done/ ✅

Example 3: Handling Verification Failure

示例3:处理验证失败

Step 3 Implementation complete.
Launching judge agents...

Judge 1: 3.5/5.0 - FAIL (threshold 4.0)
Judge 2: 3.2/5.0 - FAIL

Issues found:
- Test Coverage: 2.5/5
  Evidence: "Missing edge case tests for empty input"
  Justification: "Success criteria requires edge case coverage"
- Pattern Adherence: 3.0/5
  Evidence: "Uses custom Result type instead of project standard"
  Justification: "Should use existing Result<T, E> from src/types"

Should I attempt to fix these issues? [Y/n]

User: Y

Launching sdd:developer agent with feedback...
Agent: "Fix Step 3: Address judge feedback..."
Result: Issues fixed, tests added

Re-launching judge agents...
Judge 1: 4.2/5.0 - PASS
Judge 2: 4.4/5.0 - PASS
Panel Result: 4.3/5.0 ✅
Status: ✅ COMPLETE (Judge Confirmed)
步骤3执行完成。
启动评审代理...

评审1:3.5/5.0 - 失败(阈值4.0)
评审2:3.2/5.0 - 失败

发现的问题:
- 测试覆盖率:2.5/5
  证据:"缺少空输入的边缘情况测试"
  理由:"成功标准要求覆盖边缘情况"
- 模式遵循度:3.0/5
  证据:"使用自定义Result类型而非项目标准类型"
  理由:"应使用src/types中的现有Result<T, E>"

是否尝试修复这些问题?[Y/n]

用户: Y

结合反馈启动sdd:developer代理...
代理:"Fix Step 3: Address judge feedback..."
结果:问题已修复,测试已添加

重新启动评审代理...
评审1:4.2/5.0 - 通过
评审2:4.4/5.0 - 通过
小组结果:4.3/5.0 ✅
状态:✅ 完成(评审确认)

Example 4: Continue from Interruption

示例4:从中断处继续

User: /implement add-validation.feature.md --continue

Phase 0: Parsing flags...
Configuration:
- Continue Mode: true
- Target Quality: 4.0/5.0 (default)

Scanning task file for completed steps...
Found: Step 1 [DONE], Step 2 [DONE]
Last completed: Step 2

Verifying Step 2 artifacts...
Launching judge agent for Step 3...
Judge: 4.3/5.0 - PASS ✅
Marking step as complete in task file...

Resuming from Step 4...

Step 3: Launching sdd:developer agent...
[continues normally from Step 4]
用户: /implement add-validation.feature.md --continue

阶段0:解析标志...
配置:
- 继续模式: true
- 目标质量: 4.0/5.0(默认)

扫描任务文件查找已完成步骤...
发现:步骤1 [已完成],步骤2 [已完成]
最后完成的步骤:步骤2

验证步骤2的工件...
启动步骤3的评审代理...
评审:4.3/5.0 - 通过 ✅
在任务文件中标记步骤为已完成...

从步骤4继续...

步骤3:启动sdd:developer代理...
[正常从步骤4继续]

Example 5: Refine After User Fixes

示例5:用户修复后优化

undefined
undefined

User manually fixed src/validation/validation.service.ts

用户手动修复了src/validation/validation.service.ts

(This file was created in Step 2: Create ValidationService)

(该文件在步骤2:Create ValidationService中创建)

User: /implement add-validation.feature.md --refine
Phase 0: Parsing flags... Configuration:
  • Refine Mode: true
Detecting changed project files... Changed files:
  • src/validation/validation.service.ts (modified)
Mapping files to implementation steps...
  • src/validation/validation.service.ts → Step 2 (Create ValidationService)
Earliest affected step: Step 2 Preserving: Step 1 (unchanged) Re-verifying from: Step 2 onwards
Step 2: Launching judge to verify rest of logic with user's changes... Judge: 4.3/5.0 - PASS ✅ Rest of logic is not affected, proceeding...
Step 3: Launching judge to verify... Judge: typescript error detected in file Launching imeplementation agent to fix the error, and align logic with user's changes...
Launching judge to verify fixed logic... Judge: 4.5/5.0 - PASS ✅
[continues verifying remaining steps...]
All steps verified with user's changes incorporated ✅
undefined
用户: /implement add-validation.feature.md --refine
阶段0:解析标志... 配置:
  • 优化模式: true
检测变更的项目文件... 变更文件:
  • src/validation/validation.service.ts(已修改)
将文件映射到执行步骤...
  • src/validation/validation.service.ts → 步骤2(Create ValidationService)
最早受影响的步骤:步骤2 保留:步骤1(未变更) 从步骤2开始重新验证
步骤2:启动评审代理验证结合用户变更后的其余逻辑... 评审:4.3/5.0 - 通过 ✅ 其余逻辑未受影响,继续执行...
步骤3:启动评审代理验证... 评审:检测到文件中的typescript错误 启动执行代理修复错误,并调整逻辑以匹配用户变更...
启动评审代理验证修复后的逻辑... 评审:4.5/5.0 - 通过 ✅
[继续验证剩余步骤...]
所有步骤已验证,用户变更已纳入 ✅
undefined

Example 6: Human-in-the-Loop Review

示例6:人工介入评审

User: /implement add-validation.feature.md --human-in-the-loop

Configuration:
- Human Checkpoints: All steps

Step 1: Launching sdd:developer agent...
Result: Directories created ✅

---
用户: /implement add-validation.feature.md --human-in-the-loop

配置:
- 人工检查点: 所有步骤

步骤1:启动sdd:developer代理...
结果:目录已创建 ✅

---

🔍 Human Review Checkpoint - Step 1

🔍 人工评审检查点 - 步骤1

Step: Create Directory Structure Judge Score: N/A (no verification) Status: ✅ COMPLETE
Artifacts Created:
  • src/validation/
  • src/validation/tests/
Action Required: Review the above artifacts and provide feedback or continue.
Continue? [Y/n/feedback]: Y

Step 2: Launching sdd:developer agent... Result: ValidationService created ✅
Launching judge agents... Judge 1: 4.5/5.0 - PASS Judge 2: 4.3/5.0 - PASS Panel Result: 4.4/5.0 ✅

步骤: 创建目录结构 评审分数: N/A(无验证) 状态: ✅ 完成
创建的工件:
  • src/validation/
  • src/validation/tests/
操作要求: 评审上述工件并提供反馈或继续执行。
是否继续?[Y/n/反馈]: Y

步骤2:启动sdd:developer代理... 结果:ValidationService已创建 ✅
启动评审代理... 评审1:4.5/5.0 - 通过 评审2:4.3/5.0 - 通过 小组结果:4.4/5.0 ✅

🔍 Human Review Checkpoint - Step 2

🔍 人工评审检查点 - 步骤2

Step: Create ValidationService Judge Score: 4.4/5.0 (threshold: 4.0) Status: ✅ PASS
Artifacts Created:
  • src/validation/validation.service.ts
  • src/validation/tests/validation.service.spec.ts
Judge Feedback:
  • All criteria met
  • Test coverage comprehensive
Action Required: Review the above artifacts and provide feedback or continue.
Continue? [Y/n/feedback]: The error messages could be more descriptive

Incorporating feedback: "error messages could be more descriptive" Re-launching sdd:developer agent with feedback... [iteration continues]
undefined
步骤: 创建ValidationService 评审分数: 4.4/5.0(阈值: 4.0) 状态: ✅ 通过
创建的工件:
  • src/validation/validation.service.ts
  • src/validation/tests/validation.service.spec.ts
评审反馈:
  • 所有标准已满足
  • 测试覆盖率全面
操作要求: 评审上述工件并提供反馈或继续执行。
是否继续?[Y/n/反馈]: 错误消息可以更具描述性

纳入反馈:"错误消息可以更具描述性" 结合反馈重新启动sdd:developer代理... [迭代继续]
undefined

Example 7: Strict Quality Threshold

示例7:严格质量阈值

User: /implement critical-api.feature.md --target-quality 4.5

Configuration:
- Target Quality: 4.5/5.0

Step 2: Implementing critical API endpoint...
Result: Endpoint created

Launching judge agents...
Judge 1: 4.2/5.0 - FAIL (threshold: 4.5)
Judge 2: 4.3/5.0 - FAIL

Iteration 1: Re-implementing with feedback...
[fixes applied]

Launching judge agents...
Judge 1: 4.4/5.0 - FAIL
Judge 2: 4.5/5.0 - PASS

Iteration 2: Re-implementing with feedback...
[more fixes applied]

Launching judge agents...
Judge 1: 4.6/5.0 - PASS
Judge 2: 4.5/5.0 - PASS
Panel Result: 4.55/5.0 ✅

Status: ✅ COMPLETE (passed on iteration 2)

用户: /implement critical-api.feature.md --target-quality 4.5

配置:
- 目标质量: 4.5/5.0

步骤2:实现关键API端点...
结果:端点已创建

启动评审代理...
评审1:4.2/5.0 - 失败(阈值: 4.5)
评审2:4.3/5.0 - 失败

迭代1:结合反馈重新实现...
[已应用修复]

启动评审代理...
评审1:4.4/5.0 - 失败
评审2:4.5/5.0 - 通过

迭代2:结合反馈重新实现...
[已应用更多修复]

启动评审代理...
评审1:4.6/5.0 - 通过
评审2:4.5/5.0 - 通过
小组结果:4.55/5.0 ✅

状态:✅ 完成(第2次迭代通过)

Error Handling

错误处理

Implementation Failure

执行失败

If sdd:developer agent reports failure:
  1. Present the failure details to user
  2. Ask clarification questions that could help resolve
  3. Launch sdd:developer agent again with clarifications
若sdd:developer代理报告失败:
  1. 向用户呈现失败详情
  2. 询问有助于解决问题的澄清问题
  3. 结合澄清信息重新启动sdd:developer代理

Judge Disagreement

评审意见分歧

If judges disagree significantly (difference > 2.0):
  1. Present both perspectives with evidence
  2. Ask user to resolve: "Judges disagree. Your decision?"
  3. Proceed based on user decision
若评审意见显著分歧(差值>2.0):
  1. 呈现双方观点及证据
  2. 请用户决策:"Judges disagree. Your decision?"
  3. 根据用户决策继续执行

Refine Mode: No Changes Detected

优化模式:未检测到变更

If
--refine
mode finds no git changes in the project:
  1. Report: "No project file changes detected since last commit."
  2. Suggest: "Make edits to project files first, then run --refine again."
  3. Alternatively: "Run without --refine to re-implement all steps."
--refine
模式未发现项目中的git变更:
  1. 报告:"No project file changes detected since last commit."
  2. 建议:"Make edits to project files first, then run --refine again."
  3. 备选方案:"Run without --refine to re-implement all steps."

Refine Mode: Changes Don't Map to Steps

优化模式:变更未映射到步骤

If
--refine
mode finds changed files but none map to implementation steps:
  1. Report: "Changed files don't match any implementation step's expected outputs."
  2. List the changed files detected
  3. Suggest: "Verify manually or run without --refine to re-verify all steps."

--refine
模式发现变更文件但未映射到任何执行步骤:
  1. 报告:"Changed files don't match any implementation step's expected outputs."
  2. 列出检测到的变更文件
  3. 建议:"Verify manually or run without --refine to re-verify all steps."

Checklist

检查清单

Before completing implementation:
完成执行前:

Configuration Handling

配置处理

  • Parsed all flags from
    $ARGUMENTS
    correctly
  • Used
    THRESHOLD_FOR_STANDARD_COMPONENTS
    for standard steps
  • Used
    THRESHOLD_FOR_CRITICAL_COMPONENTS
    for critical steps
  • Iterated until quality threshold met (or
    MAX_ITERATIONS
    reached, default 3)
  • Triggered human-in-the-loop checkpoints ONLY for steps in
    HUMAN_IN_THE_LOOP_STEPS
  • If
    SKIP_JUDGES
    is true: Skipped ALL judge validation
  • If
    CONTINUE_MODE
    is true: Verified last step and resumed correctly
  • If
    REFINE_MODE
    is true: Detected changed project files, mapped to steps, re-verified from earliest affected step
  • 正确解析
    $ARGUMENTS
    中的所有标志
  • 标准步骤使用
    THRESHOLD_FOR_STANDARD_COMPONENTS
  • 关键步骤使用
    THRESHOLD_FOR_CRITICAL_COMPONENTS
  • 迭代直至达到质量阈值(或达到
    MAX_ITERATIONS
    ,默认3次)
  • 仅对
    HUMAN_IN_THE_LOOP_STEPS
    中的步骤触发人工介入检查点
  • SKIP_JUDGES
    为true:跳过所有评审验证
  • CONTINUE_MODE
    为true:正确验证最后一步并恢复执行
  • REFINE_MODE
    为true:检测变更的项目文件,映射到步骤,从最早受影响的步骤开始重新验证

Context Protection (CRITICAL)

上下文保护(关键)

  • Read ONLY the task file (
    $TASK_PATH
    in
    .specs/tasks/in-progress/
    ) - no other files
  • Did NOT read implementation outputs, reference files, or artifacts
  • Used sub-agent reports for status - did NOT read files to "check"
  • 仅读取任务文件(
    .specs/tasks/in-progress/
    中的
    $TASK_PATH
    )- 不读取其他文件
  • 未读取执行输出、参考文件或工件
  • 使用子代理报告获取状态 - 未读取文件进行"检查"

Delegation

委托

  • ALL implementations done by
    sdd:developer
    agents via Task tool
  • ALL evaluations done by
    sdd:developer
    agents via Task tool
  • Did NOT perform any verification yourself
  • Did NOT skip any verification steps (unless
    SKIP_JUDGES
    is true)
  • 所有执行任务由
    sdd:developer
    代理通过任务工具完成
  • 所有评审由
    sdd:developer
    代理通过任务工具完成
  • 未自行执行任何验证
  • 未跳过任何验证步骤(除非
    SKIP_JUDGES
    为true)

Stage Tracking

阶段跟踪

  • Each step marked complete ONLY after judge PASS (or immediately if
    SKIP_JUDGES
    )
  • Task file updated after each step completion:
    • Step title marked with
      [DONE]
    • Subtasks marked with
      [X]
  • Todo list updated after each step completion
  • 仅在评审通过后标记步骤完成(或
    SKIP_JUDGES
    为true时立即标记)
  • 每个步骤完成后更新任务文件:
    • 步骤标题标记为
      [DONE]
    • 子任务标记为
      [X]
  • 每个步骤完成后更新待办列表

Execution Quality

执行质量

  • All steps executed in dependency order
  • Parallel steps launched simultaneously (not sequentially)
  • Each sdd:developer agent received focused prompt with exact step
  • All critical artifacts evaluated by judges (unless
    SKIP_JUDGES
    )
  • Panel voting used for high-stakes artifacts
  • Chain-of-thought requirement included in all evaluation prompts
  • Failed evaluations iterated until quality threshold met
  • Final report generated with judge confirmation status
  • User informed of any evaluator disagreements
  • 所有步骤按依赖顺序执行
  • 并行步骤同时启动(非顺序)
  • 每个sdd:developer代理收到聚焦于具体步骤的提示
  • 所有关键工件均由评审评估(除非
    SKIP_JUDGES
    为true)
  • 高风险工件使用小组投票
  • 所有评审提示包含链式思考要求
  • 失败的评审迭代直至达到质量阈值
  • 生成带评审确认状态的最终报告
  • 告知用户任何评审意见分歧

Human-in-the-Loop (if enabled)

人工介入(若启用)

  • Displayed checkpoint after each step in
    HUMAN_IN_THE_LOOP_STEPS
  • Incorporated user feedback into subsequent iterations/steps
  • Paused workflow when user requested
  • HUMAN_IN_THE_LOOP_STEPS
    中的每个步骤后显示检查点
  • 将用户反馈纳入后续迭代/步骤
  • 用户要求时暂停工作流

Final Verification and Completion

最终验证与完成

  • Definition of Done verification agent launched
  • All DoD items verified (PASS/FAIL/BLOCKED status)
  • Failing DoD items fixed via sdd:developer agents
  • Re-verification performed after fixes
  • Task moved from
    in-progress/
    to
    done/
    folder
  • All DoD checkboxes marked
    [X]
    in task file
  • Final verification report presented to user

  • 启动完成定义验证代理
  • 所有DoD项已验证(通过/失败/阻塞状态)
  • 通过sdd:developer代理修复失败的DoD项
  • 修复后重新验证
  • 将任务从
    in-progress/
    移至
    done/
    目录
  • 任务文件中所有DoD复选框标记为
    [X]
  • 向用户呈现最终验证报告

Appendix A: Verification Specifications Reference

附录A:验证规范参考

This appendix documents how verification is specified in task files. During Phase 2 (Execute Steps), you will reference these specifications to understand how to verify each artifact.
本附录记录任务文件中如何指定验证。在阶段2(执行步骤)中,你将参考这些规范了解如何验证每个工件。

How Task Files Define Verification

任务文件如何定义验证

Task files define verification requirements in
#### Verification
sections within each implementation step. These sections specify:
任务文件在每个执行步骤的
#### Verification
部分定义验证要求。这些部分指定:

Required Elements

必填元素

  1. Level: Verification complexity
    • None
      - Simple operations (mkdir, delete) - skip verification
    • Single Judge
      - Non-critical artifacts - 1 judge, threshold 4.0/5.0
    • Panel of 2 Judges
      - Critical artifacts - 2 judges, median voting, threshold 4.0/5.0 or 4.5/5.0
    • Per-Item Judges
      - Multiple similar items - 1 judge per item, parallel execution
  2. Artifact(s): Path(s) to file(s) being verified
    • Example:
      src/decision/decision.service.ts
      ,
      src/decision/tests/decision.service.spec.ts
  3. Threshold: Minimum passing score
    • Typically 4.0/5.0 for standard quality
    • Sometimes 4.5/5.0 for critical components
  4. Rubric: Weighted criteria table (see format below)
  5. Reference Pattern (Optional): Path to example of good implementation
    • Example:
      src/app.service.ts
      for NestJS service patterns
  1. 级别: 验证复杂度
    • None
      - 简单操作(mkdir、delete)- 跳过验证
    • Single Judge
      - 非关键工件 - 1个评审,阈值4.0/5.0
    • Panel of 2 Judges
      - 关键工件 - 2个评审,中位数投票,阈值4.0/5.0或4.5/5.0
    • Per-Item Judges
      - 多个相似项 - 每个项1个评审,并行执行
  2. 工件: 待验证文件的路径
    • 示例:
      src/decision/decision.service.ts
      src/decision/tests/decision.service.spec.ts
  3. 阈值: 最低通过分数
    • 标准质量通常为4.0/5.0
    • 关键组件有时为4.5/5.0
  4. 评分规则: 加权标准表格(见以下格式)
  5. 参考模式(可选): 良好实现示例的路径
    • 示例:
      src/app.service.ts
      (NestJS服务模式)

Rubric Format

评分规则格式

Rubrics in task files use this markdown table format:
markdown
| Criterion | Weight | Description |
|-----------|--------|-------------|
| [Name 1]  | 0.XX   | [What to evaluate] |
| [Name 2]  | 0.XX   | [What to evaluate] |
| ...       | ...    | ...         |
Requirements:
  • Weights MUST sum to 1.0
  • Each criterion has a clear, measurable description
  • Typically 3-6 criteria per rubric
Example:
markdown
| Criterion | Weight | Description |
|-----------|--------|-------------|
| Type Correctness | 0.35 | Types match specification exactly |
| API Contract Alignment | 0.25 | Aligns with documented API contract |
| Export Structure | 0.20 | Barrel exports correctly expose all types |
| Code Quality | 0.20 | Follows project TypeScript conventions |
任务文件中的评分规则使用以下markdown表格格式:
markdown
| 标准 | 权重 | 描述 |
|-----------|--------|-------------|
| [名称1]  | 0.XX   | [评估内容] |
| [名称2]  | 0.XX   | [评估内容] |
| ...       | ...    | ...         |
要求:
  • 权重总和必须为1.0
  • 每个标准有清晰、可衡量的描述
  • 每个评分规则通常包含3-6个标准
示例:
markdown
| 标准 | 权重 | 描述 |
|-----------|--------|-------------|
| 类型正确性 | 0.35 | 类型与规范完全匹配 |
| API契约一致性 | 0.25 | 与文档化的API契约一致 |
| 导出结构 | 0.20 | 桶导出正确暴露所有类型 |
| 代码质量 | 0.20 | 遵循项目TypeScript约定 |

Scoring Scale

评分量表

When judges evaluate artifacts, they use this 5-point scale for each criterion:
  • 1 (Poor): Does not meet requirements
    • Missing essential elements
    • Fundamental misunderstanding of requirements
  • 2 (Below Average): Multiple issues, partially meets requirements
    • Some correct elements, but significant gaps
    • Would require substantial rework
  • 3 (Adequate): Meets basic requirements
    • Functional but minimal
    • Room for improvement in quality or completeness
  • 4 (Good): Meets all requirements, few minor issues
    • Solid implementation
    • Minor polish could improve it
  • 5 (Excellent): Exceeds requirements
    • Exceptional quality
    • Goes beyond what was asked
    • Could serve as reference implementation
评审评估工件时,对每个标准使用以下5分制量表:
  • 1(差): 未满足要求
    • 缺少基本元素
    • 对要求存在根本性误解
  • 2(低于平均): 存在多个问题,部分满足要求
    • 有一些正确元素,但存在重大差距
    • 需要大量返工
  • 3(合格): 满足基本要求
    • 可用但仅达最低标准
    • 在质量或完整性方面有改进空间
  • 4(良好): 满足所有要求,仅有少量次要问题
    • 可靠的实现
    • 少量优化即可改进
  • 5(优秀): 超出要求
    • 质量卓越
    • 超出要求范围
    • 可作为参考实现

Using Verification Specs During Execution

执行期间使用验证规范

During Phase 2 (Execute Steps):
  1. After a sdd:developer agent completes implementation
  2. Read the step's
    #### Verification
    section in the task file
  3. Extract: Level, Artifact paths, Threshold, Rubric, Reference Pattern
  4. Launch appropriate judge agent(s) based on Level
  5. Provide judges with: Artifact path, Rubric, Threshold, Reference Pattern
  6. Aggregate judge results and determine PASS/FAIL
  7. If FAIL, launch sdd:developer agent to fix issues and re-verify
Example Verification Section in Task File:
markdown
undefined
阶段2(执行步骤)期间:
  1. sdd:developer代理完成执行后
  2. 读取任务文件中步骤的
    #### Verification
    部分
  3. 提取:级别、工件路径、阈值、评分规则、参考模式
  4. 根据级别启动相应的评审代理
  5. 向评审提供:工件路径、评分规则、阈值、参考模式
  6. 聚合评审结果并确定通过/失败
  7. 若失败,启动sdd:developer代理修复问题并重新验证
任务文件中的示例验证部分:
markdown
undefined

Verification

Verification

Level: Panel of 2 Judges with Aggregated Voting Artifact:
src/decision/decision.service.ts
,
src/decision/tests/decision.service.spec.ts
Rubric:
CriterionWeightDescription
Routing Logic0.20Correctly routes by customerType
Drip Feed Implementation0.252% random approval for rejected New customers only
Response Formatting0.20Correct decision outcome, triggeredRules preserved, ISO 8601 timestamp
Testability0.15Injectable randomGenerator enables deterministic testing
Test Coverage0.20Unit tests cover approval, rejection, drip feed, routing, timestamp
Reference Pattern: NestJS service patterns, ZenEngineService API

This specification tells you to:

- Launch 2 judge agents in parallel
- Have them evaluate both service and test files
- Use the 5-criterion rubric with specified weights
- Do not pass threshold to judges, only use it to compare it with the average score of the judges
- Reference existing NestJS patterns for comparison
Level: Panel of 2 Judges with Aggregated Voting Artifact:
src/decision/decision.service.ts
,
src/decision/tests/decision.service.spec.ts
Rubric:
CriterionWeightDescription
路由逻辑0.20按customerType正确路由
drip Feed实现0.25仅对被拒绝的新客户随机批准2%
响应格式0.20正确的决策结果,保留triggeredRules,ISO 8601时间戳
可测试性0.15可注入的randomGenerator支持确定性测试
测试覆盖率0.20单元测试覆盖批准、拒绝、drip feed、路由、时间戳
Reference Pattern: NestJS服务模式,ZenEngineService API

此规范要求你:

- 并行启动2个评审代理
- 让他们评估服务和测试文件
- 使用包含5个标准的评分规则及指定权重
- 不要将阈值传递给评审,仅将其与评审的平均分数比较
- 参考现有NestJS模式进行比较