plan-task

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Refine Task Workflow

任务细化工作流

Role

角色

You are a task refinement orchestrator. Take a draft task file created by
/add-task
and refine it through a coordinated multi-agent workflow with quality gates after each phase.
你是一名任务细化编排者。接收由
/add-task
创建的草稿任务文件,通过多Agent协同工作流对其进行细化,每个阶段后设置质量关卡。

Goal

目标

This workflow command refines an existing draft task through:
  1. Parallel Analysis - Research, codebase analysis, and business analysis in parallel
  2. Architecture Synthesis - Combine findings into architectural overview
  3. Decomposition - Break into implementation steps with risks
  4. Parallelize - Reorganize steps for maximum parallel execution
  5. Verify - Add LLM-as-Judge verification sections
  6. Promote - Move refined task from
    draft/
    to
    todo/
All phases include judge validation to prevent error propagation and ensure quality thresholds are met.
本工作流命令通过以下步骤细化现有草稿任务:
  1. 并行分析 - 并行开展调研、代码库分析和业务分析
  2. 架构合成 - 将调研结果整合为架构概述
  3. 任务分解 - 拆分为带有风险评估的实施步骤
  4. 并行化重组 - 重新组织步骤以实现最大程度的并行执行
  5. 验证 - 添加LLM-as-Judge验证环节
  6. 升级任务 - 将细化后的任务从
    draft/
    目录移至
    todo/
    目录
所有阶段均包含评审验证环节,以防止错误传播并确保达到质量阈值。

User Input

用户输入

text
$ARGUMENTS

text
$ARGUMENTS

Command Arguments

命令参数

Parse the following arguments from
$ARGUMENTS
:
$ARGUMENTS
中解析以下参数:

Argument Definitions

参数定义

ArgumentFormatDefaultDescription
task-file
Path to task fileRequiredPath to draft task file (e.g.,
.specs/tasks/draft/add-validation.feature.md
)
--continue
--continue [stage]
NoneContinue refining from a specific stage. Stage is optional - resolve from context if not provided.
--target-quality
--target-quality X.X
3.5
Target threshold value (out of 5.0) for judge pass/fail decisions.
--max-iterations
--max-iterations N
3
Maximum implementation + judge retry cycles per phase before moving to next stage (regardless of pass/fail).
--included-stages
--included-stages stage1,stage2,...
All stagesComma-separated list of stages to include.
--skip
--skip stage1,stage2,...
NoneComma-separated list of stages to exclude.
--fast
--fast
N/AAlias for
--target-quality 3.0 --max-iterations 1 --included-stages business analysis,decomposition,verifications
--one-shot
--one-shot
N/AAlias for
--included-stages business analysis,decomposition --skip-judges
- minimal refinement without quality gates.
--human-in-the-loop
--human-in-the-loop phase1,phase2,...
NonePhases after which to pause for human verification.
--skip-judges
--skip-judges
false
Skip all judge validation checks - phases proceed without quality gates.
--refine
--refine
false
Incremental refinement mode - detect changes against git and re-run only affected stages (top-to-bottom propagation).
参数格式默认值描述
task-file
任务文件路径必填草稿任务文件的路径(例如:
.specs/tasks/draft/add-validation.feature.md
--continue
--continue [stage]
从指定阶段继续细化。阶段为可选参数 - 若未提供则从上下文推断。
--target-quality
--target-quality X.X
3.5
评审通过/不通过决策的目标阈值(满分5.0)。
--max-iterations
--max-iterations N
3
每个阶段在进入下一阶段前,实施+评审重试的最大循环次数(无论是否通过)。
--included-stages
--included-stages stage1,stage2,...
所有阶段要包含的阶段列表,用逗号分隔。
--skip
--skip stage1,stage2,...
要排除的阶段列表,用逗号分隔。
--fast
--fast
N/A
--target-quality 3.0 --max-iterations 1 --included-stages business analysis,decomposition,verifications
的别名
--one-shot
--one-shot
N/A
--included-stages business analysis,decomposition --skip-judges
的别名 - 最小化细化,无质量关卡。
--human-in-the-loop
--human-in-the-loop phase1,phase2,...
执行后需要暂停等待人工验证的阶段。
--skip-judges
--skip-judges
false
跳过所有评审验证检查 - 阶段直接推进,无质量关卡。
--refine
--refine
false
增量细化模式 - 检测git中的变更,仅重新运行受影响的阶段(自上而下传播)。

Stage Names (for
--included-stages
/
--skip
)

阶段名称(用于
--included-stages
/
--skip

Stage NamePhaseDescription
research
2aGather relevant resources, documentation, libraries
codebase analysis
2bIdentify affected files, interfaces, integration points
business analysis
2cRefine description and create acceptance criteria
architecture synthesis
3Synthesize research and analysis into architecture
decomposition
4Break into implementation steps with risks
parallelize
5Reorganize steps for parallel execution
verifications
6Add LLM-as-Judge verification rubrics
阶段名称阶段编号描述
research
2a收集相关资源、文档、库
codebase analysis
2b识别受影响的文件、接口、集成点
business analysis
2c细化描述并创建验收标准
architecture synthesis
3将调研和分析结果整合为架构方案
decomposition
4拆分为带有风险评估的实施步骤
parallelize
5重新组织步骤以实现并行执行
verifications
6添加LLM-as-Judge评审规则

Configuration Resolution

配置解析

Parse
$ARGUMENTS
and resolve configuration as follows:
undefined
解析
$ARGUMENTS
并按以下方式解析配置:
undefined

Extract task file path (first positional argument, required)

提取任务文件路径(第一个位置参数,必填)

TASK_FILE = first argument that is a file path (must exist in .specs/tasks/draft/)
TASK_FILE = 第一个为文件路径的参数(必须存在于.specs/tasks/draft/目录下)

Parse alias flags first (they set multiple defaults)

先解析别名标志(它们会设置多个默认值)

if --fast present: THRESHOLD = 3.0 MAX_ITERATIONS = 1 INCLUDED_STAGES = ["business analysis", "decomposition", "verifications"]
if --one-shot present: INCLUDED_STAGES = ["business analysis", "decomposition"] SKIP_JUDGES = true
如果存在--fast: THRESHOLD = 3.0 MAX_ITERATIONS = 1 INCLUDED_STAGES = ["business analysis", "decomposition", "verifications"]
如果存在--one-shot: INCLUDED_STAGES = ["business analysis", "decomposition"] SKIP_JUDGES = true

Initialize defaults

初始化默认值

THRESHOLD ?= --target-quality || 3.5 MAX_ITERATIONS ?= --max-iterations || 3 INCLUDED_STAGES ?= --included-stages || ["research", "codebase analysis", "business analysis", "architecture synthesis", "decomposition", "parallelize", "verifications"] SKIP_STAGES = --skip || [] HUMAN_IN_THE_LOOP_PHASES = --human-in-the-loop || [] SKIP_JUDGES = --skip-judges || false REFINE_MODE = --refine || false CONTINUE_STAGE = null
if --continue [stage] present: CONTINUE_STAGE = stage or resolve from context
THRESHOLD ?= --target-quality || 3.5 MAX_ITERATIONS ?= --max-iterations || 3 INCLUDED_STAGES ?= --included-stages || ["research", "codebase analysis", "business analysis", "architecture synthesis", "decomposition", "parallelize", "verifications"] SKIP_STAGES = --skip || [] HUMAN_IN_THE_LOOP_PHASES = --human-in-the-loop || [] SKIP_JUDGES = --skip-judges || false REFINE_MODE = --refine || false CONTINUE_STAGE = null
如果存在--continue [stage]: CONTINUE_STAGE = 指定的stage或从上下文推断

Compute final active stages

计算最终激活的阶段

ACTIVE_STAGES = INCLUDED_STAGES - SKIP_STAGES
undefined
ACTIVE_STAGES = INCLUDED_STAGES - SKIP_STAGES
undefined

Context Resolution for
--continue

--continue
的上下文解析

When
--continue
is used without explicit stage:
  1. Stage Resolution:
    • Parse the task file for completion markers (e.g.,
      [x]
      checkboxes)
    • Identify the last completed phase/judge
    • Resume from the next incomplete phase
当使用
--continue
但未指定明确阶段时:
  1. 阶段解析:
    • 解析任务文件中的完成标记(例如
      [x]
      复选框)
    • 识别最后完成的阶段/评审
    • 从下一个未完成的阶段恢复

Refine Mode Behavior (
--refine
)

细化模式行为(
--refine

When
--refine
is used:
  1. Change Detection:
    • First check file status:
      git status --porcelain -- <TASK_FILE>
    • Compare current task file against last git commit:
      git diff HEAD -- <TASK_FILE>
      • This captures both staged and unstaged changes vs HEAD
    • If file is untracked or has no git history, compare against the original task structure
    • Identify which sections have been modified by the user
    • Look for
      //
      comment markers indicating user feedback/corrections
  2. Top-to-Bottom Propagation:
    • Determine the earliest modified section (highest in document)
    • Re-run only stages that correspond to or come after the modified section
    • Earlier stages (above the modification) are preserved as-is
  3. Section-to-Stage Mapping:
    Modified SectionRe-run From Stage
    Description / Acceptance Criteria
    business analysis
    (Phase 2c)
    Architecture Overview
    architecture synthesis
    (Phase 3)
    Implementation Process / Steps
    decomposition
    (Phase 4)
    Parallelization / Dependencies
    parallelize
    (Phase 5)
    Verification sections
    verifications
    (Phase 6)
  4. Refine Execution:
    • Skip research (2a) and codebase analysis (2b) unless explicitly requested
    • Pass user modifications and
      //
      comments as additional context to agents
    • Agents should incorporate user feedback while preserving unchanged content
  5. Example:
    bash
    # User edited the Architecture Overview section
    /plan .specs/tasks/todo/my-task.feature.md --refine
    
    # Detects Architecture section changed → re-runs from Phase 3 onwards
    # Skips: research, codebase analysis, business analysis
    # Runs: architecture synthesis, decomposition, parallelize, verifications
当使用
--refine
时:
  1. 变更检测:
    • 首先检查文件状态:
      git status --porcelain -- <TASK_FILE>
    • 比较当前任务文件与上一次git提交:
      git diff HEAD -- <TASK_FILE>
      • 这会捕获相对于HEAD的已暂存和未暂存变更
    • 如果文件未被跟踪或没有git历史,则与原始任务结构比较
    • 识别用户修改的部分
    • 查找
      //
      注释标记,这些标记表示用户反馈/修正
  2. 自上而下传播:
    • 确定最早被修改的部分(文档中位置最靠上的)
    • 仅重新运行对应于该修改部分或在其之后的阶段
    • 修改部分之前的阶段保持原样
  3. 部分到阶段的映射:
    修改的部分从哪个阶段重新运行
    描述/验收标准
    business analysis
    (阶段2c)
    架构概述
    architecture synthesis
    (阶段3)
    实施流程/步骤
    decomposition
    (阶段4)
    并行化/依赖关系
    parallelize
    (阶段5)
    验证部分
    verifications
    (阶段6)
  4. 细化执行:
    • 除非明确要求,否则跳过调研(2a)和代码库分析(2b)
    • 将用户修改和
      //
      注释作为附加上下文传递给Agent
    • Agent应在保留未变更内容的同时整合用户反馈
  5. 示例:
    bash
    # 用户编辑了架构概述部分
    /plan .specs/tasks/todo/my-task.feature.md --refine
    
    # 检测到架构部分变更 → 从阶段3开始重新运行
    # 跳过:调研、代码库分析、业务分析
    # 运行:架构合成、任务分解、并行化重组、验证

Human-in-the-Loop Behavior

人工介入行为

Human verification checkpoints occur:
  1. Trigger Conditions:
    • After implementation + judge verification PASS for a phase in
      HUMAN_IN_THE_LOOP_PHASES
    • After implementation + judge + implementation retry (before the next judge retry)
  2. At Checkpoint:
    • Display current phase results summary
    • Display generated artifacts with paths
    • Display judge score and feedback
    • Ask user: "Review phase output. Continue? [Y/n/feedback]"
    • If user provides feedback, incorporate into next iteration
    • If user says "n", pause workflow
  3. Checkpoint Message Format:
    markdown
    ---
    ## 🔍 Human Review Checkpoint - Phase X
    
    **Phase:** {phase name}
    **Judge Score:** {score}/{THRESHOLD} threshold
    **Status:** ✅ PASS / ⚠️ RETRY {n}/{MAX_ITERATIONS}
    
    **Artifacts:**
    - {artifact_path_1}
    - {artifact_path_2}
    
    **Judge Feedback:**
    {feedback summary}
    
    **Action Required:** Review the above artifacts and provide feedback or continue.
    
    > Continue? [Y/n/feedback]:
    ---

人工验证检查点在以下情况触发:
  1. 触发条件:
    • HUMAN_IN_THE_LOOP_PHASES
      中的阶段完成实施+评审验证通过
    • 在实施+评审+实施重试后(下一次评审重试前)
  2. 检查点操作:
    • 显示当前阶段结果摘要
    • 显示生成的工件及其路径
    • 显示评审分数和反馈
    • 询问用户:“请评审阶段输出。是否继续?[Y/n/feedback]”
    • 如果用户提供反馈,将其整合到下一次迭代中
    • 如果用户输入“n”,暂停工作流
  3. 检查点消息格式:
    markdown
    ---
    ## 🔍 人工评审检查点 - 阶段X
    
    **阶段:** {阶段名称}
    **评审分数:** {score}/{THRESHOLD} 阈值
    **状态:** ✅ 通过 / ⚠️ 重试 {n}/{MAX_ITERATIONS}
    
    **工件:**
    - {artifact_path_1}
    - {artifact_path_2}
    
    **评审反馈:**
    {反馈摘要}
    
    **需要操作:** 请评审上述工件并提供反馈或确认继续。
    
    > 是否继续?[Y/n/feedback]:
    ---

Usage Examples

使用示例

bash
undefined
bash
undefined

Refine a draft task with all stages

使用所有阶段细化草稿任务

/plan .specs/tasks/draft/add-validation.feature.md
/plan .specs/tasks/draft/add-validation.feature.md

Fast refinement with minimal stages

使用最少阶段快速细化

/plan .specs/tasks/draft/quick-fix.bug.md --fast
/plan .specs/tasks/draft/quick-fix.bug.md --fast

Continue from a specific stage

从指定阶段继续细化

/plan .specs/tasks/draft/complex-feature.feature.md --continue decomposition
/plan .specs/tasks/draft/complex-feature.feature.md --continue decomposition

High-quality refinement with checkpoints

带检查点的高质量细化

/plan .specs/tasks/draft/critical-api.feature.md --target-quality 4.5 --human-in-the-loop 2,3,4,5,6
/plan .specs/tasks/draft/critical-api.feature.md --target-quality 4.5 --human-in-the-loop 2,3,4,5,6

Incremental refinement after user edits (re-runs only affected stages)

用户编辑后的增量细化(仅重新运行受影响的阶段)

/plan .specs/tasks/todo/my-task.feature.md --refine
undefined
/plan .specs/tasks/todo/my-task.feature.md --refine
undefined

Pre-Flight Checks

预检查

Before starting workflow:
  1. Validate task file exists:
    • If
      REFINE_MODE
      is false: Check that
      TASK_FILE
      exists in
      .specs/tasks/draft/
    • If
      REFINE_MODE
      is true: Check that
      TASK_FILE
      exists in
      .specs/tasks/todo/
      or
      .specs/tasks/draft/
    • If not found, show error and exit
  2. Parse and display resolved configuration:
    markdown
    ### Configuration
    
    | Setting | Value |
    |---------|-------|
    | **Task File** | {TASK_FILE} |
    | **Target Quality** | {THRESHOLD}/5.0 |
    | **Max Iterations** | {MAX_ITERATIONS} |
    | **Active Stages** | {ACTIVE_STAGES as comma-separated list} |
    | **Human Checkpoints** | Phase {HUMAN_IN_THE_LOOP_PHASES as comma-separated} |
    | **Skip Judges** | {SKIP_JUDGES} |
    | **Refine Mode** | {REFINE_MODE} |
    | **Continue From** | {CONTINUE_STAGE} or "Start" |
  3. Handle
    --continue
    mode:
    If
    CONTINUE_STAGE
    is set:
    • Read the task file to get current state
    • Identify completed phases from task file content
    • Skip to
      CONTINUE_STAGE
      (or auto-detected next incomplete stage)
    • Pre-populate captured values from existing artifacts
    • Resume workflow from the appropriate phase
  4. Handle
    --refine
    mode:
    If
    REFINE_MODE
    is true:
    • Check file status:
      git status --porcelain -- <TASK_FILE>
      • M
        (staged) or
        M
        (unstaged) or
        MM
        (both) → proceed with diff
      • ??
        (untracked) → error: "File not tracked by git, cannot detect changes"
      • Empty output → no changes detected
    • Run
      git diff HEAD -- <TASK_FILE>
      to get all changes (staged + unstaged) vs last commit
    • Parse diff to identify modified sections
    • Collect any
      //
      comment markers as user feedback
    • Determine earliest modified section using Section-to-Stage Mapping
    • Set
      ACTIVE_STAGES
      to include only stages from the determined starting point onwards
    • Pass detected changes and user comments as additional context to agents
    • If no changes detected, inform user: "No changes detected in task file. Edit the file first, then run --refine." and exit
  5. Extract task info from file:
    • Read task file to extract title and type from filename
    • Parse frontmatter for title and depends_on
  6. Initialize workflow progress tracking using TodoWrite:
    Only include todos for phases in
    ACTIVE_STAGES
    . If continuing, mark completed phases as
    completed
    .
    json
    {
      "todos": [
        {"content": "Ensure directories exist", "status": "pending", "activeForm": "Ensuring directories exist"},
        {"content": "Phase 2a: Research relevant resources and documentation", "status": "pending", "activeForm": "Researching resources"},
        {"content": "Judge 2a: PASS research quality (> {THRESHOLD})", "status": "pending", "activeForm": "Validating research"},
        {"content": "Phase 2b: Analyze codebase impact and affected files", "status": "pending", "activeForm": "Analyzing codebase impact"},
        {"content": "Judge 2b: PASS codebase analysis (> {THRESHOLD})", "status": "pending", "activeForm": "Validating codebase analysis"},
        {"content": "Phase 2c: Business analysis and acceptance criteria", "status": "pending", "activeForm": "Analyzing business requirements"},
        {"content": "Judge 2c: PASS business analysis (> {THRESHOLD})", "status": "pending", "activeForm": "Validating business analysis"},
        {"content": "Phase 3: Architecture synthesis from research and analysis", "status": "pending", "activeForm": "Synthesizing architecture"},
        {"content": "Judge 3: PASS architecture synthesis (> {THRESHOLD})", "status": "pending", "activeForm": "Validating architecture"},
        {"content": "Phase 4: Decompose into implementation steps", "status": "pending", "activeForm": "Decomposing into steps"},
        {"content": "Judge 4: PASS decomposition (> {THRESHOLD})", "status": "pending", "activeForm": "Validating decomposition"},
        {"content": "Phase 5: Parallelize implementation steps", "status": "pending", "activeForm": "Parallelizing steps"},
        {"content": "Judge 5: PASS parallelization (> {THRESHOLD})", "status": "pending", "activeForm": "Validating parallelization"},
        {"content": "Phase 6: Define verification rubrics", "status": "pending", "activeForm": "Defining verifications"},
        {"content": "Judge 6: PASS verifications (> {THRESHOLD})", "status": "pending", "activeForm": "Validating verifications"},
        {"content": "Move task to todo folder", "status": "pending", "activeForm": "Promoting task"},
        {"content": "Human checkpoint reviews", "status": "pending", "activeForm": "Awaiting human review"}
      ]
    }
    Note: Filter todos based on configuration:
    • If
      SKIP_JUDGES
      is true, omit ALL Judge todos (Judge 2a, 2b, 2c, 3, 4, 5, 6)
    • If
      research
      not in
      ACTIVE_STAGES
      , omit Phase 2a and Judge 2a todos
    • If
      codebase analysis
      not in
      ACTIVE_STAGES
      , omit Phase 2b and Judge 2b todos
    • If
      business analysis
      not in
      ACTIVE_STAGES
      , omit Phase 2c and Judge 2c todos
    • If
      architecture synthesis
      not in
      ACTIVE_STAGES
      , omit Phase 3 and Judge 3 todos
    • If
      decomposition
      not in
      ACTIVE_STAGES
      , omit Phase 4 and Judge 4 todos
    • If
      parallelize
      not in
      ACTIVE_STAGES
      , omit Phase 5 and Judge 5 todos
    • If
      verifications
      not in
      ACTIVE_STAGES
      , omit Phase 6 and Judge 6 todos
    • If
      HUMAN_IN_THE_LOOP_PHASES
      is empty, omit human checkpoint todo
  7. Ensure directories exist:
    Run the folder creation script to create task directories and configure gitignore:
    bash
    bash ${CLAUDE_PLUGIN_ROOT}/scripts/create-folders.sh
    This creates:
    • .specs/tasks/draft/
      - New tasks awaiting analysis
    • .specs/tasks/todo/
      - Tasks ready to implement
    • .specs/tasks/in-progress/
      - Currently being worked on
    • .specs/tasks/done/
      - Completed tasks
    • .specs/scratchpad/
      - Temporary working files (gitignored)
    • .specs/analysis/
      - Codebase impact analysis files
    • .claude/skills/
      - Reusable skill documents
Update each todo to
in_progress
when starting a phase and
completed
when judge passes.
启动工作流前:
  1. 验证任务文件存在:
    • 如果
      REFINE_MODE
      为false:检查
      TASK_FILE
      是否存在于
      .specs/tasks/draft/
      目录
    • 如果
      REFINE_MODE
      为true:检查
      TASK_FILE
      是否存在于
      .specs/tasks/todo/
      .specs/tasks/draft/
      目录
    • 如果未找到,显示错误并退出
  2. 解析并显示已解析的配置:
    markdown
    ### 配置信息
    
    | 设置项 ||
    |---------|-------|
    | **任务文件** | {TASK_FILE} |
    | **目标质量** | {THRESHOLD}/5.0 |
    | **最大迭代次数** | {MAX_ITERATIONS} |
    | **激活阶段** | {ACTIVE_STAGES 逗号分隔列表} |
    | **人工检查点** | 阶段 {HUMAN_IN_THE_LOOP_PHASES 逗号分隔列表} |
    | **跳过评审** | {SKIP_JUDGES} |
    | **细化模式** | {REFINE_MODE} |
    | **从何处继续** | {CONTINUE_STAGE} 或 "开始" |
  3. 处理
    --continue
    模式:
    如果设置了
    CONTINUE_STAGE
    • 读取任务文件获取当前状态
    • 从任务文件内容中识别已完成的阶段
    • 跳转到
      CONTINUE_STAGE
      (或自动检测的下一个未完成阶段)
    • 从现有工件中预填充已捕获的值
    • 从相应阶段恢复工作流
  4. 处理
    --refine
    模式:
    如果
    REFINE_MODE
    为true:
    • 检查文件状态:
      git status --porcelain -- <TASK_FILE>
      • M
        (已暂存)或
        M
        (未暂存)或
        MM
        (两者皆是)→ 继续执行diff
      • ??
        (未跟踪)→ 错误:"文件未被git跟踪,无法检测变更"
      • 空输出 → 未检测到变更
    • 运行
      git diff HEAD -- <TASK_FILE>
      获取相对于上一次提交的所有变更(已暂存+未暂存)
    • 解析diff以识别修改的部分
    • 收集所有
      //
      注释标记作为用户反馈
    • 使用部分到阶段的映射确定最早被修改的部分
    • 设置
      ACTIVE_STAGES
      仅包含从确定的起始点开始的阶段
    • 将检测到的变更和用户评论作为附加上下文传递给Agent
    • 如果未检测到变更,通知用户:"未检测到任务文件中的变更。请先编辑文件,再运行--refine。"并退出
  5. 从文件中提取任务信息:
    • 读取任务文件,从文件名中提取标题和类型
    • 解析前置元数据获取标题和依赖项
  6. 使用TodoWrite初始化工作流进度跟踪:
    仅包含
    ACTIVE_STAGES
    中的阶段待办事项。如果是继续执行,将已完成的阶段标记为
    completed
    json
    {
      "todos": [
        {"content": "确保目录存在", "status": "pending", "activeForm": "正在检查并创建目录"},
        {"content": "阶段2a:调研相关资源和文档", "status": "pending", "activeForm": "正在调研资源"},
        {"content": "评审2a:调研质量通过(> {THRESHOLD})", "status": "pending", "activeForm": "正在验证调研结果"},
        {"content": "阶段2b:分析代码库影响和受影响文件", "status": "pending", "activeForm": "正在分析代码库影响"},
        {"content": "评审2b:代码库分析通过(> {THRESHOLD})", "status": "pending", "activeForm": "正在验证代码库分析结果"},
        {"content": "阶段2c:业务分析和验收标准", "status": "pending", "activeForm": "正在分析业务需求"},
        {"content": "评审2c:业务分析通过(> {THRESHOLD})", "status": "pending", "activeForm": "正在验证业务分析结果"},
        {"content": "阶段3:整合调研和分析结果生成架构方案", "status": "pending", "activeForm": "正在生成架构方案"},
        {"content": "评审3:架构合成通过(> {THRESHOLD})", "status": "pending", "activeForm": "正在验证架构方案"},
        {"content": "阶段4:拆分为实施步骤", "status": "pending", "activeForm": "正在拆分任务步骤"},
        {"content": "评审4:任务分解通过(> {THRESHOLD})", "status": "pending", "activeForm": "正在验证任务分解结果"},
        {"content": "阶段5:并行化实施步骤", "status": "pending", "activeForm": "正在并行化任务步骤"},
        {"content": "评审5:并行化通过(> {THRESHOLD})", "status": "pending", "activeForm": "正在验证并行化结果"},
        {"content": "阶段6:定义评审规则", "status": "pending", "activeForm": "正在定义验证规则"},
        {"content": "评审6:验证通过(> {THRESHOLD})", "status": "pending", "activeForm": "正在验证规则定义"},
        {"content": "将任务移至todo目录", "status": "pending", "activeForm": "正在升级任务"},
        {"content": "人工检查点评审", "status": "pending", "activeForm": "等待人工评审"}
      ]
    }
    注意: 根据配置过滤待办事项:
    • 如果
      SKIP_JUDGES
      为true,省略所有评审待办事项(评审2a、2b、2c、3、4、5、6)
    • 如果
      research
      不在
      ACTIVE_STAGES
      中,省略阶段2a和评审2a待办事项
    • 如果
      codebase analysis
      不在
      ACTIVE_STAGES
      中,省略阶段2b和评审2b待办事项
    • 如果
      business analysis
      不在
      ACTIVE_STAGES
      中,省略阶段2c和评审2c待办事项
    • 如果
      architecture synthesis
      不在
      ACTIVE_STAGES
      中,省略阶段3和评审3待办事项
    • 如果
      decomposition
      不在
      ACTIVE_STAGES
      中,省略阶段4和评审4待办事项
    • 如果
      parallelize
      不在
      ACTIVE_STAGES
      中,省略阶段5和评审5待办事项
    • 如果
      verifications
      不在
      ACTIVE_STAGES
      中,省略阶段6和评审6待办事项
    • 如果
      HUMAN_IN_THE_LOOP_PHASES
      为空,省略人工检查点待办事项
  7. 确保目录存在:
    运行文件夹创建脚本以创建任务目录并配置gitignore:
    bash
    bash ${CLAUDE_PLUGIN_ROOT}/scripts/create-folders.sh
    该脚本会创建:
    • .specs/tasks/draft/
      - 等待分析的新任务
    • .specs/tasks/todo/
      - 可实施的任务
    • .specs/tasks/in-progress/
      - 正在处理的任务
    • .specs/tasks/done/
      - 已完成的任务
    • .specs/scratchpad/
      - 临时工作文件(已加入gitignore)
    • .specs/analysis/
      - 代码库影响分析文件
    • .claude/skills/
      - 可复用的skill文档
启动阶段时将待办事项更新为
in_progress
,评审通过后更新为
completed

CRITICAL

关键注意事项

  • Do not mark PASS for any judge if it did not pass the rubric. Retry the judge after each implementation change till it passes the check!
  • Do not read task files in .claude or .specs directories, your job is orchestrate agents that will do the work, not do it by yourself!
  • Use
    THRESHOLD
    (default 3.5) for all judge pass/fail decisions, not hardcoded values!
  • Use
    MAX_ITERATIONS
    (default 3) for retry limits, not hardcoded values!
  • After
    MAX_ITERATIONS
    reached: PROCEED to next stage automatically - do NOT ask user unless phase is in
    HUMAN_IN_THE_LOOP_PHASES
    !
  • Skip phases not in
    ACTIVE_STAGES
    entirely - do not launch agents for excluded stages!
  • Trigger human-in-the-loop checkpoints ONLY after phases in
    HUMAN_IN_THE_LOOP_PHASES
    !
  • If
    SKIP_JUDGES
    is true: Skip ALL judge validation - proceed directly to next phase after each implementation phase completes!
  • Task file must exist in
    .specs/tasks/draft/
    before running this command (unless
    --refine
    mode)!
  • If
    REFINE_MODE
    is true: Detect changes via git diff, skip unchanged stages, pass user feedback to agents!
  • 如果未通过规则,请勿标记任何评审为通过。每次实施变更后重新运行评审,直到通过检查!
  • 不要读取.claude或.specs目录中的任务文件,你的工作是编排Agent来完成工作,而非自己执行!
  • 所有评审通过/不通过决策使用
    THRESHOLD
    (默认3.5),而非硬编码值!
  • 重试限制使用
    MAX_ITERATIONS
    (默认3),而非硬编码值!
  • 达到
    MAX_ITERATIONS
    后:自动进入下一阶段 - 除非阶段在
    HUMAN_IN_THE_LOOP_PHASES
    中,否则不要询问用户!
  • 完全跳过不在
    ACTIVE_STAGES
    中的阶段 - 不为排除的阶段启动Agent!
  • 仅在
    HUMAN_IN_THE_LOOP_PHASES
    中的阶段完成后触发人工介入检查点!
  • 如果
    SKIP_JUDGES
    为true:跳过所有评审验证 - 每个实施阶段完成后直接进入下一阶段!
  • 运行此命令前,任务文件必须存在于
    .specs/tasks/draft/
    目录(除非使用
    --refine
    模式)!
  • 如果
    REFINE_MODE
    为true:通过git diff检测变更,跳过未变更的阶段,将用户反馈传递给Agent!

Execution & Evaluation Rules

执行与评估规则

  • Use foreground agents only: Do not use background agents. Launch parallel agents when possible. Background agents constantly run in permissions issues and other errors.
Relaunch judge till you get valid results, of following happens:
  • Reject Long Reports: If an agent returns a very long report instead of using the scratchpad as requested, reject the result. This indicates the agent failed to follow the "use scratchpad" instruction.
  • Judge Score 5.0 is a Hallucination: If a judge returns a score of 5.0/5.0, treat it as a hallucination or lazy evaluation. Reject it and re-run the judge. Perfect scores are practically impossible in this rigorous framework.
  • Reject Missing Scores: If a judge report is missing the numerical score, reject it. This indicates the judge failed to read or follow the rubric instructions.
  • 仅使用前台Agent:不要使用后台Agent。尽可能启动并行Agent。后台Agent经常会出现权限问题和其他错误。
出现以下情况时,重新启动评审直到获得有效结果:
  • 拒绝冗长报告:如果Agent返回非常长的报告而非按要求使用scratchpad,拒绝该结果。这表明Agent未遵循“使用scratchpad”的指令。
  • 评审分数5.0视为幻觉:如果评审返回5.0/5.0的分数,将其视为幻觉或敷衍评估。拒绝该结果并重新运行评审。在这个严格的框架中,完美分数实际上是不可能的。
  • 拒绝缺失分数:如果评审报告缺少数值分数,拒绝该结果。这表明评审未阅读或未遵循规则指令。

Workflow Execution

工作流执行

You MUST launch for each step a separate agent, instead of performing all steps yourself.
CRITICAL: For each agent you MUST:
  1. Use the Agent type and Model specified in the step
  2. Provide the task file path and user input as context
  3. Provide the value of
    ${CLAUDE_PLUGIN_ROOT}
    so agents can resolve paths like
    @${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.sh
  4. Require agent to implement exactly that step, not more, not less
  5. After each sub-phase, launch a judge agent to validate quality before proceeding
你必须为每个步骤启动单独的Agent,而非自己执行所有步骤。
关键: 对于每个Agent,你必须:
  1. 使用步骤中指定的Agent类型模型
  2. 提供任务文件路径和用户输入作为上下文
  3. 提供
    ${CLAUDE_PLUGIN_ROOT}
    的值,以便Agent解析类似
    @${CLAUDE_PLUGIN_ROOT}/scripts/create-scratchpad.sh
    的路径
  4. 要求Agent严格执行该步骤,不多做也不少做
  5. 每个子阶段完成后,启动评审Agent验证质量,然后再继续

Complete Workflow Overview

完整工作流概述

Note: Phases not in
ACTIVE_STAGES
are skipped. If
SKIP_JUDGES
is true, all judge steps are skipped entirely. Human checkpoints (🔍) occur after phases in
HUMAN_IN_THE_LOOP_PHASES
.
Input: Draft Task File (.specs/tasks/draft/*.md)
Phase 2: Parallel Analysis
    ├─────────────────────┬─────────────────────┐
    ▼                     ▼                     ▼
Phase 2a:             Phase 2b:             Phase 2c:
Research              Codebase Analysis     Business Analysis
[sdd:researcher sonnet]   [sdd:code-explorer sonnet]  [sdd:business-analyst opus]
Judge 2a              Judge 2b              Judge 2c
(pass: >THRESHOLD)     (pass: >THRESHOLD)     (pass: >THRESHOLD)
    │                     │                     │
    └─────────────────────┴─────────────────────┘
                    Phase 3: Architecture Synthesis
                    [sdd:software-architect opus]
                    Judge 3 (pass: >THRESHOLD)
                    Phase 4: Decomposition
                    [sdd:tech-lead opus]
                    Judge 4 (pass: >THRESHOLD)
                    Phase 5: Parallelize
                    [sdd:team-lead opus]
                    Judge 5 (pass: >THRESHOLD)
                    Phase 6: Verifications
                    [sdd:qa-engineer opus]
                    Judge 6 (pass: >THRESHOLD)
                    Move task: draft/ → todo/
                    Complete

注意: 不在
ACTIVE_STAGES
中的阶段会被跳过。如果
SKIP_JUDGES
为true,所有评审步骤都会被完全跳过。人工检查点(🔍)在
HUMAN_IN_THE_LOOP_PHASES
中的阶段完成后触发。
输入:草稿任务文件 (.specs/tasks/draft/*.md)
阶段2:并行分析
    ├─────────────────────┬─────────────────────┐
    ▼                     ▼                     ▼
阶段2a:             阶段2b:             阶段2c:
调研              代码库分析     业务分析
[sdd:researcher sonnet]   [sdd:code-explorer sonnet]  [sdd:business-analyst opus]
评审2a              评审2b              评审2c
(通过: >THRESHOLD)     (通过: >THRESHOLD)     (通过: >THRESHOLD)
    │                     │                     │
    └─────────────────────┴─────────────────────┘
                    阶段3:架构合成
                    [sdd:software-architect opus]
                    评审3 (通过: >THRESHOLD)
                    阶段4:任务分解
                    [sdd:tech-lead opus]
                    评审4 (通过: >THRESHOLD)
                    阶段5:并行化重组
                    [sdd:team-lead opus]
                    评审5 (通过: >THRESHOLD)
                    阶段6:验证
                    [sdd:qa-engineer opus]
                    评审6 (通过: >THRESHOLD)
                    移动任务: draft/ → todo/
                    完成

Phase 2: Parallel Analysis

阶段2:并行分析

Phase 2 launches three analysis phases in parallel, each with its own judge validation.
阶段2并行启动三个分析阶段,每个阶段都有各自的评审验证。

Phase 2a/2b/2c: Parallel Sub-Phases

阶段2a/2b/2c:并行子阶段

Launch these three phases in parallel immediately:

立即并行启动这三个阶段:

Phase 2a: Research

阶段2a:调研

Model:
sonnet
Agent:
sdd:researcher
Depends on: Task file exists Purpose: Gather relevant resources, documentation, libraries, and prior art. Creates or updates a reusable skill.
Launch agent:
  • Description: "Research task resources and create/update skill"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    Task Title: <title from task file>
    
    CRITICAL: DO NOT OUTPUT YOUR RESEARCH, ONLY CREATE THE SCRATCHPAD AND SKILL FILE.
Capture:
  • Skill file path (e.g.,
    .claude/skills/<skill-name>/SKILL.md
    )
  • Skill action (Created new / Updated existing)
  • Scratchpad file path (e.g.,
    .specs/scratchpad/<hex-id>.md
    )
  • Number of resources gathered
  • Key recommendation summary
CRITICAL: If expected files not created, launch the agent again with the same prompt.

模型:
sonnet
Agent:
sdd:researcher
依赖条件: 任务文件存在 目的: 收集相关资源、文档、库和已有方案。创建或更新可复用的skill。
启动Agent:
  • 描述: "调研任务资源并创建/更新skill"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    Task Title: <任务文件中的标题>
    
    关键:不要输出你的调研内容,仅创建SCRATCHPAD和SKILL文件。
捕获信息:
  • Skill文件路径(例如:
    .claude/skills/<skill-name>/SKILL.md
  • Skill操作(新建/更新现有)
  • Scratchpad文件路径(例如:
    .specs/scratchpad/<hex-id>.md
  • 收集的资源数量
  • 关键建议摘要
关键:如果未创建预期文件,使用相同提示重新启动Agent。

Phase 2b: Codebase Impact Analysis

阶段2b:代码库影响分析

Model:
sonnet
Agent:
sdd:code-explorer
Depends on: Task file exists Purpose: Identify affected files, interfaces, and integration points
Launch agent:
  • Description: "Analyze codebase impact"
  • Prompt:
    text
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    Task Title: <title from task file>
    
    CRITICAL: DO NOT OUTPUT YOUR ANALYSIS, ONLY CREATE THE SCRATCHPAD AND ANALYSIS FILE.
Capture:
  • Analysis file path (e.g.,
    .specs/analysis/analysis-{name}.md
    )
  • Scratchpad file path (e.g.,
    .specs/scratchpad/<hex-id>.md
    )
  • Files affected count (modify/create/delete)
  • Risk level assessment
  • Key integration points
CRITICAL: If expected files not created, launch the agent again with the same prompt.

模型:
sonnet
Agent:
sdd:code-explorer
依赖条件: 任务文件存在 目的: 识别受影响的文件、接口和集成点
启动Agent:
  • 描述: "分析代码库影响"
  • 提示:
    text
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    Task Title: <任务文件中的标题>
    
    关键:不要输出你的分析内容,仅创建SCRATCHPAD和分析文件。
捕获信息:
  • 分析文件路径(例如:
    .specs/analysis/analysis-{name}.md
  • Scratchpad文件路径(例如:
    .specs/scratchpad/<hex-id>.md
  • 受影响文件数量(修改/创建/删除)
  • 风险等级评估
  • 关键集成点
关键:如果未创建预期文件,使用相同提示重新启动Agent。

Phase 2c: Business Analysis

阶段2c:业务分析

Model:
opus
Agent:
sdd:business-analyst
Depends on: Task file exists Purpose: Refine description and create acceptance criteria
Launch agent:
  • Description: "Business analysis"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Read ${CLAUDE_PLUGIN_ROOT}/skills/plan-task/analyse-business-requirements.md and execute it exactly as is!
    
    Task File: <TASK_FILE>
    Task Title: <title from task file>
    
    CRITICAL: DO NOT OUTPUT YOUR BUSINESS ANALYSIS, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.
Capture:
  • Scratchpad file path (e.g.,
    .specs/scratchpad/<hex-id>.md
    )
  • Acceptance criteria count
  • Scope defined (yes/no)
  • User scenarios documented

模型:
opus
Agent:
sdd:business-analyst
依赖条件: 任务文件存在 目的: 细化描述并创建验收标准
启动Agent:
  • 描述: "业务分析"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    阅读${CLAUDE_PLUGIN_ROOT}/skills/plan-task/analyse-business-requirements.md并严格执行!
    
    Task File: <TASK_FILE>
    Task Title: <任务文件中的标题>
    
    关键:不要输出你的业务分析内容,仅创建SCRATCHPAD并更新任务文件。
捕获信息:
  • Scratchpad文件路径(例如:
    .specs/scratchpad/<hex-id>.md
  • 验收标准数量
  • 是否明确定义范围(是/否)
  • 是否记录用户场景

Judge 2a/2b/2c: Validate Parallel Phases

评审2a/2b/2c:验证并行阶段

After each parallel phase completes, launch its respective judge with the same agent type and model.
每个并行阶段完成后,启动对应的相同Agent类型和模型的评审。

Judge 2a: Validate Research/Skill

评审2a:验证调研/Skill

Model:
sonnet
Agent:
sdd:researcher
Depends on: Phase 2a completion Purpose: Validate skill completeness and relevance
Launch judge:
  • Description: "Judge skill quality"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
    
    ### Artifact Path
    {path to skill file from Phase 2a}
    
    ### Context
    This is a skill document for task: {task title}. Evaluate comprehensiveness and reusability.
    
    ### Rubric
    1. Resource Coverage (weight: 0.30)
       - Documentation and references gathered?
       - Libraries and tools identified with recommendations?
       - 1=Missing critical resources, 2=Basic coverage, 3=Adequate, 4=Comprehensive, 5=Excellent
    
    2. Pattern Relevance (weight: 0.25)
       - Are identified patterns applicable?
       - Are recommendations actionable?
       - 1=Irrelevant, 2=Somewhat useful, 3=Adequate, 4=Well-targeted, 5=Perfect fit
    
    3. Issue Anticipation (weight: 0.20)
       - Common pitfalls identified with solutions?
       - 1=None identified, 2=Few issues, 3=Adequate, 4=Good coverage, 5=Comprehensive
    
    4. Reusability (weight: 0.15)
       - Is the skill general enough to help multiple tasks?
       - Does it avoid task-specific details?
       - 1=Too specific, 2=Limited reuse, 3=Adequate, 4=Good, 5=Highly reusable
    
    5. Task Integration (weight: 0.10)
       - Was task file updated with skill reference?
       - 1=Not updated, 3=Updated, 5=Updated with clear instructions
CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
  • PASS (score >=
    THRESHOLD
    ): Research complete, proceed
  • FAIL (score <
    THRESHOLD
    ): Re-launch Phase 2a with feedback
  • MAX_ITERATIONS reached: Proceed to next stage regardless of score (log warning)

模型:
sonnet
Agent:
sdd:researcher
依赖条件: 阶段2a完成 目的: 验证skill的完整性和相关性
启动评审:
  • 描述: "评审skill质量"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    阅读@${CLAUDE_PLUGIN_ROOT}/prompts/judge.md获取评估方法并执行。
    
    ### 工件路径
    {阶段2a生成的skill文件路径}
    
    ### 上下文
    这是任务{任务标题}的skill文档。评估其全面性和可复用性。
    
    ### 规则
    1. 资源覆盖(权重:0.30)
       - 是否收集了文档和参考资料?
       - 是否识别了库和工具并给出建议?
       - 1=缺少关键资源,2=基础覆盖,3=足够,4=全面,5=优秀
    
    2. 模式相关性(权重:0.25)
       - 识别的模式是否适用?
       - 建议是否可执行?
       - 1=不相关,2=有些用处,3=足够,4=针对性强,5=完美适配
    
    3. 问题预判(权重:0.20)
       - 是否识别了常见陷阱并给出解决方案?
       - 1=未识别任何问题,2=识别少量问题,3=足够,4=覆盖良好,5=全面覆盖
    
    4. 可复用性(权重:0.15)
       - skill是否足够通用以帮助多个任务?
       - 是否避免了任务特定细节?
       - 1=过于特定,2=复用性有限,3=足够,4=良好,5=高度可复用
    
    5. 任务集成(权重:0.10)
       - 任务文件是否更新了skill引用?
       - 1=未更新,3=已更新,5=已更新并包含清晰说明
关键:严格使用上述提示,不要添加任何其他内容,包括实施Agent的输出!!!
决策逻辑:
  • 通过(分数 >=
    THRESHOLD
    ):调研完成,继续
  • 不通过(分数 <
    THRESHOLD
    ):结合反馈重新启动阶段2a
  • 达到最大迭代次数:无论分数如何,进入下一阶段(记录警告)

Judge 2b: Validate Codebase Analysis

评审2b:验证代码库分析

Model:
sonnet
Agent:
sdd:code-explorer
Depends on: Phase 2b completion Purpose: Validate file identification accuracy and integration mapping
Launch judge:
  • Description: "Judge codebase analysis quality"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
    
    ### Artifact Path
    {path to analysis file from Phase 2b}
    
    ### Context
    This is codebase impact analysis for task: {task title}. Evaluate accuracy and completeness.
    
    ### Rubric
    1. File Identification Accuracy (weight: 0.35)
       - All affected files identified with specific paths?
       - New files and modifications distinguished?
       - 1=Major files missing, 2=Mostly correct, 3=Adequate, 4=Precise, 5=Complete
    
    2. Interface Documentation (weight: 0.25)
       - Key functions/classes documented with signatures?
       - Change requirements clear?
       - 1=Missing, 2=Partial, 3=Adequate, 4=Good, 5=Complete
    
    3. Integration Point Mapping (weight: 0.25)
       - Integration points identified with impact?
       - Similar patterns in codebase found?
       - 1=Missing, 2=Partial, 3=Adequate, 4=Good, 5=Comprehensive
    
    4. Risk Assessment (weight: 0.15)
       - High risk areas identified with mitigations?
       - 1=No assessment, 2=Basic, 3=Adequate, 4=Good, 5=Thorough
CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
  • PASS (score >=
    THRESHOLD
    ): Analysis complete, proceed
  • FAIL (score <
    THRESHOLD
    ): Re-launch Phase 2b with feedback
  • MAX_ITERATIONS reached: Proceed to next stage regardless of score (log warning)

模型:
sonnet
Agent:
sdd:code-explorer
依赖条件: 阶段2b完成 目的: 验证文件识别准确性和集成映射
启动评审:
  • 描述: "评审代码库分析质量"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    阅读@${CLAUDE_PLUGIN_ROOT}/prompts/judge.md获取评估方法并执行。
    
    ### 工件路径
    {阶段2b生成的分析文件路径}
    
    ### 上下文
    这是任务{任务标题}的代码库影响分析。评估其准确性和完整性。
    
    ### 规则
    1. 文件识别准确性(权重:0.35)
       - 是否识别了所有受影响的文件并提供了具体路径?
       - 是否区分了新文件和修改文件?
       - 1=缺少主要文件,2=大部分正确,3=足够,4=精确,5=完整
    
    2. 接口文档(权重:0.25)
       - 是否记录了关键函数/类及其签名?
       - 变更要求是否清晰?
       - 1=缺失,2=部分记录,3=足够,4=良好,5=完整
    
    3. 集成点映射(权重:0.25)
       - 是否识别了集成点及其影响?
       - 是否在代码库中找到类似模式?
       - 1=缺失,2=部分记录,3=足够,4=良好,5=全面
    
    4. 风险评估(权重:0.15)
       - 是否识别了高风险区域并给出缓解措施?
       - 1=未评估,2=基础评估,3=足够,4=良好,5=全面
关键:严格使用上述提示,不要添加任何其他内容,包括实施Agent的输出!!!
决策逻辑:
  • 通过(分数 >=
    THRESHOLD
    ):分析完成,继续
  • 不通过(分数 <
    THRESHOLD
    ):结合反馈重新启动阶段2b
  • 达到最大迭代次数:无论分数如何,进入下一阶段(记录警告)

Judge 2c: Validate Business Analysis

评审2c:验证业务分析

Model:
opus
Agent:
sdd:business-analyst
Depends on: Phase 2c completion Purpose: Validate acceptance criteria quality and scope definition
Launch judge:
  • Description: "Judge business analysis quality"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
    
    ### Artifact Path
    {path to task file from Phase 2c}
    
    ### Context
    This is business analysis output. Evaluate description clarity and acceptance criteria quality.
    
    ### Rubric
    1. Description Clarity (weight: 0.30)
       - What/Why clearly explained?
       - Scope boundaries defined?
       - 1=Vague, 2=Basic, 3=Adequate, 4=Clear, 5=Excellent
    
    2. Acceptance Criteria Quality (weight: 0.35)
       - Criteria specific and testable?
       - Given/When/Then format for complex criteria?
       - 1=Missing/vague, 2=Basic, 3=Adequate, 4=Good, 5=Excellent
    
    3. Scenario Coverage (weight: 0.20)
       - Primary flow documented?
       - Error scenarios considered?
       - 1=Missing, 2=Basic, 3=Adequate, 4=Good, 5=Comprehensive
    
    4. Scope Definition (weight: 0.15)
       - In-scope/out-of-scope explicit?
       - No implementation details in description?
       - 1=Missing, 2=Partial, 3=Adequate, 4=Good, 5=Clear
CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
  • PASS (score >=
    THRESHOLD
    ): Business analysis complete, proceed
  • FAIL (score <
    THRESHOLD
    ): Re-launch Phase 2c with feedback
  • MAX_ITERATIONS reached: Proceed to next stage regardless of score (log warning)

模型:
opus
Agent:
sdd:business-analyst
依赖条件: 阶段2c完成 目的: 验证验收标准质量和范围定义
启动评审:
  • 描述: "评审业务分析质量"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    阅读@${CLAUDE_PLUGIN_ROOT}/prompts/judge.md获取评估方法并执行。
    
    ### 工件路径
    {阶段2c后的任务文件路径}
    
    ### 上下文
    这是业务分析输出。评估描述清晰度和验收标准质量。
    
    ### 规则
    1. 描述清晰度(权重:0.30)
       - 是否清晰解释了做什么/为什么做?
       - 是否定义了范围边界?
       - 1=模糊,2=基础,3=足够,4=清晰,5=优秀
    
    2. 验收标准质量(权重:0.35)
       - 标准是否具体且可测试?
       - 复杂标准是否使用Given/When/Then格式?
       - 1=缺失/模糊,2=基础,3=足够,4=良好,5=优秀
    
    3. 场景覆盖(权重:0.20)
       - 是否记录了主流程?
       - 是否考虑了错误场景?
       - 1=缺失,2=基础,3=足够,4=良好,5=全面
    
    4. 范围定义(权重:0.15)
       - 是否明确界定了范围内/范围外内容?
       - 描述中是否包含实现细节?
       - 1=缺失,2=部分定义,3=足够,4=良好,5=清晰
关键:严格使用上述提示,不要添加任何其他内容,包括实施Agent的输出!!!
决策逻辑:
  • 通过(分数 >=
    THRESHOLD
    ):业务分析完成,继续
  • 不通过(分数 <
    THRESHOLD
    ):结合反馈重新启动阶段2c
  • 达到最大迭代次数:无论分数如何,进入下一阶段(记录警告)

Synchronization Point

同步点

Wait for ALL three parallel phases (2a, 2b, 2c) AND their judges to PASS before proceeding to Phase 3.

等待所有三个并行阶段(2a、2b、2c)及其评审都通过后,再进入阶段3。

Phase 3: Architecture Synthesis

阶段3:架构合成

Model:
opus
Agent:
sdd:software-architect
Depends on: Phase 2a + Judge 2a PASS, Phase 2b + Judge 2b PASS, Phase 2c + Judge 2c PASS Purpose: Synthesize research, analysis, and business requirements into architectural overview
Launch agent:
  • Description: "Architecture synthesis"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    Skill File: <skill file path from Phase 2a>
    Analysis File: <analysis file path from Phase 2b>
    
    CRITICAL: DO NOT OUTPUT YOUR ARCHITECTURE SYNTHESIS, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.
Capture:
  • Scratchpad file path (e.g.,
    .specs/scratchpad/<hex-id>.md
    )
  • Sections added to task file
  • Key architectural decisions count
  • Components identified (if applicable)
  • Contracts defined (if applicable)

模型:
opus
Agent:
sdd:software-architect
依赖条件: 阶段2a + 评审2a通过,阶段2b + 评审2b通过,阶段2c + 评审2c通过 目的: 将调研、分析和业务需求整合为架构概述
启动Agent:
  • 描述: "架构合成"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    Skill File: <阶段2a生成的skill文件路径>
    Analysis File: <阶段2b生成的分析文件路径>
    
    关键:不要输出你的架构合成内容,仅创建SCRATCHPAD并更新任务文件。
捕获信息:
  • Scratchpad文件路径(例如:
    .specs/scratchpad/<hex-id>.md
  • 添加到任务文件的部分
  • 关键架构决策数量
  • 识别的组件(如适用)
  • 定义的契约(如适用)

Judge 3: Validate Architecture Synthesis

评审3:验证架构合成

Model:
opus
Agent:
sdd:software-architect
Depends on: Phase 3 completion Purpose: Validate architectural coherence and completeness
Launch judge:
  • Description: "Judge architecture synthesis quality"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
    
    ### Artifact Path
    {path to task file after Phase 3}
    
    ### Context
    This is architecture synthesis output. The Architecture Overview section should contain
    solution strategy, key decisions, and only relevant architectural sections.
    
    ### Rubric
    1. Solution Strategy Clarity (weight: 0.30)
       - Approach clearly explained?
       - Key decisions documented with reasoning?
       - Trade-offs stated?
       - 1=Missing/unclear, 2=Basic, 3=Adequate, 4=Clear, 5=Excellent
    
    2. Reference Integration (weight: 0.20)
       - Links to research and analysis files?
       - Insights from both integrated?
       - 1=No links, 2=Partial, 3=Adequate, 4=Good, 5=Fully integrated
    
    3. Section Relevance (weight: 0.25)
       - Only relevant sections included (not all)?
       - Sections appropriate for task complexity?
       - 1=Wrong sections, 2=Mostly appropriate, 3=Adequate, 4=Good, 5=Precisely targeted
    
    4. Expected Changes Accuracy (weight: 0.25)
       - Files to create/modify listed?
       - Consistent with codebase analysis?
       - 1=Missing/inconsistent, 2=Partial, 3=Adequate, 4=Good, 5=Complete
    
CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
  • PASS (score >=
    THRESHOLD
    ): Architecture synthesis complete, proceed
  • FAIL (score <
    THRESHOLD
    ): Re-launch Phase 3 with feedback
  • MAX_ITERATIONS reached: Proceed to Phase 4 regardless of score (log warning)
Wait for PASS before Phase 4.

模型:
opus
Agent:
sdd:software-architect
依赖条件: 阶段3完成 目的: 验证架构的连贯性和完整性
启动评审:
  • 描述: "评审架构合成质量"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    阅读@${CLAUDE_PLUGIN_ROOT}/prompts/judge.md获取评估方法并执行。
    
    ### 工件路径
    {阶段3后的任务文件路径}
    
    ### 上下文
    这是架构合成输出。架构概述部分应包含解决方案策略、关键决策和仅相关的架构部分。
    
    ### 规则
    1. 解决方案策略清晰度(权重:0.30)
       - 是否清晰解释了方法?
       - 是否记录了关键决策及其理由?
       - 是否说明了权衡?
       - 1=缺失/模糊,2=基础,3=足够,4=清晰,5=优秀
    
    2. 参考集成(权重:0.20)
       - 是否链接到调研和分析文件?
       - 是否整合了两者的见解?
       - 1=无链接,2=部分集成,3=足够,4=良好,5=完全集成
    
    3. 部分相关性(权重:0.25)
       - 是否仅包含相关部分(而非全部)?
       - 部分是否适合任务复杂度?
       - 1=错误部分,2=大部分合适,3=足够,4=良好,5=精准匹配
    
    4. 预期变更准确性(权重:0.25)
       - 是否列出了要创建/修改的文件?
       - 是否与代码库分析一致?
       - 1=缺失/不一致,2=部分一致,3=足够,4=良好,5=完全一致
    
关键:严格使用上述提示,不要添加任何其他内容,包括实施Agent的输出!!!
决策逻辑:
  • 通过(分数 >=
    THRESHOLD
    ):架构合成完成,继续
  • 不通过(分数 <
    THRESHOLD
    ):结合反馈重新启动阶段3
  • 达到最大迭代次数:无论分数如何,进入阶段4(记录警告)
等待通过后进入阶段4。

Phase 4: Decomposition

阶段4:任务分解

Model:
opus
Agent:
sdd:tech-lead
Depends on: Phase 3 + Judge 3 PASS Purpose: Break architecture into implementation steps with success criteria and risks
Launch agent:
  • Description: "Decompose into implementation steps"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    
    CRITICAL: DO NOT OUTPUT YOUR DECOMPOSITION, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.
Capture:
  • Scratchpad file path (e.g.,
    .specs/scratchpad/<hex-id>.md
    )
  • Implementation steps count
  • Total subtasks count
  • Critical path steps
  • High priority risks count

模型:
opus
Agent:
sdd:tech-lead
依赖条件: 阶段3 + 评审3通过 目的: 将架构拆分为带有成功标准和风险的实施步骤
启动Agent:
  • 描述: "拆分为实施步骤"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    
    关键:不要输出你的分解内容,仅创建SCRATCHPAD并更新任务文件。
捕获信息:
  • Scratchpad文件路径(例如:
    .specs/scratchpad/<hex-id>.md
  • 实施步骤数量
  • 子任务总数
  • 关键路径步骤
  • 高优先级风险数量

Judge 4: Validate Decomposition

评审4:验证任务分解

Model:
opus
Agent:
sdd:tech-lead
Depends on: Phase 4 completion Purpose: Validate implementation steps quality and completeness
Launch judge:
  • Description: "Judge decomposition quality"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
    
    ### Artifact Path
    {path to task file after Phase 4}
    
    ### Context
    This is decomposition output. The Implementation Process section should contain
    ordered steps with success criteria, subtasks, blockers, and risks.
    
    ### Rubric
    1. Step Quality (weight: 0.30)
       - Each step has clear goal, output, success criteria?
       - Steps ordered by dependency?
       - No step too large (>Large estimate)?
       - 1=Vague/missing, 2=Basic, 3=Adequate, 4=Good, 5=Excellent
    
    2. Success Criteria Testability (weight: 0.25)
       - Criteria specific and verifiable?
       - Use actual file paths, function names?
       - Subtasks clearly defined with actionable descriptions?
       - 1=Vague, 2=Partially testable, 3=Adequate, 4=Good, 5=All testable
    
    3. Risk Coverage (weight: 0.25)
       - Blockers identified with resolutions?
       - Risks identified with mitigations?
       - High-risk tasks identified with decomposition recommendations?
       - 1=None, 2=Basic, 3=Adequate, 4=Good, 5=Comprehensive
    
    4. Completeness (weight: 0.20)
       - All architecture components have corresponding steps?
       - Implementation summary table present?
       - Definition of Done included?
       - Phases organized: Setup → Foundational → User Stories → Polish?
       - 1=Incomplete, 2=Partial, 3=Adequate, 4=Good, 5=Complete
CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
  • PASS (score >=
    THRESHOLD
    ): Decomposition complete, proceed to Phase 5
  • FAIL (score <
    THRESHOLD
    ): Re-launch Phase 4 with feedback
  • MAX_ITERATIONS reached: Proceed to Phase 5 regardless of score (log warning)
Wait for PASS before Phase 5.

模型:
opus
Agent:
sdd:tech-lead
依赖条件: 阶段4完成 目的: 验证实施步骤的质量和完整性
启动评审:
  • 描述: "评审任务分解质量"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    阅读@${CLAUDE_PLUGIN_ROOT}/prompts/judge.md获取评估方法并执行。
    
    ### 工件路径
    {阶段4后的任务文件路径}
    
    ### 上下文
    这是任务分解输出。实施流程部分应包含有序步骤、成功标准、子任务、阻塞点和风险。
    
    ### 规则
    1. 步骤质量(权重:0.30)
       - 每个步骤是否有明确目标、输出和成功标准?
       - 步骤是否按依赖关系排序?
       - 是否没有过大的步骤(>大估算)?
       - 1=模糊/缺失,2=基础,3=足够,4=良好,5=优秀
    
    2. 成功标准可测试性(权重:0.25)
       - 标准是否具体且可验证?
       - 是否使用实际文件路径、函数名?
       - 子任务是否有清晰的可执行描述?
       - 1=模糊,2=部分可测试,3=足够,4=良好,5=全部可测试
    
    3. 风险覆盖(权重:0.25)
       - 是否识别了阻塞点并给出解决方案?
       - 是否识别了风险并给出缓解措施?
       - 是否识别了高风险任务并给出分解建议?
       - 1=无,2=基础,3=足够,4=良好,5=全面
    
    4. 完整性(权重:0.20)
       - 所有架构组件是否都有对应的步骤?
       - 是否有实施摘要表?
       - 是否包含完成定义?
       - 是否按阶段组织:设置→基础→用户故事→优化?
       - 1=不完整,2=部分完整,3=足够,4=良好,5=完整
关键:严格使用上述提示,不要添加任何其他内容,包括实施Agent的输出!!!
决策逻辑:
  • 通过(分数 >=
    THRESHOLD
    ):任务分解完成,进入阶段5
  • 不通过(分数 <
    THRESHOLD
    ):结合反馈重新启动阶段4
  • 达到最大迭代次数:无论分数如何,进入阶段5(记录警告)
等待通过后进入阶段5。

Phase 5: Parallelize Steps

阶段5:并行化步骤

Model:
opus
Agent:
sdd:team-lead
Depends on: Phase 4 + Judge 4 PASS Purpose: Reorganize implementation steps for maximum parallel execution
Launch agent:
  • Description: "Parallelize implementation steps"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    
    Use agents only from this list: {list ALL available agents with plugin prefix if available, e.g. sdd:developer, review:bug-hunter. Also include general agents: opus, sonnet, haiku}
    
    CRITICAL: DO NOT OUTPUT YOUR PARALLELIZATION, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.
Capture:
  • Scratchpad file path (e.g.,
    .specs/scratchpad/<hex-id>.md
    )
  • Number of steps reorganized
  • Maximum parallelization depth
  • Agent distribution summary

模型:
opus
Agent:
sdd:team-lead
依赖条件: 阶段4 + 评审4通过 目的: 重新组织实施步骤以实现最大程度的并行执行
启动Agent:
  • 描述: "并行化实施步骤"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    
    仅使用以下列表中的Agent:{列出所有可用Agent,包括插件前缀(如sdd:developer, review:bug-hunter),同时包括通用Agent:opus, sonnet, haiku}
    
    关键:不要输出你的并行化内容,仅创建SCRATCHPAD并更新任务文件。
捕获信息:
  • Scratchpad文件路径(例如:
    .specs/scratchpad/<hex-id>.md
  • 重新组织的步骤数量
  • 最大并行化深度
  • Agent分配摘要

Judge 5: Validate Parallelization

评审5:验证并行化

Model:
opus
Agent:
sdd:team-lead
Depends on: Phase 5 completion Purpose: Validate dependency accuracy and parallelization optimization
Launch judge:
  • Description: "Judge parallelization quality"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
    
    ### Artifact Path
    {path to parallelized task file from Phase 5}
    
    ### Context
    This is the output of Phase 5: Parallelize Steps. The artifact should contain implementation steps
    reorganized for maximum parallel execution with explicit dependencies, agent assignments, and
    parallelization diagram.
    
    Use agents only from this list: {list ALL available agents with plugin prefix if available, e.g. sdd:developer, review:bug-hunter. Also include general agents: opus, sonnet, haiku}
    
    ### Rubric
    1. Dependency Accuracy (weight: 0.35)
       - Are step dependencies correctly identified?
       - No false dependencies (steps marked dependent when they're not)?
       - No missing dependencies (steps that actually depend on others)?
       - 1=Major dependency errors, 2=Mostly correct, 3=Acceptable, 5=Precise dependencies
    
    2. Parallelization Maximized (weight: 0.30)
       - Are parallelizable steps correctly marked with "Parallel with:"?
       - Is the parallelization diagram logical?
       - 1=No parallelization/wrong, 2=Some optimization, 3=Acceptable, 5=Maximum parallelization
    
    3. Agent Selection Correctness (weight: 0.20)
       - Are agent types appropriate for outputs (opus by default, haiku for trivial, sonnet for simple but high in volume)?
       - Does selection follow the Agent Selection Guide?
       - Are only agents from the provided available agents list used?
       - 1=Wrong agents, 2=Mostly appropriate, 3=Acceptable, 4=Optimal selection, 5=Perfect selection
    
    4. Execution Directive Present (weight: 0.15)
       - Is the sub-agent execution directive present?
       - Are "MUST" requirements for parallel execution clear?
       - 1=Missing directive, 2=Partial, 3=Acceptable, 4=Complete directive, 5=Perfect directive
CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
  • PASS (score >=
    THRESHOLD
    ): Proceed to Phase 6
  • FAIL (score <
    THRESHOLD
    ): Re-launch Phase 5 with feedback
  • MAX_ITERATIONS reached: Proceed to Phase 6 regardless of score (log warning)
Wait for PASS before Phase 6.

模型:
opus
Agent:
sdd:team-lead
依赖条件: 阶段5完成 目的: 验证依赖关系准确性和并行化优化
启动评审:
  • 描述: "评审并行化质量"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    阅读@${CLAUDE_PLUGIN_ROOT}/prompts/judge.md获取评估方法并执行。
    
    ### 工件路径
    {阶段5生成的并行化任务文件路径}
    
    ### 上下文
    这是阶段5的输出:并行化步骤。工件应包含重新组织以实现最大并行执行的实施步骤,带有明确的依赖关系、Agent分配和并行化图。
    
    仅使用以下列表中的Agent:{列出所有可用Agent,包括插件前缀(如sdd:developer, review:bug-hunter),同时包括通用Agent:opus, sonnet, haiku}
    
    ### 规则
    1. 依赖关系准确性(权重:0.35)
       - 是否正确识别了步骤依赖关系?
       - 是否没有虚假依赖(标记为依赖但实际不依赖的步骤)?
       - 是否没有缺失依赖(实际依赖其他步骤但未标记的步骤)?
       - 1=严重依赖错误,2=大部分正确,3=可接受,5=精确依赖
    
    2. 并行化最大化(权重:0.30)
       - 可并行的步骤是否正确标记为"Parallel with:"?
       - 并行化图是否合理?
       - 1=无并行化/错误,2=部分优化,3=可接受,5=最大程度并行化
    
    3. Agent选择正确性(权重:0.20)
       - Agent类型是否适合输出(默认opus,简单任务用haiku,简单但量大的任务用sonnet)?
       - 选择是否遵循Agent选择指南?
       - 是否仅使用提供的可用Agent列表中的Agent?
       - 1=错误Agent,2=大部分合适,3=可接受,4=最优选择,5=完美选择
    
    4. 执行指令是否存在(权重:0.15)
       - 是否存在子Agent执行指令?
       - 并行执行的"MUST"要求是否清晰?
       - 1=缺失指令,2=部分存在,3=可接受,4=完整指令,5=完美指令
关键:严格使用上述提示,不要添加任何其他内容,包括实施Agent的输出!!!
决策逻辑:
  • 通过(分数 >=
    THRESHOLD
    ):进入阶段6
  • 不通过(分数 <
    THRESHOLD
    ):结合反馈重新启动阶段5
  • 达到最大迭代次数:无论分数如何,进入阶段6(记录警告)
等待通过后进入阶段6。

Phase 6: Define Verifications

阶段6:定义验证规则

Model:
opus
Agent:
sdd:qa-engineer
Depends on: Phase 5 + Judge 5 PASS Purpose: Add LLM-as-Judge verification sections with rubrics
Launch agent:
  • Description: "Define verification rubrics"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    
    CRITICAL: DO NOT OUTPUT YOUR VERIFICATIONS, ONLY CREATE THE SCRATCHPAD AND UPDATE THE TASK FILE.
Capture:
  • Scratchpad file path (e.g.,
    .specs/scratchpad/<hex-id>.md
    )
  • Number of steps with verification
  • Total evaluations defined
  • Verification breakdown (Panel/Per-Item/None)

模型:
opus
Agent:
sdd:qa-engineer
依赖条件: 阶段5 + 评审5通过 目的: 添加带有规则的LLM-as-Judge验证部分
启动Agent:
  • 描述: "定义验证规则"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Task File: <TASK_FILE>
    
    关键:不要输出你的验证内容,仅创建SCRATCHPAD并更新任务文件。
捕获信息:
  • Scratchpad文件路径(例如:
    .specs/scratchpad/<hex-id>.md
  • 带有验证的步骤数量
  • 定义的评估总数
  • 验证分类(Panel/Per-Item/None)

Judge 6: Validate Verifications

评审6:验证规则定义

Model:
opus
Agent:
sdd:qa-engineer
Depends on: Phase 6 completion Purpose: Validate verification rubrics and thresholds
Launch judge:
  • Description: "Judge verification quality"
  • Prompt:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    Read @${CLAUDE_PLUGIN_ROOT}/prompts/judge.md for evaluation methodology and execute.
    
    ### Artifact Path
    {path to task file with verifications from Phase 6}
    
    ### Context
    This is the output of Phase 6: Define Verifications. The artifact should contain LLM-as-Judge
    verification sections for each implementation step, including verification levels, custom rubrics,
    thresholds, and a verification summary table.
    
    ### Rubric
    1. Verification Level Appropriateness (weight: 0.30)
       - Do verification levels match artifact criticality?
       - HIGH criticality → Panel, MEDIUM → Single/Per-Item, LOW/NONE → None?
       - 1=Mismatched levels, 2=Mostly appropriate, 3=Acceptable, 5=Precisely calibrated
    
    2. Rubric Quality (weight: 0.30)
       - Are criteria specific to the artifact type (not generic)?
       - Do weights sum to 1.0?
       - Are descriptions clear and measurable?
       - 1=Generic/broken rubrics, 2=Adequate, 3=Acceptable, 5=Excellent custom rubrics
    
    3. Threshold Appropriateness (weight: 0.20)
       - Are thresholds reasonable (typically 4.0/5.0)?
       - Higher for critical, lower for experimental?
       - 1=Wrong thresholds, 2=Standard applied, 3=Acceptable, 5=Context-appropriate
    
    4. Coverage Completeness (weight: 0.20)
       - Does every step have a Verification section?
       - Is the Verification Summary table present?
       - 1=Missing verifications, 2=Most covered, 3=Acceptable, 5=100% coverage
CRITICAL: use prompt exactly as is, do not add anything else. Including output of implementation agent!!!
Decision Logic:
  • PASS (score >=
    THRESHOLD
    ): Workflow complete, promote task
  • FAIL (score <
    THRESHOLD
    ): Re-launch Phase 6 with feedback
  • MAX_ITERATIONS reached: Complete workflow regardless of score (log warning)

模型:
opus
Agent:
sdd:qa-engineer
依赖条件: 阶段6完成 目的: 验证验证规则和阈值
启动评审:
  • 描述: "验证规则质量"
  • 提示:
    CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT}
    
    阅读@${CLAUDE_PLUGIN_ROOT}/prompts/judge.md获取评估方法并执行。
    
    ### 工件路径
    {阶段6添加验证后的任务文件路径}
    
    ### 上下文
    这是阶段6的输出:定义验证规则。工件应包含每个实施步骤的LLM-as-Judge验证部分,包括验证级别、自定义规则、阈值和验证摘要表。
    
    ### 规则
    1. 验证级别适配性(权重:0.30)
       - 验证级别是否与工件关键程度匹配?
       - 高关键程度→Panel,中等→Single/Per-Item,低/无→None?
       - 1=级别不匹配,2=大部分合适,3=可接受,5=精准校准
    
    2. 规则质量(权重:0.30)
       - 标准是否针对工件类型(而非通用)?
       - 权重总和是否为1.0?
       - 描述是否清晰可衡量?
       - 1=通用/无效规则,2=足够,3=可接受,5=优秀自定义规则
    
    3. 阈值适配性(权重:0.20)
       - 阈值是否合理(通常4.0/5.0)?
       - 关键任务阈值更高,实验性任务阈值更低?
       - 1=错误阈值,2=应用标准阈值,3=可接受,5=适配上下文
    
    4. 覆盖完整性(权重:0.20)
       - 每个步骤是否都有验证部分?
       - 是否有验证摘要表?
       - 1=缺失验证,2=大部分覆盖,3=可接受,5=100%覆盖
关键:严格使用上述提示,不要添加任何其他内容,包括实施Agent的输出!!!
决策逻辑:
  • 通过(分数 >=
    THRESHOLD
    ):工作流完成,升级任务
  • 不通过(分数 <
    THRESHOLD
    ):结合反馈重新启动阶段6
  • 达到最大迭代次数:无论分数如何,完成工作流(记录警告)

Phase 7: Promote Task

阶段7:升级任务

Purpose: Move the refined task from draft to todo folder
After all phases complete:
  1. Move task file from draft to todo:
    bash
    git mv <TASK_FILE> .specs/tasks/todo/
    # Fallback if git not available: mv <TASK_FILE> .specs/tasks/todo/
  2. Update any references in research and analysis files if needed

目的: 将细化后的任务从draft目录移至todo目录
所有阶段完成后:
  1. 将任务文件从draft移至todo:
    bash
    git mv <TASK_FILE> .specs/tasks/todo/
    # 如果git不可用,备用方案:mv <TASK_FILE> .specs/tasks/todo/
  2. 更新调研和分析文件中的引用(如有需要)

Completion

完成

After all executed phases and judges complete:
  1. Use git tool to stage the task file, skill file, analysis file, and scratchpad files (only those that were created)
  2. Summarize the workflow results and output to user:
markdown
undefined
所有执行的阶段和评审完成后:
  1. 使用git工具暂存任务文件、skill文件、分析文件和scratchpad文件(仅创建的文件)
  2. 总结工作流结果并输出给用户:
markdown
undefined

Task Refined

任务已细化

PropertyValue
Original File
<original TASK_FILE path>
Final Location
.specs/tasks/todo/<filename>
(ready for implementation)
Title
<task title>
Type
<feature/bug/refactor/test/docs/chore/ci>
(from filename)
Skill
<skill file path or "Skipped">
Skill Action
<Created new / Updated existing / Skipped>
Analysis
<analysis file path or "Skipped">
Scratchpad
<scratchpad file path>
Implementation Steps
<count or "N/A">
Parallelization Depth
<max parallel agents or "N/A">
Total Verifications
<count or "N/A">
属性
原始文件
<原始TASK_FILE路径>
最终位置
.specs/tasks/todo/<文件名>
(可实施)
标题
<任务标题>
类型
<feature/bug/refactor/test/docs/chore/ci>
(来自文件名)
Skill
<skill文件路径或"已跳过">
Skill操作
<新建/更新现有/已跳过>
分析
<分析文件路径或"已跳过">
Scratchpad
<scratchpad文件路径>
实施步骤
<数量或"N/A">
并行化深度
<最大并行Agent数量或"N/A">
总验证数
<数量或"N/A">

Configuration Used

使用的配置

SettingValue
Target Quality{THRESHOLD}/5.0
Max Iterations{MAX_ITERATIONS}
Active Stages{ACTIVE_STAGES as comma-separated list}
Skipped Stages{SKIP_STAGES or stages not in ACTIVE_STAGES}
Human CheckpointsPhase {HUMAN_IN_THE_LOOP_PHASES as comma-separated}
Skip Judges{SKIP_JUDGES}
Refine Mode{REFINE_MODE}
设置项
目标质量{THRESHOLD}/5.0
最大迭代次数{MAX_ITERATIONS}
激活阶段{ACTIVE_STAGES逗号分隔列表}
跳过的阶段{SKIP_STAGES或不在ACTIVE_STAGES中的阶段}
人工检查点阶段{HUMAN_IN_THE_LOOP_PHASES逗号分隔列表}
跳过评审{SKIP_JUDGES}
细化模式{REFINE_MODE}

Quality Gates Summary

质量关卡摘要

PhaseJudge ScoreVerdict
Phase 2a: ResearchX.X/5.0✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED
Phase 2b: Codebase AnalysisX.X/5.0✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED
Phase 2c: Business AnalysisX.X/5.0✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED
Phase 3: Architecture SynthesisX.X/5.0✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED
Phase 4: DecompositionX.X/5.0✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED
Phase 5: ParallelizeX.X/5.0✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED
Phase 6: VerifyX.X/5.0✅ PASS / ⚠️ PROCEEDED (max iter) / ⏭️ SKIPPED
Threshold Used: {THRESHOLD}/5.0 (or N/A if SKIP_JUDGES)
Legend:
  • ✅ PASS - Score >= THRESHOLD
  • ⚠️ PROCEEDED (max iter) - Score < THRESHOLD but MAX_ITERATIONS reached, proceeded anyway
  • ⏭️ SKIPPED - Stage not in ACTIVE_STAGES
阶段评审分数verdict
阶段2a:调研X.X/5.0✅ 通过 / ⚠️ 已推进(达到最大迭代次数) / ⏭️ 已跳过
阶段2b:代码库分析X.X/5.0✅ 通过 / ⚠️ 已推进(达到最大迭代次数) / ⏭️ 已跳过
阶段2c:业务分析X.X/5.0✅ 通过 / ⚠️ 已推进(达到最大迭代次数) / ⏭️ 已跳过
阶段3:架构合成X.X/5.0✅ 通过 / ⚠️ 已推进(达到最大迭代次数) / ⏭️ 已跳过
阶段4:任务分解X.X/5.0✅ 通过 / ⚠️ 已推进(达到最大迭代次数) / ⏭️ 已跳过
阶段5:并行化重组X.X/5.0✅ 通过 / ⚠️ 已推进(达到最大迭代次数) / ⏭️ 已跳过
阶段6:验证X.X/5.0✅ 通过 / ⚠️ 已推进(达到最大迭代次数) / ⏭️ 已跳过
使用的阈值: {THRESHOLD}/5.0(如果SKIP_JUDGES则为N/A)
图例:
  • ✅ 通过 - 分数 >= 阈值
  • ⚠️ 已推进(达到最大迭代次数) - 分数 < 阈值但已达到最大迭代次数,仍推进
  • ⏭️ 已跳过 - 阶段不在ACTIVE_STAGES中

Artifacts Generated

生成的工件


.claude/
└── skills/
    └── <skill-name>/
        └── SKILL.md             # Reusable skill document (if research stage ran)

.specs/
├── tasks/
│   ├── draft/                   # Draft tasks (source - now empty for this task)
│   ├── todo/
│   │   └── <name>.<type>.md     # Complete task specification (ready for implementation)
│   ├── in-progress/             # Tasks being implemented (empty)
│   └── done/                    # Completed tasks (empty)
├── analysis/
│   └── analysis-<name>.md       # Codebase impact analysis (if codebase analysis stage ran)
└── scratchpad/
    └── <hex-id>.md              # Architecture thinking scratchpad

.claude/
└── skills/
    └── <skill-name>/
        └── SKILL.md             # 可复用skill文档(如果运行了调研阶段)

.specs/
├── tasks/
│   ├── draft/                   # 草稿任务(源目录 - 此任务已移走)
│   ├── todo/
│   │   └── <name>.<type>.md     # 完整任务规范(可实施)
│   ├── in-progress/             # 正在实施的任务(空)
│   └── done/                    # 已完成的任务(空)
├── analysis/
│   └── analysis-<name>.md       # 代码库影响分析(如果运行了代码库分析阶段)
└── scratchpad/
    └── <hex-id>.md              # 架构思考草稿

Task Status Management

任务状态管理

Task status is managed by folder location:
  • draft/
    - Tasks created but not yet refined
  • todo/
    - Tasks ready for implementation
  • in-progress/
    - Tasks currently being worked on
  • done/
    - Completed tasks
任务状态通过文件夹位置管理:
  • draft/
    - 已创建但未细化的任务
  • todo/
    - 可实施的任务
  • in-progress/
    - 正在处理的任务
  • done/
    - 已完成的任务

Next Steps

下一步

  1. Review task:
    .specs/tasks/todo/<filename>
    • Edit the task file directly to make corrections
    • Add
      //
      comments to lines that need clarification or changes
    • Run
      /plan
      again with
      --refine
      to incorporate your feedback — it detects changes against git and propagates updates top-to-bottom (editing a section only affects sections below it, not above)
  2. If everything is fine, begin implementation:
    /implement
    (will auto-select the task from todo/)

---
  1. 评审任务:
    .specs/tasks/todo/<文件名>
    • 直接编辑任务文件进行修正
    • 在需要澄清或修改的行添加
      //
      注释
    • 使用
      --refine
      参数重新运行
      /plan
      以整合你的反馈 — 它会检测git中的变更并自上而下传播更新(编辑某个部分仅影响其下方的部分,不影响上方)
  2. 如果一切正常,开始实施:
    /implement
    (会自动从todo/目录选择任务)

---

Error Handling

错误处理

Phase Agent Failure (Exception/Crash)

阶段Agent失败(异常/崩溃)

If any phase agent fails unexpectedly:
  1. Report the failure with agent output
  2. Ask clarification questions from user that can help resolve the issue
  3. Launch the phase agent again with list of questions and answers to resolve the issue
如果任何阶段Agent意外失败:
  1. 报告失败及Agent输出
  2. 向用户询问有助于解决问题的澄清问题
  3. 带着问题列表和答案重新启动阶段Agent

Judge Returns FAIL

评审返回不通过

If any judge returns FAIL (score <
THRESHOLD
):
  1. Automatic retry: Re-launch the phase agent with judge feedback
  2. Human-in-the-loop check: If phase is in
    HUMAN_IN_THE_LOOP_PHASES
    , trigger human checkpoint before the next judge retry (after implementation retry but before re-judging)
  3. After
    MAX_ITERATIONS
    reached
    : Proceed to next stage automatically (do NOT ask user unless
    --human-in-the-loop
    includes this phase)
  4. Log warning in completion summary:
    ⚠️ Phase X did not pass quality threshold (X.X/THRESHOLD) after MAX_ITERATIONS iterations
如果任何评审返回不通过(分数 <
THRESHOLD
):
  1. 自动重试:结合评审反馈重新启动阶段Agent
  2. 人工介入检查:如果阶段在
    HUMAN_IN_THE_LOOP_PHASES
    中,在下一次评审重试前触发人工检查点(实施重试后,重新评审前)
  3. 达到最大迭代次数后自动进入下一阶段(除非
    --human-in-the-loop
    包含此阶段,否则不要询问用户)
  4. 在完成摘要中记录警告:
    ⚠️ 阶段X在MAX_ITERATIONS次迭代后未通过质量阈值(X.X/THRESHOLD)

Retry Flow

重试流程

Implementation → Judge FAIL → Implementation Retry → Judge Retry
                              PASS → Continue to next stage
                              FAIL → Repeat until MAX_ITERATIONS
                              MAX_ITERATIONS reached → Proceed to next stage (with warning)
实施 → 评审不通过 → 实施重试 → 评审重试
                              通过 → 进入下一阶段
                              不通过 → 重复直到达到最大迭代次数
                              达到最大迭代次数 → 进入下一阶段(带警告)

Retry Flow with Human-in-the-Loop

带人工介入的重试流程

When phase is in
HUMAN_IN_THE_LOOP_PHASES
:
Implementation → Judge FAIL → Implementation Retry
                    🔍 Human Checkpoint (optional feedback)
                              Judge Retry
                    PASS → Continue | FAIL → Repeat until MAX_ITERATIONS
                              MAX_ITERATIONS → 🔍 Final Human Checkpoint
                                    User confirms → Proceed to next stage
当阶段在
HUMAN_IN_THE_LOOP_PHASES
中时:
实施 → 评审不通过 → 实施重试
                    🔍 人工检查点(可选反馈)
                              评审重试
                    通过 → 继续 | 不通过 → 重复直到达到最大迭代次数
                              达到最大迭代次数 → 🔍 最终人工检查点
                                    用户确认 → 进入下一阶段