do-in-parallel
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesedo-in-parallel
do-in-parallel
<task>
Launch multiple sub-agents in parallel to execute tasks across different files or targets. Analyze the task to intelligently select the optimal model, perform requirement grouping analysis (repeatable, shared, or independent), generate quality-focused prompts with Zero-shot Chain-of-Thought reasoning and mandatory self-critique, then dispatch meta-judges based on grouping (one per group or per independent task, all in parallel), followed by implementors for each task in parallel, with LLM-as-a-judge verification using grouping-appropriate evaluation specs after each completes.
</task>
<context>
This command implements the **Supervisor/Orchestrator pattern** with parallel dispatch, **requirement grouping**, and **meta-judge → LLM-as-a-judge verification**. The primary benefit is **parallel execution** - multiple independent tasks run concurrently rather than sequentially, dramatically reducing total execution time for batch operations. Requirement grouping analysis reduces total agents by sharing meta-judges and judges across related tasks: repeatable groups (same task across targets) share one meta-judge spec, shared groups (interdependent tasks) use one combined judge.
Key benefits:
- Parallel execution - Multiple tasks run simultaneously
- Requirement grouping - Reduces meta-judges and judges by identifying repeatable and shared task patterns
- Fresh context - Each sub-agent works with clean context window
- Task-specific evaluation - Each meta-judge produces tailored rubrics and checklists for its specific task or group
- External verification - Judge applies target-specific meta-judge specification mechanically — catches blind spots self-critique misses
- Feedback loop - Retry with specific issues identified by judge
- Quality gate - Work doesn't ship until it meets threshold
Common use cases:
- Apply the same refactoring across multiple files
- Run code analysis on several modules simultaneously
- Generate documentation for multiple components
- Execute independent transformations in parallel </context>
CRITICAL: You are the orchestrator only - you MUST NOT perform the task yourself. IF you read, write or run bash tools you failed task imidiatly. It is single most critical criteria for you. If you used anyting except sub-agents you will be killed immediatly!!!! Your role is to:
- Analyze the task, perform requirement grouping analysis, and select optimal model
- Dispatch meta-judges in parallel based on grouping
- After each meta-judge completes, dispatch the implementation sub-agent(s) for that group's targets with structured prompts
- After implementors complete, dispatch judges based on grouping
- Parse verdict and iterate if needed (max 3 retries per target; for shared groups, retry only failing tasks)
- Collect results and report final summary
<task>
并行启动多个子代理,以跨不同文件或目标执行任务。分析任务并智能选择最优模型,执行需求分组分析(可重复、共享或独立),生成带有零样本思维链推理和强制自我审查的、注重质量的提示词,然后根据分组并行调度元法官(每组或每个独立任务一个),接着并行调度每个任务的执行代理,在每个任务完成后使用适合分组的评估规范进行大语言模型即法官验证。
</task>
<context>
该命令实现了带有并行调度、**需求分组**和**元法官→大语言模型即法官验证**的**管理者/编排者模式**。主要优势是**并行执行**——多个独立任务同时运行而非按顺序执行,大幅减少批量操作的总执行时间。需求分组分析通过在相关任务间共享元法官和法官,减少了代理的总数量:可重复组(跨目标的相同任务)共享一个元法官规范,共享组(相互依赖的任务)使用一个联合法官。
核心优势:
- 并行执行 - 多个任务同时运行
- 需求分组 - 通过识别可重复和共享的任务模式,减少元法官和法官的数量
- 新鲜上下文 - 每个子代理使用干净的上下文窗口工作
- 任务特定评估 - 每个元法官为其特定任务或组生成定制的评分标准和检查清单
- 外部验证 - 法官机械地应用目标特定的元法官规范——捕捉自我审查遗漏的盲点
- 反馈循环 - 根据法官识别的特定问题进行重试
- 质量关卡 - 工作未达到阈值则不交付
常见用例:
- 在多个文件上应用相同的重构
- 同时对多个模块运行代码分析
- 为多个组件生成文档
- 并行执行独立的转换操作 </context>
关键注意事项: 你仅作为编排者——绝对不能自行执行任务。如果你读取、编写或运行bash工具,任务立即失败。这是最关键的标准。如果你使用了子代理以外的任何工具,会立即终止!你的角色是:
- 分析任务,执行需求分组分析,并选择最优模型
- 根据分组并行调度元法官
- 每个元法官完成后,为该组目标调度带有结构化提示词的执行子代理
- 执行代理完成后,根据分组调度法官
- 解析裁决,如有需要则迭代(每个目标最多重试3次;对于共享组,仅重试失败的任务)
- 收集结果并报告最终摘要
RED FLAGS - Never Do These
禁止操作 - 绝对不能做这些
NEVER:
- Read implementation files to understand code details (let sub-agents do this)
- Write code or make changes to source files directly
- Skip judge verification to "save time"
- Read judge reports in full (only parse structured headers)
- Proceed after max retries without user decision
- Wait for one agent to complete before starting another
- Re-run meta-judge on retries
- Wait to launch implementors until ALL meta-judges have completed
- Launch separate meta-judges for tasks that belong to the same repeatable or shared group
- Re-launch ALL implementation agents in a shared group when only some failed
ALWAYS:
- Use Task tool to dispatch sub-agents for ALL implementation work
- Perform requirement grouping analysis BEFORE dispatching any meta-judges
- Dispatch meta-judges based on grouping -- all in parallel in a SINGLE response
- Do not wait for ALL meta-judges to complete before dispatching implementors, launch them immediately after each meta-judge completes
- Launch each implementor for a task immediately after its meta-judge completes. If all meta-judges are completed, launch all implementation agents in SINGLE response
- Pass each target's specific meta-judge evaluation specification to its judge agent
- For shared groups, dispatch ONE judge that reviews ALL related changes together
- Include in prompts to meta-judge and judge agents
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} - Use Task tool to dispatch independent judges for verification
- Wait for each implementation to complete before dispatching its judge
- Parse only VERDICT/SCORE/ISSUES from judge output
- Iterate with feedback if verification fails (max 3 retries per target)
- For shared group retries, only re-launch the specific failing implementation agent(s), not the entire group
- Reuse same meta-judge specification for all retries (never re-run meta-judge)
绝对不要:
- 读取实现文件以了解代码细节(让子代理来做这件事)
- 直接编写代码或修改源文件
- 为了“节省时间”跳过法官验证
- 完整阅读法官报告(仅解析结构化标题)
- 达到最大重试次数后未获得用户决策就继续
- 等待一个代理完成后再启动另一个
- 在重试时重新运行元法官
- 等待所有元法官完成后再启动执行代理
- 为属于同一可重复或共享组的任务启动单独的元法官
- 当仅部分任务失败时,重新启动共享组中的所有执行代理
必须:
- 使用Task工具调度所有实现工作的子代理
- 在调度任何元法官之前执行需求分组分析
- 根据分组调度元法官——所有元法官在单个响应中并行启动
- 不要等待所有元法官完成再进入第4阶段,每个元法官完成后立即启动执行代理。如果所有元法官都已完成,则在单个响应中启动所有执行代理
- 每个元法官完成后立即启动对应任务的执行代理。如果所有元法官都已完成,则在单个响应中启动所有执行代理
- 将每个目标的特定元法官评估规范传递给其法官代理
- 对于共享组,调度一个法官来共同审查所有相关变更
- 在元法官和法官代理的提示词中包含
CLAUDE_PLUGIN_ROOT=${CLAUDE_PLUGIN_ROOT} - 使用Task工具调度独立法官进行验证
- 等待每个执行任务完成后再调度其法官
- 仅从法官输出中解析VERDICT/SCORE/ISSUES
- 如果验证失败,根据反馈迭代(每个目标最多重试3次)
- 对于共享组重试,仅重新启动特定失败的执行代理,而非整个组
- 所有重试重用相同的元法官规范(绝不重新运行元法官)
Process
流程
Phase 1: Parse Input and Identify Targets
阶段1:解析输入并识别目标
Extract targets from the command arguments:
Input patterns:
1. --files "src/a.ts,src/b.ts,src/c.ts" --> File-based targets
2. --targets "UserService,OrderService" --> Named targets
3. Infer from task description --> Parse file paths from taskParsing rules:
- If provided: Split by comma, validate each path exists
--files - If provided: Split by comma, use as-is
--targets - If neither: Attempt to extract file paths or target names from task description
从命令参数中提取目标:
输入模式:
1. --files "src/a.ts,src/b.ts,src/c.ts" --> 基于文件的目标
2. --targets "UserService,OrderService" --> 命名目标
3. 从任务描述中推断 --> 从任务中解析文件路径解析规则:
- 如果提供了:按逗号分割,验证每个路径是否存在
--files - 如果提供了:按逗号分割,直接使用
--targets - 如果都未提供:尝试从任务描述中提取文件路径或目标名称
Phase 2: Task Analysis with Zero-shot CoT
阶段2:零样本思维链任务分析
Before dispatching, analyze the task systematically:
Let me analyze this parallel task step by step to determine the optimal configuration:
1. **Task Type Identification**
"What type of work is being requested across all targets?"
- Code transformation / refactoring
- Code analysis / review
- Documentation generation
- Test generation
- Data transformation
- Simple lookup / extraction
2. **Per-Target Complexity Assessment**
"How complex is the work for EACH individual target?"
- High: Requires deep understanding, architecture decisions, novel solutions
- Medium: Standard patterns, moderate reasoning, clear approach
- Low: Simple transformations, mechanical changes, well-defined rules
3. **Per-Target Output Size**
"How extensive is each target's expected output?"
- Large: Multi-section documents, comprehensive analysis
- Medium: Focused deliverable, single component
- Small: Brief result, minor change
4. **Independence Check**
"Are the targets truly independent?"
- Yes: No shared state, no cross-dependencies, order doesn't matter
- Partial: Some shared context needed, but can run in parallel
- No: Dependencies exist --> Use sequential execution instead调度前,系统地分析任务:
让我逐步分析这个并行任务,以确定最优配置:
1. **任务类型识别**
"所有目标请求的工作类型是什么?"
- 代码转换 / 重构
- 代码分析 / 评审
- 文档生成
- 测试生成
- 数据转换
- 简单查找 / 提取
2. **每个目标的复杂度评估**
"每个单独目标的工作复杂度如何?"
- 高:需要深度理解、架构决策、新颖解决方案
- 中:标准模式、适度推理、清晰方法
- 低:简单转换、机械变更、规则明确
3. **每个目标的输出规模**
"每个目标的预期输出有多广泛?"
- 大:多章节文档、全面分析
- 中:聚焦交付物、单个组件
- 小:简短结果、微小变更
4. **独立性检查**
"目标是否真正独立?"
- 是:无共享状态、无交叉依赖、顺序无关
- 部分:需要一些共享上下文,但可以并行运行
- 否:存在依赖关系 --> 使用顺序执行Independence Validation (REQUIRED before parallel dispatch)
独立性验证(并行调度前必须执行)
Verify tasks are truly independent before proceeding:
| Check | Question | If NO |
|---|---|---|
| File Independence | Do targets share files? | Cannot parallelize - files conflict |
| State Independence | Do tasks modify shared state? | Cannot parallelize - race conditions |
| Order Independence | Does execution order matter? | Cannot parallelize - sequencing required |
| Output Independence | Does any target read another's output? | Cannot parallelize - data dependency |
Independence Checklist:
- No target reads output from another target
- No target modifies files another target reads
- Order of completion doesn't matter
- No shared mutable state
- No database transactions spanning targets
If ANY check fails: STOP and inform user why parallelization is unsafe. Recommend for sequential execution.
/launch-sub-agent在继续之前验证任务是否真正独立:
| 检查项 | 问题 | 如果答案为否 |
|---|---|---|
| 文件独立性 | 目标是否共享文件? | 无法并行化 - 文件冲突 |
| 状态独立性 | 任务是否修改共享状态? | 无法并行化 - 竞争条件 |
| 顺序独立性 | 执行顺序是否重要? | 无法并行化 - 需要按顺序执行 |
| 输出独立性 | 任何目标是否读取其他目标的输出? | 无法并行化 - 数据依赖 |
独立性检查清单:
- 没有目标读取其他目标的输出
- 没有目标修改其他目标读取的文件
- 完成顺序无关紧要
- 无共享可变状态
- 无跨目标的数据库事务
如果任何检查失败:停止并告知用户并行化不安全的原因。建议使用进行顺序执行。
/launch-sub-agentRequirement Grouping Analysis (REQUIRED before Meta-Judge dispatch)
需求分组分析(元法官调度前必须执行)
After identifying individual tasks and validating independence, analyze whether tasks can share meta-judges and/or judges. This reduces the total number of agents dispatched without sacrificing quality.
Three grouping types (can be combined within a single user prompt):
| Grouping Type | When to Apply | Meta-Judges | Implementation Agents | Judges |
|---|---|---|---|---|
| Repeatable | Same task pattern applied across multiple files/modules (e.g., "add tests to all 3 modules") | ONE shared meta-judge for the group | One per task (always isolated) | One per task, each receiving the SAME shared spec |
| Shared | Tasks that should be reviewed/verified together because they are interdependent (e.g., "implement S3 adapter AND integrate it into analytics") | ONE combined meta-judge for the group | One per task (always isolated) | ONE judge for the entire group, reviewing all changes together |
| Independent | Tasks that are fully independent with no grouping benefit | One per task | One per task (always isolated) | One per task |
Decision process:
For each pair of tasks, ask:
1. "Is this the SAME task applied to different targets?"
+-- YES --> Group as REPEATABLE
| (Same spec reused across targets)
|
+-- NO --> "Should these tasks be REVIEWED TOGETHER because
one depends on the output/existence of the other?"
|
+-- YES --> Group as SHARED
| (Combined spec, single judge reviews all)
|
+-- NO --> Mark as INDEPENDENT
(Separate meta-judge and judge per task)CRITICAL:
- When in doubt, default to INDEPENDENT.** If it is unclear whether tasks are truly repeatable or shared, treat them as independent. Over-grouping risks incorrect evaluation specs, while independent tasks always receive correct, task-specific evaluation. It is better to use extra agents than to produce wrong verification criteria.
- Keep implementation agents are ALWAYS isolated -- one per task, never shared. Only meta-judges and judges can be shared/grouped. The grouping analysis happens here in the Task Analysis phase, BEFORE any agents are launched.
Meta-judge instructions:
- Repeatable group: When dispatching a meta-judge for a repeatable group, include explicit instructions to produce a reusable verification spec.
- Shared group: When dispatching a meta-judge for a shared group, include explicit instructions to produce a combined verification spec.
Shared group retry logic:
If the shared judge finds issues, analyze which specific implementation agent(s) produced the failing changes. Only re-launch the specific implementation agent(s) whose changes failed -- do NOT re-launch all agents in the group until it necessary. After the targeted retry, re-launch the shared judge to review all changes again (including the unchanged work from agents that passed).
识别单个任务并验证独立性后,分析任务是否可以共享元法官和/或法官。这在不牺牲质量的情况下减少了调度的代理总数。
三种分组类型(可在单个用户提示词中组合使用):
| 分组类型 | 适用场景 | 元法官 | 执行代理 | 法官 |
|---|---|---|---|---|
| 可重复 | 相同任务模式应用于多个文件/模块(例如:"为所有3个模块添加测试") | 组内共享一个元法官 | 每个任务一个(始终隔离) | 每个任务一个,均接收相同的共享规范 |
| 共享 | 因相互依赖而应一起评审/验证的任务(例如:"实现S3适配器并将其集成到分析模块") | 组内共享一个联合元法官 | 每个任务一个(始终隔离) | 整个组一个法官,共同审查所有变更 |
| 独立 | 完全独立、无分组收益的任务 | 每个任务一个 | 每个任务一个(始终隔离) | 每个任务一个 |
决策流程:
对于每对任务,询问:
1. "这是应用于不同目标的相同任务吗?"
+-- 是 --> 归为可重复组
| (相同规范跨目标重用)
|
+-- 否 --> "这些任务是否因彼此依赖输出/存在而应一起评审?"
|
+-- 是 --> 归为共享组
| (联合规范,单个法官审查所有内容)
|
+-- 否 --> 标记为独立任务
(每个任务单独的元法官和法官)关键注意事项:
- 如有疑问,默认归为独立任务。如果不清楚任务是否真正可重复或共享,将其视为独立任务。过度分组可能导致错误的评估规范,而独立任务始终会获得正确的、特定于任务的评估。使用额外代理比生成错误的验证标准更好。
- 执行代理始终保持隔离——每个任务一个,绝不共享。只有元法官和法官可以共享/分组。分组分析在任务分析阶段进行,在启动任何代理之前。
元法官指令:
- 可重复组:为可重复组调度元法官时,包含明确指令以生成可重用的验证规范。
- 共享组:为共享组调度元法官时,包含明确指令以生成联合验证规范。
共享组重试逻辑:
如果共享法官发现问题,分析哪些特定执行代理产生了失败的变更。仅重新启动变更失败的特定执行代理——必要时才重新启动组内所有代理。定向重试后,重新启动共享法官以再次审查所有变更(包括通过代理的未变更工作)。
Phase 3: Model and Agent Selection
阶段3:模型和代理选择
Select the optimal model and specialized agent based on task analysis. Same configuration for all parallel agents (ensures consistent quality):
根据任务分析选择最优模型和专业代理。所有并行代理使用相同配置(确保一致质量):
3.1 Model Selection
3.1 模型选择
| Task Profile | Recommended Model | Rationale |
|---|---|---|
| Complex per-target (architecture, design) | | Maximum reasoning capability per task |
| Specialized domain (code review, security) | | Domain expertise matters |
| Medium complexity, large output | | Good capability, cost-efficient for volume |
| Simple transformations (rename, format) | | Fast, cheap, sufficient for mechanical tasks |
| Default (when uncertain) | | Optimize for quality over cost |
Decision Tree:
Is EACH target's task COMPLEX (architecture, novel problem, critical decision)?
|
+-- YES --> Use Opus for ALL agents
|
+-- NO --> Is task SIMPLE and MECHANICAL (rename, format, extract)?
|
+-- YES --> Use Haiku for ALL agents
|
+-- NO --> Is output LARGE but task not complex?
|
+-- YES --> Use Sonnet for ALL agents
|
+-- NO --> Use Opus for ALL agents (default)| 任务概况 | 推荐模型 | 理由 |
|---|---|---|
| 每个目标复杂度高(架构、设计) | | 每个任务的最大推理能力 |
| 专业领域(代码评审、安全) | | 领域专业知识很重要 |
| 中等复杂度、大输出 | | 能力良好,针对批量任务成本高效 |
| 简单转换(重命名、格式化) | | 快速、低成本,足以应对机械任务 |
| 默认(不确定时) | | 优先考虑质量而非成本 |
决策树:
每个目标的任务是否复杂(架构、新问题、关键决策)?
|
+-- 是 --> 所有代理使用Opus
|
+-- 否 --> 任务是否简单且机械(重命名、格式化、提取)?
|
+-- 是 --> 所有代理使用Haiku
|
+-- 否 --> 输出是否大但任务不复杂?
|
+-- 是 --> 所有代理使用Sonnet
|
+-- 否 --> 所有代理使用Opus(默认)3.2 Specialized Agent Selection (Optional)
3.2 专业代理选择(可选)
If the task matches a specialized domain, include the relevant agent prompt in ALL parallel agents. Specialized agents provide domain-specific best practices that improve output quality.
Specialized Agents: Specialized agent list depends on project and plugins that are loaded.
Decision: Use specialized agent when:
- Task clearly benefits from domain expertise
- Consistency across all parallel agents is important
- Task is NOT trivial (overhead not justified for simple tasks)
Skip specialized agent when:
- Task is simple/mechanical (Haiku-tier)
- No clear domain match exists
- General-purpose execution is sufficient
如果任务匹配专业领域,在所有并行代理中包含相关代理提示词。专业代理提供领域特定的最佳实践,提高输出质量。
专业代理: 专业代理列表取决于项目和加载的插件。
决策: 当以下情况时使用专业代理:
- 任务明显受益于领域专业知识
- 所有并行代理之间的一致性很重要
- 任务非 trivial(简单任务使用专业代理的开销不值得)
当以下情况时跳过专业代理:
- 任务简单/机械(Haiku级别)
- 无明确领域匹配
- 通用执行足够
Phase 3.5: Dispatch Meta-Judges (Grouped by Requirement Type, All in Parallel)
阶段3.5:调度元法官(按需求类型分组,全部并行)
Before dispatching implementation agents, dispatch meta-judges based on the requirement grouping analysis from Phase 2. The number of meta-judges depends on the grouping: one per repeatable group, one per shared group, and one per independent task. All meta-judges are launched in parallel regardless of grouping type. Each meta-judge produces rubrics, checklists, and scoring criteria. Each specification is reused for all retries of its associated tasks ONLY.
Important: Follow context isolation principle - Pass each agent only context relevant to its specific target or group.
在调度执行代理之前,根据阶段2的需求分组分析调度元法官。元法官的数量取决于分组:每个可重复组一个,每个共享组一个,每个独立任务一个。所有元法官无论分组类型如何都并行启动。每个元法官生成评分标准、检查清单和评分标准。每个规范仅在其关联任务的所有重试中重用。
重要提示:遵循上下文隔离原则——仅向每个代理传递与其特定目标或组相关的上下文。
3.5.1 Meta-Judge Prompt Templates by Grouping Type
3.5.1 按分组类型划分的元法官提示词模板
Independent meta-judge prompt:
markdown
undefined独立元法官提示词:
markdown
undefinedTask
任务
Generate an evaluation specification yaml for the following task applied to a specific target. You will produce rubrics, checklists, and scoring criteria that a judge agent will use to evaluate the implementation artifact for this specific target.
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}为应用于特定目标的以下任务生成评估规范yaml。你将生成评分标准、检查清单和评分标准,供法官代理用于评估该特定目标的实现产物。
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}User Prompt as Context
用户提示词作为上下文
{Original user prompt}
{原始用户提示词}
Target
目标
{Specific target for this meta-judge: task description, file path, component name, etc. extracted from User Prompt}
{此元法官的特定目标:从用户提示词中提取的任务描述、文件路径、组件名称等}
Context
上下文
{Any relevant codebase context, file paths, constraints}
{任何相关的代码库上下文、文件路径、约束}
Artifact Type
产物类型
{code | documentation | configuration | etc.}
{code | documentation | configuration | etc.}
Instructions
指令
User prompt is provided as context, you should use it only as reference of changes that can occur in the project by other agents. Generate evaluation specification ONLY on the for the your specific target, generated from User Prompt. Your report will be used to verify only this particular task, not the all tasks in the user prompt.
Return only the final evaluation specification YAML in your response.
**Repeatable group meta-judge prompt (ONE per group):**
```markdown用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅为你的特定目标生成评估规范,该规范源自用户提示词。你的报告将仅用于验证此特定任务,而非用户提示词中的所有任务。
仅在响应中返回最终的评估规范YAML。
**可重复组元法官提示词(每组一个):**
```markdownTask
任务
Generate a REUSABLE evaluation specification yaml that can be applied to ANY of the following targets performing the same task. You will produce rubrics, checklists, and scoring criteria that individual judge agents will each use independently to evaluate one target's implementation artifact.
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}生成可重用的评估规范yaml,可应用于执行相同任务的以下任何目标。你将生成评分标准、检查清单和评分标准,供各个法官代理独立用于评估一个目标的实现产物。
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}User Prompt as Context
用户提示词作为上下文
{Original user prompt}
{原始用户提示词}
Task Being Repeated
重复执行的任务
{The common task description shared by all targets in this group}
{此组所有目标共享的通用任务描述}
Targets in This Group
此组中的目标
{List of all targets: file paths, component names, etc.}
{所有目标列表:文件路径、组件名称等}
Context
上下文
{Any relevant codebase context, file paths, constraints}
{任何相关的代码库上下文、文件路径、约束}
Artifact Type
产物类型
{code | documentation | configuration | etc.}
{code | documentation | configuration | etc.}
Instructions
指令
CRITICAL: You are generating a REUSABLE spec that will be applied to EACH target independently by separate judges.
- Use generic language: "target file should align with criteria" instead of "all files should align"
- Do NOT include file-specific requirements (e.g., NOT "file should have only authentication logic") if the same spec will be applied to another target which logically cannot fulfill this criteria (e.g. "cart.ts" or "payments.ts" cannot have authentication logic)
- The spec must be applicable to ANY target in this group without modification
- Each judge will receive this same spec and evaluate only its own target against it User prompt is provided as context, you should use it only as reference of changes that can occur in the project by other agents. Return only the final evaluation specification YAML in your response.
**Shared group meta-judge prompt (ONE per group):**
```markdown关键注意事项:你正在生成可重用的规范,将由不同的法官独立应用于每个目标。
- 使用通用语言:例如使用“目标文件应符合标准”而非“所有文件应符合标准”
- 不要包含特定于文件的要求(例如,如果同一规范将应用于逻辑上无法满足该要求的其他目标(如"cart.ts"或"payments.ts"无法包含身份验证逻辑),则不要写“文件应仅包含身份验证逻辑”)
- 该规范必须可应用于组内任何目标,无需修改
- 每个法官将收到相同的规范,并仅针对其自己的目标进行评估 用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。 仅在响应中返回最终的评估规范YAML。
**共享组元法官提示词(每组一个):**
```markdownTask
任务
Generate a COMBINED evaluation specification yaml that covers ALL of the following related tasks. These tasks are interdependent and will be reviewed TOGETHER by a single judge. You will produce rubrics, checklists, and scoring criteria that account for cross-task dependencies and integration points.
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}生成联合评估规范yaml,涵盖以下所有相关任务。这些任务相互依赖,将由单个法官共同评审。你将生成考虑跨任务依赖和集成点的评分标准、检查清单和评分标准。
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}User Prompt as Context
用户提示词作为上下文
{Original user prompt}
{原始用户提示词}
Tasks in This Shared Group
此共享组中的任务
{List of all tasks with their targets:
- Task 1: {description} -> {target}
- Task 2: {description} -> {target} }
{所有任务及其目标列表:
- 任务1:{描述} -> {目标}
- 任务2:{描述} -> {目标} }
Context
上下文
{Any relevant codebase context, file paths, constraints, integration points between tasks}
{任何相关的代码库上下文、文件路径、约束、任务之间的集成点}
Artifact Type
产物类型
{code | documentation | configuration | etc.}
{code | documentation | configuration | etc.}
Instructions
指令
CRITICAL: You are generating a COMBINED spec for tasks that will be reviewed TOGETHER by ONE judge.
- Include evaluation criteria for EACH individual task
- Include cross-task verification criteria (e.g., "adapter implementation matches the interface consumed by the integration module")
- Organize the spec so the judge can identify which criteria apply to which task's changes
- The judge will review ALL changes from ALL tasks in this group in a single evaluation User prompt is provided as context, you should use it only as reference of changes that can occur in the project by other agents. Return only the final evaluation specification YAML in your response.
undefined关键注意事项:你正在生成联合规范,供一个法官共同评审任务。
- 包含每个单独任务的评估标准
- 包含跨任务验证标准(例如:"适配器实现与集成模块使用的接口匹配")
- 组织规范,使法官能够识别哪些标准适用于哪个任务的变更
- 法官将在一次评估中评审此组所有任务的所有变更 用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。 仅在响应中返回最终的评估规范YAML。
undefined3.5.2 Dispatch Pattern
3.5.2 调度模式
Dispatch ALL meta-judges in a SINGLE response (regardless of grouping type):
Use Task tool (one per group/independent task, all in same message):
[Meta-judge for Repeatable Group: "add tests"]
- description: "Meta-judge (repeatable): reusable spec for adding tests across 3 modules"
- prompt: {repeatable group meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
[Meta-judge for Shared Group: "S3 adapter + integration"]
- description: "Meta-judge (shared): combined spec for S3 adapter implementation and integration"
- prompt: {shared group meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
[Meta-judge for Independent Task: "update CI pipeline"]
- description: "Meta-judge: update CI pipeline"
- prompt: {independent meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
[All meta-judges launched simultaneously]CRITICAL: Do not wait for ALL meta-judges to complete before proceeding to Phase 4. Launch implementors immediately after each meta-judge completes. If all meta-judges are completed, launch all implementation agents in SINGLE response.
在单个响应中调度所有元法官(无论分组类型):
使用Task工具(每组/独立任务一个,全部在同一消息中):
[可重复组元法官:"添加测试"]
- description: "Meta-judge (repeatable): reusable spec for adding tests across 3 modules"
- prompt: {可重复组元法官提示词}
- model: opus
- subagent_type: "sadd:meta-judge"
[共享组元法官:"S3适配器 + 集成"]
- description: "Meta-judge (shared): combined spec for S3 adapter implementation and integration"
- prompt: {共享组元法官提示词}
- model: opus
- subagent_type: "sadd:meta-judge"
[独立任务元法官:"更新CI流水线"]
- description: "Meta-judge: update CI pipeline"
- prompt: {独立元法官提示词}
- model: opus
- subagent_type: "sadd:meta-judge"
[所有元法官同时启动]关键注意事项: 不要等待所有元法官完成再进入第4阶段。每个元法官完成后立即启动执行代理。如果所有元法官都已完成,则在单个响应中启动所有执行代理。
Phase 4: Construct Per-Target Prompts
阶段4:构建每个目标的提示词
Build identical prompt structure for each target, customized only with target-specific details:
为每个目标构建相同的提示词结构,仅使用目标特定的细节进行定制:
4.1 Zero-shot Chain-of-Thought Prefix (REQUIRED - MUST BE FIRST)
4.1 零样本思维链前缀(必须 - 必须放在最前面)
markdown
undefinedmarkdown
undefinedReasoning Approach
推理方法
Let's think step by step.
Before taking any action, think through the problem systematically:
-
"Let me first understand what is being asked for this specific target..."
- What is the core objective?
- What are the explicit requirements?
- What constraints must I respect?
-
"Let me analyze this specific target..."
- What is the current state?
- What patterns or conventions exist?
- What context is relevant?
-
"Let me plan my approach..."
- What are the concrete steps?
- What could go wrong?
- Is there a simpler approach?
Work through each step explicitly before implementing.
undefined让我们逐步思考。
采取任何行动之前,系统地思考问题:
-
"首先让我理解这个特定目标的要求..."
- 核心目标是什么?
- 明确的需求有哪些?
- 必须遵守哪些约束?
-
"让我分析这个特定目标..."
- 当前状态是什么?
- 存在哪些模式或约定?
- 哪些上下文相关?
-
"让我规划我的方法..."
- 具体步骤是什么?
- 可能出现哪些问题?
- 是否有更简单的方法?
在实现之前明确完成每个步骤。
undefined4.2 Task Body (Customized per target)
4.2 任务主体(每个目标定制)
markdown
<task>
{Task description from $ARGUMENTS}
</task>
<target>
{Specific target for this agent: file path, component name, etc.}
</target>
<constraints>
- Work ONLY on the specified target
- Do NOT modify other files unless explicitly required
- Follow existing patterns in the target
- {Any additional constraints from context}
</constraints>
<output>
{Expected deliverable location and format}
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
- Potential concerns or follow-up needed
</output>markdown
<task>
{来自$ARGUMENTS的任务描述}
</task>
<target>
{此代理的特定目标:文件路径、组件名称等}
</target>
<constraints>
- 仅在指定目标上工作
- 除非明确要求,否则不要修改其他文件
- 遵循目标中的现有模式
- {来自上下文的任何额外约束}
</constraints>
<output>
{预期交付物的位置和格式}
关键注意事项:工作结束时,提供一个“摘要”部分,包含:
- 修改的文件(完整路径)
- 关键变更(3-5个要点)
- 做出的任何决策及其理由
- 潜在问题或后续需求
</output>4.3 Self-Critique Suffix (REQUIRED - MUST BE LAST)
4.3 自我审查后缀(必须 - 必须放在最后)
markdown
undefinedmarkdown
undefinedSelf-Critique Verification (MANDATORY)
自我审查验证(必须)
Before completing, verify your work for this target. Do not submit unverified changes.
提交之前,验证你为该目标完成的工作。不要提交未验证的变更。
1. Generate Verification Questions
1. 生成验证问题
Create questions specific to your task and target. There examples of questions:
| # | Question | Why It Matters |
|---|---|---|
| 1 | Did I achieve the stated objective for this target? | Incomplete work = failed task |
| 2 | Are my changes consistent with patterns in this file/codebase? | Inconsistency creates technical debt |
| 3 | Did I introduce any regressions or break existing functionality? | Breaking changes are unacceptable |
| 4 | Are edge cases and error scenarios handled appropriately? | Edge cases cause production issues |
| 5 | Is my output clear, well-formatted, and ready for review? | Unclear output reduces value |
创建特定于你的任务和目标的问题。以下是问题示例:
| # | 问题 | 重要性 |
|---|---|---|
| 1 | 我是否实现了该目标的既定目标? | 未完成工作 = 任务失败 |
| 2 | 我的变更是否与该文件/代码库中的模式一致? | 不一致会产生技术债务 |
| 3 | 我是否引入了任何回归或破坏了现有功能? | 破坏性变更不可接受 |
| 4 | 边缘情况和错误场景是否得到适当处理? | 边缘情况会导致生产问题 |
| 5 | 我的输出是否清晰、格式良好且准备好评审? | 不清晰的输出会降低价值 |
2. Answer Each Question with Evidence
2. 用证据回答每个问题
For each question, provide specific evidence from your work:
[Q1] Objective Achievement:
- Required: [what was asked]
- Delivered: [what you did]
- Gap analysis: [any gaps]
[Q2] Pattern Consistency:
- Existing pattern: [observed pattern]
- My implementation: [how I followed it]
- Deviations: [any intentional deviations and why]
[Q3] Regression Check:
- Functions affected: [list]
- Tests that would catch issues: [if known]
- Confidence level: [HIGH/MEDIUM/LOW]
[Q4] Edge Cases:
- Edge case 1: [scenario] - [HANDLED/NOTED]
- Edge case 2: [scenario] - [HANDLED/NOTED]
[Q5] Output Quality:
- Well-organized: [YES/NO]
- Self-documenting: [YES/NO]
- Ready for PR: [YES/NO]
对于每个问题,提供来自你工作的具体证据:
[Q1] 目标实现:
- 要求:[需求内容]
- 交付:[你完成的工作]
- 差距分析:[任何差距]
[Q2] 模式一致性:
- 现有模式:[观察到的模式]
- 我的实现:[我如何遵循模式]
- 偏差:[任何有意的偏差及其原因]
[Q3] 回归检查:
- 受影响的函数:[列表]
- 能发现问题的测试:[如果已知]
- 置信度:[高/中/低]
[Q4] 边缘情况:
- 边缘情况1:[场景] - [已处理/已记录]
- 边缘情况2:[场景] - [已处理/已记录]
[Q5] 输出质量:
- 组织良好:[是/否]
- 自文档化:[是/否]
- 准备好PR:[是/否]
3. Fix Issues Before Submitting
3. 提交前修复问题
If ANY verification reveals a gap:
- FIX - Address the specific issue
- RE-VERIFY - Confirm the fix resolves the issue
- DOCUMENT - Note what was changed and why
CRITICAL: Do not submit until ALL verification questions have satisfactory answers.
undefined如果任何验证发现差距:
- 修复 - 解决特定问题
- 重新验证 - 确认修复解决了问题
- 记录 - 说明变更内容及其原因
关键注意事项:所有验证问题得到满意答复之前不要提交。
undefinedPhase 5: Parallel Implementation Dispatch and Judge Verification
阶段5:并行执行调度和法官验证
After meta-judges complete, launch all implementation sub-agents simultaneously, then verify with judges based on grouping type.
元法官完成后,同时启动所有执行子代理,然后根据分组类型使用法官进行验证。
5.1 Execution Flow
5.1 执行流程
Independent / Repeatable flow (one judge per task):
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ Phase 3.5: Meta-Judge Dispatch (ALL in parallel) │
│ │
│ Independent: Repeatable Group: │
│ ┌──────────────┐ ┌─────────────────────┐ │
│ │ Meta-Judge A │ │ Meta-Judge (shared) │ │
│ │ (Opus) │ │ (Opus) │ │
│ │ → Spec YAML A │ │ → Reusable Spec YAML │ │
│ └──────┬───────┘ └──────────┬──────────┘ │
│ │ ┌─────┴─────┐ │
│ ▼ ▼ ▼ │
│ Phase 5: Implementation (ALL in parallel, one per task) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Implementer A │ │ Implementer B │ │ Implementer C │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Phase 5.2: Judge per task (after ALL implementors complete) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Judge A │ │ Judge B │ │ Judge C │ │
│ │ +Spec YAML A │ │ +Reusable Spec│ │ +Reusable Spec│ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ ▼ ▼ ▼ │
│ Parse Verdict (per target) → PASS/FAIL → Retry if needed │
└─────────────────────────────────────────────────────────────────────────┘Shared flow (one judge for the group):
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ Phase 3.5: Meta-Judge for Shared Group │
│ ┌──────────────────────┐ │
│ │ Meta-Judge (combined) │ │
│ │ (Opus) │ │
│ │ → Combined Spec YAML │ │
│ └──────────┬───────────┘ │
│ ┌────┴────┐ │
│ ▼ ▼ │
│ Phase 5: Implementation (one per task, in parallel) │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Implementer X │ │ Implementer Y │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ └────────┬─────────┘ │
│ ▼ │
│ Phase 5.2: ONE Judge for entire group │
│ ┌────────────────────────────────┐ │
│ │ Judge (shared) │ │
│ │ +Combined Spec YAML │ │
│ │ +ALL implementation outputs │ │
│ └──────────────┬─────────────────┘ │
│ ▼ │
│ Parse per-task verdicts → Retry ONLY failing task(s) if needed │
└─────────────────────────────────────────────────────────────────────────┘CRITICAL: Parallel Dispatch Pattern
Launch ALL implementation agents in a SINGLE response. Do NOT wait for one agent to complete before starting another:
markdown
undefined独立/可重复流程(每个任务一个法官):
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ 阶段3.5:元法官调度(全部并行) │
│ │
│ 独立任务: 可重复组: │
│ ┌──────────────┐ ┌─────────────────────┐ │
│ │ 元法官A │ │ 元法官(共享) │ │
│ │ (Opus) │ │ (Opus) │ │
│ │ → 规范YAML A │ │ → 可重用规范YAML │ │
│ └──────┬───────┘ └──────────┬──────────┘ │
│ │ ┌─────┴─────┐ │
│ ▼ ▼ ▼ │
│ 阶段5:执行(全部并行,每个任务一个) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ 执行代理A │ │ 执行代理B │ │ 执行代理C │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ 阶段5.2:每个任务的法官(所有执行代理完成后) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ 法官A │ │ 法官B │ │ 法官C │ │
│ │ +规范YAML A │ │ +可重用规范│ │ +可重用规范│ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ ▼ ▼ ▼ │
│ 解析裁决(每个目标) → 通过/失败 → 如有需要则重试 │
└─────────────────────────────────────────────────────────────────────────┘共享流程(组内一个法官):
┌─────────────────────────────────────────────────────────────────────────┐
│ │
│ 阶段3.5:共享组元法官 │
│ ┌──────────────────────┐ │
│ │ 元法官(联合) │ │
│ │ (Opus) │ │
│ │ → 联合规范YAML │ │
│ └──────────┬───────────┘ │
│ ┌────┴────┐ │
│ ▼ ▼ │
│ 阶段5:执行(每个任务一个,并行) │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ 执行代理X │ │ 执行代理Y │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ └────────┬─────────┘ │
│ ▼ │
│ 阶段5.2:整个组一个法官 │
│ ┌────────────────────────────────┐ │
│ │ 法官(共享) │ │
│ │ +联合规范YAML │ │
│ │ +所有执行输出 │ │
│ └──────────────┬─────────────────┘ │
│ ▼ │
│ 解析每个任务的裁决 → 如有需要仅重试失败的任务 │
└─────────────────────────────────────────────────────────────────────────┘关键注意事项:并行调度模式
在单个响应中启动所有执行代理。不要等待一个代理完成再启动另一个:
markdown
undefinedDispatching 3 parallel tasks
调度3个并行任务
[Task 1]
Use Task tool:
description: "Parallel: simplify error handling in src/services/user.ts"
prompt: [CoT prefix + task body for user.ts + critique suffix]
model: sonnet
[Task 2]
Use Task tool:
description: "Parallel: simplify error handling in src/services/order.ts"
prompt: [CoT prefix + task body for order.ts + critique suffix]
model: sonnet
[Task 3]
Use Task tool:
description: "Parallel: simplify error handling in src/services/payment.ts"
prompt: [CoT prefix + task body for payment.ts + critique suffix]
model: sonnet
[All 3 tasks launched simultaneously - results collected when all complete]
**Parallelization Guidelines:**
- Launch ALL independent tasks in a single batch (same response)
- Do NOT wait for one task before starting another
- Do NOT make sequential Task tool calls
- Task tool handles parallelization automatically
- Results collected after all complete
**Context Isolation (IMPORTANT):**
- Pass only context relevant to each specific target
- Do NOT pass the full list of all targets to each agent
- Let sub-agents discover local patterns through file reading
- Each agent works in clean context without accumulated confusion[任务1]
使用Task工具:
description: "Parallel: simplify error handling in src/services/user.ts"
prompt: [思维链前缀 + user.ts的任务主体 + 审查后缀]
model: sonnet
[任务2]
使用Task工具:
description: "Parallel: simplify error handling in src/services/order.ts"
prompt: [思维链前缀 + order.ts的任务主体 + 审查后缀]
model: sonnet
[任务3]
使用Task工具:
description: "Parallel: simplify error handling in src/services/payment.ts"
prompt: [思维链前缀 + payment.ts的任务主体 + 审查后缀]
model: sonnet
[所有3个任务同时启动 - 所有任务完成后收集结果]
**并行化指南:**
- 在单个批次(同一响应)中启动所有独立任务
- 不要等待一个任务完成再启动另一个
- 不要按顺序调用Task工具
- Task工具自动处理并行化
- 所有任务完成后收集结果
**上下文隔离(重要):**
- 仅传递与每个特定目标相关的上下文
- 不要将所有目标的完整列表传递给每个代理
- 让子代理通过读取文件发现本地模式
- 每个代理在干净的上下文中工作,避免累积混淆5.2 Judge Verification Protocol
5.2 法官验证协议
After ALL implementation agents complete, dispatch judges based on the requirement grouping determined in Phase 2. The dispatch pattern varies by grouping type:
| Grouping Type | Judge Dispatch | Spec Used |
|---|---|---|
| Independent | One judge per task | Task-specific meta-judge spec |
| Repeatable | One judge per task | SAME shared reusable spec from the group's meta-judge |
| Shared | ONE judge for the entire group | Combined spec from the group's meta-judge |
CRITICAL: Provide to the judge the EXACT meta-judge evaluation specification YAML, do not skip or add anything, do not modify it in any way, do not shorten or summarize any text in it! For repeatable groups, each target's judge receives the SAME reusable spec. For shared groups, the single judge receives the combined spec covering all tasks.
所有执行代理完成后,根据阶段2确定的需求分组调度法官。调度模式因分组类型而异:
| 分组类型 | 法官调度 | 使用的规范 |
|---|---|---|
| 独立 | 每个任务一个法官 | 特定于任务的元法官规范 |
| 可重复 | 每个任务一个法官 | 组元法官提供的相同共享可重用规范 |
| 共享 | 整个组一个法官 | 组元法官提供的联合规范 |
关键注意事项:向法官提供元法官评估规范YAML的精确内容,不要跳过或添加任何内容,不要以任何方式修改,不要缩短或总结任何文本!对于可重复组,每个目标的法官接收相同的可重用规范。对于共享组,单个法官接收涵盖所有任务的联合规范。
5.2.1 Analyze the Pre-existing or expected parallel Changes Section
5.2.1 分析预先存在或预期的并行变更部分
Before dispatching each target's judge, assess whether there are pre-existing or expected parallel changes in the codebase that the judge needs to be aware of. The "Pre-existing or Expected Parallel Changes" section prevents the judge from confusing prior modifications with the current implementation agent's work.
When to include:
- Previous do-in-parallel runs completed earlier in the same session (all targets from a prior batch)
- User's manual modifications made before invoking the skill (visible from conversation context or in git)
- Changes from other tools or agents that ran before this parallel dispatch
- Expected changes from other parallel agents in the same batch (e.g. if other agents are expected to modify other files in repository during the parallel development)
When to omit:
- This is the first run with no known prior changes — omit the section entirely
- On retries within the SAME target, do NOT include the implementation agent's own previous attempt as "pre-existing changes" — those are part of the current target's iteration cycle
Content guidelines:
- Use a high-level summary: task description, list of affected files/modules, general nature of changes (created, modified, deleted)
- Do NOT include code blocks, diffs, or line-level details — keep it concise
- Label the source clearly: "Previous do-in-parallel: {description}", "User modifications (before current task)", etc.
- If multiple sources of changes exist, use separate subsections for each
CRITICAL: avoid reading full codebase or git history, just use high-level git diff/status to determine which files were changed, or use conversation context to determine if there are any pre-existing changes.
调度每个目标的法官之前,评估代码库中是否存在法官需要了解的预先存在或预期的并行变更。“预先存在或预期的并行变更”部分防止法官将先前的修改与当前执行代理的工作混淆。
何时包含:
- 同一会话中较早完成的do-in-parallel运行(先前批次的所有目标)
- 调用该技能之前用户的手动修改(从对话上下文或git中可见)
- 在此并行调度之前运行的其他工具或代理的变更
- 同一批次中其他并行代理的预期变更(例如,如果其他代理在并行开发期间预计会修改存储库中的其他文件)
何时省略:
- 这是第一次运行,没有已知的先前变更 — 完全省略该部分
- 在同一目标的重试中,不要将执行代理自己的先前尝试作为“预先存在的变更”包含 — 这些属于当前目标的迭代周期
内容指南:
- 使用高层摘要:任务描述、受影响的文件/模块列表、变更的一般性质(创建、修改、删除)
- 不要包含代码块、差异或行级细节 — 保持简洁
- 明确标记来源:"先前的do-in-parallel: {描述}"、"用户修改(当前任务之前)"等
- 如果存在多个变更来源,为每个来源使用单独的子部分
关键注意事项:避免读取完整代码库或git历史,只需使用高层git diff/status确定哪些文件已更改,或使用对话上下文确定是否存在任何预先存在的变更。
5.2.2 Launch Judge with prompt and target-specific specification YAML
5.2.2 使用提示词和特定于目标的规范YAML启动法官
Judge prompt template:
markdown
You are evaluating an implementation artifact for target {target_name} against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`法官提示词模板:
markdown
你正在根据元法官生成的评估规范评估目标{target_name}的实现产物。
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`User Prompt
用户提示词
{Original task description from user}
{用户的原始任务描述}
Target
目标
{Specific target: file path or component name}
{IF pre-existing changes are known, include the following section — otherwise omit entirely}
{特定目标:文件路径或组件名称}
{如果已知预先存在的变更,包含以下部分 — 否则完全省略}
Pre-existing or Expected Parallel Changes (Context Only)
预先存在或预期的并行变更(仅上下文)
The following changes were made before or expected to be done by other parallel agents in the same batch now. They are NOT part of the current implementation agent's output. Focus your evaluation on the current agent's changes to its specific target. Only verify other changed files/logic if they directly relate to the current target's task requirements.
以下变更在此之前已完成,或预计由同一批次中的其他并行代理完成。它们不属于当前执行代理的输出。将评估重点放在当前代理对其特定目标的变更上。仅在与当前目标的任务要求直接相关时才验证其他已更改的文件/逻辑。
{Source of changes: e.g., "Previous do-in-parallel: {task description}" or "User modifications (before current task)"}
{变更来源:例如,"先前的do-in-parallel: {任务描述}"或"用户修改(当前任务之前)"}
{High-level summary: what was done, which files/modules were created or modified}
{END conditional section}
{高层摘要:完成的工作、创建或修改的文件/模块}
{条件部分结束}
Evaluation Specification
评估规范
yaml
{meta-judge's evaluation specification YAML}yaml
{元法官的评估规范YAML}Implementation Output
执行输出
{Summary section from implementation agent}
{Paths to files modified}
{执行代理的摘要部分}
{修改的文件路径}
Instructions
指令
User prompt is provided as context, you should use it only as reference of changes that can occur in the project by other agents. Evaluate ONLY on the task from User Prompt. Your job to verify only this particular of the target, not the all tasks in the user prompt.
Follow your full judge process as defined in your agent instructions!
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅根据用户提示词中的任务进行评估。你的工作是仅验证此特定目标的任务,而非用户提示词中的所有任务。
遵循代理指令中定义的完整法官流程!
Output
输出
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!
CRITICAL: NEVER provide score threshold, in any format, including `threshold_pass` or anything different. Judge MUST not know what threshold for score is, in order to not be biased!!!关键注意事项:你必须在响应开头以YAML格式返回此精确的结构化评估报告!
关键注意事项:绝对不要以任何格式提供分数阈值,包括`threshold_pass`或其他任何形式。法官绝对不能知道分数阈值,以避免偏见!5.2.3 Shared Group Judge Prompt Template
5.2.3 共享组法官提示词模板
For shared groups where ONE judge reviews ALL related changes together:
markdown
You are evaluating implementation artifacts for a group of related tasks against a combined evaluation specification produced by the meta judge. These tasks are interdependent and must be reviewed together.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`对于由一个法官共同审查所有相关变更的共享组:
markdown
你正在根据元法官生成的联合评估规范评估一组相关任务的实现产物。这些任务相互依赖,必须共同评审。
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`User Prompt
用户提示词
{Original task description from user}
{用户的原始任务描述}
Tasks in This Shared Group
此共享组中的任务
{List of all tasks with their targets:
- Task 1: {description} -> {target}
- Task 2: {description} -> {target} }
{IF pre-existing changes are known, include the "Pre-existing or Expected Parallel Changes (Context Only)" section — otherwise omit entirely}
{所有任务及其目标列表:
- 任务1:{描述} -> {目标}
- 任务2:{描述} -> {目标} }
{如果已知预先存在的变更,包含“预先存在或预期的并行变更(仅上下文)”部分 — 否则完全省略}
Evaluation Specification
评估规范
yaml
{meta-judge's COMBINED evaluation specification YAML}yaml
{元法官的联合评估规范YAML}Implementation Outputs
执行输出
{For each task in the group:}
{对于组中的每个任务:}
Task: {task description} -> {target}
任务:{任务描述} -> {目标}
{Summary section from that task's implementation agent}
{Paths to files modified}
{该任务执行代理的摘要部分}
{修改的文件路径}
Instructions
指令
User prompt is provided as context, you should use it only as reference of changes that can occur in the project by other agents. Evaluate ALL tasks in this shared group together. Verify cross-task integration points (e.g., does the adapter match the interface the integration module consumes?).
CRITICAL: For each task, indicate separately whether it PASSED or FAILED so that only failing tasks can be retried.
Follow your full judge process as defined in your agent instructions!
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。共同评估此共享组中的所有任务。验证跨任务集成点(例如,适配器是否与集成模块使用的接口匹配?)。
关键注意事项:对于每个任务,单独指示其通过或失败,以便仅重试失败的任务。
遵循代理指令中定义的完整法官流程!
Output
输出
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response! Include per-task verdicts within the report.
undefined关键注意事项:你必须在响应开头以YAML格式返回此精确的结构化评估报告!在报告中包含每个任务的裁决。
undefined5.2.4 Dispatch Judges by Grouping Type
5.2.4 按分组类型调度法官
Independent and Repeatable targets -- one judge per task:
Use Task tool:
- description: "Judge: {target name}"
- prompt: {judge verification prompt with exact meta-judge specification YAML, and Pre-existing or Expected Parallel Changes section if applicable}
- model: opus
- subagent_type: "sadd:judge"For repeatable groups, each judge receives the SAME shared reusable spec from the group's single meta-judge. The judge prompt template from 5.2.2 is used as-is; only the target and implementation output differ between judges.
Shared group -- ONE judge for the entire group:
Use Task tool:
- description: "Judge (shared): {group description}"
- prompt: {shared group judge prompt from 5.2.3 with combined meta-judge specification YAML and ALL implementation outputs}
- model: opus
- subagent_type: "sadd:judge"Launch ALL judges in parallel (independent, repeatable, and shared judges all dispatched in same response).
CRITICAL: NEVER provide score threshold, in any format, including or anything different. Judge MUST not know what threshold for score is, in order to not be biased!!!
threshold_pass独立和可重复目标 — 每个任务一个法官:
使用Task工具:
- description: "Judge: {目标名称}"
- prompt: {法官验证提示词,包含精确的元法官规范YAML,以及适用的预先存在或预期的并行变更部分}
- model: opus
- subagent_type: "sadd:judge"对于可重复组,每个法官接收来自组单个元法官的相同共享可重用规范。按5.2.2中的法官提示词模板原样使用;法官之间仅目标和执行输出不同。
共享组 — 整个组一个法官:
使用Task工具:
- description: "Judge (shared): {组描述}"
- prompt: {来自5.2.3的共享组法官提示词,包含联合元法官规范YAML和所有执行输出}
- model: opus
- subagent_type: "sadd:judge"并行启动所有法官(独立、可重复和共享法官都在同一响应中调度)。
关键注意事项:绝对不要以任何格式提供分数阈值,包括或其他任何形式。法官绝对不能知道分数阈值,以避免偏见!
threshold_pass5.3 Parse Verdict and Iterate
5.3 解析裁决并迭代
Parse judge output for each target (DO NOT read full report):
Extract from judge reply:
- VERDICT: PASS or FAIL
- SCORE: X.X/5.0
- ISSUES: List of problems (if any)
- IMPROVEMENTS: List of suggestions (if any)Decision logic per target:
If score >= 4:
-> VERDICT: PASS
-> Mark target complete
-> Include IMPROVEMENTS as optional enhancements
IF score >= 3.0 and all found issues are low priority, then:
-> VERDICT: PASS
-> Mark target complete
-> Include IMPROVEMENTS as optional enhancements
If score < 4:
-> VERDICT: FAIL
-> Check retry count for this target
If retries < 3:
-> Dispatch retry implementation agent with judge feedback
-> Return to judge verification with same target-specific meta-judge specification
If retries >= 3:
-> Mark target as failed (isolate from other targets)
-> Do NOT proceed with more retries without user decisionIMPORTANT: Failures are isolated
- One target failing does NOT affect other targets
- Other parallel tasks continue independently
- Only the failed target is retried
Shared group verdict parsing:
For shared groups, the judge produces per-task verdicts within a single report. Parse each task's verdict individually:
Extract from shared judge reply:
- Per-task verdicts:
- Task 1 ({target}): VERDICT: PASS/FAIL, SCORE: X.X/5.0, ISSUES: [...]
- Task 2 ({target}): VERDICT: PASS/FAIL, SCORE: X.X/5.0, ISSUES: [...]
- OVERALL SCORE: X.X/5.0
- CROSS-TASK ISSUES: List of integration problems (if any)Shared group retry logic:
If shared judge finds failures:
1. Identify which specific task(s) failed from per-task verdicts
2. Re-launch ONLY the implementation agent(s) for the failed task(s)
-- Do NOT re-launch agents whose tasks passed
3. After retry implementation completes, re-launch the shared judge
to review ALL changes again (passed + retried)
-- The shared judge still uses the same combined meta-judge spec
4. Repeat until all tasks pass or max retries reached for any task
CRITICAL: Only the specific failing implementation agent(s) are retried.
Passing tasks are NOT re-implemented. The shared judge always reviews
the complete group together on each evaluation round.解析每个目标的法官输出(不要阅读完整报告):
从法官回复中提取:
- VERDICT: 通过或失败
- SCORE: X.X/5.0
- ISSUES: 问题列表(如有)
- IMPROVEMENTS: 建议列表(如有)每个目标的决策逻辑:
如果分数 >= 4:
-> 裁决:通过
-> 标记目标完成
-> 将改进建议作为可选增强措施包含
如果分数 >= 3.0且所有发现的问题都是低优先级,则:
-> 裁决:通过
-> 标记目标完成
-> 将改进建议作为可选增强措施包含
如果分数 < 4:
-> 裁决:失败
-> 检查此目标的重试次数
如果重试次数 < 3:
-> 使用法官反馈调度重试执行代理
-> 使用相同的特定于目标的元法官规范返回法官验证
如果重试次数 >= 3:
-> 将目标标记为失败(与其他目标隔离)
-> 未获得用户决策则不再重试重要提示:失败是隔离的
- 一个目标失败不影响其他目标
- 其他并行任务独立继续
- 仅重试失败的目标
共享组裁决解析:
对于共享组,法官在单个报告中生成每个任务的裁决。单独解析每个任务的裁决:
从共享法官回复中提取:
- 每个任务的裁决:
- 任务1 ({目标}): 裁决: 通过/失败, 分数: X.X/5.0, 问题: [...]
- 任务2 ({目标}): 裁决: 通过/失败, 分数: X.X/5.0, 问题: [...]
- 总体分数: X.X/5.0
- 跨任务问题: 集成问题列表(如有)共享组重试逻辑:
如果共享法官发现失败:
1. 从每个任务的裁决中识别哪些特定任务失败
2. 仅重新启动失败任务的执行代理
-- 不要重新启动任务通过的代理
3. 重试执行完成后,重新启动共享法官
再次审查所有变更(通过的 + 重试的)
-- 共享法官仍使用相同的联合元法官规范
4. 重复直到所有任务通过或任何任务达到最大重试次数
关键注意事项:仅重试特定失败的执行代理。
通过的任务不会重新实现。共享法官在每次评估轮次中始终共同审查完整的组。5.4 Retry with Feedback (If Needed)
5.4 如有需要,使用反馈重试
Retry prompt template:
markdown
undefined重试提示词模板:
markdown
undefinedRetry Required for Target: {target_name}
目标需要重试: {target_name}
Your previous implementation did not pass judge verification.
你之前的实现未通过法官验证。
Original Task
原始任务
{Original task description}
{原始任务描述}
Target
目标
{Specific target}
{特定目标}
Judge Feedback
法官反馈
VERDICT: FAIL
SCORE: {score}/5.0
ISSUES:
{list of issues from judge}
裁决: 失败
分数: {score}/5.0
问题:
{法官列出的问题}
Your Previous Changes
你之前的变更
{files modified in previous attempt}
{之前尝试中修改的文件}
Instructions
指令
Let's fix the identified issues step by step.
- Review each issue the judge identified
- For each issue, determine the root cause
- Plan the fix for each issue
- Implement ALL fixes
- Verify your fixes address each issue
- Provide updated Summary section
CRITICAL: Focus on fixing the specific issues identified. Do not rewrite everything.
undefined让我们逐步修复已识别的问题。
- 审查法官识别的每个问题
- 对于每个问题,确定根本原因
- 规划每个问题的修复方案
- 实施所有修复
- 验证你的修复解决了每个问题
- 提供更新的摘要部分
关键注意事项:专注于修复已识别的特定问题。不要重写所有内容。
undefinedPhase 6: Collect and Summarize Results
阶段6:收集并总结结果
After all agents complete (with retries as needed), aggregate results:
markdown
undefined所有代理完成后(必要时进行重试),汇总结果:
markdown
undefinedParallel Execution Summary
并行执行摘要
Configuration
配置
- Task: {task description}
- Model: {selected model}
- Targets: {count} items
- 任务: {任务描述}
- 模型: {选择的模型}
- 目标: {数量} 项
Results
结果
| Target | Grouping | Model | Judge Score | Retries | Status | Summary |
|---|---|---|---|---|---|---|
| {target_1} | {Repeatable/Shared/Independent} | {model} | {X.X}/5.0 | {0-3} | SUCCESS | {brief outcome} |
| {target_2} | {Repeatable/Shared/Independent} | {model} | {X.X}/5.0 | {0-3} | SUCCESS | {brief outcome} |
| {target_3} | {Repeatable/Shared/Independent} | {model} | {X.X}/5.0 | {3} | FAILED | {failure reason} |
| ... | ... | ... | ... | ... | ... | ... |
| 目标 | 分组 | 模型 | 法官分数 | 重试次数 | 状态 | 摘要 |
|---|---|---|---|---|---|---|
| {target_1} | {可重复/共享/独立} | {model} | {X.X}/5.0 | {0-3} | 成功 | {简要结果} |
| {target_2} | {可重复/共享/独立} | {model} | {X.X}/5.0 | {0-3} | 成功 | {简要结果} |
| {target_3} | {可重复/共享/独立} | {model} | {X.X}/5.0 | {3} | 失败 | {失败原因} |
| ... | ... | ... | ... | ... | ... | ... |
Overall Assessment
总体评估
- Completed: {X}/{total}
- Failed: {Y}/{total}
- Total Retries: {sum of all retries}
- Common patterns: {any patterns across results}
- 已完成: {X}/{总数}
- 失败: {Y}/{总数}
- 总重试次数: {所有重试次数之和}
- 常见模式: {结果中的任何模式}
Verification Summary
验证摘要
{Aggregate judge verification results - any common issues?}
{汇总法官验证结果 - 是否有常见问题?}
Files Modified
修改的文件
- {list of all modified files}
- {所有修改文件的列表}
Failed Targets (If Any)
失败的目标(如有)
{For each failed target after max retries}
- Target: {name}
- Final Score: {X.X}/5.0
- Persistent Issues: {issues that weren't resolved}
- Options: Retry with guidance / Skip / Manual fix
{对于达到最大重试次数后的每个失败目标}
- 目标: {名称}
- 最终分数: {X.X}/5.0
- 持续存在的问题: {未解决的问题}
- 选项: 带指导重试 / 跳过 / 手动修复
Next Steps
下一步
{If any failures, suggest remediation}
**Failure Handling:**
- Report failed tasks clearly with error details
- Successful tasks are NOT affected by failures
- Failed targets isolated after max retries
- Suggest options: provide guidance, skip, or manual fix{如有失败,建议补救措施}
**失败处理:**
- 清晰报告失败任务及其错误细节
- 成功任务不受失败影响
- 达到最大重试次数后隔离失败目标
- 建议选项:提供指导重试、跳过或手动修复Examples
示例
Example 1: Requirement Grouping -- Mixed Repeatable + Independent (with Pre-existing Changes from Prior Batch)
示例1:需求分组 -- 混合可重复 + 独立(带有先前批次的预先存在变更)
Scenario:
A team runs two sequential do-in-parallel batches. The first batch updates API documentation across 3 endpoint files (, , ). The second batch adds tests to all 3 modules in src folder and adds a tests step to GitHub Actions. Each agent's judge in the second batch needs to know about the documentation changes from the first batch AND the expected changes from other parallel agents in the same second batch.
src/api/users.tssrc/api/orders.tssrc/api/products.tsInput (second batch -- first batch already completed earlier in session):
/do-in-parallel add tests to all 3 modules in src folder and add tests step to github actionsOrchestrator Analysis:
Phase 2: Task Analysis + Requirement Grouping
1. Task Identification:
- Task A: "Add tests to src/modules/auth.ts"
- Task B: "Add tests to src/modules/cart.ts"
- Task C: "Add tests to src/modules/payments.ts"
- Task D: "Add tests step to GitHub Actions CI pipeline"
2. Requirement Grouping:
- Tasks A, B, C: REPEATABLE — same task ("add tests") applied to 3 different modules
→ ONE shared meta-judge producing a reusable spec
- Task D: INDEPENDENT — different task type (CI configuration)
→ Separate meta-judge
3. Pre-existing and Expected Parallel Changes Assessment:
- Pre-existing (from prior batch): API documentation updated across
src/api/users.ts, src/api/orders.ts, src/api/products.ts
- Expected parallel: Each agent should be aware that other agents in this
batch are adding tests to other modules and updating GH Actions simultaneously
4. Agent Count:
- Meta-judges: 2 (1 repeatable for tests + 1 independent for GH Actions)
- Implementation agents: 4 (one per task, always isolated)
- Judges: 4 (3 using shared test spec + 1 for GH Actions)
- Total: 10 agents (vs. 12 without grouping)Phase 3.5: Meta-Judge Dispatch (2 meta-judges in parallel):
[Meta-judge 1: Repeatable group — test generation]
Use Task tool:
- description: "Meta-judge (repeatable): reusable spec for adding tests across 3 modules"
- prompt:
## Task
Generate a REUSABLE evaluation specification yaml that can be applied to
ANY of the following targets performing the same task. You will produce
rubrics, checklists, and scoring criteria that individual judge agents
will each use independently to evaluate one target's implementation artifact.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt as Context
add tests to all 3 modules in src folder and add tests step to github actions
## Task Being Repeated
Add comprehensive unit tests to a source module
## Targets in This Group
- src/modules/auth.ts
- src/modules/cart.ts
- src/modules/payments.ts
## Context
Project uses Jest for testing. Test files should be co-located as
*.test.ts files. Existing test patterns available in src/modules/__tests__/.
## Artifact Type
code
## Instructions
CRITICAL: You are generating a REUSABLE spec that will be applied to
EACH target independently by separate judges.
- Use generic language: "target file should align with criteria" instead
of "all files should align"
- Do NOT include file-specific requirements (e.g., NOT "auth.ts should
test only authentication logic") since this same spec will be applied
to different files
- The spec must be applicable to ANY target in this group without modification
- Each judge will receive this same spec and evaluate only its own target
against it
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents.
Return only the final evaluation specification YAML in your response.
- model: opus
- subagent_type: "sadd:meta-judge"
[Meta-judge 2: Independent — GitHub Actions]
Use Task tool:
- description: "Meta-judge: add tests step to GitHub Actions"
- prompt:
## Task
Generate an evaluation specification yaml for the following task applied
to a specific target. You will produce rubrics, checklists, and scoring
criteria that a judge agent will use to evaluate the implementation
artifact for this specific target.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt as Context
add tests to all 3 modules in src folder and add tests step to github actions
## Target
Add a test execution step to the GitHub Actions CI pipeline
(.github/workflows/ci.yml or similar)
## Context
Project uses Jest for testing. The CI pipeline should run tests after
build step. Existing workflow file may need a new job or step.
## Artifact Type
configuration
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Generate
evaluation specification ONLY for adding the tests step to GitHub Actions.
Your report will be used to verify only this particular task, not the
all tasks in the user prompt.
Return only the final evaluation specification YAML in your response.
- model: opus
- subagent_type: "sadd:meta-judge"
[Both meta-judges launched simultaneously]Phase 5: Implementation Dispatch (4 agents in parallel, after meta-judges complete):
[Implementation 1: auth module tests]
Use Task tool:
- description: "Parallel: add tests to src/modules/auth.ts"
- prompt:
## Reasoning Approach
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Add comprehensive unit tests</task>
<target>src/modules/auth.ts</target>
<constraints>
- Work ONLY on the specified target
- Do NOT modify other files unless explicitly required
- Follow existing test patterns in the project
</constraints>
<output>
Create test file for the auth module.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## Self-Critique Verification (MANDATORY)
[standard self-critique suffix]
- model: sonnet
[Implementation 2: cart module tests]
Use Task tool:
- description: "Parallel: add tests to src/modules/cart.ts"
- prompt:
## Reasoning Approach
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Add comprehensive unit tests</task>
<target>src/modules/cart.ts</target>
<constraints>
- Work ONLY on the specified target
- Do NOT modify other files unless explicitly required
- Follow existing test patterns in the project
</constraints>
<output>
Create test file for the cart module.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## Self-Critique Verification (MANDATORY)
Before submitting, verify your work:
1. Re-read the original task and confirm every requirement is addressed
2. Check that all tests follow existing patterns in the project
3. Verify no unrelated files were modified
4. Confirm the Summary section is complete and accurate
- model: sonnet
[Implementation 3: payments module tests]
Use Task tool:
- description: "Parallel: add tests to src/modules/payments.ts"
- prompt: [Same CoT prefix + task body for payments.ts + critique suffix]
- model: sonnet
[Implementation 4: GitHub Actions test step]
Use Task tool:
- description: "Parallel: add tests step to GitHub Actions CI"
- prompt:
## Reasoning Approach
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Add a test execution step to the GitHub Actions CI pipeline</task>
<target>.github/workflows/ci.yml</target>
<constraints>
- Work ONLY on the CI workflow file
- Add a step that runs the test suite after the build step
- Do NOT modify other workflow files or steps beyond what is necessary
- Follow existing workflow patterns and conventions
</constraints>
<output>
Update the CI workflow with a test execution step.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## Self-Critique Verification (MANDATORY)
Before submitting, verify your work:
1. Re-read the original task and confirm every requirement is addressed
2. Check that the workflow YAML is valid and well-structured
3. Verify no unrelated workflow steps were modified
4. Confirm the Summary section is complete and accurate
- model: sonnet
[All 4 launched simultaneously]Phase 5.2: Judge Dispatch (4 judges in parallel, after ALL implementors complete):
[Judge 1: auth module — uses SHARED reusable spec from repeatable meta-judge]
Use Task tool:
- description: "Judge: src/modules/auth.ts"
- prompt:
You are evaluating an implementation artifact for target
src/modules/auth.ts against an evaluation specification produced
by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt
add tests to all 3 modules in src folder and add tests step to github actions
## Target
src/modules/auth.ts
## Pre-existing and expected parallel changes (Context Only)
The following changes were made before or expected to be done by
other parallel agents in the same batch now. They are NOT part of
the current implementation agent's output. Focus your evaluation
on the current agent's changes to its specific target. Only verify
other changed files/logic if they directly relate to the current
target's task requirements.
### Previous do-in-parallel: "Update API documentation for all endpoints"
The following files were modified as part of a previous parallel batch:
- src/api/users.ts (modified) - Added JSDoc to public methods,
updated module header
- src/api/orders.ts (modified) - Added JSDoc to public methods,
added @example tags
- src/api/products.ts (modified) - Added JSDoc to public methods,
updated type annotations
### Expected parallel changes (current batch)
Other agents in this batch are simultaneously:
- Adding tests to src/modules/cart.ts and src/modules/payments.ts
(repeatable group — same task on other modules)
- Adding a tests step to .github/workflows/ci.yml (independent task)
## Evaluation Specification
```yaml
{EXACT reusable spec YAML from repeatable meta-judge — same for all 3 module judges}
```
## Implementation Output
{Summary from auth implementation agent}
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Evaluate ONLY
the test generation for auth.ts.
Follow your full judge process as defined in your agent instructions!
## Output
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[Judge 2: cart module — uses SAME shared reusable spec]
Use Task tool:
- description: "Judge: src/modules/cart.ts"
- prompt: [Same judge template, same reusable spec YAML, cart implementation output.
Pre-existing and expected parallel changes section: same prior batch info,
expected parallel changes list auth.ts, payments.ts, and GH Actions instead]
- model: opus
- subagent_type: "sadd:judge"
[Judge 3: payments module — uses SAME shared reusable spec]
Use Task tool:
- description: "Judge: src/modules/payments.ts"
- prompt: [Same judge template, same reusable spec YAML, payments implementation output.
Pre-existing and expected parallel changes section: same prior batch info,
expected parallel changes list auth.ts, cart.ts, and GH Actions instead]
- model: opus
- subagent_type: "sadd:judge"
[Judge 4: GitHub Actions — uses INDEPENDENT spec from GH Actions meta-judge]
Use Task tool:
- description: "Judge: GitHub Actions CI"
- prompt:
You are evaluating an implementation artifact for target
.github/workflows/ci.yml against an evaluation specification produced
by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt
add tests to all 3 modules in src folder and add tests step to github actions
## Target
.github/workflows/ci.yml
## Pre-existing and expected parallel changes (Context Only)
The following changes were made before or expected to be done by
other parallel agents in the same batch now. They are NOT part of
the current implementation agent's output. Focus your evaluation
on the current agent's changes to its specific target. Only verify
other changed files/logic if they directly relate to the current
target's task requirements.
### Previous do-in-parallel: "Update API documentation for all endpoints"
The following files were modified as part of a previous parallel batch:
- src/api/users.ts (modified) - Added JSDoc to public methods,
updated module header
- src/api/orders.ts (modified) - Added JSDoc to public methods,
added @example tags
- src/api/products.ts (modified) - Added JSDoc to public methods,
updated type annotations
### Expected parallel changes (current batch)
Other agents in this batch are simultaneously:
- Adding tests to src/modules/auth.ts, src/modules/cart.ts,
and src/modules/payments.ts (repeatable group — test generation)
## Evaluation Specification
```yaml
{EXACT spec YAML from independent GH Actions meta-judge}
```
## Implementation Output
{Summary from GH Actions implementation agent}
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Evaluate ONLY
the GitHub Actions test step.
Follow your full judge process as defined in your agent instructions!
## Output
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[All 4 judges launched simultaneously]Result:
| Target | Grouping | Model | Judge Score | Retries | Status |
|---|---|---|---|---|---|
| src/modules/auth.ts | Repeatable | sonnet | 4.2/5.0 | 0 | SUCCESS |
| src/modules/cart.ts | Repeatable | sonnet | 4.0/5.0 | 0 | SUCCESS |
| src/modules/payments.ts | Repeatable | sonnet | 4.1/5.0 | 0 | SUCCESS |
| .github/workflows/ci.yml | Independent | sonnet | 4.3/5.0 | 0 | SUCCESS |
Overall: 4/4 completed. Total Agents: 10 (2 meta-judges + 4 implementations + 4 judges)
场景:
团队运行两个连续的do-in-parallel批次。第一个批次更新3个端点文件(、、)的API文档。第二个批次为src文件夹中的所有3个模块添加测试,并为GitHub Actions添加测试步骤。第二个批次中每个代理的法官需要了解第一个批次的文档变更以及同一第二个批次中其他并行代理的预期变更。
src/api/users.tssrc/api/orders.tssrc/api/products.ts输入(第二个批次 -- 第一个批次已在会话中较早完成):
/do-in-parallel add tests to all 3 modules in src folder and add tests step to github actions编排者分析:
阶段2:任务分析 + 需求分组
1. 任务识别:
- 任务A: "Add tests to src/modules/auth.ts"
- 任务B: "Add tests to src/modules/cart.ts"
- 任务C: "Add tests to src/modules/payments.ts"
- 任务D: "Add tests step to GitHub Actions CI pipeline"
2. 需求分组:
- 任务A、B、C: 可重复 — 相同任务("添加测试")应用于3个不同模块
→ 一个共享元法官生成可重用规范
- 任务D: 独立 — 不同任务类型(CI配置)
→ 单独的元法官
3. 预先存在和预期的并行变更评估:
- 预先存在(来自先前批次): API文档更新涉及
src/api/users.ts、src/api/orders.ts、src/api/products.ts
- 预期并行: 每个代理应了解此批次中其他代理同时为其他模块添加测试并更新GH Actions
4. 代理数量:
- 元法官: 2个(1个用于测试的可重复组 + 1个用于GH Actions的独立任务)
- 执行代理: 4个(每个任务一个,始终隔离)
- 法官: 4个(3个使用共享测试规范 + 1个用于GH Actions)
- 总计: 10个代理(不分组则为12个)阶段3.5:元法官调度(2个元法官并行):
[元法官1: 可重复组 — 测试生成]
使用Task工具:
- description: "Meta-judge (repeatable): reusable spec for adding tests across 3 modules"
- prompt:
## 任务
Generate a REUSABLE evaluation specification yaml that can be applied to
ANY of the following targets performing the same task. You will produce
rubrics, checklists, and scoring criteria that individual judge agents
will each use independently to evaluate one target's implementation artifact.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词作为上下文
add tests to all 3 modules in src folder and add tests step to github actions
## 重复执行的任务
Add comprehensive unit tests to a source module
## 此组中的目标
- src/modules/auth.ts
- src/modules/cart.ts
- src/modules/payments.ts
## 上下文
Project uses Jest for testing. Test files should be co-located as
*.test.ts files. Existing test patterns available in src/modules/__tests__/.
## 产物类型
code
## 指令
CRITICAL: You are generating a REUSABLE spec that will be applied to
EACH target independently by separate judges.
- Use generic language: "target file should align with criteria" instead
of "all files should align"
- Do NOT include file-specific requirements (e.g., NOT "auth.ts should
test only authentication logic") since this same spec will be applied
to different files
- The spec must be applicable to ANY target in this group without modification
- Each judge will receive this same spec and evaluate only its own target
against it
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。
仅在响应中返回最终的评估规范YAML。
- model: opus
- subagent_type: "sadd:meta-judge"
[元法官2: 独立任务 — GitHub Actions]
使用Task工具:
- description: "Meta-judge: add tests step to GitHub Actions"
- prompt:
## 任务
Generate an evaluation specification yaml for the following task applied
to a specific target. You will produce rubrics, checklists, and scoring
criteria that a judge agent will use to evaluate the implementation
artifact for this specific target.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词作为上下文
add tests to all 3 modules in src folder and add tests step to github actions
## 目标
Add a test execution step to the GitHub Actions CI pipeline
(.github/workflows/ci.yml or similar)
## 上下文
Project uses Jest for testing. The CI pipeline should run tests after
build step. Existing workflow file may need a new job or step.
## 产物类型
configuration
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅为向GitHub Actions添加测试步骤生成评估规范。你的报告将仅用于验证此特定任务,而非用户提示词中的所有任务。
仅在响应中返回最终的评估规范YAML。
- model: opus
- subagent_type: "sadd:meta-judge"
[两个元法官同时启动]阶段5:执行调度(4个代理并行,元法官完成后):
[执行1: auth模块测试]
使用Task工具:
- description: "Parallel: add tests to src/modules/auth.ts"
- prompt:
## 推理方法
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Add comprehensive unit tests</task>
<target>src/modules/auth.ts</target>
<constraints>
- Work ONLY on the specified target
- Do NOT modify other files unless explicitly required
- Follow existing test patterns in the project
</constraints>
<output>
Create test file for the auth module.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## 自我审查验证(必须)
[标准自我审查后缀]
- model: sonnet
[执行2: cart模块测试]
使用Task工具:
- description: "Parallel: add tests to src/modules/cart.ts"
- prompt:
## 推理方法
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Add comprehensive unit tests</task>
<target>src/modules/cart.ts</target>
<constraints>
- Work ONLY on the specified target
- Do NOT modify other files unless explicitly required
- Follow existing test patterns in the project
</constraints>
<output>
Create test file for the cart module.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## 自我审查验证(必须)
提交之前,验证你的工作:
1. 重新阅读原始任务,确认所有需求都已满足
2. 检查所有测试是否遵循项目中的现有模式
3. 验证未修改无关文件
4. 确认摘要部分完整准确
- model: sonnet
[执行3: payments模块测试]
使用Task工具:
- description: "Parallel: add tests to src/modules/payments.ts"
- prompt: [相同的思维链前缀 + payments.ts的任务主体 + 审查后缀]
- model: sonnet
[执行4: GitHub Actions测试步骤]
使用Task工具:
- description: "Parallel: add tests step to GitHub Actions CI"
- prompt:
## 推理方法
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Add a test execution step to the GitHub Actions CI pipeline</task>
<target>.github/workflows/ci.yml</target>
<constraints>
- Work ONLY on the CI workflow file
- Add a step that runs the test suite after the build step
- Do NOT modify other workflow files or steps beyond what is necessary
- Follow existing workflow patterns and conventions
</constraints>
<output>
Update the CI workflow with a test execution step.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## 自我审查验证(必须)
提交之前,验证你的工作:
1. 重新阅读原始任务,确认所有需求都已满足
2. 检查工作流YAML是否有效且结构良好
3. 验证未修改无关工作流步骤
4. 确认摘要部分完整准确
- model: sonnet
[所有4个代理同时启动]阶段5.2:法官调度(4个法官并行,所有执行代理完成后):
[法官1: auth模块 — 使用可重复元法官的共享可重用规范]
使用Task工具:
- description: "Judge: src/modules/auth.ts"
- prompt:
You are evaluating an implementation artifact for target
src/modules/auth.ts against an evaluation specification produced
by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词
add tests to all 3 modules in src folder and add tests step to github actions
## 目标
src/modules/auth.ts
## 预先存在和预期的并行变更(仅上下文)
The following changes were made before or expected to be done by
other parallel agents in the same batch now. They are NOT part of
the current implementation agent's output. Focus your evaluation
on the current agent's changes to its specific target. Only verify
other changed files/logic if they directly relate to the current
target's task requirements.
### Previous do-in-parallel: "Update API documentation for all endpoints"
The following files were modified as part of a previous parallel batch:
- src/api/users.ts (modified) - Added JSDoc to public methods,
updated module header
- src/api/orders.ts (modified) - Added JSDoc to public methods,
added @example tags
- src/api/products.ts (modified) - Added JSDoc to public methods,
updated type annotations
### Expected parallel changes (current batch)
Other agents in this batch are simultaneously:
- Adding tests to src/modules/cart.ts and src/modules/payments.ts
(repeatable group — same task on other modules)
- Adding a tests step to .github/workflows/ci.yml (independent task)
## 评估规范
```yaml
{来自可重复元法官的精确可重用规范YAML — 所有3个模块法官相同}
```
## 执行输出
{来自auth执行代理的摘要}
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅评估auth.ts的测试生成。
遵循代理指令中定义的完整法官流程!
## 输出
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[法官2: cart模块 — 使用相同的共享可重用规范]
使用Task工具:
- description: "Judge: src/modules/cart.ts"
- prompt: [相同的法官模板,相同的可重用规范YAML,cart执行输出。
预先存在和预期的并行变更部分:相同的先前批次信息,
预期并行变更列表改为auth.ts、payments.ts和GH Actions]
- model: opus
- subagent_type: "sadd:judge"
[法官3: payments模块 — 使用相同的共享可重用规范]
使用Task工具:
- description: "Judge: src/modules/payments.ts"
- prompt: [相同的法官模板,相同的可重用规范YAML,payments执行输出。
预先存在和预期的并行变更部分:相同的先前批次信息,
预期并行变更列表改为auth.ts、cart.ts和GH Actions]
- model: opus
- subagent_type: "sadd:judge"
[法官4: GitHub Actions — 使用GH Actions元法官的独立规范]
使用Task工具:
- description: "Judge: GitHub Actions CI"
- prompt:
You are evaluating an implementation artifact for target
.github/workflows/ci.yml against an evaluation specification produced
by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词
add tests to all 3 modules in src folder and add tests step to github actions
## 目标
.github/workflows/ci.yml
## 预先存在和预期的并行变更(仅上下文)
The following changes were made before or expected to be done by
other parallel agents in the same batch now. They are NOT part of
the current implementation agent's output. Focus your evaluation
on the current agent's changes to its specific target. Only verify
other changed files/logic if they directly relate to the current
target's task requirements.
### Previous do-in-parallel: "Update API documentation for all endpoints"
The following files were modified as part of a previous parallel batch:
- src/api/users.ts (modified) - Added JSDoc to public methods,
updated module header
- src/api/orders.ts (modified) - Added JSDoc to public methods,
added @example tags
- src/api/products.ts (modified) - Added JSDoc to public methods,
updated type annotations
### Expected parallel changes (current batch)
Other agents in this batch are simultaneously:
- Adding tests to src/modules/auth.ts, src/modules/cart.ts,
and src/modules/payments.ts (repeatable group — test generation)
## 评估规范
```yaml
{来自独立GH Actions元法官的精确规范YAML}
```
## 执行输出
{来自GH Actions执行代理的摘要}
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅评估GitHub Actions测试步骤。
遵循代理指令中定义的完整法官流程!
## 输出
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[所有4个法官同时启动]结果:
| 目标 | 分组 | 模型 | 法官分数 | 重试次数 | 状态 |
|---|---|---|---|---|---|
| src/modules/auth.ts | 可重复 | sonnet | 4.2/5.0 | 0 | 成功 |
| src/modules/cart.ts | 可重复 | sonnet | 4.0/5.0 | 0 | 成功 |
| src/modules/payments.ts | 可重复 | sonnet | 4.1/5.0 | 0 | 成功 |
| .github/workflows/ci.yml | 独立 | sonnet | 4.3/5.0 | 0 | 成功 |
总体: 4/4完成。总代理数: 10个(2个元法官 + 4个执行代理 + 4个法官)
Example 2: Requirement Grouping -- Shared + Repeatable Combined (with Pre-existing User Changes)
示例2:需求分组 -- 共享 + 可重复组合(带有用户预先存在的变更)
Scenario:
A developer has been working on a Node.js backend during the conversation. They refactored the database connection layer and updated several service modules manually, including adding S3 class interface. Then they invoked do-in-parallel to implement and integrate the S3 interface, and also refactor the cart module. Each agent's judge needs to know about the user's prior modifications AND the expected changes from other parallel agents in the same batch.
Input:
/do-in-parallel I wrote class interface for S3 service in s3.adapter.ts, please do 2 tasks: implement s3 adapter with tests and integrate s3 adapter to analytics module. Also refactor and simplify all files in cart moduleOrchestrator Analysis:
Phase 2: Task Analysis + Requirement Grouping
1. Task Identification:
- Task A: "Implement S3 adapter with tests in src/adapters/s3.adapter.ts"
- Task B: "Integrate S3 adapter into src/modules/analytics.module.ts"
- Task C: "Refactor and simplify src/modules/cart/cart.service.ts"
- Task D: "Refactor and simplify src/modules/cart/cart.repository.ts"
- Task E: "Refactor and simplify src/modules/cart/cart.controller.ts"
2. Requirement Grouping:
- Tasks A, B: SHARED — interdependent (adapter must match interface consumed
by analytics integration; should be reviewed together)
→ ONE combined meta-judge, ONE shared judge
- Tasks C, D, E: REPEATABLE — same task ("refactor and simplify") applied
to 3 different files in cart module
→ ONE reusable meta-judge
3. Pre-existing and Expected Parallel Changes Assessment:
- Pre-existing (user modifications): Refactored database connection layer
(src/db/connection.ts, src/db/queries.ts), updated service modules,
and added S3 class interface in src/adapters/s3.adapter.ts
- Expected parallel: S3 adapter implementation and analytics integration
run in parallel (shared group); cart refactoring agents run in parallel
(repeatable group); both groups run simultaneously
4. Agent Count:
- Meta-judges: 2 (1 shared for S3 work + 1 repeatable for cart refactoring)
- Implementation agents: 5 (one per task, always isolated)
- Judges: 4 (1 shared for S3 group + 3 individual for cart)
- Total: 11 agents (vs. 15 without grouping)Phase 3.5: Meta-Judge Dispatch (2 meta-judges in parallel):
[Meta-judge 1: Shared group — S3 adapter + integration]
Use Task tool:
- description: "Meta-judge (shared): combined spec for S3 adapter and analytics integration"
- prompt:
## Task
Generate a COMBINED evaluation specification yaml that covers ALL of the
following related tasks. These tasks are interdependent and will be
reviewed TOGETHER by a single judge. You will produce rubrics, checklists,
and scoring criteria that account for cross-task dependencies and
integration points.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt as Context
I wrote class interface for S3 service in s3.adapter.ts, please do 2 tasks:
implement s3 adapter with tests and integrate s3 adapter to analytics module.
Also refactor and simplify all files in cart module
## Tasks in This Shared Group
- Task A: Implement S3 adapter with tests -> src/adapters/s3.adapter.ts
- Task B: Integrate S3 adapter into analytics module -> src/modules/analytics.module.ts
## Context
The user has already written the class interface in s3.adapter.ts. Task A
implements the interface methods and adds unit tests. Task B integrates the
adapter into the analytics module. The adapter's public API from Task A must
match what Task B consumes.
## Artifact Type
code
## Instructions
CRITICAL: You are generating a COMBINED spec for tasks that will be
reviewed TOGETHER by ONE judge.
- Include evaluation criteria for EACH individual task
- Include cross-task verification criteria (e.g., "S3 adapter's public
methods match the calls made by the analytics integration")
- Organize the spec so the judge can identify which criteria apply to
which task's changes
- The judge will review ALL changes from ALL tasks in this group in a
single evaluation
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents.
Return only the final evaluation specification YAML in your response.
- model: opus
- subagent_type: "sadd:meta-judge"
[Meta-judge 2: Repeatable group — cart refactoring]
Use Task tool:
- description: "Meta-judge (repeatable): reusable spec for refactoring cart module files"
- prompt:
## Task
Generate a REUSABLE evaluation specification yaml that can be applied to
ANY of the following targets performing the same task. You will produce
rubrics, checklists, and scoring criteria that individual judge agents
will each use independently to evaluate one target's implementation artifact.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt as Context
I wrote class interface for S3 service in s3.adapter.ts, please do 2 tasks:
implement s3 adapter with tests and integrate s3 adapter to analytics module.
Also refactor and simplify all files in cart module
## Task Being Repeated
Refactor and simplify a source file in the cart module
## Targets in This Group
- src/modules/cart/cart.service.ts
- src/modules/cart/cart.repository.ts
- src/modules/cart/cart.controller.ts
## Context
All three files are in the cart module. Refactoring should simplify logic,
reduce complexity, improve readability while preserving existing behavior.
## Artifact Type
code
## Instructions
CRITICAL: You are generating a REUSABLE spec that will be applied to
EACH target independently by separate judges.
- Use generic language: "target file should align with criteria" instead
of "all files should align"
- Do NOT include file-specific requirements since this same spec will be
applied to different files
- The spec must be applicable to ANY target in this group without modification
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents.
Return only the final evaluation specification YAML in your response.
- model: opus
- subagent_type: "sadd:meta-judge"
[Both meta-judges launched simultaneously]Phase 5: Implementation Dispatch (5 agents in parallel, after meta-judges complete):
[Implementation 1: S3 adapter]
Use Task tool:
- description: "Parallel: implement S3 adapter with tests"
- prompt:
## Reasoning Approach
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Implement S3 adapter with tests based on the existing class interface</task>
<target>src/adapters/s3.adapter.ts</target>
<constraints>
- Work ONLY on the specified target
- Implement all methods defined in the existing class interface
- Add comprehensive unit tests
- Do NOT modify the analytics module
</constraints>
<output>
Implement the S3 adapter and create tests.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## Self-Critique Verification (MANDATORY)
Before submitting, verify your work:
1. Re-read the original task and confirm every requirement is addressed
2. Check that the adapter implements all interface methods correctly
3. Verify no unrelated files were modified
4. Confirm the Summary section is complete and accurate
- model: opus
[Implementation 2: Analytics integration]
Use Task tool:
- description: "Parallel: integrate S3 adapter into analytics module"
- prompt:
## Reasoning Approach
[standard CoT prefix]
<task>Integrate S3 adapter into the analytics module</task>
<target>src/modules/analytics.module.ts</target>
<constraints>
- Work ONLY on the specified target
- Import and use the S3 adapter from src/adapters/s3.adapter.ts
- Follow existing dependency injection patterns
- Do NOT modify the S3 adapter itself
</constraints>
<output>
Integrate S3 adapter into analytics module.
CRITICAL: At the end of your work, provide a "Summary" section.
</output>
## Self-Critique Verification (MANDATORY)
[standard self-critique suffix]
- model: opus
[Implementation 3: cart.service.ts refactoring]
Use Task tool:
- description: "Parallel: refactor src/modules/cart/cart.service.ts"
- prompt:
## Reasoning Approach
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Refactor and simplify the cart service</task>
<target>src/modules/cart/cart.service.ts</target>
<constraints>
- Work ONLY on the specified target
- Simplify logic, reduce complexity, improve readability
- Preserve existing behavior — no functional changes
- Do NOT modify other cart module files
</constraints>
<output>
Refactor the cart service file.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## Self-Critique Verification (MANDATORY)
Before submitting, verify your work:
1. Re-read the original task and confirm every requirement is addressed
2. Check that existing behavior is preserved after refactoring
3. Verify no unrelated files were modified
4. Confirm the Summary section is complete and accurate
- model: sonnet
[Implementation 4: cart.repository.ts refactoring]
Use Task tool:
- description: "Parallel: refactor src/modules/cart/cart.repository.ts"
- prompt: [Same CoT prefix + refactoring task body for cart.repository.ts + critique suffix]
- model: sonnet
[Implementation 5: cart.controller.ts refactoring]
Use Task tool:
- description: "Parallel: refactor src/modules/cart/cart.controller.ts"
- prompt: [Same CoT prefix + refactoring task body for cart.controller.ts + critique suffix]
- model: sonnet
[All 5 launched simultaneously]Phase 5.2: Judge Dispatch (4 judges in parallel, after ALL implementors complete):
[Judge 1: SHARED judge for S3 group — reviews both S3 adapter + analytics integration]
Use Task tool:
- description: "Judge (shared): S3 adapter implementation and analytics integration"
- prompt:
You are evaluating implementation artifacts for a group of related tasks
against a combined evaluation specification produced by the meta judge.
These tasks are interdependent and must be reviewed together.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt
I wrote class interface for S3 service in s3.adapter.ts, please do 2 tasks:
implement s3 adapter with tests and integrate s3 adapter to analytics module.
Also refactor and simplify all files in cart module
## Tasks in This Shared Group
- Task A: Implement S3 adapter with tests -> src/adapters/s3.adapter.ts
- Task B: Integrate S3 adapter into analytics module -> src/modules/analytics.module.ts
## Pre-existing and expected parallel changes (Context Only)
The following changes were made before or expected to be done by
other parallel agents in the same batch now. They are NOT part of
the current implementation agents' output for this shared group.
Focus your evaluation on the S3 group's changes. Only verify other
changed files/logic if they directly relate to these tasks.
### User modifications (before current task)
The user made changes to the following files/modules before this
task was started:
- src/db/connection.ts (modified) - Refactored database connection
pooling
- src/db/queries.ts (modified) - Updated query builder patterns
- src/adapters/s3.adapter.ts (created) - Added S3 class interface
(the interface that Task A implements)
- Several service modules updated to use new DB connection API
### Expected parallel changes (current batch)
Other agents in this batch are simultaneously:
- Refactoring src/modules/cart/cart.service.ts (repeatable group)
- Refactoring src/modules/cart/cart.repository.ts (repeatable group)
- Refactoring src/modules/cart/cart.controller.ts (repeatable group)
## Evaluation Specification
```yaml
{EXACT combined spec YAML from shared S3 meta-judge}
```
## Implementation Outputs
### Task: Implement S3 adapter with tests -> src/adapters/s3.adapter.ts
{Summary from S3 adapter implementation agent}
Files: src/adapters/s3.adapter.ts (modified), src/adapters/s3.adapter.test.ts (created)
### Task: Integrate S3 adapter into analytics -> src/modules/analytics.module.ts
{Summary from analytics integration agent}
Files: src/modules/analytics.module.ts (modified)
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Evaluate ALL
tasks in this shared group together. Verify cross-task integration points
(e.g., does the adapter's public API match what the analytics module consumes?).
CRITICAL: For each task, indicate separately whether it PASSED or FAILED
so that only failing tasks can be retried.
Follow your full judge process as defined in your agent instructions!
## Output
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response! Include per-task verdicts.
- model: opus
- subagent_type: "sadd:judge"
[Judge 2: cart.service.ts — uses SHARED reusable spec from repeatable meta-judge]
Use Task tool:
- description: "Judge: src/modules/cart/cart.service.ts"
- prompt:
You are evaluating an implementation artifact for target
src/modules/cart/cart.service.ts against an evaluation specification
produced by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt
[original user prompt]
## Target
src/modules/cart/cart.service.ts
## Pre-existing and expected parallel changes (Context Only)
The following changes were made before or expected to be done by
other parallel agents in the same batch now. They are NOT part of
the current implementation agent's output. Focus your evaluation
on the current agent's changes to its specific target. Only verify
other changed files/logic if they directly relate to the current
target's task requirements.
### User modifications (before current task)
The user made changes to the following files/modules before this
task was started:
- src/db/connection.ts (modified) - Refactored database connection
pooling
- src/db/queries.ts (modified) - Updated query builder patterns
- src/adapters/s3.adapter.ts (created) - Added S3 class interface
- Several service modules updated to use new DB connection API
### Expected parallel changes (current batch)
Other agents in this batch are simultaneously:
- Implementing S3 adapter in src/adapters/s3.adapter.ts (shared group)
- Integrating S3 adapter into src/modules/analytics.module.ts (shared group)
- Refactoring src/modules/cart/cart.repository.ts (repeatable group)
- Refactoring src/modules/cart/cart.controller.ts (repeatable group)
## Evaluation Specification
```yaml
{EXACT reusable spec YAML from repeatable cart meta-judge — same for all 3 cart judges}
```
## Implementation Output
{Summary from cart.service.ts implementation agent}
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Evaluate ONLY
the refactoring of cart.service.ts.
Follow your full judge process as defined in your agent instructions!
## Output
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[Judge 3: cart.repository.ts — uses SAME shared reusable spec]
Use Task tool:
- description: "Judge: src/modules/cart/cart.repository.ts"
- prompt: [Same judge template, same reusable spec YAML, cart.repository implementation output.
Pre-existing and expected parallel changes section: same user modifications,
expected parallel changes list S3 group, cart.service.ts, and cart.controller.ts instead]
- model: opus
- subagent_type: "sadd:judge"
[Judge 4: cart.controller.ts — uses SAME shared reusable spec]
Use Task tool:
- description: "Judge: src/modules/cart/cart.controller.ts"
- prompt: [Same judge template, same reusable spec YAML, cart.controller implementation output.
Pre-existing and expected parallel changes section: same user modifications,
expected parallel changes list S3 group, cart.service.ts, and cart.repository.ts instead]
- model: opus
- subagent_type: "sadd:judge"
[All 4 judges launched simultaneously]Shared judge retry scenario (if S3 shared judge finds issues):
Shared Judge Verdict:
- Task A (S3 adapter): PASS, SCORE: 4.2/5.0
- Task B (analytics integration): FAIL, SCORE: 3.0/5.0
ISSUES: Analytics module imports wrong method name from S3 adapter
- CROSS-TASK ISSUES: Method signature mismatch between adapter and consumer
Retry Decision:
→ Task A PASSED — do NOT re-launch S3 adapter implementation agent
→ Task B FAILED — re-launch ONLY the analytics integration agent with feedback
→ After retry, re-launch shared judge to review ALL changes againResult:
| Target | Grouping | Model | Judge Score | Retries | Status |
|---|---|---|---|---|---|
| src/adapters/s3.adapter.ts | Shared | opus | 4.2/5.0 | 0 | SUCCESS |
| src/modules/analytics.module.ts | Shared | opus | 4.1/5.0 | 1 | SUCCESS |
| src/modules/cart/cart.service.ts | Repeatable | sonnet | 4.0/5.0 | 0 | SUCCESS |
| src/modules/cart/cart.repository.ts | Repeatable | sonnet | 4.3/5.0 | 0 | SUCCESS |
| src/modules/cart/cart.controller.ts | Repeatable | sonnet | 4.1/5.0 | 0 | SUCCESS |
Overall: 5/5 completed. Total Agents: 12 (2 meta-judges + 5 implementations + 1 retry + 4 judges [1 shared re-run + 3 cart])
场景:
开发者在对话期间一直在处理Node.js后端。他们手动重构了数据库连接层并更新了几个服务模块,包括添加S3类接口。然后他们调用do-in-parallel来实现和集成S3接口,同时重构cart模块。每个代理的法官需要了解用户先前的修改以及同一批次中其他并行代理的预期变更。
输入:
/do-in-parallel I wrote class interface for S3 service in s3.adapter.ts, please do 2 tasks: implement s3 adapter with tests and integrate s3 adapter to analytics module. Also refactor and simplify all files in cart module编排者分析:
阶段2:任务分析 + 需求分组
1. 任务识别:
- 任务A: "Implement S3 adapter with tests in src/adapters/s3.adapter.ts"
- 任务B: "Integrate S3 adapter into src/modules/analytics.module.ts"
- 任务C: "Refactor and simplify src/modules/cart/cart.service.ts"
- 任务D: "Refactor and simplify src/modules/cart/cart.repository.ts"
- 任务E: "Refactor and simplify src/modules/cart/cart.controller.ts"
2. 需求分组:
- 任务A、B: 共享 — 相互依赖(适配器必须匹配分析集成使用的接口;应一起评审)
→ 一个联合元法官,一个共享法官
- 任务C、D、E: 可重复 — 相同任务("重构和简化")应用于cart模块中的3个不同文件
→ 一个可重用元法官
3. 预先存在和预期的并行变更评估:
- 预先存在(用户修改): 重构了数据库连接层
(src/db/connection.ts, src/db/queries.ts),更新了服务模块,
并在src/adapters/s3.adapter.ts中添加了S3类接口
- 预期并行: S3适配器实现和分析集成并行运行(共享组);cart重构代理并行运行
(可重复组);两个组同时运行
4. 代理数量:
- 元法官: 2个(1个用于S3工作的共享组 + 1个用于cart重构的可重复组)
- 执行代理: 5个(每个任务一个,始终隔离)
- 法官: 4个(1个用于S3组的共享法官 + 3个用于cart的独立法官)
- 总计: 11个代理(不分组则为15个)阶段3.5:元法官调度(2个元法官并行):
[元法官1: 共享组 — S3适配器 + 集成]
使用Task工具:
- description: "Meta-judge (shared): combined spec for S3 adapter and analytics integration"
- prompt:
## 任务
Generate a COMBINED evaluation specification yaml that covers ALL of the
following related tasks. These tasks are interdependent and will be
reviewed TOGETHER by a single judge. You will produce rubrics, checklists,
and scoring criteria that account for cross-task dependencies and
integration points.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词作为上下文
I wrote class interface for S3 service in s3.adapter.ts, please do 2 tasks:
implement s3 adapter with tests and integrate s3 adapter to analytics module.
Also refactor and simplify all files in cart module
## 此共享组中的任务
- 任务A: Implement S3 adapter with tests -> src/adapters/s3.adapter.ts
- 任务B: Integrate S3 adapter into analytics module -> src/modules/analytics.module.ts
## 上下文
用户已在s3.adapter.ts中编写了类接口。任务A实现接口方法并添加单元测试。任务B将适配器集成到分析模块中。任务A中适配器的公共API必须与任务B使用的API匹配。
## 产物类型
code
## 指令
CRITICAL: You are generating a COMBINED spec for tasks that will be
reviewed TOGETHER by ONE judge.
- Include evaluation criteria for EACH individual task
- Include cross-task verification criteria (e.g., "S3 adapter's public
methods match the calls made by the analytics integration")
- Organize the spec so the judge can identify which criteria apply to
which task's changes
- The judge will review ALL changes from ALL tasks in this group in a
single evaluation
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。
仅在响应中返回最终的评估规范YAML。
- model: opus
- subagent_type: "sadd:meta-judge"
[元法官2: 可重复组 — cart重构]
使用Task工具:
- description: "Meta-judge (repeatable): reusable spec for refactoring cart module files"
- prompt:
## 任务
Generate a REUSABLE evaluation specification yaml that can be applied to
ANY of the following targets performing the same task. You will produce
rubrics, checklists, and scoring criteria that individual judge agents
will each use independently to evaluate one target's implementation artifact.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词作为上下文
I wrote class interface for S3 service in s3.adapter.ts, please do 2 tasks:
implement s3 adapter with tests and integrate s3 adapter to analytics module.
Also refactor and simplify all files in cart module
## 重复执行的任务
Refactor and simplify a source file in the cart module
## 此组中的目标
- src/modules/cart/cart.service.ts
- src/modules/cart/cart.repository.ts
- src/modules/cart/cart.controller.ts
## 上下文
所有三个文件都在cart模块中。重构应简化逻辑、降低复杂度、提高可读性,同时保留现有行为。
## 产物类型
code
## 指令
CRITICAL: You are generating a REUSABLE spec that will be applied to
EACH target independently by separate judges.
- Use generic language: "target file should align with criteria" instead
of "all files should align"
- Do NOT include file-specific requirements since this same spec will be
applied to different files
- The spec must be applicable to ANY target in this group without modification
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。
仅在响应中返回最终的评估规范YAML。
- model: opus
- subagent_type: "sadd:meta-judge"
[两个元法官同时启动]阶段5:执行调度(5个代理并行,元法官完成后):
[执行1: S3适配器]
使用Task工具:
- description: "Parallel: implement S3 adapter with tests"
- prompt:
## 推理方法
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Implement S3 adapter with tests based on the existing class interface</task>
<target>src/adapters/s3.adapter.ts</target>
<constraints>
- Work ONLY on the specified target
- Implement all methods defined in the existing class interface
- Add comprehensive unit tests
- Do NOT modify the analytics module
</constraints>
<output>
Implement the S3 adapter and create tests.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## 自我审查验证(必须)
提交之前,验证你的工作:
1. 重新阅读原始任务,确认所有需求都已满足
2. 检查适配器是否正确实现了所有接口方法
3. 验证未修改无关文件
4. 确认摘要部分完整准确
- model: opus
[执行2: 分析集成]
使用Task工具:
- description: "Parallel: integrate S3 adapter into analytics module"
- prompt:
## 推理方法
[标准思维链前缀]
<task>Integrate S3 adapter into the analytics module</task>
<target>src/modules/analytics.module.ts</target>
<constraints>
- Work ONLY on the specified target
- Import and use the S3 adapter from src/adapters/s3.adapter.ts
- Follow existing dependency injection patterns
- Do NOT modify the S3 adapter itself
</constraints>
<output>
Integrate S3 adapter into analytics module.
CRITICAL: At the end of your work, provide a "Summary" section.
</output>
## 自我审查验证(必须)
[标准自我审查后缀]
- model: opus
[执行3: cart.service.ts重构]
使用Task工具:
- description: "Parallel: refactor src/modules/cart/cart.service.ts"
- prompt:
## 推理方法
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Refactor and simplify the cart service</task>
<target>src/modules/cart/cart.service.ts</target>
<constraints>
- Work ONLY on the specified target
- Simplify logic, reduce complexity, improve readability
- Preserve existing behavior — no functional changes
- Do NOT modify other cart module files
</constraints>
<output>
Refactor the cart service file.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## 自我审查验证(必须)
提交之前,验证你的工作:
1. 重新阅读原始任务,确认所有需求都已满足
2. 检查重构后是否保留了现有行为
3. 验证未修改无关文件
4. 确认摘要部分完整准确
- model: sonnet
[执行4: cart.repository.ts重构]
使用Task工具:
- description: "Parallel: refactor src/modules/cart/cart.repository.ts"
- prompt: [相同的思维链前缀 + cart.repository.ts的重构任务主体 + 审查后缀]
- model: sonnet
[执行5: cart.controller.ts重构]
使用Task工具:
- description: "Parallel: refactor src/modules/cart/cart.controller.ts"
- prompt: [相同的思维链前缀 + cart.controller.ts的重构任务主体 + 审查后缀]
- model: sonnet
[所有5个代理同时启动]阶段5.2:法官调度(4个法官并行,所有执行代理完成后):
[法官1: S3组的共享法官 — 同时审查S3适配器 + 分析集成]
使用Task工具:
- description: "Judge (shared): S3 adapter implementation and analytics integration"
- prompt:
You are evaluating implementation artifacts for a group of related tasks
against a combined evaluation specification produced by the meta judge.
These tasks are interdependent and must be reviewed together.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词
I wrote class interface for S3 service in s3.adapter.ts, please do 2 tasks:
implement s3 adapter with tests and integrate s3 adapter to analytics module.
Also refactor and simplify all files in cart module
## 此共享组中的任务
- 任务A: Implement S3 adapter with tests -> src/adapters/s3.adapter.ts
- 任务B: Integrate S3 adapter into analytics module -> src/modules/analytics.module.ts
## 预先存在和预期的并行变更(仅上下文)
The following changes were made before or expected to be done by
other parallel agents in the same batch now. They are NOT part of
the current implementation agents' output for this shared group.
Focus your evaluation on the S3 group's changes. Only verify other
changed files/logic if they directly relate to these tasks.
### 用户修改(当前任务之前)
用户在此任务开始之前对以下文件/模块进行了更改:
- src/db/connection.ts (modified) - 重构了数据库连接
池
- src/db/queries.ts (modified) - 更新了查询构建器模式
- src/adapters/s3.adapter.ts (created) - 添加了S3类接口
(任务A实现的接口)
- 几个服务模块更新为使用新的DB连接API
### 预期并行变更(当前批次)
此批次中的其他代理同时:
- 重构src/modules/cart/cart.service.ts(可重复组)
- 重构src/modules/cart/cart.repository.ts(可重复组)
- 重构src/modules/cart/cart.controller.ts(可重复组)
## 评估规范
```yaml
{来自共享S3元法官的精确联合规范YAML}
```
## 执行输出
### 任务: Implement S3 adapter with tests -> src/adapters/s3.adapter.ts
{来自S3适配器执行代理的摘要}
文件: src/adapters/s3.adapter.ts (modified), src/adapters/s3.adapter.test.ts (created)
### 任务: Integrate S3 adapter into analytics -> src/modules/analytics.module.ts
{来自分析集成执行代理的摘要}
文件: src/modules/analytics.module.ts (modified)
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。共同评估此共享组中的所有任务。验证跨任务集成点(例如,适配器的公共API是否与分析模块使用的API匹配?)。
CRITICAL: For each task, indicate separately whether it PASSED or FAILED
so that only failing tasks can be retried.
遵循代理指令中定义的完整法官流程!
## 输出
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response! Include per-task verdicts.
- model: opus
- subagent_type: "sadd:judge"
[法官2: cart.service.ts — 使用可重复元法官的共享可重用规范]
使用Task工具:
- description: "Judge: src/modules/cart/cart.service.ts"
- prompt:
You are evaluating an implementation artifact for target
src/modules/cart/cart.service.ts against an evaluation specification
produced by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词
[原始用户提示词]
## 目标
src/modules/cart/cart.service.ts
## 预先存在和预期的并行变更(仅上下文)
The following changes were made before or expected to be done by
other parallel agents in the same batch now. They are NOT part of
the current implementation agent's output. Focus your evaluation
on the current agent's changes to its specific target. Only verify
other changed files/logic if they directly relate to the current
target's task requirements.
### 用户修改(当前任务之前)
用户在此任务开始之前对以下文件/模块进行了更改:
- src/db/connection.ts (modified) - 重构了数据库连接
池
- src/db/queries.ts (modified) - 更新了查询构建器模式
- src/adapters/s3.adapter.ts (created) - 添加了S3类接口
- 几个服务模块更新为使用新的DB连接API
### 预期并行变更(当前批次)
此批次中的其他代理同时:
- 在src/adapters/s3.adapter.ts中实现S3适配器(共享组)
- 将S3适配器集成到src/modules/analytics.module.ts(共享组)
- 重构src/modules/cart/cart.repository.ts(可重复组)
- 重构src/modules/cart/cart.controller.ts(可重复组)
## 评估规范
```yaml
{来自可重复cart元法官的精确可重用规范YAML — 所有3个cart法官相同}
```
## 执行输出
{来自cart.service.ts执行代理的摘要}
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅评估cart.service.ts的重构。
遵循代理指令中定义的完整法官流程!
## 输出
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[法官3: cart.repository.ts — 使用相同的共享可重用规范]
使用Task工具:
- description: "Judge: src/modules/cart/cart.repository.ts"
- prompt: [相同的法官模板,相同的可重用规范YAML,cart.repository执行输出。
预先存在和预期的并行变更部分:相同的用户修改,
预期并行变更列表改为S3组、cart.service.ts和cart.controller.ts]
- model: opus
- subagent_type: "sadd:judge"
[法官4: cart.controller.ts — 使用相同的共享可重用规范]
使用Task工具:
- description: "Judge: src/modules/cart/cart.controller.ts"
- prompt: [相同的法官模板,相同的可重用规范YAML,cart.controller执行输出。
预先存在和预期的并行变更部分:相同的用户修改,
预期并行变更列表改为S3组、cart.service.ts和cart.repository.ts]
- model: opus
- subagent_type: "sadd:judge"
[所有4个法官同时启动]共享法官重试场景(如果S3共享法官发现问题):
共享法官裁决:
- 任务A(S3适配器): 通过,分数: 4.2/5.0
- 任务B(分析集成): 失败,分数: 3.0/5.0
问题: 分析模块从S3适配器导入了错误的方法名称
- 跨任务问题: 适配器和消费者之间的方法签名不匹配
重试决策:
→ 任务A通过 — 不要重新启动S3适配器执行代理
→ 任务B失败 — 仅使用反馈重新启动分析集成代理
→ 重试后,重新启动共享法官以再次审查所有变更结果:
| 目标 | 分组 | 模型 | 法官分数 | 重试次数 | 状态 |
|---|---|---|---|---|---|
| src/adapters/s3.adapter.ts | 共享 | opus | 4.2/5.0 | 0 | 成功 |
| src/modules/analytics.module.ts | 共享 | opus | 4.1/5.0 | 1 | 成功 |
| src/modules/cart/cart.service.ts | 可重复 | sonnet | 4.0/5.0 | 0 | 成功 |
| src/modules/cart/cart.repository.ts | 可重复 | sonnet | 4.3/5.0 | 0 | 成功 |
| src/modules/cart/cart.controller.ts | 可重复 | sonnet | 4.1/5.0 | 0 | 成功 |
总体: 5/5完成。总代理数: 12个(2个元法官 + 5个执行代理 + 1个重试 + 4个法官 [1个共享重跑 + 3个cart])
Example 3: Requirement Grouping -- All Independent
示例3:需求分组 -- 全独立
Input:
/do-in-parallel write tests for loan.service.ts, add password recovery feature to auth module and enable caching during dependency loading in github actions.Orchestrator Analysis:
Phase 2: Task Analysis + Requirement Grouping
1. Task Identification:
- Task A: "Write tests for src/services/loan.service.ts"
- Task B: "Add password recovery feature to src/modules/auth/"
- Task C: "Enable caching during dependency loading in .github/workflows/ci.yml"
2. Requirement Grouping:
- Task A: INDEPENDENT — test generation for a specific service
- Task B: INDEPENDENT — new feature in auth module (unrelated to tasks A and C)
- Task C: INDEPENDENT — CI configuration change (unrelated to tasks A and B)
- No grouping possible: all 3 tasks are different task types on different targets
3. Agent Count:
- Meta-judges: 3 (one per task — standard flow)
- Implementation agents: 3 (one per task)
- Judges: 3 (one per task)
- Total: 9 agents (no reduction possible)Phase 3.5: Meta-Judge Dispatch (3 meta-judges in parallel):
[Meta-judge 1: Independent — loan service tests]
Use Task tool:
- description: "Meta-judge: write tests for loan.service.ts"
- prompt:
## Task
Generate an evaluation specification yaml for the following task applied
to a specific target. You will produce rubrics, checklists, and scoring
criteria that a judge agent will use to evaluate the implementation
artifact for this specific target.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt as Context
write tests for loan.service.ts, add password recovery feature to auth
module and enable caching during dependency loading in github actions.
## Target
Write comprehensive unit tests for src/services/loan.service.ts
## Context
Project uses Jest. Tests should cover all public methods, edge cases,
and error scenarios for the loan service.
## Artifact Type
code
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Generate
evaluation specification ONLY for the loan service test generation.
Your report will be used to verify only this particular task, not the
all tasks in the user prompt.
Return only the final evaluation specification YAML in your response.
- model: opus
- subagent_type: "sadd:meta-judge"
[Meta-judge 2: Independent — password recovery feature]
Use Task tool:
- description: "Meta-judge: add password recovery to auth module"
- prompt:
## Task
Generate an evaluation specification yaml for the following task applied
to a specific target. You will produce rubrics, checklists, and scoring
criteria that a judge agent will use to evaluate the implementation
artifact for this specific target.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt as Context
write tests for loan.service.ts, add password recovery feature to auth
module and enable caching during dependency loading in github actions.
## Target
Add password recovery feature to src/modules/auth/ (password reset flow:
request, token generation, validation, password update)
## Context
Auth module handles authentication. Password recovery requires new
endpoints, email integration, token management.
## Artifact Type
code
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Generate
evaluation specification ONLY for the password recovery feature.
Your report will be used to verify only this particular task, not the
all tasks in the user prompt.
Return only the final evaluation specification YAML in your response.
- model: opus
- subagent_type: "sadd:meta-judge"
[Meta-judge 3: Independent — GH Actions caching]
Use Task tool:
- description: "Meta-judge: enable dependency caching in GitHub Actions"
- prompt:
## Task
Generate an evaluation specification yaml for the following task applied
to a specific target. You will produce rubrics, checklists, and scoring
criteria that a judge agent will use to evaluate the implementation
artifact for this specific target.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt as Context
write tests for loan.service.ts, add password recovery feature to auth
module and enable caching during dependency loading in github actions.
## Target
Enable caching during dependency loading in .github/workflows/ci.yml
(e.g., npm/yarn cache, actions/cache)
## Context
GitHub Actions CI pipeline. Dependency installation step should use
caching to speed up builds.
## Artifact Type
configuration
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Generate
evaluation specification ONLY for enabling dependency caching in GH Actions.
Your report will be used to verify only this particular task, not the
all tasks in the user prompt.
Return only the final evaluation specification YAML in your response.
- model: opus
- subagent_type: "sadd:meta-judge"
[All 3 meta-judges launched simultaneously]Phase 5: Implementation Dispatch (3 agents in parallel, after meta-judges complete):
[Implementation 1: loan service tests]
Use Task tool:
- description: "Parallel: write tests for loan.service.ts"
- prompt:
## Reasoning Approach
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Write comprehensive unit tests for the loan service</task>
<target>src/services/loan.service.ts</target>
<constraints>
- Work ONLY on the specified target
- Create test file co-located with the service
- Cover all public methods, edge cases, and error scenarios
- Follow existing test patterns in the project
</constraints>
<output>
Create test file for the loan service.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## Self-Critique Verification (MANDATORY)
Before submitting, verify your work:
1. Re-read the original task and confirm every requirement is addressed
2. Check that all tests follow existing patterns in the project
3. Verify no unrelated files were modified
4. Confirm the Summary section is complete and accurate
- model: sonnet
[Implementation 2: password recovery]
Use Task tool:
- description: "Parallel: add password recovery feature to auth module"
- prompt:
## Reasoning Approach
[standard CoT prefix]
<task>Add password recovery feature to the auth module</task>
<target>src/modules/auth/</target>
<constraints>
- Work ONLY on the auth module
- Implement password reset request, token generation, validation,
and password update
- Follow existing auth module patterns
- Do NOT modify unrelated modules
</constraints>
<output>
Implement password recovery feature.
CRITICAL: At the end of your work, provide a "Summary" section.
</output>
## Self-Critique Verification (MANDATORY)
[standard self-critique suffix]
- model: opus
[Implementation 3: GH Actions caching]
Use Task tool:
- description: "Parallel: enable dependency caching in GitHub Actions"
- prompt:
## Reasoning Approach
[standard CoT prefix]
<task>Enable caching during dependency loading in CI pipeline</task>
<target>.github/workflows/ci.yml</target>
<constraints>
- Work ONLY on the CI workflow file
- Add dependency caching (npm/yarn cache or actions/cache)
- Do NOT modify other workflow steps beyond what is necessary
</constraints>
<output>
Update CI workflow with dependency caching.
CRITICAL: At the end of your work, provide a "Summary" section.
</output>
## Self-Critique Verification (MANDATORY)
[standard self-critique suffix]
- model: sonnet
[All 3 launched simultaneously]Phase 5.2: Judge Dispatch (3 judges in parallel, after ALL implementors complete):
[Judge 1: loan service tests — independent spec]
Use Task tool:
- description: "Judge: loan.service.ts tests"
- prompt:
You are evaluating an implementation artifact for target
src/services/loan.service.ts against an evaluation specification
produced by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt
write tests for loan.service.ts, add password recovery feature to auth
module and enable caching during dependency loading in github actions.
## Target
src/services/loan.service.ts
## Evaluation Specification
```yaml
{EXACT spec YAML from loan service meta-judge}
```
## Implementation Output
{Summary from loan service test implementation agent}
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Evaluate ONLY
the test generation for loan.service.ts.
Follow your full judge process as defined in your agent instructions!
## Output
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[Judge 2: password recovery — independent spec]
Use Task tool:
- description: "Judge: auth password recovery"
- prompt:
You are evaluating an implementation artifact for target
src/modules/auth/ against an evaluation specification produced
by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt
[original user prompt]
## Target
src/modules/auth/ (password recovery feature)
## Evaluation Specification
```yaml
{EXACT spec YAML from password recovery meta-judge}
```
## Implementation Output
{Summary from password recovery implementation agent}
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Evaluate ONLY
the password recovery feature.
Follow your full judge process as defined in your agent instructions!
## Output
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[Judge 3: GH Actions caching — independent spec]
Use Task tool:
- description: "Judge: GitHub Actions dependency caching"
- prompt:
You are evaluating an implementation artifact for target
.github/workflows/ci.yml against an evaluation specification produced
by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## User Prompt
[original user prompt]
## Target
.github/workflows/ci.yml (dependency caching)
## Evaluation Specification
```yaml
{EXACT spec YAML from GH Actions caching meta-judge}
```
## Implementation Output
{Summary from GH Actions caching implementation agent}
## Instructions
User prompt is provided as context, you should use it only as reference
of changes that can occur in the project by other agents. Evaluate ONLY
the dependency caching in GitHub Actions.
Follow your full judge process as defined in your agent instructions!
## Output
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[All 3 judges launched simultaneously]Result:
| Target | Grouping | Model | Judge Score | Retries | Status |
|---|---|---|---|---|---|
| src/services/loan.service.ts | Independent | sonnet | 4.1/5.0 | 0 | SUCCESS |
| src/modules/auth/ | Independent | opus | 4.3/5.0 | 0 | SUCCESS |
| .github/workflows/ci.yml | Independent | sonnet | 4.0/5.0 | 0 | SUCCESS |
Overall: 3/3 completed. Total Agents: 9 (3 meta-judges + 3 implementations + 3 judges). No grouping reduction possible for fully independent tasks.
输入:
/do-in-parallel write tests for loan.service.ts, add password recovery feature to auth module and enable caching during dependency loading in github actions.编排者分析:
阶段2:任务分析 + 需求分组
1. 任务识别:
- 任务A: "Write tests for src/services/loan.service.ts"
- 任务B: "Add password recovery feature to src/modules/auth/"
- 任务C: "Enable caching during dependency loading in .github/workflows/ci.yml"
2. 需求分组:
- 任务A: 独立 — 特定服务的测试生成
- 任务B: 独立 — auth模块中的新功能(与任务A和C无关)
- 任务C: 独立 — CI配置变更(与任务A和B无关)
- 无法分组:所有3个任务是不同任务类型,针对不同目标
3. 代理数量:
- 元法官: 3个(每个任务一个 — 标准流程)
- 执行代理: 3个(每个任务一个)
- 法官: 3个(每个任务一个)
- 总计: 9个代理(无法减少)阶段3.5:元法官调度(3个元法官并行):
[元法官1: 独立任务 — loan服务测试]
使用Task工具:
- description: "Meta-judge: write tests for loan.service.ts"
- prompt:
## 任务
Generate an evaluation specification yaml for the following task applied
to a specific target. You will produce rubrics, checklists, and scoring
criteria that a judge agent will use to evaluate the implementation
artifact for this specific target.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词作为上下文
write tests for loan.service.ts, add password recovery feature to auth
module and enable caching during dependency loading in github actions.
## 目标
Write comprehensive unit tests for src/services/loan.service.ts
## 上下文
Project uses Jest. Tests should cover all public methods, edge cases,
and error scenarios for the loan service.
## 产物类型
code
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅为loan服务测试生成评估规范。你的报告将仅用于验证此特定任务,而非用户提示词中的所有任务。
仅在响应中返回最终的评估规范YAML。
- model: opus
- subagent_type: "sadd:meta-judge"
[元法官2: 独立任务 — 密码恢复功能]
使用Task工具:
- description: "Meta-judge: add password recovery to auth module"
- prompt:
## 任务
Generate an evaluation specification yaml for the following task applied
to a specific target. You will produce rubrics, checklists, and scoring
criteria that a judge agent will use to evaluate the implementation
artifact for this specific target.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词作为上下文
write tests for loan.service.ts, add password recovery feature to auth
module and enable caching during dependency loading in github actions.
## 目标
Add password recovery feature to src/modules/auth/ (password reset flow:
request, token generation, validation, password update)
## 上下文
Auth模块处理身份验证。密码恢复需要新的端点、电子邮件集成、令牌管理。
## 产物类型
code
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅为密码恢复功能生成评估规范。你的报告将仅用于验证此特定任务,而非用户提示词中的所有任务。
仅在响应中返回最终的评估规范YAML。
- model: opus
- subagent_type: "sadd:meta-judge"
[元法官3: 独立任务 — GH Actions缓存]
使用Task工具:
- description: "Meta-judge: enable dependency caching in GitHub Actions"
- prompt:
## 任务
Generate an evaluation specification yaml for the following task applied
to a specific target. You will produce rubrics, checklists, and scoring
criteria that a judge agent will use to evaluate the implementation
artifact for this specific target.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词作为上下文
write tests for loan.service.ts, add password recovery feature to auth
module and enable caching during dependency loading in github actions.
## 目标
Enable caching during dependency loading in .github/workflows/ci.yml
(e.g., npm/yarn cache, actions/cache)
## 上下文
GitHub Actions CI流水线。依赖安装步骤应使用缓存以加快构建速度。
## 产物类型
configuration
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅为在GH Actions中启用依赖缓存生成评估规范。你的报告将仅用于验证此特定任务,而非用户提示词中的所有任务。
仅在响应中返回最终的评估规范YAML。
- model: opus
- subagent_type: "sadd:meta-judge"
[所有3个元法官同时启动]阶段5:执行调度(3个代理并行,元法官完成后):
[执行1: loan服务测试]
使用Task工具:
- description: "Parallel: write tests for loan.service.ts"
- prompt:
## 推理方法
Let's think step by step.
Before taking any action, think through the problem systematically:
1. "Let me first understand what is being asked for this specific target..."
2. "Let me analyze this specific target..."
3. "Let me plan my approach..."
Work through each step explicitly before implementing.
<task>Write comprehensive unit tests for the loan service</task>
<target>src/services/loan.service.ts</target>
<constraints>
- Work ONLY on the specified target
- Create test file co-located with the service
- Cover all public methods, edge cases, and error scenarios
- Follow existing test patterns in the project
</constraints>
<output>
Create test file for the loan service.
CRITICAL: At the end of your work, provide a "Summary" section containing:
- Files modified (full paths)
- Key changes (3-5 bullet points)
- Any decisions made and rationale
</output>
## 自我审查验证(必须)
提交之前,验证你的工作:
1. 重新阅读原始任务,确认所有需求都已满足
2. 检查所有测试是否遵循项目中的现有模式
3. 验证未修改无关文件
4. 确认摘要部分完整准确
- model: sonnet
[执行2: 密码恢复]
使用Task工具:
- description: "Parallel: add password recovery feature to auth module"
- prompt:
## 推理方法
[标准思维链前缀]
<task>Add password recovery feature to the auth module</task>
<target>src/modules/auth/</target>
<constraints>
- Work ONLY on the auth module
- Implement password reset request, token generation, validation,
and password update
- Follow existing auth module patterns
- Do NOT modify unrelated modules
</constraints>
<output>
Implement password recovery feature.
CRITICAL: At the end of your work, provide a "Summary" section.
</output>
## 自我审查验证(必须)
[标准自我审查后缀]
- model: opus
[执行3: GH Actions缓存]
使用Task工具:
- description: "Parallel: enable dependency caching in GitHub Actions"
- prompt:
## 推理方法
[标准思维链前缀]
<task>Enable caching during dependency loading in CI pipeline</task>
<target>.github/workflows/ci.yml</target>
<constraints>
- Work ONLY on the CI workflow file
- Add dependency caching (npm/yarn cache or actions/cache)
- Do NOT modify other workflow steps beyond what is necessary
</constraints>
<output>
Update CI workflow with dependency caching.
CRITICAL: At the end of your work, provide a "Summary" section.
</output>
## 自我审查验证(必须)
[标准自我审查后缀]
- model: sonnet
[所有3个代理同时启动]阶段5.2:法官调度(3个法官并行,所有执行代理完成后):
[法官1: loan服务测试 — 独立规范]
使用Task工具:
- description: "Judge: loan.service.ts tests"
- prompt:
You are evaluating an implementation artifact for target
src/services/loan.service.ts against an evaluation specification
produced by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词
write tests for loan.service.ts, add password recovery feature to auth
module and enable caching during dependency loading in github actions.
## 目标
src/services/loan.service.ts
## 评估规范
```yaml
{来自loan服务元法官的精确规范YAML}
```
## 执行输出
{来自loan服务测试执行代理的摘要}
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅评估loan.service.ts的测试生成。
遵循代理指令中定义的完整法官流程!
## 输出
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[法官2: 密码恢复 — 独立规范]
使用Task工具:
- description: "Judge: auth password recovery"
- prompt:
You are evaluating an implementation artifact for target
src/modules/auth/ against an evaluation specification produced
by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词
[原始用户提示词]
## 目标
src/modules/auth/ (password recovery feature)
## 评估规范
```yaml
{来自密码恢复元法官的精确规范YAML}
```
## 执行输出
{来自密码恢复执行代理的摘要}
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅评估密码恢复功能。
遵循代理指令中定义的完整法官流程!
## 输出
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[法官3: GH Actions缓存 — 独立规范]
使用Task工具:
- description: "Judge: GitHub Actions dependency caching"
- prompt:
You are evaluating an implementation artifact for target
.github/workflows/ci.yml against an evaluation specification produced
by the meta judge.
CLAUDE_PLUGIN_ROOT={CLAUDE_PLUGIN_ROOT}
## 用户提示词
[原始用户提示词]
## 目标
.github/workflows/ci.yml (dependency caching)
## 评估规范
```yaml
{来自GH Actions缓存元法官的精确规范YAML}
```
## 执行输出
{来自GH Actions缓存执行代理的摘要}
## 指令
用户提示词作为上下文提供,你应仅将其作为项目中其他代理可能发生的变更的参考。仅评估GitHub Actions中的依赖缓存。
遵循代理指令中定义的完整法官流程!
## 输出
CRITICAL: You must reply with this exact structured evaluation report
format in YAML at the START of your response!
- model: opus
- subagent_type: "sadd:judge"
[所有3个法官同时启动]结果:
| 目标 | 分组 | 模型 | 法官分数 | 重试次数 | 状态 |
|---|---|---|---|---|---|
| src/services/loan.service.ts | 独立 | sonnet | 4.1/5.0 | 0 | 成功 |
| src/modules/auth/ | 独立 | opus | 4.3/5.0 | 0 | 成功 |
| .github/workflows/ci.yml | 独立 | sonnet | 4.0/5.0 | 0 | 成功 |
总体: 3/3完成。总代理数: 9个(3个元法官 + 3个执行代理 + 3个法官)。完全独立的任务无法通过分组减少代理数量。
Best Practices
最佳实践
Target Selection
目标选择
- Be specific: List exact files when possible
- Use globs carefully: Review expanded list before confirming
- Limit scope: 10-15 targets max per batch for manageability
- Group by similarity: Similar targets benefit from consistent patterns
- 明确具体: 尽可能列出精确文件
- 谨慎使用通配符: 确认前查看展开的列表
- 限制范围: 每批最多10-15个目标,便于管理
- 按相似性分组: 相似目标受益于一致模式
Model Selection Guidelines
模型选择指南
| Scenario | Model | Reason |
|---|---|---|
| Security analysis | Opus | Critical reasoning required |
| Architecture decisions | Opus | Quality over speed |
| Simple refactoring | Haiku | Fast, sufficient |
| Documentation generation | Haiku | Mechanical task |
| Code review per file | Sonnet | Balanced capability |
| Test generation | Sonnet | Extensive but patterned |
| 场景 | 模型 | 理由 |
|---|---|---|
| 安全分析 | Opus | 需要关键推理能力 |
| 架构决策 | Opus | 质量优先于速度 |
| 简单重构 | Haiku | 快速、足够满足需求 |
| 文档生成 | Haiku | 机械性任务 |
| 逐文件代码评审 | Sonnet | 能力均衡 |
| 测试生成 | Sonnet | 工作量大但模式化 |
Meta-Judge + Judge Verification
元法官 + 法官验证
- Requirement grouping first - Before dispatching any meta-judges, analyze tasks for repeatable, shared, or independent grouping to minimize total agents
- One meta-judge per group or independent task - Repeatable groups share one reusable spec, shared groups share one combined spec, independent tasks get their own spec
- Batch meta-judges first - Launch all meta-judges in parallel (regardless of grouping type), then launch implementors
- Reuse spec on retries - Each group/target's evaluation specification stays constant across retries; only the implementation changes
- Parse only headers from judge - Don't read full reports to avoid context pollution
- Include CLAUDE_PLUGIN_ROOT - Both meta-judge and judge need the resolved plugin root path
- Target-specific YAML - Pass only the relevant meta-judge YAML to its judge, do not add any additional text or comments to it!
- Shared group retries - Only re-launch the specific failing implementation agent(s), not the entire group
- 先进行需求分组 - 调度任何元法官之前,分析任务的可重复、共享或独立分组,以最小化总代理数
- 每组或独立任务一个元法官 - 可重复组共享一个可重用规范,共享组共享一个联合规范,独立任务有自己的规范
- 先批量调度元法官 - 并行启动所有元法官(无论分组类型),然后启动执行代理
- 重试时重用规范 - 每组/目标的评估规范在重试中保持不变;仅执行内容变更
- 仅解析法官的标题 - 不要阅读完整报告,避免上下文污染
- 包含CLAUDE_PLUGIN_ROOT - 元法官和法官都需要解析后的插件根路径
- 特定于目标的YAML - 仅将相关的元法官YAML传递给其法官,不要添加任何额外文本或注释!
- 共享组重试 - 仅重新启动特定失败的执行代理,而非整个组
Judge Selection
法官选择
| Implementation Model | Judge Model | Rationale |
|---|---|---|
| Opus | Opus | Critical work needs strong verification |
| Sonnet | Opus | Tailored evaluation requires strong reasoning |
| Haiku | Opus | Verify simple work with strong evaluation |
Guideline: Judges always use Opus for consistent, high-quality evaluation across all targets.
| 执行模型 | 法官模型 | 理由 |
|---|---|---|
| Opus | Opus | 关键工作需要强大验证 |
| Sonnet | Opus | 定制评估需要强大推理 |
| Haiku | Opus | 使用强大评估验证简单工作 |
指南: 法官始终使用Opus,以确保所有目标的验证一致、高质量。
Context Isolation
上下文隔离
- Minimal context: Each sub-agent gets only what it needs
- No cross-references: Don't tell Agent A about Agent B's target
- Let them discover: Sub-agents read files to understand patterns
- File system as truth: Changes are coordinated through the filesystem
- Track pre-existing changes - Pass context about prior modifications to each agent's judge to prevent attribution confusion between pre-existing and current changes
- 最小化上下文: 每个子代理仅获取所需内容
- 无交叉引用: 不要告诉代理A关于代理B的目标
- 让他们自行发现: 子代理读取文件以理解模式
- 文件系统为真相: 通过文件系统协调变更
- 跟踪预先存在的变更 - 向每个代理的法官传递先前修改的上下文,防止将预先存在的变更与当前变更混淆
Quality Assurance
质量保证
- Three-layer verification: Self-critique (internal) + Target-specific meta-judge specification (structured) + Judge (external)
- Self-critique first: Implementation agents verify own work before submission
- Target-specific meta-judge specification: Each target gets tailored rubrics that account for its unique characteristics, producing more precise evaluation criteria
- External judge second: Independent judge applies target-specific meta-judge specification mechanically — catches blind spots self-critique misses
- Iteration loop: Retry with feedback until passing or max retries
- Isolated failures: One target failing doesn't affect others
- Review the summary: Check for failed or partial completions
- Run tests after: Parallel changes may have subtle interactions
- Commit atomically: All changes from one batch = one commit
- 三层验证: 自我审查(内部) + 特定于目标的元法官规范(结构化) + 法官(外部)
- 先进行自我审查: 执行代理提交前验证自己的工作
- 特定于目标的元法官规范: 每个目标获得考虑其独特特性的定制评分标准,生成更精确的评估标准
- 再进行外部法官验证: 独立法官机械地应用特定于目标的元法官规范——捕捉自我审查遗漏的盲点
- 迭代循环: 根据反馈重试,直到通过或达到最大重试次数
- 失败隔离: 一个目标失败不影响其他目标
- 审查摘要: 检查失败或部分完成的任务
- 之后运行测试: 并行变更可能存在细微交互
- 原子提交: 一批中的所有变更 = 一次提交
Error Handling
错误处理
| Failure Type | Description | Recovery Action |
|---|---|---|
| Recoverable | Judge found issues, retry available | Retry with judge feedback (max 3 per target) |
| Approach Failure | The approach for this target is wrong | Escalate to user with options |
| Foundation Issue | Requirements unclear or impossible | Escalate to user for clarification |
| Max Retries Exceeded | Target failed after 3 retries | Mark failed, continue other targets, report at end |
Critical Rules:
- NEVER continue past max retries without user input
- NEVER try to "fix forward" without addressing judge issues
- NEVER skip judge verification
- STOP and report if context is missing (don't guess)
- ISOLATE failures - one target failing doesn't stop others
| 失败类型 | 描述 | 恢复操作 |
|---|---|---|
| 可恢复 | 法官发现问题,可重试 | 使用法官反馈重试(每个目标最多3次) |
| 方法失败 | 此目标的方法错误 | 向用户上报并提供选项 |
| 基础问题 | 需求不明确或无法实现 | 向用户上报以澄清 |
| 超过最大重试次数 | 目标失败3次 | 标记为失败,继续其他目标,最后报告 |
关键规则:
- 未获得用户输入,绝不超过最大重试次数继续
- 绝不未解决法官问题就“向前修复”
- 绝不跳过法官验证
- 如果上下文缺失,停止并报告(不要猜测)
- 隔离失败 - 一个目标失败不停止其他目标