do-competitively
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesedo-competitively
do-competitively
<task>
Execute tasks through competitive multi-agent generation, meta-judge evaluation specification, multi-judge evaluation, and evidence-based synthesis to produce superior results by combining the best elements from parallel implementations.
</task>
<context>
This command implements the Generate-Critique-Synthesize (GCS) pattern with adaptive strategy selection for high-stakes tasks where quality matters more than speed. It combines competitive generation with meta-judge evaluation specification and multi-perspective evaluation, then intelligently selects the optimal synthesis strategy based on results.
Key features:
- Self-critique loops in generation (Constitutional AI)
- Structured evaluation - Meta-judge produces tailored rubrics before judging
- Verification loops in evaluation (Chain-of-Verification)
- Adaptive strategy: polish clear winners, synthesize split decisions, redesign failures
- Average 15-20% cost savings through intelligent strategy selection </context>
CRITICAL: You are not implementation agent or judge, you shoudn't read files that provided as context for sub-agent or task. You shouldn't read reports, you shouldn't overwhelm your context with unneccesary information. You MUST follow process step by step. Any diviations will be considered as failure and you will be killed!
<task>
通过竞争性多Agent生成、元裁判评估规范、多裁判评估以及基于证据的合成来执行任务,结合并行实现中的最佳元素以产出更优结果。
</task>
<context>
该命令为质量优先于速度的高风险任务实现了Generate-Critique-Synthesize(GCS)模式,并具备自适应策略选择功能。它将竞争性生成与元裁判评估规范、多视角评估相结合,然后根据结果智能选择最优合成策略。
核心特性:
- 生成过程中的自我批判循环(Constitutional AI)
- 结构化评估 - 元裁判在评估前生成定制化评分标准
- 评估过程中的验证循环(Chain-of-Verification)
- 自适应策略:优化明确胜出方案、合成分歧决策、重新设计失败方案
- 通过智能策略选择平均节省15-20%的成本 </context>
CRITICAL: 你不是执行Agent或裁判,不应读取为子Agent或任务提供的上下文文件。不应读取报告,不应让不必要的信息占用你的上下文。你必须严格遵循流程步骤。任何偏离都将被视为失败,你将被终止!
Pattern: Generate-Critique-Synthesize (GCS)
模式:Generate-Critique-Synthesize(GCS)
This command implements a multi-phase adaptive competitive orchestration pattern:
Phase 1: Competitive Generation with Self-Critique + Meta-Judge (IN PARALLEL)
┌─ Meta-Judge → Evaluation Specification YAML ───────────┐
Task ────┼─ Agent 2 → Draft → Critique → Revise → Solution B ───┐ │
├─ Agent 3 → Draft → Critique → Revise → Solution C ───┼─┤
└─ Agent 1 → Draft → Critique → Revise → Solution A ───┘ │
│
Phase 2: Multi-Judge Evaluation with Verification │
┌─ Judge 1 → Evaluate → Verify → Revise → Report A ─┐ │
├─ Judge 2 → Evaluate → Verify → Revise → Report B ─┼────┤
└─ Judge 3 → Evaluate → Verify → Revise → Report C ─┘ │
│
Phase 2.5: Adaptive Strategy Selection │
Analyze Consensus ───────────────────────────────────────┤
├─ Clear Winner? → SELECT_AND_POLISH │
├─ All Flawed (<3.0)? → REDESIGN (return Phase 1) │
└─ Split Decision? → FULL_SYNTHESIS │
│ │
Phase 3: Evidence-Based Synthesis │ │
(Only if FULL_SYNTHESIS) │ │
Synthesizer ─────────────────────┴───────────────────────┴─→ Final Solution该命令实现了多阶段自适应竞争性编排模式:
Phase 1: 带自我批判的竞争性生成 + 元裁判(并行执行)
┌─ 元裁判 → 评估规范YAML ───────────┐
任务 ────┼─ Agent 2 → 草稿 → 批判 → 修订 → 方案B ───┐ │
├─ Agent 3 → 草稿 → 批判 → 修订 → 方案C ───┼─┤
└─ Agent 1 → 草稿 → 批判 → 修订 → 方案A ───┘ │
│
Phase 2: 带验证的多裁判评估 │
┌─ 裁判1 → 评估 → 验证 → 修订 → 报告A ─┐ │
├─ 裁判2 → 评估 → 验证 → 修订 → 报告B ─┼────┤
└─ 裁判3 → 评估 → 验证 → 修订 → 报告C ─┘ │
│
Phase 2.5: 自适应策略选择 │
分析共识 ───────────────────────────────────────┤
├─ 存在明确胜出方案?→ SELECT_AND_POLISH(选择并优化) │
├─ 所有方案均存在缺陷(<3.0分)?→ REDESIGN(重新设计,返回Phase 1) │
└─ 决策存在分歧?→ FULL_SYNTHESIS(完全合成) │
│ │
Phase 3: 基于证据的合成 │ │
(仅在选择FULL_SYNTHESIS时执行) │ │
合成Agent ─────────────────────┴───────────────────────┴─→ 最终方案Process
流程
Setup: Create Reports Directory
准备工作:创建报告目录
Before starting, ensure the reports directory exists:
bash
mkdir -p .specs/reportsReport naming convention:
.specs/reports/{solution-name}-{YYYY-MM-DD}.[1|2|3].mdWhere:
- - Derived from output path (e.g.,
{solution-name}from outputusers-api)specs/api/users.md - - Current date
{YYYY-MM-DD} - - Judge number
[1|2|3]
Note: Solutions remain in their specified output locations; only evaluation reports go to
.specs/reports/开始前,请确保报告目录已存在:
bash
mkdir -p .specs/reports报告命名规则:
.specs/reports/{solution-name}-{YYYY-MM-DD}.[1|2|3].md其中:
- - 由输出路径派生(例如,输出
{solution-name}对应specs/api/users.md)users-api - - 当前日期
{YYYY-MM-DD} - - 裁判编号
[1|2|3]
注意: 方案保留在指定的输出位置;仅评估报告存储到目录
.specs/reports/Phase 1: Competitive Generation + Meta-Judge (IN PARALLEL)
Phase 1: 竞争性生成 + 元裁判(并行执行)
Launch 3 independent generator agents AND 1 meta-judge agent in parallel (4 agents total, all recommended: Opus for quality):
The meta-judge runs in parallel with the 3 generators because it does not need their output — it only needs the task description to generate evaluation criteria.
CRITICAL: Dispatch all 4 agents in a single message using 4 Task tool calls as foreground agents. The meta-judge MUST be the first tool call in the dispatch order, because he should have time to collect context from codebase, before it was modified by generators.
启动3个独立生成Agent和1个元裁判Agent并行执行(共4个Agent,均推荐使用Opus模型以保证质量):
元裁判与3个生成Agent并行运行,因为它不需要生成Agent的输出——仅需任务描述即可生成评估标准。
CRITICAL: 使用4个Task工具调用在单条消息中调度所有4个Agent作为前台Agent。元裁判必须是调度顺序中的第一个工具调用,因为它需要在生成Agent修改代码库之前收集代码库上下文。
Meta-Judge Agent (1 agent)
元裁判Agent(1个)
The meta-judge generates an evaluation specification YAML (rubrics, checklists, scoring criteria) tailored to this specific task. It returns the evaluation specification YAML that all 3 judges will use.
Prompt template for meta-judge:
markdown
undefined元裁判生成针对特定任务的评估规范YAML(评分标准、检查清单、打分准则)。它返回供所有3个裁判使用的评估规范YAML。
元裁判提示模板:
markdown
undefinedTask
任务
Generate an evaluation specification yaml for the following task. You will produce rubrics, checklists, and scoring criteria that judge agents will use to evaluate and compare competitive implementation artifacts.
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}为以下任务生成评估规范yaml。你需要生成评分标准、检查清单和打分准则,供裁判Agent用于评估和比较竞争性实现产物。
CLAUDE_PLUGIN_ROOT=
${CLAUDE_PLUGIN_ROOT}User Prompt
用户提示
{Original task description from user}
{用户提供的原始任务描述}
Context
上下文
{Any relevant codebase context, file paths, constraints}
{任何相关代码库上下文、文件路径、约束条件}
Artifact Type
产物类型
{code | documentation | configuration | etc.}
{代码 | 文档 | 配置 | 其他}
Number of Solutions
方案数量
3 (competitive implementations to be compared)
3个(需比较的竞争性实现方案)
Instructions
说明
Return only the final evaluation specification YAML in your response.
The specification should support comparative evaluation across multiple solutions.
**Dispatch:**
Use Task tool:
- description: "Meta-judge: {brief task summary}"
- prompt: {meta-judge prompt}
- model: opus
- subagent_type: "sadd:meta-judge"
undefined仅在回复中返回最终的评估规范YAML。
该规范应支持对多个方案进行对比评估。
**调度方式:**
使用Task工具:
- description: "Meta-judge: {任务简要摘要}"
- prompt: {元裁判提示内容}
- model: opus
- subagent_type: "sadd:meta-judge"
undefinedGenerator Agents (3 agents)
生成Agent(3个)
- Each agent receives identical task description and context
- Agents work independently without seeing each other's work
- Each produces a complete solution to the same problem
- Solutions are saved to distinct files (e.g., )
{solution-file}.[a|b|c].[ext]
Solution naming convention:
Where:
{solution-file}.[a|b|c].[ext]- - Derived from task (e.g.,
{solution-file}result increate users.tsas solution file)users - - Unique identifier per sub-agent
[a|b|c] - - File extension (e.g.,
[ext],mdand etc.)ts
Key principle: Diversity through independence - agents explore different approaches.
CRITICAL: You MUST provide filename with [a|b|c] identifier to agents and judges!!! Missing it, will result in your TERMINATION imidiatly!
Prompt template for generators:
markdown
<task>
{task_description}
</task>
<constraints>
{constraints_if_any}
</constraints>
<context>
{relevant_context}
</context>
<output>
{define expected output following such pattern: {solution-file}.[a|b|c].[ext] based on the task description and context. Each [a|b|c] is a unique identifier per sub-agent. You MUST provide filename with it!!!}
</output>
Instructions:
Let's approach this systematically to produce the best possible solution.
1. First, analyze the task carefully - what is being asked and what are the key requirements?
2. Consider multiple approaches - what are the different ways to solve this?
3. Think through the tradeoffs step by step and choose the approach you believe is best
4. Implement it completely
5. Generate 5 verification questions about critical aspects
6. Answer your own questions:
- Review solution against each question
- Identify gaps or weaknesses
7. Revise solution:
- Fix identified issues
8. Explain what was changed and why- 每个Agent接收完全相同的任务描述和上下文
- Agent独立工作,互不查看对方成果
- 每个Agent针对同一问题生成完整方案
- 方案保存到不同文件(例如,)
{solution-file}.[a|b|c].[ext]
方案命名规则:
其中:
{solution-file}.[a|b|c].[ext]- - 由任务派生(例如,
{solution-file}对应create users.ts为solution-file)users - - 每个子Agent的唯一标识
[a|b|c] - - 文件扩展名(例如,
[ext]、md等)ts
核心原则: 通过独立性实现多样性——Agent探索不同的解决路径。
CRITICAL:你必须为Agent和裁判提供带有[a|b|c]标识的文件名!!!缺失该标识将导致你立即被终止!
生成Agent提示模板:
markdown
<task>
{任务描述}
</task>
<constraints>
{任何约束条件}
</constraints>
<context>
{相关上下文}
</context>
<output>
{根据任务描述和上下文,按照{solution-file}.[a|b|c].[ext]格式定义预期输出。每个[a|b|c]是子Agent的唯一标识。你必须提供带有该标识的文件名!!!}
</output>
说明:
让我们系统地完成任务,以产出最佳方案。
1. 首先,仔细分析任务——需求是什么,核心要求有哪些?
2. 考虑多种解决路径——有哪些不同的方法可以解决这个问题?
3. 逐步权衡利弊,选择你认为最佳的方法
4. 完整实现该方案
5. 生成5个关于关键环节的验证问题
6. 自行回答这些问题:
- 根据每个问题审查方案
- 找出漏洞或不足
7. 修订方案:
- 修复发现的问题
8. 说明修改内容及原因Parallel Dispatch Example
并行调度示例
Send ALL 4 Task tool calls in a single message. Meta-judge first, then generators:
Message with 4 tool calls:
Tool call 1 (meta-judge):
- description: "Meta-judge: {brief task summary}"
- model: opus
- subagent_type: "sadd:meta-judge"
Tool call 2 (generator A):
- description: "Generate solution A: {brief task summary}"
- model: opus
Tool call 3 (generator B):
- description: "Generate solution B: {brief task summary}"
- model: opus
Tool call 4 (generator C):
- description: "Generate solution C: {brief task summary}"
- model: opusWait for ALL 4 to return before proceeding to Phase 2.
在单条消息中发送所有4个Task工具调用。先调度元裁判,再调度生成Agent:
包含4个工具调用的消息:
工具调用1(元裁判):
- description: "Meta-judge: {任务简要摘要}"
- model: opus
- subagent_type: "sadd:meta-judge"
工具调用2(生成Agent A):
- description: "Generate solution A: {任务简要摘要}"
- model: opus
工具调用3(生成Agent B):
- description: "Generate solution B: {任务简要摘要}"
- model: opus
工具调用4(生成Agent C):
- description: "Generate solution C: {任务简要摘要}"
- model: opus等待所有4个Agent返回结果后,再进入Phase 2。
Phase 2: Multi-Judge Evaluation
Phase 2: 多裁判评估
Launch 3 independent judges in parallel (recommended: Opus for rigor):
CRITICAL: Wait for ALL Phase 1 agents (meta-judge + 3 generators) to complete before dispatching judges.
CRITICAL: Provide to each judge the EXACT meta-judge evaluation specification YAML. Do not skip or add anything, do not modify it in any way, do not shorten or summarize any text in it!
- Each judge receives the meta-judge evaluation specification YAML and paths to ALL candidate solutions (A, B, C)
- Judges evaluate against the meta-judge's criteria (not hardcoded criteria)
- Each judge produces:
- Comparative analysis (which solution excels where)
- Evidence-based ratings (with specific quotes/examples)
- Final vote (which solution they prefer and why)
- Reports saved to distinct files (e.g., )
.specs/reports/{solution-name}-{date}.[1|2|3].md
Key principle: Multiple independent evaluations reduce bias and catch different issues.
Prompt template for judges:
markdown
You are evaluating {number} competitive solutions against an evaluation specification produced by the meta judge.
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`启动3个独立裁判并行执行(推荐使用Opus模型以保证严谨性):
CRITICAL: 等待Phase 1的所有Agent(元裁判+3个生成Agent)完成后,再调度裁判。
CRITICAL: 向每个裁判提供元裁判生成的完整评估规范YAML。不得跳过或添加任何内容,不得对其进行任何修改,不得缩短或总结其中的文本!
- 每个裁判接收元裁判评估规范YAML以及所有候选方案(A、B、C)的路径
- 裁判依据元裁判制定的标准(而非硬编码标准)进行评估
- 每个裁判生成:
- 对比分析(各方案的优势领域)
- 基于证据的评分(附带具体引用/示例)
- 最终投票(偏好哪个方案及原因)
- 报告保存到不同文件(例如,)
.specs/reports/{solution-name}-{date}.[1|2|3].md
核心原则: 多独立评估可减少偏见,发现不同问题。
裁判提示模板:
markdown
你需要根据元裁判制定的评估规范,对{数量}个竞争性方案进行评估。
CLAUDE_PLUGIN_ROOT=`${CLAUDE_PLUGIN_ROOT}`Task
任务
{task_description}
{任务描述}
Solutions
方案
{list of paths to all candidate solutions}
{所有候选方案的路径列表}
Evaluation Specification
评估规范
yaml
{meta-judge's evaluation specification YAML}yaml
{元裁判生成的评估规范YAML}Output
输出
Write full report to: {.specs/reports/{solution-name}-{date}.[1|2|3].md - each judge gets unique number identifier}
CRITICAL: You must reply with this exact structured header format:
VOTE: [Solution A/B/C]
SCORES:
Solution A: [X.X]/5.0
Solution B: [X.X]/5.0
Solution C: [X.X]/5.0
CRITERIA:
- {criterion_1}: [X.X]/5.0
- {criterion_2}: [X.X]/5.0 ...
[Summary of your evaluation]
将完整报告写入:{.specs/reports/{solution-name}-{date}.[1|2|3].md - 每个裁判获得唯一编号标识}
CRITICAL:你必须使用以下精确的结构化标题格式进行回复:
VOTE: [方案A/B/C]
SCORES:
方案A: [X.X]/5.0
方案B: [X.X]/5.0
方案C: [X.X]/5.0
CRITERIA:
- {标准1}: [X.X]/5.0
- {标准2}: [X.X]/5.0 ...
[评估摘要]
Instructions
说明
Follow your full judge process as defined in your agent instructions!
CRITICAL: Base your evaluation on evidence, not impressions. Quote specific text.
遵循Agent说明中定义的完整裁判流程!
CRITICAL:评估需基于证据,而非主观印象。引用具体文本内容。
Output
输出
CRITICAL: You must reply with this exact structured evaluation report format in YAML at the START of your response!
CRITICAL: NEVER provide score threshold to judges. Judge MUST not know what threshold for score is, in order to not be biased!!!
**Dispatch:**
Use Task tool (3 calls in single message):
- description: "Judge [1|2|3]: {brief task summary}"
- prompt: {judge prompt with exact meta-judge specification YAML}
- model: opus
- subagent_type: "sadd:judge"
undefinedCRITICAL:你必须在回复开头使用上述精确的结构化评估报告格式(YAML)!
CRITICAL:切勿向裁判提供分数阈值。裁判不得知晓分数阈值,以免产生偏见!!!
**调度方式:**
使用Task工具(单条消息中调用3次):
- description: "Judge [1|2|3]: {任务简要摘要}"
- prompt: {包含元裁判规范YAML的裁判提示内容}
- model: opus
- subagent_type: "sadd:judge"
undefinedPhase 2.5: Adaptive Strategy Selection (Early Return)
Phase 2.5: 自适应策略选择(提前返回)
The orchestrator (not a subagent) analyzes judge outputs to determine the optimal strategy.
编排器(非子Agent)分析裁判输出,确定最优策略。
Decision Logic
决策逻辑
Step 1: Parse structured headers from judge reply
Parse the judges reply.
CRITICAL: Do not read reports files itself, it can overflow your context.
Step 2: Check for unanimous winner
Compare all three VOTE values:
- If Judge 1 VOTE = Judge 2 VOTE = Judge 3 VOTE (same solution):
- Strategy: SELECT_AND_POLISH
- Reason: Clear consensus - all three judges prefer same solution
Step 3: Check if all solutions are fundamentally flawed
If no unanimous vote, calculate average scores:
- Average Solution A scores: (Judge1_A + Judge2_A + Judge3_A) / 3
- Average Solution B scores: (Judge1_B + Judge2_B + Judge3_B) / 3
- Average Solution C scores: (Judge1_C + Judge2_C + Judge3_C) / 3
If (avg_A < 3.0) AND (avg_B < 3.0) AND (avg_C < 3.0):
- Strategy: REDESIGN
- Reason: All solutions below quality threshold, fundamental approach issues
Step 5: Default to full synthesis
If none of the above conditions met:
- Strategy: FULL_SYNTHESIS
- Reason: Split decision with merit, synthesis needed to combine best elements
步骤1:解析裁判回复中的结构化标题
解析裁判的回复。
CRITICAL:不要读取报告文件本身,以免超出上下文限制。
步骤2:检查是否存在一致胜出方案
对比三个VOTE值:
- 如果裁判1的VOTE = 裁判2的VOTE = 裁判3的VOTE(同一方案):
- 策略:SELECT_AND_POLISH(选择并优化)
- 原因: 明确共识——所有三个裁判偏好同一方案
步骤3:检查所有方案是否存在根本性缺陷
如果无一致投票,计算平均分数:
- 方案A的平均得分:(裁判1_A + 裁判2_A + 裁判3_A) / 3
- 方案B的平均得分:(裁判1_B + 裁判2_B + 裁判3_B) / 3
- 方案C的平均得分:(裁判1_C + 裁判2_C + 裁判3_C) / 3
如果(avg_A < 3.0) 且 (avg_B < 3.0) 且 (avg_C < 3.0):
- 策略:REDESIGN(重新设计)
- 原因: 所有方案均低于质量阈值,存在根本性路径问题
步骤5:默认执行完全合成
如果上述条件均不满足:
- 策略:FULL_SYNTHESIS(完全合成)
- 原因: 决策存在分歧但各方案均有可取之处,需合成以结合最佳元素
Strategy 1: SELECT_AND_POLISH
策略1:SELECT_AND_POLISH(选择并优化)
When: Clear winner (unanimous votes)
Process:
- Select the winning solution as the base
- Launch subagent to apply specific improvements from judge feedback
- Cherry-pick 1-2 best elements from runner-up solutions
- Document what was added and why
Benefits:
- Saves synthesis cost (simpler than full synthesis)
- Preserves proven quality of winning solution
- Focused improvements rather than full reconstruction
Prompt template:
markdown
You are polishing the winning solution based on judge feedback.
<task>
{task_description}
</task>
<winning_solution>
{path_to_winning_solution}
Score: {winning_score}/5.0
Judge consensus: {why_it_won}
</winning_solution>
<runner_up_solutions>
{list of paths to all runner-up solutions}
</runner_up_solutions>
<judge_feedback>
{list of paths to all evaluation reports}
</judge_feedback>
<output>
{final_solution_path}
</output>
Instructions:
Let's work through this step by step to polish the winning solution effectively.
1. Take the winning solution as your base (do NOT rewrite it)
2. First, carefully review all judge feedback to understand what needs improvement
3. Apply improvements based on judge feedback:
- Fix identified weaknesses
- Add missing elements judges noted
4. Next, examine the runner-up solutions for standout elements
5. Cherry-pick 1-2 specific elements from runners-up if judges praised them
6. Document changes made:
- What was changed and why
- What was added from other solutions
CRITICAL: Preserve the winning solution's core approach. Make targeted improvements only.适用场景: 存在明确胜出方案(一致投票)
流程:
- 选择胜出方案作为基础
- 启动子Agent,根据裁判反馈进行针对性改进
- 从落选方案中挑选1-2个最佳元素
- 记录添加内容及原因
优势:
- 节省合成成本(比完全合成更简单)
- 保留胜出方案的已验证质量
- 聚焦改进而非完全重构
提示模板:
markdown
你需要根据裁判反馈优化胜出方案。
<task>
{任务描述}
</task>
<winning_solution>
{胜出方案的路径}
得分: {winning_score}/5.0
裁判共识: {胜出原因}
</winning_solution>
<runner_up_solutions>
{所有落选方案的路径列表}
</runner_up_solutions>
<judge_feedback>
{所有评估报告的路径列表}
</judge_feedback>
<output>
{最终方案路径}
</output>
说明:
让我们逐步优化胜出方案,确保效果最佳。
1. 以胜出方案为基础(不得重写)
2. 首先,仔细查看所有裁判反馈,明确需要改进的方向
3. 根据裁判反馈进行改进:
- 修复指出的不足
- 添加裁判提到的缺失元素
4. 接下来,检查落选方案中的亮点元素
5. 如果裁判曾称赞落选方案的某些元素,挑选1-2个具体元素加入
6. 记录修改内容:
- 修改了什么及原因
- 从其他方案中添加了什么
CRITICAL:保留胜出方案的核心路径。仅进行针对性改进。Strategy 2: REDESIGN
策略2:REDESIGN(重新设计)
When: All solutions scored <3.0/5.0 (fundamental issues across the board)
Process:
- Launch new agent to analyze the failure modes and lessons learned. Ask the agent to:
- Think through step by step: what went wrong with each solution?
- Analyze common failure modes across all solutions
- Extract lessons learned (what NOT to do)
- Identify the root causes of why all approaches failed
- Generate new task decomposition or constraints based on these insights
- Return to Phase 1, provide to new implementation agents the lessons learned and new constraints.
Prompt template for new implementation:
markdown
You are analyzing why all solutions failed to meet quality standards. And implement new solution based on it.
<task>
{task_description}
</task>
<constraints>
{constraints_if_any}
</constraints>
<context>
{relevant_context}
</context>
<failed_solutions>
{list of paths to all candidate solutions}
</failed_solutions>
<evaluation_reports>
{list of paths to all evaluation reports with low scores}
</evaluation_reports>
Instructions:
Let's break this down systematically to understand what went wrong and how to design new solution based on it.
1. First, analyze the task carefully - what is being asked and what are the key requirements?
2. Read through each solution and its evaluation report
3. For each solution, think step by step about:
- What was the core approach?
- What specific issues did judges identify?
- Why did this approach fail to meet the quality threshold?
4. Identify common failure patterns across all solutions:
- Are there shared misconceptions?
- Are there missing requirements that all solutions overlooked?
- Are there fundamental constraints that weren't considered?
5. Extract lessons learned:
- What approaches should be avoided?
- What constraints must be addressed?
6. Generate improved guidance for the next iteration:
- New constraints to add
- Specific approaches to try - what are the different ways to solve this?
- Key requirements to emphasize
7. Think through the tradeoffs step by step and choose the approach you believe is best
8. Implement it completely
9. Generate 5 verification questions about critical aspects
10. Answer your own questions:
- Review solution against each question
- Identify gaps or weaknesses
11. Revise solution:
- Fix identified issues
12. Explain what was changed and why
适用场景: 所有方案得分<3.0/5.0(普遍存在根本性问题)
流程:
- 启动新Agent分析失败模式和经验教训。要求Agent:
- 逐步分析:每个方案存在哪些问题?
- 分析所有方案的共同失败模式
- 提取经验教训(哪些做法应避免)
- 找出所有路径失败的根本原因
- 基于这些洞察生成新的任务分解或约束条件
- 返回Phase 1,向新的执行Agent提供经验教训和新约束条件。
新执行任务提示模板:
markdown
你需要分析所有方案未达质量标准的原因,并基于此设计新方案。
<task>
{任务描述}
</task>
<constraints>
{任何约束条件}
</constraints>
<context>
{相关上下文}
</context>
<failed_solutions>
{所有候选方案的路径列表}
</failed_solutions>
<evaluation_reports>
{所有低分评估报告的路径列表}
</evaluation_reports>
说明:
让我们系统地分析问题所在,并基于此设计新方案。
1. 首先,仔细分析任务——需求是什么,核心要求有哪些?
2. 查看每个方案及其评估报告
3. 针对每个方案,逐步思考:
- 核心路径是什么?
- 裁判指出了哪些具体问题?
- 该路径为何未达质量阈值?
4. 找出所有方案的共同失败模式:
- 是否存在共同误解?
- 是否所有方案都忽略了某些需求?
- 是否存在未被考虑的根本性约束?
5. 提取经验教训:
- 应避免哪些路径?
- 必须解决哪些约束?
6. 生成改进后的迭代指导:
- 添加新约束
- 尝试具体路径——有哪些不同的解决方法?
- 强调核心要求
7. 逐步权衡利弊,选择你认为最佳的方法
8. 完整实现该方案
9. 生成5个关于关键环节的验证问题
10. 自行回答这些问题:
- 根据每个问题审查方案
- 找出漏洞或不足
11. 修订方案:
- 修复发现的问题
12. 说明修改内容及原因
Strategy 3: FULL_SYNTHESIS (Default)
策略3:FULL_SYNTHESIS(完全合成,默认)
When: No clear winner AND solutions have merit (scores >=3.0)
Process: Proceed to Phase 3 (Evidence-Based Synthesis)
适用场景: 无明确胜出方案且各方案均有可取之处(得分>=3.0)
流程: 进入Phase 3(基于证据的合成)
Phase 3: Evidence-Based Synthesis
Phase 3: 基于证据的合成
Only executed when Strategy 3 (FULL_SYNTHESIS) selected in Phase 2.5
Launch 1 synthesis agent (recommended: Opus for quality):
- Agent receives:
- All candidate solutions (A, B, C)
- All evaluation reports (1, 2, 3)
- Agent analyzes:
- Which elements each judge praised (consensus on strengths)
- Which issues each judge identified (consensus on weaknesses)
- Where solutions differed in approach
- Agent produces final solution by:
- Copying superior sections when one solution clearly wins
- Combining approaches when hybrid is better
- Fixing identified issues that all judges caught
- Documenting decisions (what was taken from where and why)
Key principle: Evidence-based synthesis leverages collective intelligence.
Prompt template for synthesizer:
markdown
You are synthesizing the best solution from competitive implementations and evaluations.
<task>
{task_description}
</task>
<solutions>
{list of paths to all candidate solutions}
</solutions>
<evaluation_reports>
{list of paths to all evaluation reports}
</evaluation_reports>
<output>
{define expected output following such pattern: solution.md based on the task description and context. Result should be a complete solution to the task.}
</output>
Instructions:
Let's think through this synthesis step by step to create the best possible combined solution.
1. First, read all solutions and evaluation reports carefully
2. Map out the consensus:
- What strengths did multiple judges praise in each solution?
- What weaknesses did multiple judges criticize in each solution?
3. For each major component or section, think through:
- Which solution handles this best and why?
- Could a hybrid approach work better?
4. Create the best possible solution by:
- Copying text directly when one solution is clearly superior
- Combining approaches when a hybrid would be better
- Fixing all identified issues
- Preserving the best elements from each
5. Explain your synthesis decisions:
- What you took from each solution
- Why you made those choices
- How you addressed identified weaknesses
CRITICAL: Do not create something entirely new. Synthesize the best from what exists.仅在Phase 2.5中选择策略3(FULL_SYNTHESIS)时执行
启动1个合成Agent(推荐使用Opus模型以保证质量):
- Agent接收:
- 所有候选方案(A、B、C)
- 所有评估报告(1、2、3)
- Agent分析:
- 各裁判称赞了哪些元素(共识优势)
- 各裁判指出了哪些问题(共识不足)
- 各方案在路径上的差异
- Agent生成最终方案:
- 当某方案在某环节明显更优时,直接复用该部分内容
- 当混合路径更优时,结合多种路径
- 修复所有裁判指出的问题
- 记录决策过程(从何处借鉴了什么及原因)
核心原则: 基于证据的合成充分利用集体智慧。
合成Agent提示模板:
markdown
你需要从竞争性实现方案和评估结果中合成最佳方案。
<task>
{任务描述}
</task>
<solutions>
{所有候选方案的路径列表}
</solutions>
<evaluation_reports>
{所有评估报告的路径列表}
</evaluation_reports>
<output>
{根据任务描述和上下文,按照solution.md格式定义预期输出。结果应为任务的完整解决方案。}
</output>
说明:
让我们逐步完成合成,以创建最佳的组合方案。
1. 首先,仔细查看所有方案和评估报告
2. 梳理共识内容:
- 多个裁判称赞了各方案的哪些优势?
- 多个裁判批评了各方案的哪些不足?
3. 针对每个主要组件或环节,思考:
- 哪个方案处理得最好及原因?
- 混合路径是否更优?
4. 创建最佳方案:
- 当某方案明显更优时,直接复制文本
- 当混合路径更优时,结合多种路径
- 修复所有指出的问题
- 保留各方案的最佳元素
5. 说明合成决策:
- 从每个方案中借鉴了什么
- 做出这些选择的原因
- 如何解决指出的不足
CRITICAL:不得完全从零开始创建新内容。仅从现有方案中合成最佳部分。Outputs (All Strategies)
所有策略的通用输出
- Candidate solutions: (in specified output location)
{solution-file}.[a|b|c].[ext] - Evaluation reports:
.specs/reports/{solution-name}-{date}.[1|2|3].md - Resulting solution:
{output_path}
- 候选方案: (存储在指定输出位置)
{solution-file}.[a|b|c].[ext] - 评估报告:
.specs/reports/{solution-name}-{date}.[1|2|3].md - 最终方案:
{output_path}
Strategy-Specific Outputs
各策略专属输出
- SELECT_AND_POLISH: Polished solution based on winning solution
- REDESIGN: Do not stop, return to phase 1 and eventiualy should result in finish at SELECT_AND_POLISH or FULL_SYNTHESIS strategies
- FULL_SYNTHESIS: Synthesized solution combined best from all
- SELECT_AND_POLISH:基于胜出方案优化后的方案
- REDESIGN:不停止,返回Phase 1,最终应通过SELECT_AND_POLISH或FULL_SYNTHESIS策略完成
- FULL_SYNTHESIS:结合所有方案最佳部分的合成方案
Orcestrator Reply
编排器回复
Once command execution is complete, reply to user with following structure:
markdown
undefined命令执行完成后,按照以下结构回复用户:
markdown
undefinedExecution Summary
执行摘要
Original Task: {task_description}
Strategy Used: {strategy} ({reason})
原始任务:{任务描述}
使用策略:{strategy}({reason})
Results
结果
| Phase | Agents | Models | Status |
|---|---|---|---|
| Phase 1: Competitive Generation + Meta-Judge | 4 (3 generators + 1 meta-judge) | opus x 4 | [Complete / Failed] |
| Phase 2: Multi-Judge Evaluation | 3 | opus x 3 | [Complete / Failed] |
| Phase 2.5: Adaptive Strategy Selection | orchestrator | - | {strategy} |
| Phase 3: [Synthesis/Polish/Redesign] | [N] | [model] | [Complete / Failed] |
Files Created
Final Solution:
- {output_path} - Synthesized production-ready command
Candidate Solutions:
- {solution-file}.[a|b|c].[ext] (Score: [X.X]/5.0)
Evaluation Reports:
- .specs/reports/{solution-file}-{date}.[1|2|3].md (Vote: [Solution A/B/C])
Synthesis Decisions
| Element | Source | Rationale |
|---|---|---|
| [element] | Solution [B/A/C] | [rationale] |
</output>| 阶段 | Agent数量 | 使用模型 | 状态 |
|---|---|---|---|
| Phase 1: 竞争性生成 + 元裁判 | 4个(3个生成Agent + 1个元裁判) | opus x 4 | [已完成 / 失败] |
| Phase 2: 多裁判评估 | 3个 | opus x 3 | [已完成 / 失败] |
| Phase 2.5: 自适应策略选择 | 编排器 | - | {strategy} |
| Phase 3: [合成/优化/重新设计] | [数量] | [模型] | [已完成 / 失败] |
创建的文件
最终方案:
- {output_path} - 可用于生产环境的合成方案
候选方案:
- {solution-file}.[a|b|c].[ext](得分: [X.X]/5.0)
评估报告:
- .specs/reports/{solution-file}-{date}.[1|2|3].md(投票: [方案A/B/C])
合成决策
| 组件 | 来源方案 | 理由 |
|---|---|---|
| [组件] | 方案[B/A/C] | [理由] |
</output>Best Practices
最佳实践
Meta-Judge + Judge Verification
元裁判 + 裁判验证
- Never skip meta-judge - Tailored evaluation criteria produce better judgments than generic ones
- Meta-judge runs once - Same specification for all 3 judges
- Include CLAUDE_PLUGIN_ROOT - Both meta-judge and judges need the resolved plugin root path
- Meta-judge YAML - Pass only the meta-judge YAML to judges, do not add any additional text or comments to it!
- 切勿跳过元裁判 - 定制化评估标准比通用标准能产生更优判断
- 元裁判仅运行一次 - 所有3个裁判使用同一评估规范
- 包含CLAUDE_PLUGIN_ROOT - 元裁判和裁判都需要解析后的插件根路径
- 元裁判YAML - 仅将元裁判生成的YAML传递给裁判,不得添加任何额外文本或注释!
Common Pitfalls
常见陷阱
- Using for trivial tasks - Overhead not justified
- Vague task descriptions - Leads to incomparable solutions
- Insufficient context - Agents can't produce quality work
- Forcing synthesis when clear winner exists - Wastes cost and risks degrading quality
- Synthesizing fundamentally flawed solutions - Better to redesign than polish garbage
- Skipping meta-judge - Hardcoded criteria are less effective than tailored ones
- Modifying meta-judge YAML before passing to judges - Judges must receive exact specification
Do:
- Well-defined task with clear constraints
- Rich context for informed decisions
- Trust adaptive strategy selection
- Polish clear winners, synthesize split decisions, redesign failures
- Dispatch meta-judge in parallel with generators for speed
- 用于琐碎任务 - 成本开销得不偿失
- 任务描述模糊 - 导致方案无法对比
- 上下文不足 - Agent无法产出高质量成果
- 存在明确胜出方案时仍强制合成 - 浪费成本且可能降低质量
- 合成存在根本性缺陷的方案 - 重新设计比优化劣质方案更好
- 跳过元裁判 - 硬编码标准的效果不如定制化标准
- 传递给裁判前修改元裁判YAML - 裁判必须接收原始规范
正确做法:
- 任务定义清晰,约束明确
- 提供丰富上下文以支持明智决策
- 信任自适应策略选择
- 优化明确胜出方案、合成分歧决策、重新设计失败方案
- 并行调度元裁判和生成Agent以提升速度
Examples
示例
Example 1: API Design (Clear Winner - SELECT_AND_POLISH)
示例1:API设计(明确胜出方案 - SELECT_AND_POLISH)
bash
/do-competitively "Design REST API for user management (CRUD + auth)" \
--output "specs/api/users.md" \
--criteria "RESTfulness,security,scalability,developer-experience"Phase 1 outputs (4 parallel agents):
- Meta-judge: evaluation specification YAML with 5 criteria dimensions, comparative rubrics
- - Resource-based design with nested routes
specs/api/users.a.md - - Action-based design with RPC-style endpoints
specs/api/users.b.md - - Minimal design, missing auth consideration
specs/api/users.c.md
Phase 2 outputs (assuming date 2025-01-15, 3 judges using meta-judge specification):
-
:
.specs/reports/users-api-2025-01-15.1.mdVOTE: Solution A SCORES: A=4.5/5.0, B=3.2/5.0, C=2.8/5.0"Most RESTful, good security" -
:
.specs/reports/users-api-2025-01-15.2.mdVOTE: Solution A SCORES: A=4.3/5.0, B=3.5/5.0, C=2.6/5.0"Clean resource design, scalable" -
:
.specs/reports/users-api-2025-01-15.3.mdVOTE: Solution A SCORES: A=4.6/5.0, B=3.0/5.0, C=2.9/5.0"Best practices, clear structure"
Phase 2.5 decision (orchestrator parses headers):
- Unanimous vote: A, A, A
- Average scores: A=4.5, B=3.2, C=2.8
- Strategy: SELECT_AND_POLISH
- Reason: Unanimous winner with >1.0 point gap
Phase 3 output:
- - Solution A polished with:
specs/api/users.md- Added rate limiting documentation (from B)
- Simplified nested routes (judge feedback)
- Total cost: 8 agents (4 Phase 1 + 3 judges + 1 polish)
bash
/do-competitively "设计用户管理REST API(CRUD + 认证)" \
--output "specs/api/users.md" \
--criteria "REST规范性,安全性,可扩展性,开发者体验"Phase 1输出(4个并行Agent):
- 元裁判:包含5个标准维度的评估规范YAML,带对比评分标准
- - 基于资源的设计,带嵌套路由
specs/api/users.a.md - - 基于动作的设计,带RPC风格端点
specs/api/users.b.md - - 极简设计,缺失认证考虑
specs/api/users.c.md
Phase 2输出(假设日期为2025-01-15,3个裁判使用元裁判规范):
-
:
.specs/reports/users-api-2025-01-15.1.mdVOTE: 方案A SCORES: A=4.5/5.0, B=3.2/5.0, C=2.8/5.0"最符合REST规范,安全性良好" -
:
.specs/reports/users-api-2025-01-15.2.mdVOTE: 方案A SCORES: A=4.3/5.0, B=3.5/5.0, C=2.6/5.0"清晰的资源设计,可扩展性强" -
:
.specs/reports/users-api-2025-01-15.3.mdVOTE: 方案A SCORES: A=4.6/5.0, B=3.0/5.0, C=2.9/5.0"符合最佳实践,结构清晰"
Phase 2.5决策(编排器解析标题):
- 一致投票:A、A、A
- 平均得分:A=4.5,B=3.2,C=2.8
- 策略:SELECT_AND_POLISH
- 原因:一致胜出,分差>1.0
Phase 3输出:
- - 优化后的方案A,包含:
specs/api/users.md- 添加限流文档(来自方案B)
- 简化嵌套路由(基于裁判反馈)
- 总Agent使用量:8个(Phase1的4个 + 3个裁判 + 1个优化Agent)
Example 2: Algorithm Selection (Split Decision - FULL_SYNTHESIS)
示例2:算法选择(决策分歧 - FULL_SYNTHESIS)
bash
/do-competitively "Design caching strategy for high-traffic API" \
--output "specs/caching.md" \
--criteria "performance,memory-efficiency,simplicity,reliability"Phase 1 outputs (4 parallel agents):
- Meta-judge: evaluation specification YAML with 4 criteria dimensions, comparative rubrics
- - Redis with LRU eviction
specs/caching.a.md - - Multi-tier cache (memory + Redis)
specs/caching.b.md - - CDN + application cache
specs/caching.c.md
Phase 2 outputs (assuming date 2025-01-15, 3 judges using meta-judge specification):
-
:
.specs/reports/caching-2025-01-15.1.mdVOTE: Solution B SCORES: A=3.8/5.0, B=4.2/5.0, C=3.9/5.0"Best performance, complex" -
:
.specs/reports/caching-2025-01-15.2.mdVOTE: Solution A SCORES: A=4.0/5.0, B=3.9/5.0, C=3.7/5.0"Simple, reliable, proven" -
:
.specs/reports/caching-2025-01-15.3.mdVOTE: Solution C SCORES: A=3.6/5.0, B=4.0/5.0, C=4.1/5.0"Global reach, cost-effective"
Phase 2.5 decision (orchestrator parses headers):
- Split votes: B, A, C (no consensus)
- Average scores: A=3.8, B=4.0, C=3.9
- Score gap: 4.0 - 3.9 = 0.1 (<1.0 threshold)
- Strategy: FULL_SYNTHESIS
- Reason: Split decision, all solutions >=3.0, no clear winner
Phase 3 output:
- - Hybrid approach:
specs/caching.md- Multi-tier architecture (from B)
- Simple LRU policy (from A)
- CDN for static content (from C)
- Total cost: 8 agents (4 Phase 1 + 3 judges + 1 synthesis)
bash
/do-competitively "为高流量API设计缓存策略" \
--output "specs/caching.md" \
--criteria "性能,内存效率,简洁性,可靠性"Phase 1输出(4个并行Agent):
- 元裁判:包含4个标准维度的评估规范YAML,带对比评分标准
- - 带LRU淘汰机制的Redis缓存
specs/caching.a.md - - 多层缓存(内存 + Redis)
specs/caching.b.md - - CDN + 应用层缓存
specs/caching.c.md
Phase 2输出(假设日期为2025-01-15,3个裁判使用元裁判规范):
-
:
.specs/reports/caching-2025-01-15.1.mdVOTE: 方案B SCORES: A=3.8/5.0, B=4.2/5.0, C=3.9/5.0"性能最佳,复杂度较高" -
:
.specs/reports/caching-2025-01-15.2.mdVOTE: 方案A SCORES: A=4.0/5.0, B=3.9/5.0, C=3.7/5.0"简洁、可靠、经过验证" -
:
.specs/reports/caching-2025-01-15.3.mdVOTE: 方案C SCORES: A=3.6/5.0, B=4.0/5.0, C=4.1/5.0"覆盖全球,成本效益高"
Phase 2.5决策(编排器解析标题):
- 投票分歧:B、A、C(无共识)
- 平均得分:A=3.8,B=4.0,C=3.9
- 分差:4.0 - 3.9 = 0.1(<1.0阈值)
- 策略:FULL_SYNTHESIS
- 原因:决策分歧,所有方案得分>=3.0,无明确胜出者
Phase 3输出:
- - 混合路径:
specs/caching.md- 多层架构(来自方案B)
- 简洁的LRU策略(来自方案A)
- 静态内容CDN缓存(来自方案C)
- 总Agent使用量:8个(Phase1的4个 + 3个裁判 + 1个合成Agent)
Example 3: Authentication Design (All Flawed - REDESIGN)
示例3:认证系统设计(所有方案均存在缺陷 - REDESIGN)
bash
/do-competitively "Design authentication system with social login" \
--output "specs/auth.md" \
--criteria "security,user-experience,maintainability"Phase 1 outputs (4 parallel agents):
- Meta-judge: evaluation specification YAML with 3 criteria dimensions, comparative rubrics
- - Custom OAuth2 implementation
specs/auth.a.md - - Session-based with social providers
specs/auth.b.md - - JWT with password-only auth
specs/auth.c.md
Phase 2 outputs (assuming date 2025-01-15, 3 judges using meta-judge specification):
-
:
.specs/reports/auth-2025-01-15.1.mdVOTE: Solution A SCORES: A=2.5/5.0, B=2.2/5.0, C=2.3/5.0"Security risks, reinventing wheel" -
:
.specs/reports/auth-2025-01-15.2.mdVOTE: Solution B SCORES: A=2.4/5.0, B=2.8/5.0, C=2.1/5.0"Sessions don't scale, missing requirements" -
:
.specs/reports/auth-2025-01-15.3.mdVOTE: Solution C SCORES: A=2.6/5.0, B=2.5/5.0, C=2.3/5.0"No social login, security concerns"
Phase 2.5 decision (orchestrator parses headers):
-
Split votes: A, B, C (no consensus)
-
Average scores: A=2.5, B=2.5, C=2.2 (ALL <3.0)
-
Strategy: REDESIGN
-
Reason: All solutions below 3.0 threshold, fundamental issues
-
Do not stop, return to phase 1 and eventiualy should result in finish at SELECT_AND_POLISH or FULL_SYNTHESIS strategies
bash
/do-competitively "设计带社交登录的认证系统" \
--output "specs/auth.md" \
--criteria "安全性,用户体验,可维护性"Phase 1输出(4个并行Agent):
- 元裁判:包含3个标准维度的评估规范YAML,带对比评分标准
- - 自定义OAuth2实现
specs/auth.a.md - - 基于会话的社交登录方案
specs/auth.b.md - - 仅支持密码登录的JWT方案
specs/auth.c.md
Phase 2输出(假设日期为2025-01-15,3个裁判使用元裁判规范):
-
:
.specs/reports/auth-2025-01-15.1.mdVOTE: 方案A SCORES: A=2.5/5.0, B=2.2/5.0, C=2.3/5.0"存在安全风险,重复造轮子" -
:
.specs/reports/auth-2025-01-15.2.mdVOTE: 方案B SCORES: A=2.4/5.0, B=2.8/5.0, C=2.1/5.0"会话无法扩展,缺失需求" -
:
.specs/reports/auth-2025-01-15.3.mdVOTE: 方案C SCORES: A=2.6/5.0, B=2.5/5.0, C=2.3/5.0"无社交登录,存在安全隐患"
Phase 2.5决策(编排器解析标题):
-
投票分歧:A、B、C(无共识)
-
平均得分:A=2.5,B=2.5,C=2.2(全部<3.0)
-
策略:REDESIGN
-
原因:所有方案得分低于3.0阈值,存在根本性问题
-
不停止,返回Phase 1,最终应通过SELECT_AND_POLISH或FULL_SYNTHESIS策略完成