agent-collaboration
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgent Collaboration
Agent协作
Orchestrate multiple AI models as specialized agents — each assigned to what it does best. One model plans, another codes, another reviews. The planner stays in the loop, re-entering after every phase to evaluate and redirect.
将多个AI模型编排为专业Agent——每个Agent专注于其最擅长的领域。一个模型负责规划,另一个负责编码,还有一个负责评审。规划Agent全程参与,在每个阶段结束后重新介入,评估结果并调整方向。
When to Use
适用场景
- Complex projects requiring planning, implementation, and review
- Tasks where a single model's blind spots are a risk
- When you want adversarial review to catch what self-review cannot
- Research-heavy work requiring web search, synthesis, and validation
- Math or science tasks requiring specialized reasoning
- Any task benefiting from a plan → execute → review → replan loop
- When the user explicitly asks for multi-model collaboration
- 需要规划、实现和评审的复杂项目
- 单一模型存在盲点风险的任务
- 需要对抗性评审来发现自我评审无法察觉问题的场景
- 需要网页搜索、信息综合和验证的研究型工作
- 需要专业推理的数学或科学任务
- 任何能从「规划→执行→评审→重新规划」循环中获益的任务
- 用户明确要求多模型协作的情况
When NOT to Use
不适用场景
- One-line fixes, typo corrections, simple questions
- Tasks fully within a single model's strength
- When speed matters more than thoroughness
- Exploratory conversations without a concrete deliverable
- 单行修复、拼写纠正、简单问题
- 完全在单一模型能力范围内的任务
- 速度优先于完备性的场景
- 没有具体交付成果的探索性对话
Philosophy
核心理念
One model cannot be the best at everything. Benchmarks consistently show different model families excel at different tasks. Claude Opus excels at planning and abstract reasoning. GPT-5.4 leads at code implementation. Gemini 3.1 Pro dominates math, science, and knowledge retrieval. Grok 4 brings contrarian perspective. Combining these as specialized agents outperforms any single model on complex tasks.
The planner is the conductor. It decomposes, delegates, evaluates, and replans. Every other agent reports back to the planner. The planner never edits files — it reads code for context and spawns sub-agents, but its authority comes from directing, not doing.
Adversarial review is not optional. A different model family reviewing the work catches failure modes that self-review cannot. The adversarial reviewer's job is to find problems, not to be diplomatic.
Agents are disposable, context is not. Each agent may be stateless, but the handoff between agents must preserve all relevant context. The planner is responsible for ensuring no information is lost between phases.
没有任何一个模型能做到全能。 基准测试持续显示,不同模型家族在不同任务上各有所长。Claude Opus擅长规划和抽象推理,GPT-5.4在代码实现方面领先,Gemini 3.1 Pro在数学、科学和知识检索领域表现突出,Grok 4则能提供逆向视角。将这些模型作为专业Agent组合使用,在复杂任务上的表现优于任何单一模型。
规划Agent是指挥者。 它负责分解任务、分配工作、评估结果和重新规划。其他所有Agent都向规划Agent汇报。规划Agent从不直接编辑文件——它仅读取代码获取上下文,并生成子Agent,其权威性来自指令下达而非直接执行。
对抗性评审不可或缺。 由不同模型家族进行评审,能发现自我评审无法察觉的失效模式。对抗性评审者的职责是找出问题,而非保持外交姿态。
Agent可丢弃,但上下文不可丢失。 每个Agent可能是无状态的,但Agent之间的交接必须保留所有相关上下文。规划Agent负责确保各阶段之间没有信息丢失。
The Seven Agents
七大Agent角色
1. Planner
1. 规划Agent
- Role: Decompose complex tasks into subtasks, assign each to the right agent, define success criteria, evaluate results, replan when needed
- Primary model: Claude Opus 4.6 (extended thinking)
- Fallback: Claude Opus 4.5, GPT-5.4 (high reasoning)
- Why Opus: #1 Arena overall (1504 Elo), #1 Hard Prompts, best abstract reasoning (ARC-AGI 2: 68.8%). Extended thinking excels at structured decomposition and multi-step planning
- Tools: No file edits. The planner reads code, runs read-only shell commands (git log, ls), and spawns sub-agents — but never writes or edits files
- Output: Structured plan in YAML with subtask assignments, dependencies, and success criteria
- 职责: 将复杂任务分解为子任务,为每个子任务分配合适的Agent,定义成功标准,评估结果,必要时重新规划
- 主模型: Claude Opus 4.6(扩展思考模式)
- 备选模型: Claude Opus 4.5、GPT-5.4(高推理能力)
- 选择Opus的原因: Arena总排名第1(1504 Elo),难题排名第1,抽象推理能力最强(ARC-AGI 2得分68.8%)。扩展思考模式擅长结构化分解和多步骤规划
- 工具权限: 无文件编辑权限。规划Agent可读取代码、执行只读Shell命令(如git log、ls),并生成子Agent,但从不写入或编辑文件
- 输出: 包含子任务分配、依赖关系和成功标准的结构化YAML规划
2. Coder
2. 编码Agent
- Role: Implement code changes, write tests, fix bugs, refactor. Follows the plan exactly
- Primary model: GPT-5.4 (high reasoning)
- Fallback: Claude Sonnet 4.5, Claude Sonnet 4.6
- Why GPT-5.4: Leads Aider coding leaderboard (88%). Fast, precise, excellent at turning plans into working code
- Why Sonnet 4.5 as fallback: Leads SWE-bench Verified (82%). Strong at real-world software engineering tasks
- Tools: Full file system access — read, write, edit, terminal, package managers
- Output: Changed files, test results, implementation summary
- 职责: 实现代码变更、编写测试、修复Bug、重构代码。严格遵循规划执行
- 主模型: GPT-5.4(高推理能力)
- 备选模型: Claude Sonnet 4.5、Claude Sonnet 4.6
- 选择GPT-5.4的原因: Aider编码排行榜第1(得分88%),速度快、精度高,擅长将规划转化为可运行代码
- 选择Sonnet 4.5作为备选的原因: SWE-bench Verified排名第1(得分82%),在实际软件工程任务中表现出色
- 工具权限: 完整文件系统访问权限——读取、写入、编辑、终端操作、包管理器使用
- 输出: 修改后的文件、测试结果、实现总结
3. Researcher
3. 研究Agent
- Role: Web search, documentation lookup, API exploration, literature review, competitive analysis, summarization
- Primary model: Gemini 3.1 Pro
- Fallback: Claude Opus 4.6
- Why Gemini 3.1 Pro: Leads Humanity's Last Exam (45.8%), top MMMLU (91.8%). Exceptional at finding and synthesizing information across broad knowledge domains
- Why Opus as fallback: Best on BrowseComp (web research synthesis). Excels at connecting disparate information
- Tools: Web search, web fetch, file read. No file edits — the researcher reports, it doesn't implement
- Output: Research summary with source attribution, key findings, decision-relevant tradeoffs
- 职责: 网页搜索、文档查阅、API探索、文献综述、竞品分析、信息汇总
- 主模型: Gemini 3.1 Pro
- 备选模型: Claude Opus 4.6
- 选择Gemini 3.1 Pro的原因: Humanity's Last Exam得分45.8%,MMMLU得分91.8%,在跨广泛知识领域查找和综合信息方面表现卓越
- 选择Opus作为备选的原因: BrowseComp(网页研究综合)表现最佳,擅长连接分散的信息
- 工具权限: 网页搜索、网页抓取、文件读取。无文件编辑权限——研究Agent仅汇报结果,不负责实现
- 输出: 带有来源标注、关键发现和决策相关权衡的研究总结
4. Scientist
4. 科研Agent
- Role: Mathematical reasoning, formal proofs, statistical modeling, data analysis, algorithm verification, scientific computation
- Primary model: Gemini 3 Pro
- Fallback: GPT 5.2, Claude Opus 4.6
- Why Gemini 3 Pro: Scores 100% on AIME 2025, 94.3% GPQA Diamond. Exceptional at step-by-step mathematical reasoning and formal proofs
- Why GPT 5.2 as fallback: Also 100% on AIME 2025, 92.4% GPQA Diamond
- Tools: Code execution (for computation and verification), file read/write for results. Web access not typically needed
- Output: Formal analysis, proofs, computed results with methodology
- 职责: 数学推理、形式化证明、统计建模、数据分析、算法验证、科学计算
- 主模型: Gemini 3 Pro
- 备选模型: GPT 5.2、Claude Opus 4.6
- 选择Gemini 3 Pro的原因: AIME 2025得分100%,GPQA Diamond得分94.3%,在分步数学推理和形式化证明方面表现卓越
- 选择GPT 5.2作为备选的原因: AIME 2025同样得分100%,GPQA Diamond得分92.4%
- 工具权限: 代码执行(用于计算和验证)、结果文件读写。通常无需网页访问权限
- 输出: 形式化分析、证明、带方法论的计算结果
5. Visual Analyst
5. 视觉分析Agent
- Role: Image analysis, UI/UX review, diagram interpretation, screenshot analysis, visual regression detection, design system compliance
- Primary model: Claude Opus 4.6
- Fallback: Gemini 3.1 Pro
- Why Opus: ARC-AGI 2: 68.8% (dominant lead in abstract visual reasoning). Strong multimodal understanding with structured output
- Why Gemini as fallback: MMMU-Pro 80.5%. Excellent at interpreting complex visual content
- Tools: Image reading, screenshot capture, file read. No file edits — reports visual findings
- Output: Visual analysis report with specific observations, issues, and recommendations
- 职责: 图像分析、UI/UX评审、图表解读、截图分析、视觉回归检测、设计系统合规性检查
- 主模型: Claude Opus 4.6
- 备选模型: Gemini 3.1 Pro
- 选择Opus的原因: ARC-AGI 2得分68.8%(在抽象视觉推理领域占据主导优势),具备强大的多模态理解能力和结构化输出能力
- 选择Gemini作为备选的原因: MMMU-Pro得分80.5%,擅长解读复杂视觉内容
- 工具权限: 图像读取、截图捕获、文件读取。无文件编辑权限——仅汇报视觉发现
- 输出: 包含具体观察结果、问题和建议的视觉分析报告
6. Adversarial Reviewer
6. 对抗性评审Agent
- Role: Find flaws, security vulnerabilities, edge cases, logical errors, incorrect assumptions, race conditions, and performance problems. Challenge every decision. Assume the code is broken until proven otherwise
- Primary model: Grok 4
- Fallback: Gemini 3.1 Pro, Claude Opus 4.6
- Why Grok 4: #4 Arena overall with a direct, contrarian communication style. Using a fundamentally different model family than the coder ensures genuine adversarial perspective, not self-congratulatory review
- Why a different model family matters: Models from the same family share similar blind spots. Cross-family review catches what same-family review misses
- Tools: Read-only. The adversarial reviewer never edits — it produces a list of issues ranked by severity
- Output: Issues list with severity (critical/high/medium/low), reproduction steps, and suggested fixes
- 职责: 查找缺陷、安全漏洞、边缘情况、逻辑错误、错误假设、竞争条件和性能问题。质疑每一个决策。在被证明正确之前,默认代码存在问题
- 主模型: Grok 4
- 备选模型: Gemini 3.1 Pro、Claude Opus 4.6
- 选择Grok 4的原因: Arena总排名第4,沟通风格直接且逆向。与编码Agent使用完全不同的模型家族,确保真正的对抗性视角,而非自我吹捧式评审
- 选择不同模型家族的重要性: 同一模型家族的模型存在相似的盲点。跨家族评审能发现同家族评审遗漏的问题
- 工具权限: 只读权限。对抗性评审Agent从不编辑文件——仅生成按严重性排序的问题列表
- 输出: 包含严重性(关键/高/中/低)、复现步骤和建议修复方案的问题列表
7. Peer Reviewer
7. 同行评审Agent
- Role: Quality assessment, architecture review, style consistency, best practices, documentation review, maintainability analysis
- Primary model: Claude Opus 4.6
- Fallback: GPT-4o
- Why Opus: Excels at structured, thorough analysis. Balances pragmatism with quality standards
- Why GPT-4o as fallback: Shows least positivity bias in peer review (per AI Scientist research, Sakana AI). Honest without being hostile
- Tools: Read-only. Produces a review with an explicit verdict: approve, request changes, or reject
- Output: Structured review with verdict, praise for good decisions, and specific change requests
- 职责: 质量评估、架构评审、风格一致性检查、最佳实践验证、文档评审、可维护性分析
- 主模型: Claude Opus 4.6
- 备选模型: GPT-4o
- 选择Opus的原因: 擅长结构化、全面的分析,在务实性和质量标准之间取得平衡
- 选择GPT-4o作为备选的原因: 在同行评审中表现出最低的积极偏见(根据AI Scientist研究,Sakana AI),诚实且不带有敌意
- 工具权限: 只读权限。生成带有明确结论的评审:批准、要求修改或拒绝
- 输出: 包含结论、对正确决策的认可以及具体修改要求的结构化评审报告
How It Works
工作原理
This is a manual dispatch workflow — you (or your primary agent session) are the dispatcher. The agents do not self-orchestrate. You follow the orchestration loop below, invoking each agent as needed and passing context between them using the handoff protocol. The skill provides the workflow patterns, agent definitions, and handoff formats. You provide the judgment calls.
这是一个手动调度工作流——你(或你的主Agent会话)是调度者。Agent不会自我编排。你需要遵循以下编排循环,根据需要调用每个Agent,并使用交接协议在Agent之间传递上下文。本技能提供工作流模式、Agent定义和交接格式,你负责做出判断。
The Orchestration Loop
编排循环
Every complex task follows this loop. The planner is always the entry and exit point.
┌──────────────────────────────────────────────┐
│ PLANNER │
│ Claude Opus 4.6 (thinking) │
│ │
│ 1. Analyze the full task and constraints │
│ 2. Break into concrete subtasks │
│ 3. Assign each subtask to an agent role │
│ 4. Define success criteria per subtask │
│ 5. Specify execution order and dependencies │
│ 6. Identify which subtasks can run parallel │
└──────────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ EXECUTION PHASE │
│ (parallel where no dependencies) │
│ │
│ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ Coder │ │Researcher │ │Scientist │ │
│ │ GPT-5.4 │ │Gemini 3.1 │ │Gemini 3 │ │
│ └────┬─────┘ └─────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────────────────────────────┐ │
│ │ Results + Artifacts │ │
│ └────────────────────────────────────────┘ │
└──────────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ REVIEW PHASE │
│ (both reviewers in parallel for Patterns A-D)│
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ Adversarial │ │ Peer │ │
│ │ Grok 4 │ │ Claude Opus │ │
│ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌────────────────────────────────────────┐ │
│ │ Review Verdicts + Issue Lists │ │
│ └────────────────────────────────────────┘ │
└──────────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ PLANNER RE-ENTERS │
│ │
│ Evaluates all review feedback: │
│ │
│ • All clear → Accept and complete │
│ • Minor issues → Send back to coder │
│ • Major issues → Replan from scratch │
│ • Reviewers disagree → Planner adjudicates │
│ • New information → Update plan, re-execute │
└──────────────────────────────────────────────┘每个复杂任务都遵循此循环。规划Agent始终是入口和出口点。
┌──────────────────────────────────────────────┐
│ 规划Agent │
│ Claude Opus 4.6(思考模式) │
│ │
│ 1. 分析完整任务和约束条件 │
│ 2. 分解为具体子任务 │
│ 3. 为每个子任务分配Agent角色 │
│ 4. 定义每个子任务的成功标准 │
│ 5. 指定执行顺序和依赖关系 │
│ 6. 识别可并行执行的子任务 │
└──────────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ 执行阶段 │
│ (无依赖关系的任务可并行执行) │
│ │
│ ┌──────────┐ ┌───────────┐ ┌──────────┐ │
│ │ 编码Agent │ │ 研究Agent │ │ 科研Agent │ │
│ │ GPT-5.4 │ │Gemini 3.1 │ │Gemini 3 │ │
│ └────┬─────┘ └─────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────────────────────────────────────┐ │
│ │ 结果 + 工件 │
│ └────────────────────────────────────────┘ │
└──────────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ 评审阶段 │
│ (模式A-D中两个评审Agent可并行执行) │
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ 对抗性评审Agent │ 同行评审Agent │ │
│ │ Grok 4 │ │ Claude Opus │ │
│ └───────┬───────┘ └───────┬───────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌────────────────────────────────────────┐ │
│ │ 评审结论 + 问题列表 │
│ └────────────────────────────────────────┘ │
└──────────────────┬───────────────────────────┘
│
▼
┌──────────────────────────────────────────────┐
│ 规划Agent重新介入 │
│ │
│ 评估所有评审反馈: │
│ │
│ • 无问题 → 接受并完成任务 │
│ • minor问题 → 返回给编码Agent修复 │
│ • major问题 → 从头重新规划 │
│ • 评审意见不一致 → 规划Agent裁决 │
│ • 出现新信息 → 更新规划,重新执行 │
└──────────────────────────────────────────────┘Maximum Iterations
最大迭代次数
To prevent infinite loops, enforce these limits:
- Code → Review cycles: Maximum 3 iterations. If the coder hasn't satisfied reviewers after 3 rounds, the planner must simplify the approach or escalate to the user
- Full replan: Maximum 2 replans per task. After 2, the planner presents what it has with known issues documented
- Individual agent timeout: If any agent hasn't produced useful output after a reasonable effort, the planner reassigns to fallback model or simplifies the subtask
为防止无限循环,强制执行以下限制:
- 编码→评审循环: 最多3次迭代。如果编码Agent在3轮后仍未满足评审要求,规划Agent必须简化方案或向用户升级问题
- 完整重新规划: 每个任务最多重新规划2次。2次后,规划Agent需提交现有成果并记录已知问题
- 单个Agent超时: 如果任何Agent在合理时间内未产生有用输出,规划Agent需重新分配给备选模型或简化子任务
Handoff Protocol
交接协议
When agents pass work to each other, the handoff must be structured. The planner constructs each handoff — agents don't communicate directly.
Pragmatic note: The YAML formats below are aspirational templates, not strict contracts. Real models will not always output perfect YAML. The planner should extract the relevant information from whatever format the agent produces — structured YAML, markdown, or free text. What matters is that the information flows correctly between phases, not that the formatting is exact. If an agent returns free text instead of YAML, the planner should extract the key fields (status, summary, files changed, issues found) and construct the next handoff manually.
当Agent之间传递工作时,交接必须结构化。规划Agent负责构建每个交接——Agent之间不直接通信。
实用提示: 以下YAML格式是理想模板,而非严格契约。实际模型并不总是输出完美的YAML。规划Agent应从Agent生成的任何格式(结构化YAML、Markdown或自由文本)中提取相关信息。重要的是信息在各阶段之间正确流动,而非格式完全准确。如果Agent返回自由文本而非YAML,规划Agent应提取关键字段(状态、总结、修改的文件、发现的问题)并手动构建下一个交接。
Planner → Execution Agent
规划Agent → 执行Agent
yaml
handoff:
to: coder # agent role
task_id: 2
description: "Implement OAuth2 PKCE flow"
context: |
The codebase uses JWT tokens stored in httpOnly cookies.
Middleware at /api/auth/middleware.ts validates tokens on every request.
The researcher found 3 existing auth patterns (see research summary below).
Extend the JWT pattern — do not replace it.
dependencies_resolved:
- task_id: 1
agent: researcher
summary: "Found JWT, session, and API key auth patterns. JWT is most recent."
key_files:
- /api/auth/jwt.ts (lines 1-45)
- /api/auth/middleware.ts (lines 12-30)
constraints:
- "Must extend existing JWT pattern, not replace it"
- "Must be backward-compatible with existing middleware"
- "Must include tests"
success_criteria:
- "OAuth2 PKCE flow works end-to-end"
- "Existing auth tests still pass"
- "New tests cover PKCE-specific scenarios"yaml
handoff:
to: coder # Agent角色
task_id: 2
description: "实现OAuth2 PKCE流程"
context: |
代码库使用存储在httpOnly Cookie中的JWT令牌。
/api/auth/middleware.ts中的中间件在每个请求上验证令牌。
研究Agent发现了3种现有认证模式(见下方研究总结)。
扩展JWT模式——不要替换它。
dependencies_resolved:
- task_id: 1
agent: researcher
summary: "发现JWT、会话和API密钥认证模式。JWT是最新的。"
key_files:
- /api/auth/jwt.ts(第1-45行)
- /api/auth/middleware.ts(第12-30行)
constraints:
- "必须扩展现有JWT模式,不得替换"
- "必须与现有中间件向后兼容"
- "必须包含测试"
success_criteria:
- "OAuth2 PKCE流程端到端可用"
- "现有认证测试仍通过"
- "新测试覆盖PKCE特定场景"Execution Agent → Planner (Result)
执行Agent → 规划Agent(结果)
yaml
result:
from: coder
task_id: 2
status: complete # complete | partial | failed | blocked
summary: "Implemented PKCE flow in 3 files, added 8 tests"
artifacts:
files_changed:
- /api/auth/pkce.ts (new, 120 lines)
- /api/auth/middleware.ts (modified, added PKCE validation)
- /api/auth/__tests__/pkce.test.ts (new, 85 lines)
test_results: "8 passed, 0 failed"
notes: |
Used crypto.subtle for code verifier generation (Web Crypto API).
The middleware change is backward-compatible — existing JWT auth still works.
concerns:
- "Code verifier storage uses session — may need Redis for horizontal scaling"yaml
result:
from: coder
task_id: 2
status: complete # complete | partial | failed | blocked
summary: "在3个文件中实现了PKCE流程,添加了8个测试"
artifacts:
files_changed:
- /api/auth/pkce.ts(新增,120行)
- /api/auth/middleware.ts(修改,添加了PKCE验证)
- /api/auth/__tests__/pkce.test.ts(新增,85行)
test_results: "8个通过,0个失败"
notes: |
使用crypto.subtle生成代码验证器(Web Crypto API)。
中间件修改向后兼容——现有JWT认证仍可正常工作。
concerns:
- "代码验证器存储使用会话——水平扩展可能需要Redis"Planner → Review Agent
规划Agent → 评审Agent
yaml
handoff:
to: adversarial_reviewer
task_id: 4
description: "Security audit of OAuth2 PKCE implementation"
context: |
The coder implemented a PKCE flow. Review for security vulnerabilities,
edge cases, and correctness. Be especially critical of:
- Cryptographic operations (code verifier, code challenge)
- Token storage and transmission
- CSRF and replay attack vectors
- Error handling in auth flows
artifacts_to_review:
- /api/auth/pkce.ts
- /api/auth/middleware.ts
- /api/auth/__tests__/pkce.test.ts
implementation_summary: |
Uses crypto.subtle for code verifier. Session-based storage.
Middleware validates PKCE alongside existing JWT.yaml
handoff:
to: adversarial_reviewer
task_id: 4
description: "OAuth2 PKCE实现的安全审计"
context: |
编码Agent实现了PKCE流程。评审其安全性漏洞、
边缘情况和正确性。特别关注:
- 加密操作(代码验证器、代码挑战)
- 令牌存储和传输
- CSRF和重放攻击向量
- 认证流程中的错误处理
artifacts_to_review:
- /api/auth/pkce.ts
- /api/auth/middleware.ts
- /api/auth/__tests__/pkce.test.ts
implementation_summary: |
使用crypto.subtle生成代码验证器。基于会话的存储。
中间件在现有JWT验证之外添加了PKCE验证。Review Agent → Planner (Verdict)
评审Agent → 规划Agent(结论)
yaml
verdict:
from: adversarial_reviewer
task_id: 4
decision: request_changes # approve | request_changes | reject
critical_issues:
- severity: high
location: /api/auth/pkce.ts:45
issue: "Code verifier stored in plaintext session — if session is compromised, PKCE is defeated"
suggestion: "Hash the verifier before storage, compare hashes on validation"
- severity: medium
location: /api/auth/pkce.ts:78
issue: "No expiration on code challenge — replay attack window is unlimited"
suggestion: "Add 10-minute TTL on challenge, clean up expired entries"
minor_issues:
- severity: low
location: /api/auth/__tests__/pkce.test.ts
issue: "No test for expired challenge scenario"
positive_observations:
- "Good use of crypto.subtle over Math.random for verifier generation"
- "Backward compatibility with existing JWT flow is well-handled"yaml
verdict:
from: adversarial_reviewer
task_id: 4
decision: request_changes # approve | request_changes | reject
critical_issues:
- severity: high
location: /api/auth/pkce.ts:45
issue: "代码验证器以明文形式存储在会话中——如果会话被泄露,PKCE将失效"
suggestion: "存储前对验证器进行哈希,验证时比较哈希值"
- severity: medium
location: /api/auth/pkce.ts:78
issue: "代码挑战没有过期时间——重放攻击窗口无限"
suggestion: "为挑战添加10分钟TTL,清理过期条目"
minor_issues:
- severity: low
location: /api/auth/__tests__/pkce.test.ts
issue: "没有针对过期挑战场景的测试"
positive_observations:
- "使用crypto.subtle而非Math.random生成验证器,做法良好"
- "与现有JWT流程的向后兼容处理得当"Workflow Patterns
工作流模式
Pattern A: Plan → Code → Review (Default)
模式A:规划→编码→评审(默认)
The bread-and-butter for most development tasks.
Planner → Coder → [Adversarial + Peer Review] → PlannerUse when: Adding features, fixing bugs, refactoring code. Most tasks start here.
Planner behavior: Produces a single plan with clear subtasks. After review, decides whether to accept, revise, or restart.
大多数开发任务的标准流程。
规划Agent → 编码Agent → [对抗性评审 + 同行评审] → 规划Agent适用场景: 添加功能、修复Bug、重构代码。大多数任务从这里开始。
规划Agent行为: 生成带有明确子任务的单一规划。评审后,决定是否接受、修改或重新开始。
Pattern B: Research → Plan → Code → Review
模式B:研究→规划→编码→评审
When the task requires understanding before implementation.
Planner → Researcher → Planner (replan) → Coder → [Review] → PlannerUse when: Working with unfamiliar APIs, choosing between architectural approaches, integration tasks, anything where you need information before you can plan.
Planner behavior: First plan is "research phase only." After research completes, planner creates a new, informed implementation plan.
任务需要先理解再实现的场景。
规划Agent → 研究Agent → 规划Agent(重新规划)→ 编码Agent → [评审] → 规划Agent适用场景: 处理不熟悉的API、选择架构方案、集成任务、任何需要先获取信息才能规划的场景。
规划Agent行为: 初始规划仅为「研究阶段」。研究完成后,规划Agent创建新的、基于研究结果的实现规划。
Pattern C: Deep Analysis
模式C:深度分析
For math-heavy, scientific, or visual reasoning tasks.
Planner → [Scientist + Visual Analyst + Researcher] → Planner → Coder → [Review] → PlannerUse when: Data pipelines, ML models, algorithm implementation, visual regression testing, anything requiring formal correctness.
Planner behavior: Gathers analysis from multiple specialist agents before creating the implementation plan. The scientist's output directly constrains what the coder can do.
适用于数学密集型、科学或视觉推理任务。
规划Agent → [科研Agent + 视觉分析Agent + 研究Agent] → 规划Agent → 编码Agent → [评审] → 规划Agent适用场景: 数据管道、ML模型、算法实现、视觉回归测试、任何需要形式化正确性的任务。
规划Agent行为: 在创建实现规划前,收集多个专业Agent的分析结果。科研Agent的输出直接约束编码Agent的工作范围。
Pattern D: Full Pipeline
模式D:完整流水线
The complete workflow for large, complex tasks.
Planner → Researcher → Planner (replan) → [Coder + Scientist] → Visual Analyst → [Adversarial + Peer Review] → PlannerUse when: Major features, system design, architecture changes, anything high-stakes.
Planner behavior: Multiple replan cycles. Visual analyst checks UI after implementation. Full review before acceptance.
大型复杂任务的完整工作流。
规划Agent → 研究Agent → 规划Agent(重新规划)→ [编码Agent + 科研Agent] → 视觉分析Agent → [对抗性评审 + 同行评审] → 规划Agent适用场景: 主要功能、系统设计、架构变更、任何高风险任务。
规划Agent行为: 多次重新规划循环。视觉分析Agent在实现后检查UI。接受前进行全面评审。
Pattern E: Rapid Iteration
模式E:快速迭代
For quick fixes where full review would be overkill.
Planner → Coder → Adversarial Reviewer → PlannerUse when: Small bug fixes, minor refactors, documentation updates. Skip the peer reviewer — the adversarial pass catches security and correctness issues, which is enough for small changes.
Planner behavior: Lightweight plan, single review pass, fast completion.
适用于快速修复,全面评审过于繁琐的场景。
规划Agent → 编码Agent → 对抗性评审Agent → 规划Agent适用场景: 小型Bug修复、轻微重构、文档更新。跳过同行评审——对抗性评审可发现安全和正确性问题,这对小型变更已足够。
规划Agent行为: 轻量级规划、单次评审、快速完成。
Pattern F: Research-Only
模式F:仅研究
When you need information, not implementation.
Planner → [Researcher + Scientist] → Planner → SummaryUse when: Technical investigations, feasibility studies, competitive analysis, decision support.
Planner behavior: Synthesizes research and analysis into a decision-ready summary. No code is written.
仅需要信息,不需要实现的场景。
规划Agent → [研究Agent + 科研Agent] → 规划Agent → 总结适用场景: 技术调查、可行性研究、竞品分析、决策支持。
规划Agent行为: 将研究和分析结果合成为可用于决策的总结。不编写代码。
Planner Output Format
规划Agent输出格式
The planner produces a structured plan that other agents can follow. Use this format:
yaml
plan:
task: "Description of the overall task"
pattern: A # Which workflow pattern (A-F)
subtasks:
- id: 1
description: "Research existing auth patterns in the codebase"
agent: researcher
depends_on: []
success_criteria: "Summary of auth patterns with file locations and recommendations"
- id: 2
description: "Implement OAuth2 PKCE flow extending existing JWT auth"
agent: coder
depends_on: [1]
success_criteria: "Working OAuth2 PKCE flow with tests passing, backward-compatible"
- id: 3
description: "Verify cryptographic correctness of PKCE implementation"
agent: scientist
depends_on: [2]
success_criteria: "Formal verification that entropy, hashing, and timing are correct"
- id: 4
description: "Security audit — find vulnerabilities and edge cases"
agent: adversarial_reviewer
depends_on: [2]
success_criteria: "Security audit with no unaddressed critical or high issues"
- id: 5
description: "Architecture and quality review"
agent: peer_reviewer
depends_on: [2]
success_criteria: "Approved or specific changes requested"
execution_order:
- phase: 1
parallel: [1]
- phase: 2
parallel: [2]
- phase: 3
parallel: [3, 4, 5]
notes: |
Subtasks 3, 4, 5 can run in parallel since they all review the same output.
If review finds critical issues, we loop back to subtask 2 with fixes.规划Agent生成其他Agent可遵循的结构化规划。使用以下格式:
yaml
plan:
task: "整体任务描述"
pattern: A # 工作流模式(A-F)
subtasks:
- id: 1
description: "研究代码库中的现有认证模式"
agent: researcher
depends_on: []
success_criteria: "包含文件位置和建议的认证模式总结"
- id: 2
description: "扩展现有JWT认证,实现OAuth2 PKCE流程"
agent: coder
depends_on: [1]
success_criteria: "可运行的OAuth2 PKCE流程,测试通过,向后兼容"
- id: 3
description: "验证PKCE实现的加密正确性"
agent: scientist
depends_on: [2]
success_criteria: "对熵、哈希和计时的形式化验证正确"
- id: 4
description: "安全审计——查找漏洞和边缘情况"
agent: adversarial_reviewer
depends_on: [2]
success_criteria: "安全审计无未解决的关键或高风险问题"
- id: 5
description: "架构和质量评审"
agent: peer_reviewer
depends_on: [2]
success_criteria: "批准或提出具体修改要求"
execution_order:
- phase: 1
parallel: [1]
- phase: 2
parallel: [2]
- phase: 3
parallel: [3, 4, 5]
notes: |
子任务3、4、5可并行执行,因为它们都评审相同的输出。
如果评审发现关键问题,我们将循环回到子任务2进行修复。Setup by Tool
按工具设置
OpenCode (Recommended for Multi-Provider)
OpenCode(推荐用于多提供商场景)
OpenCode natively supports 75+ LLM providers with per-agent model overrides. No gateway needed — OpenCode IS the gateway.
OpenCode原生支持75+ LLM提供商,并允许按Agent覆盖模型。无需网关——OpenCode本身就是网关。
API Keys
API密钥
Set provider API keys as environment variables. You only need keys for the providers you plan to use — pick the direct providers OR cloud providers (Bedrock/Azure), or mix and match:
bash
undefined将提供商API密钥设置为环境变量。你只需要为计划使用的提供商设置密钥——选择直接提供商或云提供商(Bedrock/Azure),或混合使用:
bash
undefined--- Direct providers ---
--- 直接提供商 ---
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..." # or GOOGLE_GENERATIVE_AI_API_KEY
export XAI_API_KEY="..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..." # 或 GOOGLE_GENERATIVE_AI_API_KEY
export XAI_API_KEY="..."
--- Amazon Bedrock (alternative for Claude + other models) ---
--- Amazon Bedrock(Claude及其他模型的替代方案) ---
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1" # or us-west-2, eu-west-1, etc.
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1" # 或 us-west-2、eu-west-1等
--- Microsoft Azure OpenAI (alternative for GPT models) ---
--- Microsoft Azure OpenAI(GPT模型的替代方案) ---
export AZURE_API_KEY="..."
export AZURE_RESOURCE_NAME="your-resource"
export AZURE_DEPLOYMENT_NAME="your-deployment"
export AZURE_API_VERSION="2024-12-01-preview"
export AZURE_API_KEY="..."
export AZURE_RESOURCE_NAME="your-resource"
export AZURE_DEPLOYMENT_NAME="your-deployment"
export AZURE_API_VERSION="2024-12-01-preview"
--- Google Vertex AI (alternative for Gemini models) ---
--- Google Vertex AI(Gemini模型的替代方案) ---
export GOOGLE_CLOUD_PROJECT="your-project"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export GOOGLE_CLOUD_PROJECT="your-project"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export GOOGLE_GENAI_USE_VERTEXAI=true
export GOOGLE_GENAI_USE_VERTEXAI=true
undefinedundefinedProvider Configuration (opencode.json)
提供商配置(opencode.json)
Configure the providers you use. You don't need all of them — pick what matches your infrastructure:
json
{
"provider": {
"anthropic": {
"api_key": "{env:ANTHROPIC_API_KEY}"
},
"openai": {
"api_key": "{env:OPENAI_API_KEY}"
},
"google": {
"api_key": "{env:GOOGLE_API_KEY}"
},
"xai": {
"api_key": "{env:XAI_API_KEY}"
},
"bedrock": {
"aws_access_key_id": "{env:AWS_ACCESS_KEY_ID}",
"aws_secret_access_key": "{env:AWS_SECRET_ACCESS_KEY}",
"aws_region": "{env:AWS_REGION}"
},
"azure": {
"api_key": "{env:AZURE_API_KEY}",
"resource_name": "{env:AZURE_RESOURCE_NAME}"
},
"vertex": {
"project": "{env:GOOGLE_CLOUD_PROJECT}"
}
}
}配置你使用的提供商。不需要全部配置——选择与你的基础设施匹配的提供商:
json
{
"provider": {
"anthropic": {
"api_key": "{env:ANTHROPIC_API_KEY}"
},
"openai": {
"api_key": "{env:OPENAI_API_KEY}"
},
"google": {
"api_key": "{env:GOOGLE_API_KEY}"
},
"xai": {
"api_key": "{env:XAI_API_KEY}"
},
"bedrock": {
"aws_access_key_id": "{env:AWS_ACCESS_KEY_ID}",
"aws_secret_access_key": "{env:AWS_SECRET_ACCESS_KEY}",
"aws_region": "{env:AWS_REGION}"
},
"azure": {
"api_key": "{env:AZURE_API_KEY}",
"resource_name": "{env:AZURE_RESOURCE_NAME}"
},
"vertex": {
"project": "{env:GOOGLE_CLOUD_PROJECT}"
}
}
}Agent Definitions
Agent定义
Agent definitions are generated from canonical templates using the setup script. Run from the skill directory:
bash
undefinedAgent定义通过设置脚本从标准模板生成。从技能目录运行:
bash
undefinedProject-level (recommended)
项目级(推荐)
sh agents/setup.sh opencode
sh agents/setup.sh opencode
Or specify a custom target directory
或指定自定义目标目录
sh agents/setup.sh opencode ~/.config/opencode/agents
This generates 7 agent `.md` files with the correct OpenCode frontmatter (`model: provider/id`, `permission: edit: deny`) in `.opencode/agents/`.
Each agent uses a different provider/model. OpenCode routes to the correct provider automatically based on the model prefix.
**Default assignments (direct providers):**
| Agent | Model ID | Provider |
|-------|---------|----------|
| Planner | `anthropic/claude-opus-4-6` | Anthropic |
| Coder | `openai/gpt-5.4` | OpenAI |
| Researcher | `google/gemini-3.1-pro` | Google |
| Scientist | `google/gemini-3-pro` | Google |
| Visual Analyst | `anthropic/claude-opus-4-6` | Anthropic |
| Adversarial Reviewer | `xai/grok-4` | xAI |
| Peer Reviewer | `anthropic/claude-opus-4-6` | Anthropic |
**Amazon Bedrock alternatives** — swap the model ID in the agent `.md` file to route through Bedrock instead:
| Agent | Bedrock Model ID |
|-------|-----------------|
| Planner | `bedrock/anthropic.claude-opus-4-6-v1` |
| Coder | `bedrock/anthropic.claude-sonnet-4-5-v1` |
| Researcher | `bedrock/amazon.nova-pro-v1` |
| Scientist | `bedrock/anthropic.claude-opus-4-6-v1` |
| Visual Analyst | `bedrock/anthropic.claude-opus-4-6-v1` |
| Adversarial Reviewer | `bedrock/amazon.nova-pro-v1` |
| Peer Reviewer | `bedrock/anthropic.claude-opus-4-6-v1` |
Note: Bedrock gives you Claude models without a separate Anthropic API key (billed through AWS). Gemini and Grok are not available on Bedrock — use Amazon Nova or Claude as alternatives, or mix Bedrock with direct providers.
**Microsoft Azure OpenAI alternatives** — for organizations on Azure:
| Agent | Azure Model ID |
|-------|---------------|
| Planner | `azure/claude-opus-4-6` |
| Coder | `azure/gpt-5.4` |
| Researcher | `azure/gpt-5.4` |
| Scientist | `azure/gpt-5.4` |
| Visual Analyst | `azure/claude-opus-4-6` |
| Adversarial Reviewer | `azure/gpt-4o` |
| Peer Reviewer | `azure/claude-opus-4-6` |
Note: Azure OpenAI model IDs depend on your deployment names. The IDs above assume deployments matching the model names. Azure gives you GPT and Claude models (via Azure AI Foundry) billed through your Azure subscription. Gemini and Grok are not available on Azure — use GPT-4o or Claude as alternatives, or mix Azure with direct providers.
**Google Vertex AI alternatives** — for organizations on GCP:
| Agent | Vertex AI Model ID |
|-------|-------------------|
| Researcher | `vertex/gemini-3.1-pro` |
| Scientist | `vertex/gemini-3-pro` |
| Visual Analyst | `vertex/gemini-3.1-pro` |
Note: Vertex AI gives you Gemini models billed through GCP. Claude is also available via Vertex AI Model Garden.
**Mixing providers is the recommended approach.** You don't have to pick one cloud — use Bedrock for Claude, Azure for GPT, direct for Gemini and Grok. Just change the model prefix in each agent's `.md` file.sh agents/setup.sh opencode ~/.config/opencode/agents
这将在`.opencode/agents/`目录中生成7个Agent的`.md`文件,包含正确的OpenCode前置元数据(`model: provider/id`、`permission: edit: deny`)。
每个Agent使用不同的提供商/模型。OpenCode根据模型前缀自动路由到正确的提供商。
**默认分配(直接提供商):**
| Agent | 模型ID | 提供商 |
|-------|---------|----------|
| 规划Agent | `anthropic/claude-opus-4-6` | Anthropic |
| 编码Agent | `openai/gpt-5.4` | OpenAI |
| 研究Agent | `google/gemini-3.1-pro` | Google |
| 科研Agent | `google/gemini-3-pro` | Google |
| 视觉分析Agent | `anthropic/claude-opus-4-6` | Anthropic |
| 对抗性评审Agent | `xai/grok-4` | xAI |
| 同行评审Agent | `anthropic/claude-opus-4-6` | Anthropic |
**Amazon Bedrock替代方案**——在Agent的`.md`文件中替换模型ID,通过Bedrock路由:
| Agent | Bedrock模型ID |
|-------|-----------------|
| 规划Agent | `bedrock/anthropic.claude-opus-4-6-v1` |
| 编码Agent | `bedrock/anthropic.claude-sonnet-4-5-v1` |
| 研究Agent | `bedrock/amazon.nova-pro-v1` |
| 科研Agent | `bedrock/anthropic.claude-opus-4-6-v1` |
| 视觉分析Agent | `bedrock/anthropic.claude-opus-4-6-v1` |
| 对抗性评审Agent | `bedrock/amazon.nova-pro-v1` |
| 同行评审Agent | `bedrock/anthropic.claude-opus-4-6-v1` |
注意:Bedrock提供Claude模型,无需单独的Anthropic API密钥(通过AWS计费)。Gemini和Grok在Bedrock上不可用——使用Amazon Nova或Claude作为替代,或混合使用Bedrock和直接提供商。
**Microsoft Azure OpenAI替代方案**——适用于使用Azure的组织:
| Agent | Azure模型ID |
|-------|---------------|
| 规划Agent | `azure/claude-opus-4-6` |
| 编码Agent | `azure/gpt-5.4` |
| 研究Agent | `azure/gpt-5.4` |
| 科研Agent | `azure/gpt-5.4` |
| 视觉分析Agent | `azure/claude-opus-4-6` |
| 对抗性评审Agent | `azure/gpt-4o` |
| 同行评审Agent | `azure/claude-opus-4-6` |
注意:Azure OpenAI模型ID取决于你的部署名称。上述ID假设部署名称与模型名称匹配。Azure提供GPT和Claude模型(通过Azure AI Foundry),通过你的Azure订阅计费。Gemini和Grok在Azure上不可用——使用GPT-4o或Claude作为替代,或混合使用Azure和直接提供商。
**Google Vertex AI替代方案**——适用于使用GCP的组织:
| Agent | Vertex AI模型ID |
|-------|-------------------|
| 研究Agent | `vertex/gemini-3.1-pro` |
| 科研Agent | `vertex/gemini-3-pro` |
| 视觉分析Agent | `vertex/gemini-3.1-pro` |
注意:Vertex AI提供Gemini模型,通过GCP计费。Claude也可通过Vertex AI Model Garden使用。
**推荐混合使用提供商**。你不必只选择一个云——使用Bedrock处理Claude,Azure处理GPT,直接提供商处理Gemini和Grok。只需更改每个Agent的`.md`文件中的模型前缀即可。Invoking Agents
调用Agent
In OpenCode, invoke agents by name. The orchestrating agent (you, the primary agent) follows the workflow patterns above:
@planner Break down this task: implement user authentication with OAuth2 PKCEThen follow the plan, invoking each agent as directed:
@researcher Find existing auth patterns in this codebase
@coder Implement PKCE flow based on the research findings: [paste context]
@adversarial-reviewer Review this PKCE implementation for security issues: [paste context]
@peer-reviewer Review code quality and architecture: [paste context]在OpenCode中,按名称调用Agent。编排Agent(你,主Agent)遵循上述工作流模式:
@planner 分解此任务:实现基于OAuth2 PKCE的用户认证然后按照规划,根据需要调用每个Agent:
@researcher 查找此代码库中的现有认证模式
@coder 根据研究结果实现PKCE流程:[粘贴上下文]
@adversarial-reviewer 评审此PKCE实现的安全问题:[粘贴上下文]
@peer-reviewer 评审代码质量和架构:[粘贴上下文]Claude Code
Claude Code
Claude Code has a mature sub-agent architecture but is limited to Anthropic models for sub-agents. Three approaches depending on whether you want cross-provider access:
Claude Code拥有成熟的子Agent架构,但子Agent仅限于Anthropic模型。根据是否需要跨提供商访问,有三种方法:
Approach 1: Anthropic-Only (No Gateway)
方法1:仅使用Anthropic(无需网关)
All agents use Anthropic models. You lose cross-family adversarial review but gain simplicity.
| Agent | Claude Code Model | Notes |
|---|---|---|
| Planner | | Extended thinking, no file edits |
| Coder | | Fast, strong at code |
| Researcher | | Strong at synthesis, use with web tools |
| Scientist | | Reasonable math capability |
| Visual Analyst | | Best multimodal in Anthropic family |
| Adversarial Reviewer | | Different from opus but same family — weaker adversarial benefit |
| Peer Reviewer | | Structured analysis |
Limitation: Adversarial review from the same model family is less effective. The adversarial reviewer using Sonnet with a strong adversarial prompt partially compensates, but same-family blind spots persist.
Agent definition files: Generate from canonical templates using the setup script:
bash
sh agents/setup.sh claude-codeThis generates 7 agent files with Anthropic-specific frontmatter (, , ) in .
.mdmodel: opus/sonnetallowed-toolseffort.claude/agents/所有Agent使用Anthropic模型。你将失去跨家族对抗性评审的优势,但获得简洁性。
| Agent | Claude Code模型 | 说明 |
|---|---|---|
| 规划Agent | | 扩展思考模式,无文件编辑权限 |
| 编码Agent | | 速度快,擅长代码 |
| 研究Agent | | 擅长信息综合,配合网页工具使用 |
| 科研Agent | | 具备合理的数学能力 |
| 视觉分析Agent | | Anthropic家族中多模态能力最强 |
| 对抗性评审Agent | | 与opus不同,但属于同一家族——对抗性优势较弱 |
| 同行评审Agent | | 结构化分析 |
限制: 同一家族模型的对抗性评审效果较差。使用Sonnet作为对抗性评审Agent并配合强对抗性提示可部分弥补,但同家族盲点仍然存在。
Agent定义文件: 使用设置脚本从标准模板生成:
bash
sh agents/setup.sh claude-code这将在目录中生成7个Agent的文件,包含Anthropic特定的前置元数据(、、)。
.claude/agents/.mdmodel: opus/sonnetallowed-toolseffortApproach 2: With OpenRouter Gateway
方法2:使用OpenRouter网关
Use an MCP server or script to call external models for specific agents. This gives you true cross-family adversarial review.
Step 1: Set up OpenRouter API key:
bash
export OPENROUTER_API_KEY="sk-or-..."Step 2: Create the gateway script directory and script:
bash
mkdir -p .claude/scripts
cat > .claude/scripts/call-model.sh << 'SCRIPT'
#!/bin/bash使用MCP服务器或脚本调用外部模型处理特定Agent。这能实现真正的跨家族对抗性评审。
步骤1: 设置OpenRouter API密钥:
bash
export OPENROUTER_API_KEY="sk-or-..."步骤2: 创建网关脚本目录和脚本:
bash
mkdir -p .claude/scripts
cat > .claude/scripts/call-model.sh << 'SCRIPT'
#!/bin/bashUsage: call-model.sh <model> <prompt-file>
用法:call-model.sh <模型> <提示文件>
Reads prompt from a file to avoid shell argument length limits.
从文件读取提示,避免Shell参数长度限制。
If no file given, reads from stdin.
如果未提供文件,则从标准输入读取。
MODEL="$1"
PROMPT_FILE="$2"
if [ -z "$OPENROUTER_API_KEY" ]; then
echo "Error: OPENROUTER_API_KEY not set" >&2
exit 1
fi
if [ -n "$PROMPT_FILE" ] && [ -f "$PROMPT_FILE" ]; then
PROMPT=$(cat "$PROMPT_FILE")
elif [ ! -t 0 ]; then
PROMPT=$(cat)
else
PROMPT="$2"
fi
RESPONSE=$(curl -s https://openrouter.ai/api/v1/chat/completions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }")
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }")
ERROR=$(echo "$RESPONSE" | jq -r '.error.message // empty')
if [ -n "$ERROR" ]; then
echo "API Error: $ERROR" >&2
exit 1
fi
echo "$RESPONSE" | jq -r '.choices[0].message.content'
SCRIPT
chmod +x .claude/scripts/call-model.sh
**Step 3:** In Claude Code, the orchestrating agent can use this script for cross-provider calls:
```bashMODEL="$1"
PROMPT_FILE="$2"
if [ -z "$OPENROUTER_API_KEY" ]; then
echo "错误:未设置OPENROUTER_API_KEY" >&2
exit 1
fi
if [ -n "$PROMPT_FILE" ] && [ -f "$PROMPT_FILE" ]; then
PROMPT=$(cat "$PROMPT_FILE")
elif [ ! -t 0 ]; then
PROMPT=$(cat)
else
PROMPT="$2"
fi
RESPONSE=$(curl -s https://openrouter.ai/api/v1/chat/completions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }")
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }")
ERROR=$(echo "$RESPONSE" | jq -r '.error.message // empty')
if [ -n "$ERROR" ]; then
echo "API错误:$ERROR" >&2
exit 1
fi
echo "$RESPONSE" | jq -r '.choices[0].message.content'
SCRIPT
chmod +x .claude/scripts/call-model.sh
**步骤3:** 在Claude Code中,编排Agent可使用此脚本进行跨提供商调用:
```bashCall Grok for adversarial review (short prompt as argument)
调用Grok进行对抗性评审(短提示作为参数)
bash .claude/scripts/call-model.sh "xai/grok-4" "Review this code for security issues: ..."
bash .claude/scripts/call-model.sh "xai/grok-4" "评审此代码的安全问题:..."
Call Gemini for research (long prompt via stdin)
调用Gemini进行研究(长提示通过标准输入传递)
echo "Research the best approach for implementing OAuth2 PKCE..." |
bash .claude/scripts/call-model.sh "google/gemini-3.1-pro"
bash .claude/scripts/call-model.sh "google/gemini-3.1-pro"
This hybrid approach uses Claude Code's native sub-agents for Anthropic models and the gateway script for other providers.echo "研究实现OAuth2 PKCE的最佳方案..." |
bash .claude/scripts/call-model.sh "google/gemini-3.1-pro"
bash .claude/scripts/call-model.sh "google/gemini-3.1-pro"
这种混合方法使用Claude Code的原生子Agent处理Anthropic模型,使用网关脚本处理其他提供商。Approach 3: With Vercel AI Gateway
方法3:使用Vercel AI网关
Same as OpenRouter but using Vercel's gateway endpoint:
bash
#!/bin/bash与OpenRouter类似,但使用Vercel的网关端点:
bash
#!/bin/bash.claude/scripts/call-model.sh (Vercel AI Gateway version)
.claude/scripts/call-model.sh(Vercel AI网关版本)
MODEL="$1"
PROMPT="$2"
curl -s https://ai-gateway.vercel.sh/v1/chat/completions
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }" | jq -r '.choices[0].message.content'
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }" | jq -r '.choices[0].message.content'
**Vercel AI Gateway advantages:** Zero token markup, 40+ providers, OIDC auth for Vercel-deployed apps (no key management).
---MODEL="$1"
PROMPT="$2"
curl -s https://ai-gateway.vercel.sh/v1/chat/completions
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }" | jq -r '.choices[0].message.content'
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }" | jq -r '.choices[0].message.content'
**Vercel AI网关优势:** 零令牌加价,支持40+提供商,Vercel部署应用支持OIDC认证(无需密钥管理)。
---Cursor
Cursor
Cursor does not support programmatic sub-agent spawning. Use sequential model switching:
- Select Claude Opus in model picker → Plan the task
- Switch to GPT-5.4 → Implement the plan
- Switch to Grok 4 or Gemini → Review the implementation
- Switch back to Claude Opus → Evaluate reviews and decide next steps
Cursor不支持程序化生成子Agent。使用顺序模型切换:
- 在模型选择器中选择Claude Opus → 规划任务
- 切换到GPT-5.4 → 实现规划
- 切换到Grok 4或Gemini → 评审实现
- 切换回Claude Opus → 评估评审结果并决定下一步
Cursor Rules (.cursor/rules/)
Cursor规则(.cursor/rules/)
Create rules that guide each phase. Place in :
.cursor/rules/.cursor/rules/agent-collaboration.mdcmarkdown
---
description: Multi-model agent collaboration workflow
globs: ["**/*"]
---创建指导每个阶段的规则。放置在目录中:
.cursor/rules/.cursor/rules/agent-collaboration.mdcmarkdown
---
description: 多模型Agent协作工作流
globs: ["**/*"]
---Agent Collaboration Workflow
Agent协作工作流
When working on complex tasks, follow this workflow:
处理复杂任务时,遵循以下工作流:
Planning Phase (use Claude Opus)
规划阶段(使用Claude Opus)
- Break the task into subtasks with clear success criteria
- Identify dependencies between subtasks
- Assign each subtask to an execution phase
- 将任务分解为带有明确成功标准的子任务
- 识别子任务之间的依赖关系
- 为每个子任务分配执行阶段
Implementation Phase (use GPT-5.4 or Claude Sonnet)
实现阶段(使用GPT-5.4或Claude Sonnet)
- Follow the plan exactly
- Write tests for new functionality
- Report what changed and why
- 严格遵循规划执行
- 为新功能编写测试
- 报告变更内容和原因
Review Phase (switch model for fresh perspective)
评审阶段(切换模型以获取新视角)
- Review for security vulnerabilities, edge cases, and logical errors
- Check architecture, style, and best practices
- Provide explicit verdict: approve or request changes
- 评审安全漏洞、边缘情况和逻辑错误
- 检查架构、风格和最佳实践
- 提供明确结论:批准或要求修改
Replan Phase (use Claude Opus)
重新规划阶段(使用Claude Opus)
- Evaluate review feedback
- Decide: accept, revise, or restart
- If revising, specify exact changes for the coder
---- 评估评审反馈
- 决定:接受、修改或重新开始
- 如果需要修改,为编码Agent指定具体变更内容
---Codex CLI
Codex CLI
Codex CLI supports the Agent Skills standard and primarily uses OpenAI models. For multi-model:
- Codex handles implementation (GPT-5.4 natively)
- Use gateway script for other models (same approach as Claude Code)
call-model.sh
Place agent skills in following the standard format.
.codex/skills/Codex CLI支持Agent技能标准,主要使用OpenAI模型。如需多模型支持:
- Codex处理实现(原生使用GPT-5.4)
- 使用网关脚本处理其他模型(与Claude Code相同的方法)
call-model.sh
将Agent技能放置在目录中,遵循标准格式。
.codex/skills/Environment Setup
环境设置
bash
undefinedbash
undefinedCodex uses your ChatGPT account or API key
Codex使用你的ChatGPT账户或API密钥
export OPENAI_API_KEY="sk-..."
export OPENAI_API_KEY="sk-..."
For cross-provider calls via gateway
用于通过网关进行跨提供商调用
export OPENROUTER_API_KEY="sk-or-..."
---export OPENROUTER_API_KEY="sk-or-..."
---Gemini CLI
Gemini CLI
Gemini CLI is single-agent, Gemini-only. Use sequential mode:
- Use Gemini 3.1 Pro for planning and research (it's strong at both)
- Use gateway script for coding (call GPT-5.4 via OpenRouter)
- Use Gemini 3.1 Pro for review (strong at adversarial analysis)
Gemini CLI是单Agent工具,仅支持Gemini模型。使用顺序模式:
- 使用Gemini 3.1 Pro进行规划和研究(它在这两方面都很强)
- 使用网关脚本进行编码(通过OpenRouter调用GPT-5.4)
- 使用Gemini 3.1 Pro进行评审(擅长对抗性分析)
GEMINI.md Integration
GEMINI.md集成
Add workflow instructions to your :
GEMINI.mdmarkdown
undefined将工作流说明添加到你的文件中:
GEMINI.mdmarkdown
undefinedAgent Collaboration
Agent协作
For complex tasks, follow this multi-phase workflow:
- PLAN: Break the task into subtasks with success criteria
- RESEARCH: Search the web and documentation for relevant context
- IMPLEMENT: Write code following the plan (call external model if needed)
- REVIEW: Critically review the implementation for flaws
- REPLAN: Evaluate and decide next steps
---处理复杂任务时,遵循以下多阶段工作流:
- 规划:将任务分解为带有成功标准的子任务
- 研究:搜索网页和文档获取相关上下文
- 实现:按照规划编写代码(必要时调用外部模型)
- 评审:严格评审实现中的缺陷
- 重新规划:评估并决定下一步
---Aider
Aider
Aider has a built-in dual-model workflow that maps naturally to planner + coder:
Aider内置双模型工作流,自然映射到规划Agent + 编码Agent:
.aider.conf.yml
.aider.conf.yml
yaml
undefinedyaml
undefinedArchitect model = Planner (proposes the approach)
架构师模型 = 规划Agent(提出方案)
model: anthropic/claude-opus-4-6
model: anthropic/claude-opus-4-6
Editor model = Coder (implements the changes)
编辑器模型 = 编码Agent(实现变更)
editor-model: openai/gpt-5.4
editor-model: openai/gpt-5.4
Weak model = Fast tasks (commit messages, summaries)
轻量模型 = 快速任务(提交消息、总结)
weak-model: google/gemini-3-flash
weak-model: google/gemini-3-flash
Enable architect mode
启用架构师模式
edit-format: architect
**What you get:** Claude Opus plans the approach, GPT-5.4 implements the edits. This covers Pattern A (Plan → Code) natively.
**What you don't get:** Review phase. For review, run a separate aider session:
```bashedit-format: architect
**你将获得:** Claude Opus规划方案,GPT-5.4实现编辑。原生覆盖模式A(规划→编码)。
**你无法获得:** 评审阶段。如需评审,运行单独的Aider会话:
```bashReview session with a different model
使用不同模型进行评审会话
aider --model xai/grok-4 --no-auto-commits --message "Review the recent changes for security issues and edge cases"
**Provider support:** Aider uses LiteLLM under the hood, supporting 100+ providers. Any `provider/model-id` format works.
---aider --model xai/grok-4 --no-auto-commits --message "评审最近的变更,查找安全问题和边缘情况"
**提供商支持:** Aider在底层使用LiteLLM,支持100+提供商。任何`provider/model-id`格式都适用。
---Gateway Configuration
网关配置
No Gateway: Direct Provider APIs
无网关:直接提供商API
For tools with native multi-provider support (OpenCode) or when using a single provider:
bash
undefined适用于原生支持多提供商的工具(如OpenCode)或使用单一提供商的场景:
bash
undefined--- Direct providers ---
--- 直接提供商 ---
export ANTHROPIC_API_KEY="sk-ant-..." # Claude models
export OPENAI_API_KEY="sk-..." # GPT models
export GOOGLE_API_KEY="..." # Gemini models
export XAI_API_KEY="..." # Grok models
export ANTHROPIC_API_KEY="sk-ant-..." # Claude模型
export OPENAI_API_KEY="sk-..." # GPT模型
export GOOGLE_API_KEY="..." # Gemini模型
export XAI_API_KEY="..." # Grok模型
--- Cloud providers (alternative or additional) ---
--- 云提供商(替代或补充) ---
Amazon Bedrock — Claude, Amazon Nova, Mistral, Llama, etc.
Amazon Bedrock — Claude、Amazon Nova、Mistral、Llama等
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"
Microsoft Azure OpenAI — GPT, Claude (via AI Foundry)
Microsoft Azure OpenAI — GPT、Claude(通过AI Foundry)
export AZURE_API_KEY="..."
export AZURE_RESOURCE_NAME="your-resource"
export AZURE_API_VERSION="2024-12-01-preview"
export AZURE_API_KEY="..."
export AZURE_RESOURCE_NAME="your-resource"
export AZURE_API_VERSION="2024-12-01-preview"
Google Vertex AI — Gemini, Claude (via Model Garden)
Google Vertex AI — Gemini、Claude(通过Model Garden)
export GOOGLE_CLOUD_PROJECT="your-project"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export GOOGLE_CLOUD_PROJECT="your-project"
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
--- Other direct providers ---
--- 其他直接提供商 ---
export MISTRAL_API_KEY="..."
export DEEPSEEK_API_KEY="..."
export GROQ_API_KEY="..."
**Best for:** OpenCode (native routing), Aider (LiteLLM routing), any tool where you have direct provider API access.
**Mixing providers is normal.** Use `bedrock/` for Claude (billed through AWS), `azure/` for GPT (billed through Azure), and `google/` or `vertex/` for Gemini directly. Each agent's model ID prefix determines which provider is used — no gateway needed.export MISTRAL_API_KEY="..."
export DEEPSEEK_API_KEY="..."
export GROQ_API_KEY="..."
**最适合:** OpenCode(原生路由)、Aider(LiteLLM路由)、任何你拥有直接提供商API访问权限的工具。
**混合使用提供商是正常的**。使用`bedrock/`处理Claude(通过AWS计费),`azure/`处理GPT(通过Azure计费),`google/`或`vertex/`直接处理Gemini。每个Agent的模型ID前缀决定使用哪个提供商——无需网关。OpenRouter: Unified API
OpenRouter:统一API
Single API key, single endpoint, 100+ models from all providers:
bash
export OPENROUTER_API_KEY="sk-or-..."Endpoint:
https://openrouter.ai/api/v1/chat/completionsModel format:
provider/model-nameanthropic/claude-opus-4-6openai/gpt-5.4google/gemini-3.1-proxai/grok-4google/gemini-3-proopenai/gpt-4o
Gateway script for CLI tools:
bash
#!/bin/bash单一API密钥、单一端点、支持100+来自所有提供商的模型:
bash
export OPENROUTER_API_KEY="sk-or-..."端点:
https://openrouter.ai/api/v1/chat/completions模型格式:
provider/model-nameanthropic/claude-opus-4-6openai/gpt-5.4google/gemini-3.1-proxai/grok-4google/gemini-3-proopenai/gpt-4o
CLI工具网关脚本:
bash
#!/bin/bashgateway-openrouter.sh — call any model via OpenRouter
gateway-openrouter.sh — 通过OpenRouter调用任何模型
Usage: gateway-openrouter.sh <model> <system_prompt> <user_prompt>
用法:gateway-openrouter.sh <模型> <系统提示> <用户提示>
MODEL="$1"
SYSTEM="$2"
PROMPT="$3"
if [ -z "$OPENROUTER_API_KEY" ]; then
echo "Error: OPENROUTER_API_KEY not set" >&2
exit 1
fi
RESPONSE=$(curl -s https://openrouter.ai/api/v1/chat/completions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-H "HTTP-Referer: https://skills.sh/pascalorg"
-H "X-OpenRouter-Title: Agent Collaboration"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-H "HTTP-Referer: https://skills.sh/pascalorg"
-H "X-OpenRouter-Title: Agent Collaboration"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")
echo "$RESPONSE" | jq -r '.choices[0].message.content'
**Best for:** Claude Code, Codex CLI, Gemini CLI — any tool locked to a single provider that needs cross-provider access via script.MODEL="$1"
SYSTEM="$2"
PROMPT="$3"
if [ -z "$OPENROUTER_API_KEY" ]; then
echo "错误:未设置OPENROUTER_API_KEY" >&2
exit 1
fi
RESPONSE=$(curl -s https://openrouter.ai/api/v1/chat/completions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-H "HTTP-Referer: https://skills.sh/pascalorg"
-H "X-OpenRouter-Title: Agent Collaboration"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-H "HTTP-Referer: https://skills.sh/pascalorg"
-H "X-OpenRouter-Title: Agent Collaboration"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")
echo "$RESPONSE" | jq -r '.choices[0].message.content'
**最适合:** Claude Code、Codex CLI、Gemini CLI——任何锁定到单一提供商但需要通过脚本进行跨提供商访问的工具。Vercel AI Gateway: Zero Markup
Vercel AI网关:零加价
40+ providers, zero token markup, managed infrastructure:
bash
export AI_GATEWAY_API_KEY="..." # From Vercel DashboardEndpoint:
https://ai-gateway.vercel.sh/v1/chat/completionsModel format: Same as OpenRouter —
provider/model-nameProvider ordering and fallbacks (when using Vercel AI SDK):
typescript
import { gateway } from '@ai-sdk/gateway';
import { generateText } from 'ai';
const result = await generateText({
model: gateway('anthropic/claude-opus-4-6'),
prompt: 'Plan this task...',
providerOptions: {
gateway: {
order: ['anthropic', 'bedrock'], // Try Anthropic first, fall back to Bedrock
caching: 'auto', // Automatic provider-appropriate caching
}
}
});Gateway script for CLI tools (same pattern as OpenRouter, different endpoint):
bash
#!/bin/bash支持40+提供商,零令牌加价,托管基础设施:
bash
export AI_GATEWAY_API_KEY="..." # 来自Vercel控制台端点:
https://ai-gateway.vercel.sh/v1/chat/completions模型格式: 与OpenRouter相同——
provider/model-name提供商排序和回退(使用Vercel AI SDK时):
typescript
import { gateway } from '@ai-sdk/gateway';
import { generateText } from 'ai';
const result = await generateText({
model: gateway('anthropic/claude-opus-4-6'),
prompt: '规划此任务...',
providerOptions: {
gateway: {
order: ['anthropic', 'bedrock'], // 先尝试Anthropic,回退到Bedrock
caching: 'auto', // 自动使用提供商合适的缓存
}
}
});CLI工具网关脚本(与OpenRouter模式相同,端点不同):
bash
#!/bin/bashgateway-vercel.sh — call any model via Vercel AI Gateway
gateway-vercel.sh — 通过Vercel AI网关调用任何模型
Usage: gateway-vercel.sh <model> <system_prompt> <user_prompt>
用法:gateway-vercel.sh <模型> <系统提示> <用户提示>
MODEL="$1"
SYSTEM="$2"
PROMPT="$3"
if [ -z "$AI_GATEWAY_API_KEY" ]; then
echo "Error: AI_GATEWAY_API_KEY not set" >&2
exit 1
fi
RESPONSE=$(curl -s https://ai-gateway.vercel.sh/v1/chat/completions
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")
echo "$RESPONSE" | jq -r '.choices[0].message.content'
**Best for:** Vercel projects, TypeScript codebases using AI SDK, teams wanting managed gateway with no token markup.
---MODEL="$1"
SYSTEM="$2"
PROMPT="$3"
if [ -z "$AI_GATEWAY_API_KEY" ]; then
echo "错误:未设置AI_GATEWAY_API_KEY" >&2
exit 1
fi
RESPONSE=$(curl -s https://ai-gateway.vercel.sh/v1/chat/completions
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")
echo "$RESPONSE" | jq -r '.choices[0].message.content'
**最适合:** Vercel项目、使用AI SDK的TypeScript代码库、需要托管网关且零令牌加价的团队。
---Escalation Rules
升级规则
The planner re-enters the loop at defined checkpoints. These rules are non-negotiable:
规划Agent在定义的检查点重新进入循环。这些规则不可协商:
When the Planner MUST Re-Enter
规划Agent必须重新介入的场景
- After every execution phase — The planner evaluates all results before sending to review
- After every review phase — The planner synthesizes review feedback and decides next steps
- When any agent fails — The planner decides: retry with same model, reassign to fallback model, or simplify the subtask
- When reviewers disagree — The planner evaluates both positions, considers the evidence, and makes a final decision
- When new information emerges — If any agent discovers something that invalidates the plan, the planner replans
- When scope changes — User requirements change, the planner re-decomposes
- 每个执行阶段结束后 — 规划Agent在发送到评审前评估所有结果
- 每个评审阶段结束后 — 规划Agent综合评审反馈并决定下一步
- 任何Agent失败时 — 规划Agent决定:使用相同模型重试、重新分配给备选模型或简化子任务
- 评审意见不一致时 — 规划Agent评估双方立场、考虑证据并做出最终决定
- 出现新信息时 — 如果任何Agent发现使规划无效的信息,规划Agent重新规划
- 范围变更时 — 用户需求变更,规划Agent重新分解任务
When the Planner Steps Aside
规划Agent暂退的场景
- During execution — Agents execute independently within their assigned scope
- During review — Reviewers form opinions independently without planner influence
- For trivial tasks — Pattern E (rapid iteration) minimizes planner involvement
- 执行期间 — Agent在分配的范围内独立执行
- 评审期间 — 评审者独立形成意见,不受规划Agent影响
- 琐碎任务 — 模式E(快速迭代)最小化规划Agent的参与
What the Planner NEVER Does
规划Agent绝对不做的事
- Writes code (that's the coder's job)
- Does research (that's the researcher's job)
- Makes mathematical claims without the scientist
- Approves its own plans (that's the reviewer's job)
- Overrules both reviewers simultaneously (if both say no, the code needs work)
- 编写代码(这是编码Agent的工作)
- 进行研究(这是研究Agent的工作)
- 在没有科研Agent参与的情况下做出数学声明
- 批准自己的规划(这是评审Agent的工作)
- 同时否决两位评审者的意见(如果两位都否决,代码需要修改)
Anti-Patterns
反模式
Using one model for everything
单一模型包办所有任务
Why it's wrong: You lose the adversarial advantage. Same model, same blind spots. A model reviewing its own code is like a writer proofreading their own manuscript — they'll miss the same errors.
Fix: At minimum, use a different model family for the adversarial reviewer.
错误原因: 失去对抗性优势。同一模型存在相同的盲点。模型评审自己的代码就像作者校对自己的手稿——会遗漏相同的错误。
修复: 至少为对抗性评审Agent使用不同的模型家族。
Skipping adversarial review
跳过对抗性评审
Why it's wrong: Peer review is constructive by default — it finds improvements but misses security issues and logical flaws. The adversarial reviewer exists specifically to find what politeness misses.
Fix: Always include adversarial review. Use Pattern E (rapid iteration) for small tasks — it still includes the adversarial pass.
错误原因: 同行评审默认具有建设性——它能发现改进点,但会遗漏安全问题和逻辑缺陷。对抗性评审Agent专门用于发现礼貌性评审遗漏的问题。
修复: 始终包含对抗性评审。小型任务使用模式E(快速迭代)——它仍然包含对抗性评审步骤。
Letting the coder review its own code
让编码Agent评审自己的代码
Why it's wrong: If the coder didn't see the bug while writing, it won't see it while reviewing. Same context, same assumptions, same errors.
Fix: Always use a different model (ideally different family) for review.
错误原因: 如果编码Agent在编写代码时没有发现Bug,评审时也不会发现。相同的上下文、相同的假设、相同的错误。
修复: 始终使用不同的模型(理想情况下是不同家族)进行评审。
Over-orchestrating simple tasks
过度编排简单任务
Why it's wrong: A one-line fix doesn't need seven agents. The overhead of full orchestration exceeds the benefit for trivial changes.
Fix: Use Pattern E for small tasks. Use your judgment — if the fix is obvious and low-risk, just do it.
错误原因: 单行修复不需要七个Agent。对于琐碎变更,完整编排的开销超过收益。
修复: 小型任务使用模式E。运用你的判断力——如果修复明显且低风险,直接执行即可。
Ignoring success criteria
忽略成功标准
Why it's wrong: Without criteria, agents don't know when they're done. They either under-deliver or gold-plate. The planner's success criteria are the contract.
Fix: The planner must define success criteria for every subtask. Agents must verify their work against criteria before reporting completion.
错误原因: 没有标准,Agent不知道何时完成任务。它们要么交付不足,要么过度优化。规划Agent的成功标准是契约。
修复: 规划Agent必须为每个子任务定义成功标准。Agent在报告完成前必须验证其工作是否符合标准。
Giving the planner file edit access
赋予规划Agent文件编辑权限
Why it's wrong: If the planner writes code, it can't objectively evaluate the result. Separation of concerns is the foundation of this workflow.
Fix: The planner never edits files. It reads code and runs read-only commands (git log, ls) for context, and spawns sub-agents to do the actual work.
错误原因: 如果规划Agent编写代码,它无法客观评估结果。职责分离是此工作流的基础。
修复: 规划Agent从不编辑文件。它读取代码并执行只读命令(如git log、ls)获取上下文,并生成子Agent执行实际工作。
Passing insufficient context in handoffs
交接时传递的上下文不足
Why it's wrong: If the coder doesn't know what the researcher found, it'll re-research or guess. If the reviewer doesn't know the constraints, it'll flag intentional tradeoffs as bugs.
Fix: Follow the handoff protocol. The planner is responsible for ensuring every agent has the context it needs.
错误原因: 如果编码Agent不知道研究Agent的发现,它会重新研究或猜测。如果评审Agent不知道约束条件,它会将有意的权衡标记为Bug。
修复: 遵循交接协议。规划Agent负责确保每个Agent获得所需的上下文。
Letting agents argue directly
让Agent直接争论
Why it's wrong: Agents don't have shared context. A "debate" between agents without the planner mediating leads to circular arguments and wasted tokens.
Fix: All communication goes through the planner. The planner synthesizes, decides, and directs.
错误原因: Agent没有共享上下文。没有规划Agent调解的Agent「辩论」会导致循环论证和令牌浪费。
修复: 所有通信都通过规划Agent进行。规划Agent负责综合信息、做出决定并下达指令。
Model Map
模型映射
Current recommended model assignments based on benchmarks as of April 2026. These will evolve as new models launch.
基于2026年4月的基准测试,当前推荐的模型分配。随着新模型发布,这些分配会不断演变。
Primary Assignments
主要分配
| Role | Model | Provider ID | Benchmark Evidence |
|---|---|---|---|
| Planner | Claude Opus 4.6 (thinking) | | Arena #1 (1504 Elo), ARC-AGI 2 68.8% |
| Coder | GPT-5.4 (high) | | Aider leaderboard #1 (88%), strong SWE-bench |
| Researcher | Gemini 3.1 Pro | | HLE 45.8%, MMMLU 91.8% |
| Scientist | Gemini 3 Pro | | AIME 2025 100%, GPQA Diamond 94.3% |
| Visual Analyst | Claude Opus 4.6 | | ARC-AGI 2 68.8%, strong MMMU |
| Adversarial Reviewer | Grok 4 | | Arena #4, contrarian style, different family |
| Peer Reviewer | Claude Opus 4.6 | | Structured analysis, low positivity bias |
| 角色 | 模型 | 提供商ID | 基准测试依据 |
|---|---|---|---|
| 规划Agent | Claude Opus 4.6(思考模式) | | Arena排名第1(1504 Elo),ARC-AGI 2得分68.8% |
| 编码Agent | GPT-5.4(高推理) | | Aider排行榜第1(88%),SWE-bench表现出色 |
| 研究Agent | Gemini 3.1 Pro | | HLE得分45.8%,MMMLU得分91.8% |
| 科研Agent | Gemini 3 Pro | | AIME 2025得分100%,GPQA Diamond得分94.3% |
| 视觉分析Agent | Claude Opus 4.6 | | ARC-AGI 2得分68.8%,MMMU表现出色 |
| 对抗性评审Agent | Grok 4 | | Arena排名第4,逆向风格,不同家族 |
| 同行评审Agent | Claude Opus 4.6 | | 结构化分析,低积极偏见 |
Fallback Assignments
备选分配
| Role | Fallback 1 | Fallback 2 |
|---|---|---|
| Planner | | |
| Coder | | |
| Researcher | | |
| Scientist | | |
| Visual Analyst | | |
| Adversarial Reviewer | | |
| Peer Reviewer | | |
| 角色 | 备选1 | 备选2 |
|---|---|---|
| 规划Agent | | |
| 编码Agent | | |
| 研究Agent | | |
| 科研Agent | | |
| 视觉分析Agent | | |
| 对抗性评审Agent | | |
| 同行评审Agent | | |
Budget-Conscious Assignments
预算友好型分配
For teams optimizing cost while maintaining the multi-model advantage:
| Role | Budget Model | Provider ID | Tradeoff |
|---|---|---|---|
| Planner | Claude Sonnet 4.6 | | Slightly less nuanced planning |
| Coder | GPT-4.1 | | Good coding, lower cost |
| Researcher | Gemini 3 Flash | | Fast research, less depth |
| Scientist | Gemini 3 Flash | | Good math, less formal rigor |
| Visual Analyst | Gemini 3.1 Pro | | Strong visual, lower cost than Opus |
| Adversarial Reviewer | Grok 4 | | Keep this — adversarial review is critical |
| Peer Reviewer | Claude Sonnet 4.6 | | Good reviews, lower cost |
适用于在保持多模型优势的同时优化成本的团队:
| 角色 | 预算模型 | 提供商ID | 权衡点 |
|---|---|---|---|
| 规划Agent | Claude Sonnet 4.6 | | 规划的细微度略有降低 |
| 编码Agent | GPT-4.1 | | 编码能力良好,成本更低 |
| 研究Agent | Gemini 3 Flash | | 研究速度快,深度略有不足 |
| 科研Agent | Gemini 3 Flash | | 数学能力良好,形式化严谨性略有降低 |
| 视觉分析Agent | Gemini 3.1 Pro | | 视觉能力强,成本低于Opus |
| 对抗性评审Agent | Grok 4 | | 保留此模型——对抗性评审至关重要 |
| 同行评审Agent | Claude Sonnet 4.6 | | 评审能力良好,成本更低 |
Cloud Provider Assignments
云提供商分配
For organizations routing through Amazon Bedrock, Microsoft Azure, or Google Vertex AI instead of direct provider APIs:
Amazon Bedrock:
| Role | Bedrock Model ID | Notes |
|---|---|---|
| Planner | | Claude via AWS billing |
| Coder | | Claude Sonnet strong at code |
| Researcher | | Nova Pro for knowledge tasks |
| Scientist | | Opus for math reasoning |
| Visual Analyst | | Opus strong multimodal |
| Adversarial Reviewer | | Different model family for adversarial benefit |
| Peer Reviewer | | Structured analysis |
Note: Bedrock doesn't have GPT, Gemini, or Grok. Mix Bedrock with direct providers for full coverage: for Claude agents, for coder, for research/science, for adversarial review.
bedrock/openai/google/xai/Microsoft Azure OpenAI:
| Role | Azure Model ID | Notes |
|---|---|---|
| Planner | | Claude via Azure AI Foundry |
| Coder | | GPT via Azure OpenAI |
| Researcher | | GPT for knowledge tasks |
| Scientist | | GPT for math reasoning |
| Visual Analyst | | Claude multimodal via Foundry |
| Adversarial Reviewer | | Different model for adversarial benefit |
| Peer Reviewer | | Structured analysis |
Note: Azure model IDs depend on your deployment names — the above assumes deployments matching model names. Azure doesn't have Gemini or Grok natively. Mix Azure with direct providers: for GPT/Claude agents, for research/science, for adversarial review.
azure/google/xai/Google Vertex AI:
| Role | Vertex AI Model ID | Notes |
|---|---|---|
| Planner | | Claude via Model Garden |
| Coder | | Gemini strong at code |
| Researcher | | Gemini native strength |
| Scientist | | Gemini native math |
| Visual Analyst | | Gemini strong multimodal |
| Adversarial Reviewer | | Different family via Model Garden |
| Peer Reviewer | | Structured analysis |
Note: Vertex AI has Gemini natively and Claude via Model Garden. No GPT or Grok. Mix with direct providers for full coverage.
Recommended hybrid for enterprise — mix cloud providers with one or two direct APIs for maximum model diversity:
| Role | Hybrid Enterprise | Why |
|---|---|---|
| Planner | | Claude via AWS billing |
| Coder | | GPT via Azure billing |
| Researcher | | Gemini via GCP billing |
| Scientist | | Gemini via GCP billing |
| Visual Analyst | | Claude via AWS billing |
| Adversarial Reviewer | | Direct — Grok not on clouds |
| Peer Reviewer | | Claude via AWS billing |
This routes billing through your existing cloud agreements while maintaining full model diversity. Only Grok requires a direct API key since xAI is not yet available on any cloud marketplace.
适用于通过Amazon Bedrock、Microsoft Azure或Google Vertex AI而非直接提供商API路由的组织:
Amazon Bedrock:
| 角色 | Bedrock模型ID | 说明 |
|---|---|---|
| 规划Agent | | Claude通过AWS计费 |
| 编码Agent | | Claude Sonnet擅长代码 |
| 研究Agent | | Nova Pro适用于知识类任务 |
| 科研Agent | | Opus适用于数学推理 |
| 视觉分析Agent | | Opus多模态能力强 |
| 对抗性评审Agent | | 不同模型家族,具备对抗性优势 |
| 同行评审Agent | | 结构化分析 |
注意:Bedrock没有GPT、Gemini或Grok。混合使用Bedrock和直接提供商以获得完整覆盖:处理Claude Agent,处理编码Agent,处理研究/科研Agent,处理对抗性评审Agent。
bedrock/openai/google/xai/Microsoft Azure OpenAI:
| 角色 | Azure模型ID | 说明 |
|---|---|---|
| 规划Agent | | Claude通过Azure AI Foundry |
| 编码Agent | | GPT通过Azure OpenAI |
| 研究Agent | | GPT适用于知识类任务 |
| 科研Agent | | GPT适用于数学推理 |
| 视觉分析Agent | | Claude多模态能力通过Foundry提供 |
| 对抗性评审Agent | | 不同模型,具备对抗性优势 |
| 同行评审Agent | | 结构化分析 |
注意:Azure模型ID取决于你的部署名称——上述假设部署名称与模型名称匹配。Azure原生没有Gemini或Grok。混合使用Azure和直接提供商:处理GPT/Claude Agent,处理研究/科研Agent,处理对抗性评审Agent。
azure/google/xai/Google Vertex AI:
| 角色 | Vertex AI模型ID | 说明 |
|---|---|---|
| 规划Agent | | Claude通过Model Garden |
| 编码Agent | | Gemini擅长代码 |
| 研究Agent | | Gemini原生优势 |
| 科研Agent | | Gemini原生数学能力 |
| 视觉分析Agent | | Gemini多模态能力强 |
| 对抗性评审Agent | | 通过Model Garden使用不同家族模型 |
| 同行评审Agent | | 结构化分析 |
注意:Vertex AI原生提供Gemini,通过Model Garden提供Claude。没有GPT或Grok。混合使用直接提供商以获得完整覆盖。
企业推荐混合方案——混合云提供商和一两个直接API,以实现最大模型多样性:
| 角色 | 企业混合方案 | 原因 |
|---|---|---|
| 规划Agent | | Claude通过AWS计费 |
| 编码Agent | | GPT通过Azure计费 |
| 研究Agent | | Gemini通过GCP计费 |
| 科研Agent | | Gemini通过GCP计费 |
| 视觉分析Agent | | Claude通过AWS计费 |
| 对抗性评审Agent | | 直接调用——Grok尚未在任何云市场提供 |
| 同行评审Agent | | Claude通过AWS计费 |
此方案通过现有云协议计费,同时保持完整的模型多样性。只有Grok需要直接API密钥,因为xAI尚未在任何云市场提供。
Cross-Provider Model ID Reference
跨提供商模型ID参考
The same model is accessed through different ID formats depending on the provider. Use this table to swap providers in agent definitions:
| Model | Direct | Bedrock | Azure | Vertex AI | OpenRouter |
|---|---|---|---|---|---|
| Claude Opus 4.6 | | | | | |
| Claude Sonnet 4.5 | | | | | |
| Claude Sonnet 4.6 | | | | | |
| GPT-5.4 | | — | | — | |
| GPT-5.2 | | — | | — | |
| GPT-4o | | — | | — | |
| Gemini 3.1 Pro | | — | — | | |
| Gemini 3 Pro | | — | — | | |
| Gemini 3 Flash | | — | — | | |
| Grok 4 | | — | — | — | |
| Amazon Nova Pro | — | | — | — | — |
"—" means the model is not available through that provider. Mix providers as needed.
同一模型通过不同提供商访问时使用不同的ID格式。使用此表在Agent定义中切换提供商:
| 模型 | 直接提供商 | Bedrock | Azure | Vertex AI | OpenRouter |
|---|---|---|---|---|---|
| Claude Opus 4.6 | | | | | |
| Claude Sonnet 4.5 | | | | | |
| Claude Sonnet 4.6 | | | | | |
| GPT-5.4 | | — | | — | |
| GPT-5.2 | | — | | — | |
| GPT-4o | | — | | — | |
| Gemini 3.1 Pro | | — | — | | |
| Gemini 3 Pro | | — | — | | |
| Gemini 3 Flash | | — | — | | |
| Grok 4 | | — | — | — | |
| Amazon Nova Pro | — | | — | — | — |
"—"表示该模型无法通过该提供商访问。根据需要混合使用提供商。
Single-Provider Fallbacks
单一提供商备选方案
When you only have access to one provider:
Anthropic only (Claude Code default):
| Role | Model | Notes |
|---|---|---|
| Planner | Opus (thinking) | Full capability |
| Coder | Sonnet | Fast, good at code |
| Researcher | Opus | Strong synthesis |
| Scientist | Opus | Reasonable math |
| Visual Analyst | Opus | Strong multimodal |
| Adversarial Reviewer | Sonnet | Different model, but same family — add extra adversarial prompting |
| Peer Reviewer | Opus | Structured analysis |
OpenAI only (Codex default):
| Role | Model | Notes |
|---|---|---|
| Planner | GPT-5.4 (high) | Strong planning |
| Coder | GPT-5.4 | Native strength |
| Researcher | GPT-5.4 | Good research |
| Scientist | GPT-5.2 | Strong math |
| Visual Analyst | GPT-5.4 | Good multimodal |
| Adversarial Reviewer | GPT-4o | Different model, different style |
| Peer Reviewer | GPT-5.4 (high) | Structured analysis |
Google only (Gemini CLI default):
| Role | Model | Notes |
|---|---|---|
| Planner | Gemini 3.1 Pro | Strong planning |
| Coder | Gemini 3.1 Pro | Good coding |
| Researcher | Gemini 3.1 Pro | Native strength |
| Scientist | Gemini 3 Pro | Native strength |
| Visual Analyst | Gemini 3.1 Pro | Strong multimodal |
| Adversarial Reviewer | Gemini 3 Flash | Different model for cost + perspective |
| Peer Reviewer | Gemini 3.1 Pro | Structured analysis |
当你只能访问一个提供商时:
仅使用Anthropic(Claude Code默认):
| 角色 | 模型 | 说明 |
|---|---|---|
| 规划Agent | Opus(思考模式) | 完整能力 |
| 编码Agent | Sonnet | 速度快,擅长代码 |
| 研究Agent | Opus | 擅长信息综合 |
| 科研Agent | Opus | 具备合理的数学能力 |
| 视觉分析Agent | Opus | 多模态能力强 |
| 对抗性评审Agent | Sonnet | 不同模型,但属于同一家族——添加额外的对抗性提示 |
| 同行评审Agent | Opus | 结构化分析 |
仅使用OpenAI(Codex默认):
| 角色 | 模型 | 说明 |
|---|---|---|
| 规划Agent | GPT-5.4(高推理) | 规划能力强 |
| 编码Agent | GPT-5.4 | 原生优势 |
| 研究Agent | GPT-5.4 | 研究能力良好 |
| 科研Agent | GPT-5.2 | 数学能力强 |
| 视觉分析Agent | GPT-5.4 | 多模态能力良好 |
| 对抗性评审Agent | GPT-4o | 不同模型,风格不同 |
| 同行评审Agent | GPT-5.4(高推理) | 结构化分析 |
仅使用Google(Gemini CLI默认):
| 角色 | 模型 | 说明 |
|---|---|---|
| 规划Agent | Gemini 3.1 Pro | 规划能力强 |
| 编码Agent | Gemini 3.1 Pro | 编码能力良好 |
| 研究Agent | Gemini 3.1 Pro | 原生优势 |
| 科研Agent | Gemini 3 Pro | 原生优势 |
| 视觉分析Agent | Gemini 3.1 Pro | 多模态能力强 |
| 对抗性评审Agent | Gemini 3 Flash | 不同模型,兼顾成本和视角 |
| 同行评审Agent | Gemini 3.1 Pro | 结构化分析 |
Evolving the Model Map
模型映射的演变
Model capabilities change with every release. The assignments above are based on benchmarks as of April 2026.
模型能力随每次发布而变化。上述分配基于2026年4月的基准测试。
Key Benchmark Sources
关键基准测试来源
Track these to stay current:
| Source | URL | What it tracks | Update frequency |
|---|---|---|---|
| Chatbot Arena | arena.ai | Overall quality, per-category rankings | Continuous |
| SWE-bench | swebench.com | Real-world software engineering | On submission |
| Aider Leaderboard | aider.chat/docs/leaderboards/ | Practical code editing | On release |
| Artificial Analysis | artificialanalysis.ai | Intelligence/speed/cost index | Weekly |
| LiveBench | livebench.ai | Contamination-resistant monthly eval | Monthly |
| GPQA Diamond | — | PhD-level science | Static |
| ARC-AGI 2 | arcprize.org | Abstract visual reasoning | Periodic |
| Humanity's Last Exam | — | Frontier knowledge | Periodic |
跟踪这些来源以保持最新:
| 来源 | URL | 跟踪内容 | 更新频率 |
|---|---|---|---|
| Chatbot Arena | arena.ai | 整体质量、分类排名 | 持续更新 |
| SWE-bench | swebench.com | 实际软件工程任务 | 按需提交更新 |
| Aider排行榜 | aider.chat/docs/leaderboards/ | 实用代码编辑能力 | 模型发布时更新 |
| Artificial Analysis | artificialanalysis.ai | 智能/速度/成本指数 | 每周更新 |
| LiveBench | livebench.ai | 抗污染月度评估 | 每月更新 |
| GPQA Diamond | — | 博士级科学知识 | 静态 |
| ARC-AGI 2 | arcprize.org | 抽象视觉推理 | 定期更新 |
| Humanity's Last Exam | — | 前沿知识 | 定期更新 |
When to Re-Evaluate
重新评估的时机
- A major new model launches (new Claude, GPT, Gemini, Grok version)
- Arena rankings shift by >50 Elo in a relevant category
- SWE-bench or Aider leaderboard gets a new #1
- You notice consistent quality degradation from a specific agent
- 重大新模型发布(新Claude、GPT、Gemini、Grok版本)
- Arena相关类别排名变化超过50 Elo
- SWE-bench或Aider排行榜出现新的第一名
- 你注意到特定Agent的质量持续下降
Future: Benchmark Tracking Skill
未来:基准测试跟踪技能
A companion skill for automated benchmark tracking is planned. It will fetch the latest rankings from programmatic sources (HuggingFace datasets, Arena Hard Auto) and generate an updated model map. Until then, check the sources above when major models launch.
计划开发一个用于自动基准测试跟踪的配套技能。它将从程序化来源(HuggingFace数据集、Arena Hard Auto)获取最新排名,并生成更新的模型映射。在此之前,当重大模型发布时,检查上述来源。
Role-to-Benchmark Mapping
角色与基准测试的映射
Use this to know which benchmarks matter for which agent role:
| Agent Role | Primary Benchmarks | What to Look For |
|---|---|---|
| Planner | Arena (Overall, Hard Prompts), ARC-AGI 2 | Abstract reasoning, complex instruction following |
| Coder | Aider, SWE-bench Verified | Practical code editing, real-world bug fixing |
| Researcher | Arena (Knowledge), HLE, MMMLU, BrowseComp | Breadth of knowledge, research synthesis |
| Scientist | AIME, GPQA Diamond, MATH Level 5 | Mathematical reasoning, scientific knowledge |
| Visual Analyst | ARC-AGI 2, MMMU-Pro | Visual reasoning, multimodal understanding |
| Adversarial Reviewer | Arena (Hard Prompts), PropensityBench | Critical thinking, low positivity bias |
| Peer Reviewer | Arena (Overall), AI Scientist review scores | Structured analysis, honest assessment |
使用此表了解哪些基准测试对哪些Agent角色重要:
| Agent角色 | 主要基准测试 | 关注要点 |
|---|---|---|
| 规划Agent | Arena(整体、难题)、ARC-AGI 2 | 抽象推理、复杂指令遵循能力 |
| 编码Agent | Aider、SWE-bench Verified | 实用代码编辑、实际Bug修复能力 |
| 研究Agent | Arena(知识)、HLE、MMMLU、BrowseComp | 知识广度、研究综合能力 |
| 科研Agent | AIME、GPQA Diamond、MATH Level 5 | 数学推理、科学知识 |
| 视觉分析Agent | ARC-AGI 2、MMMU-Pro | 视觉推理、多模态理解能力 |
| 对抗性评审Agent | Arena(难题)、PropensityBench | 批判性思维、低积极偏见 |
| 同行评审Agent | Arena(整体)、AI Scientist评审得分 | 结构化分析、诚实评估能力 |
Quick Reference
快速参考
Workflow Selection
工作流选择
| Task Size | Pattern | Agents Used |
|---|---|---|
| Trivial (one-liner) | Skip orchestration | Just do it |
| Small (single file) | E (Rapid) | Planner → Coder → Adversarial |
| Medium (multi-file) | A (Default) | Planner → Coder → Both Reviewers |
| Medium + unknown API | B (Research First) | Planner → Researcher → Planner → Coder → Review |
| Large | D (Full Pipeline) | All agents, multiple phases |
| Research only | F (Research) | Planner → Researcher + Scientist |
| Math/science heavy | C (Deep Analysis) | Planner → Scientist + Visual + Researcher → Coder → Review |
| 任务规模 | 模式 | 使用的Agent |
|---|---|---|
| 琐碎(单行) | 跳过编排 | 直接执行 |
| 小型(单文件) | E(快速迭代) | 规划Agent → 编码Agent → 对抗性评审Agent |
| 中型(多文件) | A(默认) | 规划Agent → 编码Agent → 两位评审Agent |
| 中型 + 未知API | B(先研究) | 规划Agent → 研究Agent → 规划Agent → 编码Agent → 评审 |
| 大型 | D(完整流水线) | 所有Agent,多阶段 |
| 仅研究 | F(研究) | 规划Agent → 研究Agent + 科研Agent |
| 数学/科学密集型 | C(深度分析) | 规划Agent → 科研Agent + 视觉分析Agent + 研究Agent → 编码Agent → 评审 |
Tool Selection
工具选择
| Your Tool | Multi-Model | Sub-Agents | Best Approach |
|---|---|---|---|
| OpenCode | Native (75+ providers) | Yes | Full multi-model, no gateway needed |
| Claude Code | Anthropic only | Yes | Gateway script for cross-provider, or Anthropic-only |
| Cursor | UI model picker | No | Sequential model switching per phase |
| Codex CLI | OpenAI only | Emerging | Gateway script for cross-provider |
| Gemini CLI | Google only | No | Sequential + gateway script |
| Aider | Any (via LiteLLM) | No | Architect/editor dual-model + review sessions |
| 你的工具 | 多模型支持 | 子Agent支持 | 最佳方案 |
|---|---|---|---|
| OpenCode | 原生支持(75+提供商) | 是 | 完整多模型,无需网关 |
| Claude Code | 仅支持Anthropic | 是 | 网关脚本实现跨提供商,或仅使用Anthropic |
| Cursor | UI模型选择器 | 否 | 按阶段顺序切换模型 |
| Codex CLI | 仅支持OpenAI | 新兴 | 网关脚本实现跨提供商 |
| Gemini CLI | 仅支持Google | 否 | 顺序切换 + 网关脚本 |
| Aider | 支持所有(通过LiteLLM) | 否 | 架构师/编辑器双模型 + 评审会话 |
Gateway Selection
网关选择
| Scenario | Recommended Gateway |
|---|---|
| OpenCode user | None needed — native multi-provider |
| Single API key for everything | OpenRouter |
| Vercel project / TypeScript | Vercel AI Gateway |
| Self-hosted / full control | LiteLLM proxy |
| Budget tracking important | Vercel AI Gateway or OpenRouter (both have dashboards) |
| Enterprise compliance | Portkey (SOC 2, HIPAA) |
| 场景 | 推荐网关 |
|---|---|
| OpenCode用户 | 无需网关——原生多提供商支持 |
| 单一API密钥访问所有模型 | OpenRouter |
| Vercel项目 / TypeScript | Vercel AI网关 |
| 自托管 / 完全控制 | LiteLLM代理 |
| 预算跟踪重要 | Vercel AI网关或OpenRouter(均有控制台) |
| 企业合规 | Portkey(SOC 2、HIPAA合规) |
Cost Expectations
成本预期
Multi-model orchestration uses frontier models, which is not free. Rough cost per task (varies by complexity and token counts):
| Pattern | Models Used | Estimated Cost Range |
|---|---|---|
| E (Rapid) | 3 calls (planner + coder + adversarial) | $0.50 – $2 |
| A (Default) | 4-5 calls (planner + coder + 2 reviewers + replan) | $2 – $8 |
| B (Research First) | 6-7 calls (+ researcher + replan) | $3 – $10 |
| D (Full Pipeline) | 8-12 calls (all agents, multiple phases) | $5 – $20 |
These assume frontier models (Opus, GPT-5.4, Gemini 3.1 Pro). Budget-conscious assignments (see Model Map) cut costs 50-70% with moderate quality tradeoff.
When the overhead is worth it: Complex tasks where a bug or missed requirement costs hours of rework. A $10 multi-agent review that catches a security vulnerability saves far more than $10.
When it's not worth it: Trivial changes. If you can write and verify the fix in 5 minutes, skip orchestration.
多模型编排使用前沿模型,并非免费。每个任务的大致成本(因复杂度和令牌数量而异):
| 模式 | 使用的模型 | 估计成本范围 |
|---|---|---|
| E(快速迭代) | 3次调用(规划 + 编码 + 对抗性评审) | $0.50 – $2 |
| A(默认) | 4-5次调用(规划 + 编码 + 两位评审 + 重新规划) | $2 – $8 |
| B(先研究) | 6-7次调用(+ 研究 + 重新规划) | $3 – $10 |
| D(完整流水线) | 8-12次调用(所有Agent,多阶段) | $5 – $20 |
这些假设使用前沿模型(Opus、GPT-5.4、Gemini 3.1 Pro)。预算友好型分配(见模型映射)可降低50-70%的成本,质量略有下降。
开销值得的场景: 复杂任务中,Bug或遗漏需求会导致数小时返工。一次花费$10的多Agent评审发现安全漏洞,节省的成本远超过$10。
开销不值得的场景: 琐碎变更。如果你能在5分钟内编写并验证修复,跳过编排。
Troubleshooting
故障排除
Agent not found
Agent未找到
Symptom: or agent invocation returns "agent not found" or similar error.
@plannerFix: Ensure agent definition files are in the correct directory:
- OpenCode: (project) or
.opencode/agents/planner.md(global)~/.config/opencode/agents/planner.md - Claude Code: (project) or
.claude/agents/planner.md(global)~/.claude/agents/planner.md
Verify the files were copied correctly: or
ls .opencode/agents/ls .claude/agents/症状: 或Agent调用返回「Agent未找到」或类似错误。
@planner修复: 确保Agent定义文件位于正确目录:
- OpenCode:(项目级)或
.opencode/agents/planner.md(全局)~/.config/opencode/agents/planner.md - Claude Code:(项目级)或
.claude/agents/planner.md(全局)~/.claude/agents/planner.md
验证文件是否正确复制:或
ls .opencode/agents/ls .claude/agents/Model not available / API error
模型不可用 / API错误
Symptom: "Model not found," "Invalid model," or 404 errors when an agent runs.
Fix:
- Verify the model ID is correct for your provider. Model IDs differ between providers — check the Cross-Provider Model ID Reference table
- Verify your API key is set: (should show the first 10 chars)
echo $ANTHROPIC_API_KEY | head -c 10 - For Bedrock: ensure your IAM role has permission for the specific model
bedrock:InvokeModel - For Azure: ensure the deployment name in your model ID matches your Azure deployment
症状: 运行Agent时出现「模型未找到」、「无效模型」或404错误。
修复:
- 验证模型ID对你的提供商是否正确。模型ID因提供商而异——查看跨提供商模型ID参考表
- 验证你的API密钥已设置:(应显示前10个字符)
echo $ANTHROPIC_API_KEY | head -c 10 - 对于Bedrock:确保你的IAM角色对特定模型拥有权限
bedrock:InvokeModel - 对于Azure:确保模型ID中的部署名称与你的Azure部署匹配
Agent produces garbage output
Agent产生无效输出
Symptom: Agent ignores its role, writes code when it should only review, or produces unstructured output.
Fix:
- Ensure the agent definition file has the full system prompt (the markdown body). If the file only has frontmatter, the agent has no instructions
- For read-only agents, verify (OpenCode) or restricted
permission: edit: deny(Claude Code) is setallowed-tools - If using a gateway script, check that the full prompt is being passed — shell argument truncation is a common issue. Use stdin or file-based prompts for long context
症状: Agent忽略其角色,在应仅评审时编写代码,或产生非结构化输出。
修复:
- 确保Agent定义文件包含完整的系统提示(Markdown正文)。如果文件仅包含前置元数据,Agent将没有指令
- 对于只读Agent,验证是否设置了(OpenCode)或受限的
permission: edit: deny(Claude Code)allowed-tools - 如果使用网关脚本,检查是否传递了完整提示——Shell参数截断是常见问题。对于长上下文,使用标准输入或基于文件的提示
Context window overflow
上下文窗口溢出
Symptom: Agent errors out or produces truncated output on large codebases.
Fix:
- The planner should summarize context in handoffs, not paste entire files. Include file paths and relevant line ranges, not full contents
- For research handoffs, summarize findings in 500 words or less — the coder doesn't need the full research output
- For review handoffs, include only the changed files, not the entire codebase
- If a single file is too large, have the planner break the task into smaller file-scoped subtasks
症状: 在大型代码库上,Agent出错或产生截断输出。
修复:
- 规划Agent应在交接中总结上下文,而非粘贴整个文件。包含文件路径和相关行范围,而非完整内容
- 对于研究交接,将发现总结在500字以内——编码Agent不需要完整的研究输出
- 对于评审交接,仅包含修改的文件,而非整个代码库
- 如果单个文件过大,让规划Agent将任务分解为更小的文件范围子任务
Gateway script fails silently
网关脚本静默失败
Symptom: Gateway script returns empty output or .
nullFix:
- Check API key:
echo $OPENROUTER_API_KEY | head -c 10 - Test the endpoint directly:
curl -s https://openrouter.ai/api/v1/models | jq '.data[0].id' - Check for jq: — install if missing (
which jqorbrew install jq)apt install jq - Run the script with verbose curl: replace with
curl -stemporarily to see HTTP errorscurl -v
症状: 网关脚本返回空输出或。
null修复:
- 检查API密钥:
echo $OPENROUTER_API_KEY | head -c 10 - 直接测试端点:
curl -s https://openrouter.ai/api/v1/models | jq '.data[0].id' - 检查jq是否安装:——如果缺失则安装(
which jq或brew install jq)apt install jq - 使用详细curl运行脚本:临时将替换为
curl -s以查看HTTP错误curl -v
Reviewers always approve (rubber-stamping)
评审者总是批准(橡皮图章)
Symptom: The adversarial reviewer approves everything, defeating the purpose.
Fix:
- Check that the adversarial reviewer is using a different model family than the coder. Same-family review tends toward approval
- Strengthen the adversarial prompt: add "You MUST find at least 3 issues. If you approve with zero issues, you have failed at your job"
- If using a single provider (Anthropic-only), use Sonnet for adversarial review with a very aggressive prompt — this partially compensates for same-family bias
症状: 对抗性评审Agent批准所有内容,失去其存在意义。
修复:
- 检查对抗性评审Agent是否使用与编码Agent不同的模型家族。同家族评审倾向于批准
- 强化对抗性提示:添加「你必须至少找到3个问题。如果批准且没有问题,你就失职了」
- 如果使用单一提供商(仅Anthropic),使用Sonnet作为对抗性评审Agent并配合非常激进的提示——这可部分弥补同家族偏见