agent-collaboration

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Agent Collaboration

Agent协作

Orchestrate multiple AI models as specialized agents — each assigned to what it does best. One model plans, another codes, another reviews. The planner stays in the loop, re-entering after every phase to evaluate and redirect.

将多个AI模型编排为专业Agent——每个Agent专注于其最擅长的领域。一个模型负责规划，另一个负责编码，还有一个负责评审。规划Agent全程参与，在每个阶段结束后重新介入，评估结果并调整方向。

When to Use

适用场景

Complex projects requiring planning, implementation, and review
Tasks where a single model's blind spots are a risk
When you want adversarial review to catch what self-review cannot
Research-heavy work requiring web search, synthesis, and validation
Math or science tasks requiring specialized reasoning
Any task benefiting from a plan → execute → review → replan loop
When the user explicitly asks for multi-model collaboration

需要规划、实现和评审的复杂项目
单一模型存在盲点风险的任务
需要对抗性评审来发现自我评审无法察觉问题的场景
需要网页搜索、信息综合和验证的研究型工作
需要专业推理的数学或科学任务
任何能从「规划→执行→评审→重新规划」循环中获益的任务
用户明确要求多模型协作的情况

When NOT to Use

不适用场景

One-line fixes, typo corrections, simple questions
Tasks fully within a single model's strength
When speed matters more than thoroughness
Exploratory conversations without a concrete deliverable

单行修复、拼写纠正、简单问题
完全在单一模型能力范围内的任务
速度优先于完备性的场景
没有具体交付成果的探索性对话

Philosophy

核心理念

One model cannot be the best at everything. Benchmarks consistently show different model families excel at different tasks. Claude Opus excels at planning and abstract reasoning. GPT-5.4 leads at code implementation. Gemini 3.1 Pro dominates math, science, and knowledge retrieval. Grok 4 brings contrarian perspective. Combining these as specialized agents outperforms any single model on complex tasks.

The planner is the conductor. It decomposes, delegates, evaluates, and replans. Every other agent reports back to the planner. The planner never edits files — it reads code for context and spawns sub-agents, but its authority comes from directing, not doing.

Adversarial review is not optional. A different model family reviewing the work catches failure modes that self-review cannot. The adversarial reviewer's job is to find problems, not to be diplomatic.

Agents are disposable, context is not. Each agent may be stateless, but the handoff between agents must preserve all relevant context. The planner is responsible for ensuring no information is lost between phases.

没有任何一个模型能做到全能。 基准测试持续显示，不同模型家族在不同任务上各有所长。Claude Opus擅长规划和抽象推理，GPT-5.4在代码实现方面领先，Gemini 3.1 Pro在数学、科学和知识检索领域表现突出，Grok 4则能提供逆向视角。将这些模型作为专业Agent组合使用，在复杂任务上的表现优于任何单一模型。

规划Agent是指挥者。 它负责分解任务、分配工作、评估结果和重新规划。其他所有Agent都向规划Agent汇报。规划Agent从不直接编辑文件——它仅读取代码获取上下文，并生成子Agent，其权威性来自指令下达而非直接执行。

对抗性评审不可或缺。 由不同模型家族进行评审，能发现自我评审无法察觉的失效模式。对抗性评审者的职责是找出问题，而非保持外交姿态。

Agent可丢弃，但上下文不可丢失。 每个Agent可能是无状态的，但Agent之间的交接必须保留所有相关上下文。规划Agent负责确保各阶段之间没有信息丢失。

The Seven Agents

七大Agent角色

1. Planner

1. 规划Agent

Role: Decompose complex tasks into subtasks, assign each to the right agent, define success criteria, evaluate results, replan when needed
Primary model: Claude Opus 4.6 (extended thinking)
Fallback: Claude Opus 4.5, GPT-5.4 (high reasoning)
Why Opus: #1 Arena overall (1504 Elo), #1 Hard Prompts, best abstract reasoning (ARC-AGI 2: 68.8%). Extended thinking excels at structured decomposition and multi-step planning
Tools: No file edits. The planner reads code, runs read-only shell commands (git log, ls), and spawns sub-agents — but never writes or edits files
Output: Structured plan in YAML with subtask assignments, dependencies, and success criteria

职责： 将复杂任务分解为子任务，为每个子任务分配合适的Agent，定义成功标准，评估结果，必要时重新规划
主模型： Claude Opus 4.6（扩展思考模式）
备选模型： Claude Opus 4.5、GPT-5.4（高推理能力）
选择Opus的原因： Arena总排名第1（1504 Elo），难题排名第1，抽象推理能力最强（ARC-AGI 2得分68.8%）。扩展思考模式擅长结构化分解和多步骤规划
工具权限： 无文件编辑权限。规划Agent可读取代码、执行只读Shell命令（如git log、ls），并生成子Agent，但从不写入或编辑文件
输出： 包含子任务分配、依赖关系和成功标准的结构化YAML规划

2. Coder

2. 编码Agent

Role: Implement code changes, write tests, fix bugs, refactor. Follows the plan exactly
Primary model: GPT-5.4 (high reasoning)
Fallback: Claude Sonnet 4.5, Claude Sonnet 4.6
Why GPT-5.4: Leads Aider coding leaderboard (88%). Fast, precise, excellent at turning plans into working code
Why Sonnet 4.5 as fallback: Leads SWE-bench Verified (82%). Strong at real-world software engineering tasks
Tools: Full file system access — read, write, edit, terminal, package managers
Output: Changed files, test results, implementation summary

职责： 实现代码变更、编写测试、修复Bug、重构代码。严格遵循规划执行
主模型： GPT-5.4（高推理能力）
备选模型： Claude Sonnet 4.5、Claude Sonnet 4.6
选择GPT-5.4的原因： Aider编码排行榜第1（得分88%），速度快、精度高，擅长将规划转化为可运行代码
选择Sonnet 4.5作为备选的原因： SWE-bench Verified排名第1（得分82%），在实际软件工程任务中表现出色
工具权限： 完整文件系统访问权限——读取、写入、编辑、终端操作、包管理器使用
输出： 修改后的文件、测试结果、实现总结

3. Researcher

3. 研究Agent

Role: Web search, documentation lookup, API exploration, literature review, competitive analysis, summarization
Primary model: Gemini 3.1 Pro
Fallback: Claude Opus 4.6
Why Gemini 3.1 Pro: Leads Humanity's Last Exam (45.8%), top MMMLU (91.8%). Exceptional at finding and synthesizing information across broad knowledge domains
Why Opus as fallback: Best on BrowseComp (web research synthesis). Excels at connecting disparate information
Tools: Web search, web fetch, file read. No file edits — the researcher reports, it doesn't implement
Output: Research summary with source attribution, key findings, decision-relevant tradeoffs

职责： 网页搜索、文档查阅、API探索、文献综述、竞品分析、信息汇总
主模型： Gemini 3.1 Pro
备选模型： Claude Opus 4.6
选择Gemini 3.1 Pro的原因： Humanity's Last Exam得分45.8%，MMMLU得分91.8%，在跨广泛知识领域查找和综合信息方面表现卓越
选择Opus作为备选的原因： BrowseComp（网页研究综合）表现最佳，擅长连接分散的信息
工具权限： 网页搜索、网页抓取、文件读取。无文件编辑权限——研究Agent仅汇报结果，不负责实现
输出： 带有来源标注、关键发现和决策相关权衡的研究总结

4. Scientist

4. 科研Agent

Role: Mathematical reasoning, formal proofs, statistical modeling, data analysis, algorithm verification, scientific computation
Primary model: Gemini 3 Pro
Fallback: GPT 5.2, Claude Opus 4.6
Why Gemini 3 Pro: Scores 100% on AIME 2025, 94.3% GPQA Diamond. Exceptional at step-by-step mathematical reasoning and formal proofs
Why GPT 5.2 as fallback: Also 100% on AIME 2025, 92.4% GPQA Diamond
Tools: Code execution (for computation and verification), file read/write for results. Web access not typically needed
Output: Formal analysis, proofs, computed results with methodology

职责： 数学推理、形式化证明、统计建模、数据分析、算法验证、科学计算
主模型： Gemini 3 Pro
备选模型： GPT 5.2、Claude Opus 4.6
选择Gemini 3 Pro的原因： AIME 2025得分100%，GPQA Diamond得分94.3%，在分步数学推理和形式化证明方面表现卓越
选择GPT 5.2作为备选的原因： AIME 2025同样得分100%，GPQA Diamond得分92.4%
工具权限： 代码执行（用于计算和验证）、结果文件读写。通常无需网页访问权限
输出： 形式化分析、证明、带方法论的计算结果

5. Visual Analyst

5. 视觉分析Agent

Role: Image analysis, UI/UX review, diagram interpretation, screenshot analysis, visual regression detection, design system compliance
Primary model: Claude Opus 4.6
Fallback: Gemini 3.1 Pro
Why Opus: ARC-AGI 2: 68.8% (dominant lead in abstract visual reasoning). Strong multimodal understanding with structured output
Why Gemini as fallback: MMMU-Pro 80.5%. Excellent at interpreting complex visual content
Tools: Image reading, screenshot capture, file read. No file edits — reports visual findings
Output: Visual analysis report with specific observations, issues, and recommendations

职责： 图像分析、UI/UX评审、图表解读、截图分析、视觉回归检测、设计系统合规性检查
主模型： Claude Opus 4.6
备选模型： Gemini 3.1 Pro
选择Opus的原因： ARC-AGI 2得分68.8%（在抽象视觉推理领域占据主导优势），具备强大的多模态理解能力和结构化输出能力
选择Gemini作为备选的原因： MMMU-Pro得分80.5%，擅长解读复杂视觉内容
工具权限： 图像读取、截图捕获、文件读取。无文件编辑权限——仅汇报视觉发现
输出： 包含具体观察结果、问题和建议的视觉分析报告

6. Adversarial Reviewer

6. 对抗性评审Agent

Role: Find flaws, security vulnerabilities, edge cases, logical errors, incorrect assumptions, race conditions, and performance problems. Challenge every decision. Assume the code is broken until proven otherwise
Primary model: Grok 4
Fallback: Gemini 3.1 Pro, Claude Opus 4.6
Why Grok 4: #4 Arena overall with a direct, contrarian communication style. Using a fundamentally different model family than the coder ensures genuine adversarial perspective, not self-congratulatory review
Why a different model family matters: Models from the same family share similar blind spots. Cross-family review catches what same-family review misses
Tools: Read-only. The adversarial reviewer never edits — it produces a list of issues ranked by severity
Output: Issues list with severity (critical/high/medium/low), reproduction steps, and suggested fixes

职责： 查找缺陷、安全漏洞、边缘情况、逻辑错误、错误假设、竞争条件和性能问题。质疑每一个决策。在被证明正确之前，默认代码存在问题
主模型： Grok 4
备选模型： Gemini 3.1 Pro、Claude Opus 4.6
选择Grok 4的原因： Arena总排名第4，沟通风格直接且逆向。与编码Agent使用完全不同的模型家族，确保真正的对抗性视角，而非自我吹捧式评审
选择不同模型家族的重要性： 同一模型家族的模型存在相似的盲点。跨家族评审能发现同家族评审遗漏的问题
工具权限： 只读权限。对抗性评审Agent从不编辑文件——仅生成按严重性排序的问题列表
输出： 包含严重性（关键/高/中/低）、复现步骤和建议修复方案的问题列表

7. Peer Reviewer

7. 同行评审Agent

Role: Quality assessment, architecture review, style consistency, best practices, documentation review, maintainability analysis
Primary model: Claude Opus 4.6
Fallback: GPT-4o
Why Opus: Excels at structured, thorough analysis. Balances pragmatism with quality standards
Why GPT-4o as fallback: Shows least positivity bias in peer review (per AI Scientist research, Sakana AI). Honest without being hostile
Tools: Read-only. Produces a review with an explicit verdict: approve, request changes, or reject
Output: Structured review with verdict, praise for good decisions, and specific change requests

职责： 质量评估、架构评审、风格一致性检查、最佳实践验证、文档评审、可维护性分析
主模型： Claude Opus 4.6
备选模型： GPT-4o
选择Opus的原因： 擅长结构化、全面的分析，在务实性和质量标准之间取得平衡
选择GPT-4o作为备选的原因： 在同行评审中表现出最低的积极偏见（根据AI Scientist研究，Sakana AI），诚实且不带有敌意
工具权限： 只读权限。生成带有明确结论的评审：批准、要求修改或拒绝
输出： 包含结论、对正确决策的认可以及具体修改要求的结构化评审报告

How It Works

工作原理

This is a manual dispatch workflow — you (or your primary agent session) are the dispatcher. The agents do not self-orchestrate. You follow the orchestration loop below, invoking each agent as needed and passing context between them using the handoff protocol. The skill provides the workflow patterns, agent definitions, and handoff formats. You provide the judgment calls.

这是一个手动调度工作流——你（或你的主Agent会话）是调度者。Agent不会自我编排。你需要遵循以下编排循环，根据需要调用每个Agent，并使用交接协议在Agent之间传递上下文。本技能提供工作流模式、Agent定义和交接格式，你负责做出判断。

The Orchestration Loop

编排循环

Every complex task follows this loop. The planner is always the entry and exit point.

┌──────────────────────────────────────────────┐
│                  PLANNER                      │
│           Claude Opus 4.6 (thinking)          │
│                                               │
│  1. Analyze the full task and constraints      │
│  2. Break into concrete subtasks              │
│  3. Assign each subtask to an agent role      │
│  4. Define success criteria per subtask       │
│  5. Specify execution order and dependencies  │
│  6. Identify which subtasks can run parallel  │
└──────────────────┬───────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────┐
│             EXECUTION PHASE                   │
│     (parallel where no dependencies)          │
│                                               │
│  ┌──────────┐ ┌───────────┐ ┌──────────┐    │
│  │  Coder   │ │Researcher │ │Scientist │    │
│  │ GPT-5.4  │ │Gemini 3.1 │ │Gemini 3  │    │
│  └────┬─────┘ └─────┬─────┘ └────┬─────┘    │
│       │             │             │           │
│       ▼             ▼             ▼           │
│  ┌────────────────────────────────────────┐  │
│  │       Results + Artifacts               │  │
│  └────────────────────────────────────────┘  │
└──────────────────┬───────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────┐
│              REVIEW PHASE                     │
│  (both reviewers in parallel for Patterns A-D)│
│                                               │
│  ┌───────────────┐  ┌───────────────┐        │
│  │  Adversarial  │  │     Peer      │        │
│  │    Grok 4     │  │  Claude Opus  │        │
│  └───────┬───────┘  └───────┬───────┘        │
│          │                  │                 │
│          ▼                  ▼                 │
│  ┌────────────────────────────────────────┐  │
│  │     Review Verdicts + Issue Lists       │  │
│  └────────────────────────────────────────┘  │
└──────────────────┬───────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────┐
│            PLANNER RE-ENTERS                  │
│                                               │
│  Evaluates all review feedback:               │
│                                               │
│  • All clear → Accept and complete            │
│  • Minor issues → Send back to coder          │
│  • Major issues → Replan from scratch         │
│  • Reviewers disagree → Planner adjudicates   │
│  • New information → Update plan, re-execute  │
└──────────────────────────────────────────────┘

每个复杂任务都遵循此循环。规划Agent始终是入口和出口点。

┌──────────────────────────────────────────────┐
│                  规划Agent                   │
│           Claude Opus 4.6（思考模式）          │
│                                               │
│  1. 分析完整任务和约束条件                     │
│  2. 分解为具体子任务                         │
│  3. 为每个子任务分配Agent角色                 │
│  4. 定义每个子任务的成功标准                 │
│  5. 指定执行顺序和依赖关系                   │
│  6. 识别可并行执行的子任务                   │
└──────────────────┬───────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────┐
│             执行阶段                         │
│     （无依赖关系的任务可并行执行）              │
│                                               │
│  ┌──────────┐ ┌───────────┐ ┌──────────┐    │
│  │  编码Agent │ │ 研究Agent │ │ 科研Agent │    │
│  │ GPT-5.4  │ │Gemini 3.1 │ │Gemini 3  │    │
│  └────┬─────┘ └─────┬─────┘ └────┬─────┘    │
│       │             │             │           │
│       ▼             ▼             ▼           │
│  ┌────────────────────────────────────────┐  │
│  │       结果 + 工件                        │
│  └────────────────────────────────────────┘  │
└──────────────────┬───────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────┐
│              评审阶段                         │
│  （模式A-D中两个评审Agent可并行执行）           │
│                                               │
│  ┌───────────────┐  ┌───────────────┐        │
│  │  对抗性评审Agent │    同行评审Agent │        │
│  │    Grok 4     │  │  Claude Opus  │        │
│  └───────┬───────┘  └───────┬───────┘        │
│          │                  │                 │
│          ▼                  ▼                 │
│  ┌────────────────────────────────────────┐  │
│  │     评审结论 + 问题列表                    │
│  └────────────────────────────────────────┘  │
└──────────────────┬───────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────┐
│            规划Agent重新介入                   │
│                                               │
│  评估所有评审反馈：                           │
│                                               │
│  • 无问题 → 接受并完成任务                    │
│  •  minor问题 → 返回给编码Agent修复            │
│  •  major问题 → 从头重新规划                  │
│  • 评审意见不一致 → 规划Agent裁决             │
│  • 出现新信息 → 更新规划，重新执行            │
└──────────────────────────────────────────────┘

Maximum Iterations

最大迭代次数

To prevent infinite loops, enforce these limits:

Code → Review cycles: Maximum 3 iterations. If the coder hasn't satisfied reviewers after 3 rounds, the planner must simplify the approach or escalate to the user
Full replan: Maximum 2 replans per task. After 2, the planner presents what it has with known issues documented
Individual agent timeout: If any agent hasn't produced useful output after a reasonable effort, the planner reassigns to fallback model or simplifies the subtask

为防止无限循环，强制执行以下限制：

编码→评审循环： 最多3次迭代。如果编码Agent在3轮后仍未满足评审要求，规划Agent必须简化方案或向用户升级问题
完整重新规划： 每个任务最多重新规划2次。2次后，规划Agent需提交现有成果并记录已知问题
单个Agent超时： 如果任何Agent在合理时间内未产生有用输出，规划Agent需重新分配给备选模型或简化子任务

Handoff Protocol

交接协议

When agents pass work to each other, the handoff must be structured. The planner constructs each handoff — agents don't communicate directly.

Pragmatic note: The YAML formats below are aspirational templates, not strict contracts. Real models will not always output perfect YAML. The planner should extract the relevant information from whatever format the agent produces — structured YAML, markdown, or free text. What matters is that the information flows correctly between phases, not that the formatting is exact. If an agent returns free text instead of YAML, the planner should extract the key fields (status, summary, files changed, issues found) and construct the next handoff manually.

当Agent之间传递工作时，交接必须结构化。规划Agent负责构建每个交接——Agent之间不直接通信。

实用提示： 以下YAML格式是理想模板，而非严格契约。实际模型并不总是输出完美的YAML。规划Agent应从Agent生成的任何格式（结构化YAML、Markdown或自由文本）中提取相关信息。重要的是信息在各阶段之间正确流动，而非格式完全准确。如果Agent返回自由文本而非YAML，规划Agent应提取关键字段（状态、总结、修改的文件、发现的问题）并手动构建下一个交接。

Planner → Execution Agent

规划Agent → 执行Agent

yaml

handoff:
  to: coder                           # agent role
  task_id: 2
  description: "Implement OAuth2 PKCE flow"
  context: |
    The codebase uses JWT tokens stored in httpOnly cookies.
    Middleware at /api/auth/middleware.ts validates tokens on every request.
    The researcher found 3 existing auth patterns (see research summary below).
    Extend the JWT pattern — do not replace it.
  dependencies_resolved:
    - task_id: 1
      agent: researcher
      summary: "Found JWT, session, and API key auth patterns. JWT is most recent."
      key_files:
        - /api/auth/jwt.ts (lines 1-45)
        - /api/auth/middleware.ts (lines 12-30)
  constraints:
    - "Must extend existing JWT pattern, not replace it"
    - "Must be backward-compatible with existing middleware"
    - "Must include tests"
  success_criteria:
    - "OAuth2 PKCE flow works end-to-end"
    - "Existing auth tests still pass"
    - "New tests cover PKCE-specific scenarios"

yaml

handoff:
  to: coder                           # Agent角色
  task_id: 2
  description: "实现OAuth2 PKCE流程"
  context: |
    代码库使用存储在httpOnly Cookie中的JWT令牌。
    /api/auth/middleware.ts中的中间件在每个请求上验证令牌。
    研究Agent发现了3种现有认证模式（见下方研究总结）。
    扩展JWT模式——不要替换它。
  dependencies_resolved:
    - task_id: 1
      agent: researcher
      summary: "发现JWT、会话和API密钥认证模式。JWT是最新的。"
      key_files:
        - /api/auth/jwt.ts（第1-45行）
        - /api/auth/middleware.ts（第12-30行）
  constraints:
    - "必须扩展现有JWT模式，不得替换"
    - "必须与现有中间件向后兼容"
    - "必须包含测试"
  success_criteria:
    - "OAuth2 PKCE流程端到端可用"
    - "现有认证测试仍通过"
    - "新测试覆盖PKCE特定场景"

Execution Agent → Planner (Result)

执行Agent → 规划Agent（结果）

yaml

result:
  from: coder
  task_id: 2
  status: complete                     # complete | partial | failed | blocked
  summary: "Implemented PKCE flow in 3 files, added 8 tests"
  artifacts:
    files_changed:
      - /api/auth/pkce.ts (new, 120 lines)
      - /api/auth/middleware.ts (modified, added PKCE validation)
      - /api/auth/__tests__/pkce.test.ts (new, 85 lines)
    test_results: "8 passed, 0 failed"
  notes: |
    Used crypto.subtle for code verifier generation (Web Crypto API).
    The middleware change is backward-compatible — existing JWT auth still works.
  concerns:
    - "Code verifier storage uses session — may need Redis for horizontal scaling"

yaml

result:
  from: coder
  task_id: 2
  status: complete                     # complete | partial | failed | blocked
  summary: "在3个文件中实现了PKCE流程，添加了8个测试"
  artifacts:
    files_changed:
      - /api/auth/pkce.ts（新增，120行）
      - /api/auth/middleware.ts（修改，添加了PKCE验证）
      - /api/auth/__tests__/pkce.test.ts（新增，85行）
    test_results: "8个通过，0个失败"
  notes: |
    使用crypto.subtle生成代码验证器（Web Crypto API）。
    中间件修改向后兼容——现有JWT认证仍可正常工作。
  concerns:
    - "代码验证器存储使用会话——水平扩展可能需要Redis"

Planner → Review Agent

规划Agent → 评审Agent

yaml

handoff:
  to: adversarial_reviewer
  task_id: 4
  description: "Security audit of OAuth2 PKCE implementation"
  context: |
    The coder implemented a PKCE flow. Review for security vulnerabilities,
    edge cases, and correctness. Be especially critical of:
    - Cryptographic operations (code verifier, code challenge)
    - Token storage and transmission
    - CSRF and replay attack vectors
    - Error handling in auth flows
  artifacts_to_review:
    - /api/auth/pkce.ts
    - /api/auth/middleware.ts
    - /api/auth/__tests__/pkce.test.ts
  implementation_summary: |
    Uses crypto.subtle for code verifier. Session-based storage.
    Middleware validates PKCE alongside existing JWT.

yaml

handoff:
  to: adversarial_reviewer
  task_id: 4
  description: "OAuth2 PKCE实现的安全审计"
  context: |
    编码Agent实现了PKCE流程。评审其安全性漏洞、
    边缘情况和正确性。特别关注：
    - 加密操作（代码验证器、代码挑战）
    - 令牌存储和传输
    - CSRF和重放攻击向量
    - 认证流程中的错误处理
  artifacts_to_review:
    - /api/auth/pkce.ts
    - /api/auth/middleware.ts
    - /api/auth/__tests__/pkce.test.ts
  implementation_summary: |
    使用crypto.subtle生成代码验证器。基于会话的存储。
    中间件在现有JWT验证之外添加了PKCE验证。

Review Agent → Planner (Verdict)

评审Agent → 规划Agent（结论）

yaml

verdict:
  from: adversarial_reviewer
  task_id: 4
  decision: request_changes            # approve | request_changes | reject
  critical_issues:
    - severity: high
      location: /api/auth/pkce.ts:45
      issue: "Code verifier stored in plaintext session — if session is compromised, PKCE is defeated"
      suggestion: "Hash the verifier before storage, compare hashes on validation"
    - severity: medium
      location: /api/auth/pkce.ts:78
      issue: "No expiration on code challenge — replay attack window is unlimited"
      suggestion: "Add 10-minute TTL on challenge, clean up expired entries"
  minor_issues:
    - severity: low
      location: /api/auth/__tests__/pkce.test.ts
      issue: "No test for expired challenge scenario"
  positive_observations:
    - "Good use of crypto.subtle over Math.random for verifier generation"
    - "Backward compatibility with existing JWT flow is well-handled"

yaml

verdict:
  from: adversarial_reviewer
  task_id: 4
  decision: request_changes            # approve | request_changes | reject
  critical_issues:
    - severity: high
      location: /api/auth/pkce.ts:45
      issue: "代码验证器以明文形式存储在会话中——如果会话被泄露，PKCE将失效"
      suggestion: "存储前对验证器进行哈希，验证时比较哈希值"
    - severity: medium
      location: /api/auth/pkce.ts:78
      issue: "代码挑战没有过期时间——重放攻击窗口无限"
      suggestion: "为挑战添加10分钟TTL，清理过期条目"
  minor_issues:
    - severity: low
      location: /api/auth/__tests__/pkce.test.ts
      issue: "没有针对过期挑战场景的测试"
  positive_observations:
    - "使用crypto.subtle而非Math.random生成验证器，做法良好"
    - "与现有JWT流程的向后兼容处理得当"

Workflow Patterns

工作流模式

Pattern A: Plan → Code → Review (Default)

模式A：规划→编码→评审（默认）

The bread-and-butter for most development tasks.

Planner → Coder → [Adversarial + Peer Review] → Planner

Use when: Adding features, fixing bugs, refactoring code. Most tasks start here.

Planner behavior: Produces a single plan with clear subtasks. After review, decides whether to accept, revise, or restart.

大多数开发任务的标准流程。

规划Agent → 编码Agent → [对抗性评审 + 同行评审] → 规划Agent

适用场景： 添加功能、修复Bug、重构代码。大多数任务从这里开始。

规划Agent行为： 生成带有明确子任务的单一规划。评审后，决定是否接受、修改或重新开始。

Pattern B: Research → Plan → Code → Review

模式B：研究→规划→编码→评审

When the task requires understanding before implementation.

Planner → Researcher → Planner (replan) → Coder → [Review] → Planner

Use when: Working with unfamiliar APIs, choosing between architectural approaches, integration tasks, anything where you need information before you can plan.

Planner behavior: First plan is "research phase only." After research completes, planner creates a new, informed implementation plan.

任务需要先理解再实现的场景。

规划Agent → 研究Agent → 规划Agent（重新规划）→ 编码Agent → [评审] → 规划Agent

适用场景： 处理不熟悉的API、选择架构方案、集成任务、任何需要先获取信息才能规划的场景。

规划Agent行为： 初始规划仅为「研究阶段」。研究完成后，规划Agent创建新的、基于研究结果的实现规划。

Pattern C: Deep Analysis

模式C：深度分析

For math-heavy, scientific, or visual reasoning tasks.

Planner → [Scientist + Visual Analyst + Researcher] → Planner → Coder → [Review] → Planner

Use when: Data pipelines, ML models, algorithm implementation, visual regression testing, anything requiring formal correctness.

Planner behavior: Gathers analysis from multiple specialist agents before creating the implementation plan. The scientist's output directly constrains what the coder can do.

适用于数学密集型、科学或视觉推理任务。

规划Agent → [科研Agent + 视觉分析Agent + 研究Agent] → 规划Agent → 编码Agent → [评审] → 规划Agent

适用场景： 数据管道、ML模型、算法实现、视觉回归测试、任何需要形式化正确性的任务。

规划Agent行为： 在创建实现规划前，收集多个专业Agent的分析结果。科研Agent的输出直接约束编码Agent的工作范围。

Pattern D: Full Pipeline

模式D：完整流水线

The complete workflow for large, complex tasks.

Planner → Researcher → Planner (replan) → [Coder + Scientist] → Visual Analyst → [Adversarial + Peer Review] → Planner

Use when: Major features, system design, architecture changes, anything high-stakes.

Planner behavior: Multiple replan cycles. Visual analyst checks UI after implementation. Full review before acceptance.

大型复杂任务的完整工作流。

规划Agent → 研究Agent → 规划Agent（重新规划）→ [编码Agent + 科研Agent] → 视觉分析Agent → [对抗性评审 + 同行评审] → 规划Agent

适用场景： 主要功能、系统设计、架构变更、任何高风险任务。

规划Agent行为： 多次重新规划循环。视觉分析Agent在实现后检查UI。接受前进行全面评审。

Pattern E: Rapid Iteration

模式E：快速迭代

For quick fixes where full review would be overkill.

Planner → Coder → Adversarial Reviewer → Planner

Use when: Small bug fixes, minor refactors, documentation updates. Skip the peer reviewer — the adversarial pass catches security and correctness issues, which is enough for small changes.

Planner behavior: Lightweight plan, single review pass, fast completion.

适用于快速修复，全面评审过于繁琐的场景。

规划Agent → 编码Agent → 对抗性评审Agent → 规划Agent

适用场景： 小型Bug修复、轻微重构、文档更新。跳过同行评审——对抗性评审可发现安全和正确性问题，这对小型变更已足够。

规划Agent行为： 轻量级规划、单次评审、快速完成。

Pattern F: Research-Only

模式F：仅研究

When you need information, not implementation.

Planner → [Researcher + Scientist] → Planner → Summary

Use when: Technical investigations, feasibility studies, competitive analysis, decision support.

Planner behavior: Synthesizes research and analysis into a decision-ready summary. No code is written.

仅需要信息，不需要实现的场景。

规划Agent → [研究Agent + 科研Agent] → 规划Agent → 总结

适用场景： 技术调查、可行性研究、竞品分析、决策支持。

规划Agent行为： 将研究和分析结果合成为可用于决策的总结。不编写代码。

Planner Output Format

规划Agent输出格式

The planner produces a structured plan that other agents can follow. Use this format:

yaml

plan:
  task: "Description of the overall task"
  pattern: A                             # Which workflow pattern (A-F)
  
  subtasks:
    - id: 1
      description: "Research existing auth patterns in the codebase"
      agent: researcher
      depends_on: []
      success_criteria: "Summary of auth patterns with file locations and recommendations"
    
    - id: 2
      description: "Implement OAuth2 PKCE flow extending existing JWT auth"
      agent: coder
      depends_on: [1]
      success_criteria: "Working OAuth2 PKCE flow with tests passing, backward-compatible"
    
    - id: 3
      description: "Verify cryptographic correctness of PKCE implementation"
      agent: scientist
      depends_on: [2]
      success_criteria: "Formal verification that entropy, hashing, and timing are correct"
    
    - id: 4
      description: "Security audit — find vulnerabilities and edge cases"
      agent: adversarial_reviewer
      depends_on: [2]
      success_criteria: "Security audit with no unaddressed critical or high issues"
    
    - id: 5
      description: "Architecture and quality review"
      agent: peer_reviewer
      depends_on: [2]
      success_criteria: "Approved or specific changes requested"
  
  execution_order:
    - phase: 1
      parallel: [1]
    - phase: 2
      parallel: [2]
    - phase: 3
      parallel: [3, 4, 5]
  
  notes: |
    Subtasks 3, 4, 5 can run in parallel since they all review the same output.
    If review finds critical issues, we loop back to subtask 2 with fixes.

规划Agent生成其他Agent可遵循的结构化规划。使用以下格式：

yaml

plan:
  task: "整体任务描述"
  pattern: A                             # 工作流模式（A-F）
  
  subtasks:
    - id: 1
      description: "研究代码库中的现有认证模式"
      agent: researcher
      depends_on: []
      success_criteria: "包含文件位置和建议的认证模式总结"
    
    - id: 2
      description: "扩展现有JWT认证，实现OAuth2 PKCE流程"
      agent: coder
      depends_on: [1]
      success_criteria: "可运行的OAuth2 PKCE流程，测试通过，向后兼容"
    
    - id: 3
      description: "验证PKCE实现的加密正确性"
      agent: scientist
      depends_on: [2]
      success_criteria: "对熵、哈希和计时的形式化验证正确"
    
    - id: 4
      description: "安全审计——查找漏洞和边缘情况"
      agent: adversarial_reviewer
      depends_on: [2]
      success_criteria: "安全审计无未解决的关键或高风险问题"
    
    - id: 5
      description: "架构和质量评审"
      agent: peer_reviewer
      depends_on: [2]
      success_criteria: "批准或提出具体修改要求"
  
  execution_order:
    - phase: 1
      parallel: [1]
    - phase: 2
      parallel: [2]
    - phase: 3
      parallel: [3, 4, 5]
  
  notes: |
    子任务3、4、5可并行执行，因为它们都评审相同的输出。
    如果评审发现关键问题，我们将循环回到子任务2进行修复。

Setup by Tool

按工具设置

OpenCode (Recommended for Multi-Provider)

OpenCode（推荐用于多提供商场景）

OpenCode natively supports 75+ LLM providers with per-agent model overrides. No gateway needed — OpenCode IS the gateway.

OpenCode原生支持75+ LLM提供商，并允许按Agent覆盖模型。无需网关——OpenCode本身就是网关。

API Keys

API密钥

Set provider API keys as environment variables. You only need keys for the providers you plan to use — pick the direct providers OR cloud providers (Bedrock/Azure), or mix and match:

bash

undefined

将提供商API密钥设置为环境变量。你只需要为计划使用的提供商设置密钥——选择直接提供商或云提供商（Bedrock/Azure），或混合使用：

bash

undefined

--- Direct providers ---

--- 直接提供商 ---

export ANTHROPIC_API_KEY="sk-ant-..." export OPENAI_API_KEY="sk-..." export GOOGLE_API_KEY="..." # or GOOGLE_GENERATIVE_AI_API_KEY export XAI_API_KEY="..."

export ANTHROPIC_API_KEY="sk-ant-..." export OPENAI_API_KEY="sk-..." export GOOGLE_API_KEY="..." # 或 GOOGLE_GENERATIVE_AI_API_KEY export XAI_API_KEY="..."

--- Amazon Bedrock (alternative for Claude + other models) ---

--- Amazon Bedrock（Claude及其他模型的替代方案） ---

export AWS_ACCESS_KEY_ID="..." export AWS_SECRET_ACCESS_KEY="..." export AWS_REGION="us-east-1" # or us-west-2, eu-west-1, etc.

export AWS_ACCESS_KEY_ID="..." export AWS_SECRET_ACCESS_KEY="..." export AWS_REGION="us-east-1" # 或 us-west-2、eu-west-1等

--- Microsoft Azure OpenAI (alternative for GPT models) ---

--- Microsoft Azure OpenAI（GPT模型的替代方案） ---

export AZURE_API_KEY="..." export AZURE_RESOURCE_NAME="your-resource" export AZURE_DEPLOYMENT_NAME="your-deployment" export AZURE_API_VERSION="2024-12-01-preview"

--- Google Vertex AI (alternative for Gemini models) ---

--- Google Vertex AI（Gemini模型的替代方案） ---

export GOOGLE_CLOUD_PROJECT="your-project" export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

export GOOGLE_GENAI_USE_VERTEXAI=true

undefined

undefined

Provider Configuration (opencode.json)

提供商配置（opencode.json）

Configure the providers you use. You don't need all of them — pick what matches your infrastructure:

json

{
  "provider": {
    "anthropic": {
      "api_key": "{env:ANTHROPIC_API_KEY}"
    },
    "openai": {
      "api_key": "{env:OPENAI_API_KEY}"
    },
    "google": {
      "api_key": "{env:GOOGLE_API_KEY}"
    },
    "xai": {
      "api_key": "{env:XAI_API_KEY}"
    },
    "bedrock": {
      "aws_access_key_id": "{env:AWS_ACCESS_KEY_ID}",
      "aws_secret_access_key": "{env:AWS_SECRET_ACCESS_KEY}",
      "aws_region": "{env:AWS_REGION}"
    },
    "azure": {
      "api_key": "{env:AZURE_API_KEY}",
      "resource_name": "{env:AZURE_RESOURCE_NAME}"
    },
    "vertex": {
      "project": "{env:GOOGLE_CLOUD_PROJECT}"
    }
  }
}

配置你使用的提供商。不需要全部配置——选择与你的基础设施匹配的提供商：

json

{
  "provider": {
    "anthropic": {
      "api_key": "{env:ANTHROPIC_API_KEY}"
    },
    "openai": {
      "api_key": "{env:OPENAI_API_KEY}"
    },
    "google": {
      "api_key": "{env:GOOGLE_API_KEY}"
    },
    "xai": {
      "api_key": "{env:XAI_API_KEY}"
    },
    "bedrock": {
      "aws_access_key_id": "{env:AWS_ACCESS_KEY_ID}",
      "aws_secret_access_key": "{env:AWS_SECRET_ACCESS_KEY}",
      "aws_region": "{env:AWS_REGION}"
    },
    "azure": {
      "api_key": "{env:AZURE_API_KEY}",
      "resource_name": "{env:AZURE_RESOURCE_NAME}"
    },
    "vertex": {
      "project": "{env:GOOGLE_CLOUD_PROJECT}"
    }
  }
}

Agent Definitions

Agent定义

Agent definitions are generated from canonical templates using the setup script. Run from the skill directory:

bash

undefined

Agent定义通过设置脚本从标准模板生成。从技能目录运行：

bash

undefined

Project-level (recommended)

项目级（推荐）

sh agents/setup.sh opencode

Or specify a custom target directory

或指定自定义目标目录

sh agents/setup.sh opencode ~/.config/opencode/agents


This generates 7 agent `.md` files with the correct OpenCode frontmatter (`model: provider/id`, `permission: edit: deny`) in `.opencode/agents/`.

Each agent uses a different provider/model. OpenCode routes to the correct provider automatically based on the model prefix.

**Default assignments (direct providers):**

| Agent | Model ID | Provider |
|-------|---------|----------|
| Planner | `anthropic/claude-opus-4-6` | Anthropic |
| Coder | `openai/gpt-5.4` | OpenAI |
| Researcher | `google/gemini-3.1-pro` | Google |
| Scientist | `google/gemini-3-pro` | Google |
| Visual Analyst | `anthropic/claude-opus-4-6` | Anthropic |
| Adversarial Reviewer | `xai/grok-4` | xAI |
| Peer Reviewer | `anthropic/claude-opus-4-6` | Anthropic |

**Amazon Bedrock alternatives** — swap the model ID in the agent `.md` file to route through Bedrock instead:

| Agent | Bedrock Model ID |
|-------|-----------------|
| Planner | `bedrock/anthropic.claude-opus-4-6-v1` |
| Coder | `bedrock/anthropic.claude-sonnet-4-5-v1` |
| Researcher | `bedrock/amazon.nova-pro-v1` |
| Scientist | `bedrock/anthropic.claude-opus-4-6-v1` |
| Visual Analyst | `bedrock/anthropic.claude-opus-4-6-v1` |
| Adversarial Reviewer | `bedrock/amazon.nova-pro-v1` |
| Peer Reviewer | `bedrock/anthropic.claude-opus-4-6-v1` |

Note: Bedrock gives you Claude models without a separate Anthropic API key (billed through AWS). Gemini and Grok are not available on Bedrock — use Amazon Nova or Claude as alternatives, or mix Bedrock with direct providers.

**Microsoft Azure OpenAI alternatives** — for organizations on Azure:

| Agent | Azure Model ID |
|-------|---------------|
| Planner | `azure/claude-opus-4-6` |
| Coder | `azure/gpt-5.4` |
| Researcher | `azure/gpt-5.4` |
| Scientist | `azure/gpt-5.4` |
| Visual Analyst | `azure/claude-opus-4-6` |
| Adversarial Reviewer | `azure/gpt-4o` |
| Peer Reviewer | `azure/claude-opus-4-6` |

Note: Azure OpenAI model IDs depend on your deployment names. The IDs above assume deployments matching the model names. Azure gives you GPT and Claude models (via Azure AI Foundry) billed through your Azure subscription. Gemini and Grok are not available on Azure — use GPT-4o or Claude as alternatives, or mix Azure with direct providers.

**Google Vertex AI alternatives** — for organizations on GCP:

| Agent | Vertex AI Model ID |
|-------|-------------------|
| Researcher | `vertex/gemini-3.1-pro` |
| Scientist | `vertex/gemini-3-pro` |
| Visual Analyst | `vertex/gemini-3.1-pro` |

Note: Vertex AI gives you Gemini models billed through GCP. Claude is also available via Vertex AI Model Garden.

**Mixing providers is the recommended approach.** You don't have to pick one cloud — use Bedrock for Claude, Azure for GPT, direct for Gemini and Grok. Just change the model prefix in each agent's `.md` file.

sh agents/setup.sh opencode ~/.config/opencode/agents


这将在`.opencode/agents/`目录中生成7个Agent的`.md`文件，包含正确的OpenCode前置元数据（`model: provider/id`、`permission: edit: deny`）。

每个Agent使用不同的提供商/模型。OpenCode根据模型前缀自动路由到正确的提供商。

**默认分配（直接提供商）：**

| Agent | 模型ID | 提供商 |
|-------|---------|----------|
| 规划Agent | `anthropic/claude-opus-4-6` | Anthropic |
| 编码Agent | `openai/gpt-5.4` | OpenAI |
| 研究Agent | `google/gemini-3.1-pro` | Google |
| 科研Agent | `google/gemini-3-pro` | Google |
| 视觉分析Agent | `anthropic/claude-opus-4-6` | Anthropic |
| 对抗性评审Agent | `xai/grok-4` | xAI |
| 同行评审Agent | `anthropic/claude-opus-4-6` | Anthropic |

**Amazon Bedrock替代方案**——在Agent的`.md`文件中替换模型ID，通过Bedrock路由：

| Agent | Bedrock模型ID |
|-------|-----------------|
| 规划Agent | `bedrock/anthropic.claude-opus-4-6-v1` |
| 编码Agent | `bedrock/anthropic.claude-sonnet-4-5-v1` |
| 研究Agent | `bedrock/amazon.nova-pro-v1` |
| 科研Agent | `bedrock/anthropic.claude-opus-4-6-v1` |
| 视觉分析Agent | `bedrock/anthropic.claude-opus-4-6-v1` |
| 对抗性评审Agent | `bedrock/amazon.nova-pro-v1` |
| 同行评审Agent | `bedrock/anthropic.claude-opus-4-6-v1` |

注意：Bedrock提供Claude模型，无需单独的Anthropic API密钥（通过AWS计费）。Gemini和Grok在Bedrock上不可用——使用Amazon Nova或Claude作为替代，或混合使用Bedrock和直接提供商。

**Microsoft Azure OpenAI替代方案**——适用于使用Azure的组织：

| Agent | Azure模型ID |
|-------|---------------|
| 规划Agent | `azure/claude-opus-4-6` |
| 编码Agent | `azure/gpt-5.4` |
| 研究Agent | `azure/gpt-5.4` |
| 科研Agent | `azure/gpt-5.4` |
| 视觉分析Agent | `azure/claude-opus-4-6` |
| 对抗性评审Agent | `azure/gpt-4o` |
| 同行评审Agent | `azure/claude-opus-4-6` |

注意：Azure OpenAI模型ID取决于你的部署名称。上述ID假设部署名称与模型名称匹配。Azure提供GPT和Claude模型（通过Azure AI Foundry），通过你的Azure订阅计费。Gemini和Grok在Azure上不可用——使用GPT-4o或Claude作为替代，或混合使用Azure和直接提供商。

**Google Vertex AI替代方案**——适用于使用GCP的组织：

| Agent | Vertex AI模型ID |
|-------|-------------------|
| 研究Agent | `vertex/gemini-3.1-pro` |
| 科研Agent | `vertex/gemini-3-pro` |
| 视觉分析Agent | `vertex/gemini-3.1-pro` |

注意：Vertex AI提供Gemini模型，通过GCP计费。Claude也可通过Vertex AI Model Garden使用。

**推荐混合使用提供商**。你不必只选择一个云——使用Bedrock处理Claude，Azure处理GPT，直接提供商处理Gemini和Grok。只需更改每个Agent的`.md`文件中的模型前缀即可。

Invoking Agents

调用Agent

In OpenCode, invoke agents by name. The orchestrating agent (you, the primary agent) follows the workflow patterns above:

@planner Break down this task: implement user authentication with OAuth2 PKCE

Then follow the plan, invoking each agent as directed:

@researcher Find existing auth patterns in this codebase
@coder Implement PKCE flow based on the research findings: [paste context]
@adversarial-reviewer Review this PKCE implementation for security issues: [paste context]
@peer-reviewer Review code quality and architecture: [paste context]

在OpenCode中，按名称调用Agent。编排Agent（你，主Agent）遵循上述工作流模式：

@planner 分解此任务：实现基于OAuth2 PKCE的用户认证

然后按照规划，根据需要调用每个Agent：

@researcher 查找此代码库中的现有认证模式
@coder 根据研究结果实现PKCE流程：[粘贴上下文]
@adversarial-reviewer 评审此PKCE实现的安全问题：[粘贴上下文]
@peer-reviewer 评审代码质量和架构：[粘贴上下文]

Claude Code

Claude Code has a mature sub-agent architecture but is limited to Anthropic models for sub-agents. Three approaches depending on whether you want cross-provider access:

Claude Code拥有成熟的子Agent架构，但子Agent仅限于Anthropic模型。根据是否需要跨提供商访问，有三种方法：

Approach 1: Anthropic-Only (No Gateway)

方法1：仅使用Anthropic（无需网关）

All agents use Anthropic models. You lose cross-family adversarial review but gain simplicity.

Agent	Claude Code Model	Notes
Planner	`opus`	Extended thinking, no file edits
Coder	`sonnet`	Fast, strong at code
Researcher	`opus`	Strong at synthesis, use with web tools
Scientist	`opus`	Reasonable math capability
Visual Analyst	`opus`	Best multimodal in Anthropic family
Adversarial Reviewer	`sonnet`	Different from opus but same family — weaker adversarial benefit
Peer Reviewer	`opus`	Structured analysis

Limitation: Adversarial review from the same model family is less effective. The adversarial reviewer using Sonnet with a strong adversarial prompt partially compensates, but same-family blind spots persist.

Agent definition files: Generate from canonical templates using the setup script:

bash

sh agents/setup.sh claude-code

This generates 7 agent

.md

files with Anthropic-specific frontmatter (

model: opus/sonnet

allowed-tools

effort

) in

.claude/agents/

所有Agent使用Anthropic模型。你将失去跨家族对抗性评审的优势，但获得简洁性。

Agent	Claude Code模型	说明
规划Agent	`opus`	扩展思考模式，无文件编辑权限
编码Agent	`sonnet`	速度快，擅长代码
研究Agent	`opus`	擅长信息综合，配合网页工具使用
科研Agent	`opus`	具备合理的数学能力
视觉分析Agent	`opus`	Anthropic家族中多模态能力最强
对抗性评审Agent	`sonnet`	与opus不同，但属于同一家族——对抗性优势较弱
同行评审Agent	`opus`	结构化分析

限制： 同一家族模型的对抗性评审效果较差。使用Sonnet作为对抗性评审Agent并配合强对抗性提示可部分弥补，但同家族盲点仍然存在。

Agent定义文件： 使用设置脚本从标准模板生成：

bash

sh agents/setup.sh claude-code

这将在

.claude/agents/

目录中生成7个Agent的

.md

文件，包含Anthropic特定的前置元数据（

model: opus/sonnet

、

allowed-tools

、

effort

）。

Approach 2: With OpenRouter Gateway

方法2：使用OpenRouter网关

Use an MCP server or script to call external models for specific agents. This gives you true cross-family adversarial review.

Step 1: Set up OpenRouter API key:

bash

export OPENROUTER_API_KEY="sk-or-..."

Step 2: Create the gateway script directory and script:

bash

mkdir -p .claude/scripts
cat > .claude/scripts/call-model.sh << 'SCRIPT'
#!/bin/bash

使用MCP服务器或脚本调用外部模型处理特定Agent。这能实现真正的跨家族对抗性评审。

步骤1： 设置OpenRouter API密钥：

bash

export OPENROUTER_API_KEY="sk-or-..."

步骤2： 创建网关脚本目录和脚本：

bash

mkdir -p .claude/scripts
cat > .claude/scripts/call-model.sh << 'SCRIPT'
#!/bin/bash

Usage: call-model.sh <model> <prompt-file>

用法：call-model.sh <模型> <提示文件>

Reads prompt from a file to avoid shell argument length limits.

从文件读取提示，避免Shell参数长度限制。

If no file given, reads from stdin.

如果未提供文件，则从标准输入读取。

MODEL="$1" PROMPT_FILE="$2"

if [ -z "$OPENROUTER_API_KEY" ]; then echo "Error: OPENROUTER_API_KEY not set" >&2 exit 1 fi

if [ -n "$PROMPT_FILE" ] && [ -f "$PROMPT_FILE" ]; then PROMPT=$(cat "$PROMPT_FILE") elif [ ! -t 0 ]; then PROMPT=$(cat) else PROMPT="$2" fi

RESPONSE=$(curl -s https://openrouter.ai/api/v1/chat/completions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }")

ERROR=$(echo "$RESPONSE" | jq -r '.error.message // empty') if [ -n "$ERROR" ]; then echo "API Error: $ERROR" >&2 exit 1 fi

echo "$RESPONSE" | jq -r '.choices[0].message.content' SCRIPT chmod +x .claude/scripts/call-model.sh


**Step 3:** In Claude Code, the orchestrating agent can use this script for cross-provider calls:

```bash

MODEL="$1" PROMPT_FILE="$2"

if [ -z "$OPENROUTER_API_KEY" ]; then echo "错误：未设置OPENROUTER_API_KEY" >&2 exit 1 fi

if [ -n "$PROMPT_FILE" ] && [ -f "$PROMPT_FILE" ]; then PROMPT=$(cat "$PROMPT_FILE") elif [ ! -t 0 ]; then PROMPT=$(cat) else PROMPT="$2" fi

ERROR=$(echo "$RESPONSE" | jq -r '.error.message // empty') if [ -n "$ERROR" ]; then echo "API错误：$ERROR" >&2 exit 1 fi

echo "$RESPONSE" | jq -r '.choices[0].message.content' SCRIPT chmod +x .claude/scripts/call-model.sh


**步骤3：** 在Claude Code中，编排Agent可使用此脚本进行跨提供商调用：

```bash

Call Grok for adversarial review (short prompt as argument)

调用Grok进行对抗性评审（短提示作为参数）

bash .claude/scripts/call-model.sh "xai/grok-4" "Review this code for security issues: ..."

bash .claude/scripts/call-model.sh "xai/grok-4" "评审此代码的安全问题：..."

Call Gemini for research (long prompt via stdin)

调用Gemini进行研究（长提示通过标准输入传递）

echo "Research the best approach for implementing OAuth2 PKCE..." |
bash .claude/scripts/call-model.sh "google/gemini-3.1-pro"


This hybrid approach uses Claude Code's native sub-agents for Anthropic models and the gateway script for other providers.

echo "研究实现OAuth2 PKCE的最佳方案..." |
bash .claude/scripts/call-model.sh "google/gemini-3.1-pro"


这种混合方法使用Claude Code的原生子Agent处理Anthropic模型，使用网关脚本处理其他提供商。

Approach 3: With Vercel AI Gateway

方法3：使用Vercel AI网关

Same as OpenRouter but using Vercel's gateway endpoint:

bash

#!/bin/bash

与OpenRouter类似，但使用Vercel的网关端点：

bash

#!/bin/bash

.claude/scripts/call-model.sh (Vercel AI Gateway version)

.claude/scripts/call-model.sh（Vercel AI网关版本）

MODEL="$1" PROMPT="$2"

curl -s https://ai-gateway.vercel.sh/v1/chat/completions
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [{"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)}] }" | jq -r '.choices[0].message.content'


**Vercel AI Gateway advantages:** Zero token markup, 40+ providers, OIDC auth for Vercel-deployed apps (no key management).

---

MODEL="$1" PROMPT="$2"


**Vercel AI网关优势：** 零令牌加价，支持40+提供商，Vercel部署应用支持OIDC认证（无需密钥管理）。

---

Cursor

Cursor does not support programmatic sub-agent spawning. Use sequential model switching:

Select Claude Opus in model picker → Plan the task
Switch to GPT-5.4 → Implement the plan
Switch to Grok 4 or Gemini → Review the implementation
Switch back to Claude Opus → Evaluate reviews and decide next steps

Cursor不支持程序化生成子Agent。使用顺序模型切换：

在模型选择器中选择Claude Opus → 规划任务
切换到GPT-5.4 → 实现规划
切换到Grok 4或Gemini → 评审实现
切换回Claude Opus → 评估评审结果并决定下一步

Cursor Rules (.cursor/rules/)

Cursor规则（.cursor/rules/）

Create rules that guide each phase. Place in

.cursor/rules/

.cursor/rules/agent-collaboration.mdc

markdown

---
description: Multi-model agent collaboration workflow
globs: ["**/*"]
---

创建指导每个阶段的规则。放置在

.cursor/rules/

目录中：

.cursor/rules/agent-collaboration.mdc

markdown

---
description: 多模型Agent协作工作流
globs: ["**/*"]
---

Agent Collaboration Workflow

Agent协作工作流

When working on complex tasks, follow this workflow:

处理复杂任务时，遵循以下工作流：

Planning Phase (use Claude Opus)

规划阶段（使用Claude Opus）

Break the task into subtasks with clear success criteria
Identify dependencies between subtasks
Assign each subtask to an execution phase

将任务分解为带有明确成功标准的子任务
识别子任务之间的依赖关系
为每个子任务分配执行阶段

Implementation Phase (use GPT-5.4 or Claude Sonnet)

实现阶段（使用GPT-5.4或Claude Sonnet）

Follow the plan exactly
Write tests for new functionality
Report what changed and why

严格遵循规划执行
为新功能编写测试
报告变更内容和原因

Review Phase (switch model for fresh perspective)

评审阶段（切换模型以获取新视角）

Review for security vulnerabilities, edge cases, and logical errors
Check architecture, style, and best practices
Provide explicit verdict: approve or request changes

评审安全漏洞、边缘情况和逻辑错误
检查架构、风格和最佳实践
提供明确结论：批准或要求修改

Replan Phase (use Claude Opus)

重新规划阶段（使用Claude Opus）

Evaluate review feedback
Decide: accept, revise, or restart
If revising, specify exact changes for the coder

---

评估评审反馈
决定：接受、修改或重新开始
如果需要修改，为编码Agent指定具体变更内容

---

Codex CLI

Codex CLI supports the Agent Skills standard and primarily uses OpenAI models. For multi-model:

Codex handles implementation (GPT-5.4 natively)
Use gateway script for other models (same
```
call-model.sh
```
approach as Claude Code)

Place agent skills in

.codex/skills/

following the standard format.

Codex CLI支持Agent技能标准，主要使用OpenAI模型。如需多模型支持：

Codex处理实现（原生使用GPT-5.4）
使用网关脚本处理其他模型（与Claude Code相同的
```
call-model.sh
```
方法）

将Agent技能放置在

.codex/skills/

目录中，遵循标准格式。

Environment Setup

环境设置

bash

undefined

bash

undefined

Codex uses your ChatGPT account or API key

Codex使用你的ChatGPT账户或API密钥

export OPENAI_API_KEY="sk-..."

For cross-provider calls via gateway

用于通过网关进行跨提供商调用

export OPENROUTER_API_KEY="sk-or-..."

---

export OPENROUTER_API_KEY="sk-or-..."

---

Gemini CLI

Gemini CLI is single-agent, Gemini-only. Use sequential mode:

Use Gemini 3.1 Pro for planning and research (it's strong at both)
Use gateway script for coding (call GPT-5.4 via OpenRouter)
Use Gemini 3.1 Pro for review (strong at adversarial analysis)

Gemini CLI是单Agent工具，仅支持Gemini模型。使用顺序模式：

使用Gemini 3.1 Pro进行规划和研究（它在这两方面都很强）
使用网关脚本进行编码（通过OpenRouter调用GPT-5.4）
使用Gemini 3.1 Pro进行评审（擅长对抗性分析）

GEMINI.md Integration

GEMINI.md集成

Add workflow instructions to your

GEMINI.md

markdown

undefined

将工作流说明添加到你的

GEMINI.md

文件中：

markdown

undefined

Agent Collaboration

Agent协作

For complex tasks, follow this multi-phase workflow:

PLAN: Break the task into subtasks with success criteria
RESEARCH: Search the web and documentation for relevant context
IMPLEMENT: Write code following the plan (call external model if needed)
REVIEW: Critically review the implementation for flaws
REPLAN: Evaluate and decide next steps

---

处理复杂任务时，遵循以下多阶段工作流：

规划：将任务分解为带有成功标准的子任务
研究：搜索网页和文档获取相关上下文
实现：按照规划编写代码（必要时调用外部模型）
评审：严格评审实现中的缺陷
重新规划：评估并决定下一步

---

Aider

Aider has a built-in dual-model workflow that maps naturally to planner + coder:

Aider内置双模型工作流，自然映射到规划Agent + 编码Agent：

.aider.conf.yml

yaml

undefined

yaml

undefined

Architect model = Planner (proposes the approach)

架构师模型 = 规划Agent（提出方案）

model: anthropic/claude-opus-4-6

Editor model = Coder (implements the changes)

编辑器模型 = 编码Agent（实现变更）

editor-model: openai/gpt-5.4

Weak model = Fast tasks (commit messages, summaries)

轻量模型 = 快速任务（提交消息、总结）

weak-model: google/gemini-3-flash

Enable architect mode

启用架构师模式

edit-format: architect


**What you get:** Claude Opus plans the approach, GPT-5.4 implements the edits. This covers Pattern A (Plan → Code) natively.

**What you don't get:** Review phase. For review, run a separate aider session:

```bash

edit-format: architect


**你将获得：** Claude Opus规划方案，GPT-5.4实现编辑。原生覆盖模式A（规划→编码）。

**你无法获得：** 评审阶段。如需评审，运行单独的Aider会话：

```bash

Review session with a different model

使用不同模型进行评审会话

aider --model xai/grok-4 --no-auto-commits --message "Review the recent changes for security issues and edge cases"


**Provider support:** Aider uses LiteLLM under the hood, supporting 100+ providers. Any `provider/model-id` format works.

---

aider --model xai/grok-4 --no-auto-commits --message "评审最近的变更，查找安全问题和边缘情况"


**提供商支持：** Aider在底层使用LiteLLM，支持100+提供商。任何`provider/model-id`格式都适用。

---

Gateway Configuration

网关配置

No Gateway: Direct Provider APIs

无网关：直接提供商API

For tools with native multi-provider support (OpenCode) or when using a single provider:

bash

undefined

适用于原生支持多提供商的工具（如OpenCode）或使用单一提供商的场景：

bash

undefined

--- Direct providers ---

--- 直接提供商 ---

export ANTHROPIC_API_KEY="sk-ant-..." # Claude models export OPENAI_API_KEY="sk-..." # GPT models export GOOGLE_API_KEY="..." # Gemini models export XAI_API_KEY="..." # Grok models

export ANTHROPIC_API_KEY="sk-ant-..." # Claude模型 export OPENAI_API_KEY="sk-..." # GPT模型 export GOOGLE_API_KEY="..." # Gemini模型 export XAI_API_KEY="..." # Grok模型

--- Cloud providers (alternative or additional) ---

--- 云提供商（替代或补充） ---

Amazon Bedrock — Claude, Amazon Nova, Mistral, Llama, etc.

Amazon Bedrock — Claude、Amazon Nova、Mistral、Llama等

export AWS_ACCESS_KEY_ID="..." export AWS_SECRET_ACCESS_KEY="..." export AWS_REGION="us-east-1"

Microsoft Azure OpenAI — GPT, Claude (via AI Foundry)

Microsoft Azure OpenAI — GPT、Claude（通过AI Foundry）

export AZURE_API_KEY="..." export AZURE_RESOURCE_NAME="your-resource" export AZURE_API_VERSION="2024-12-01-preview"

Google Vertex AI — Gemini, Claude (via Model Garden)

Google Vertex AI — Gemini、Claude（通过Model Garden）

export GOOGLE_CLOUD_PROJECT="your-project" export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"

--- Other direct providers ---

--- 其他直接提供商 ---

export MISTRAL_API_KEY="..." export DEEPSEEK_API_KEY="..." export GROQ_API_KEY="..."


**Best for:** OpenCode (native routing), Aider (LiteLLM routing), any tool where you have direct provider API access.

**Mixing providers is normal.** Use `bedrock/` for Claude (billed through AWS), `azure/` for GPT (billed through Azure), and `google/` or `vertex/` for Gemini directly. Each agent's model ID prefix determines which provider is used — no gateway needed.

export MISTRAL_API_KEY="..." export DEEPSEEK_API_KEY="..." export GROQ_API_KEY="..."


**最适合：** OpenCode（原生路由）、Aider（LiteLLM路由）、任何你拥有直接提供商API访问权限的工具。

**混合使用提供商是正常的**。使用`bedrock/`处理Claude（通过AWS计费），`azure/`处理GPT（通过Azure计费），`google/`或`vertex/`直接处理Gemini。每个Agent的模型ID前缀决定使用哪个提供商——无需网关。

OpenRouter: Unified API

OpenRouter：统一API

Single API key, single endpoint, 100+ models from all providers:

bash

export OPENROUTER_API_KEY="sk-or-..."

Endpoint:

https://openrouter.ai/api/v1/chat/completions

Model format:

provider/model-name

```
anthropic/claude-opus-4-6
```
```
openai/gpt-5.4
```
```
google/gemini-3.1-pro
```
```
xai/grok-4
```
```
google/gemini-3-pro
```
```
openai/gpt-4o
```

Gateway script for CLI tools:

bash

#!/bin/bash

单一API密钥、单一端点、支持100+来自所有提供商的模型：

bash

export OPENROUTER_API_KEY="sk-or-..."

端点：

https://openrouter.ai/api/v1/chat/completions

模型格式：

provider/model-name

```
anthropic/claude-opus-4-6
```
```
openai/gpt-5.4
```
```
google/gemini-3.1-pro
```
```
xai/grok-4
```
```
google/gemini-3-pro
```
```
openai/gpt-4o
```

CLI工具网关脚本：

bash

#!/bin/bash

gateway-openrouter.sh — call any model via OpenRouter

gateway-openrouter.sh — 通过OpenRouter调用任何模型

Usage: gateway-openrouter.sh <model> <system_prompt> <user_prompt>

用法：gateway-openrouter.sh <模型> <系统提示> <用户提示>

MODEL="$1" SYSTEM="$2" PROMPT="$3"

if [ -z "$OPENROUTER_API_KEY" ]; then echo "Error: OPENROUTER_API_KEY not set" >&2 exit 1 fi

RESPONSE=$(curl -s https://openrouter.ai/api/v1/chat/completions
-H "Authorization: Bearer $OPENROUTER_API_KEY"
-H "Content-Type: application/json"
-H "HTTP-Referer: https://skills.sh/pascalorg"
-H "X-OpenRouter-Title: Agent Collaboration"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")

echo "$RESPONSE" | jq -r '.choices[0].message.content'


**Best for:** Claude Code, Codex CLI, Gemini CLI — any tool locked to a single provider that needs cross-provider access via script.

MODEL="$1" SYSTEM="$2" PROMPT="$3"

if [ -z "$OPENROUTER_API_KEY" ]; then echo "错误：未设置OPENROUTER_API_KEY" >&2 exit 1 fi

echo "$RESPONSE" | jq -r '.choices[0].message.content'


**最适合：** Claude Code、Codex CLI、Gemini CLI——任何锁定到单一提供商但需要通过脚本进行跨提供商访问的工具。

Vercel AI Gateway: Zero Markup

Vercel AI网关：零加价

40+ providers, zero token markup, managed infrastructure:

bash

export AI_GATEWAY_API_KEY="..."    # From Vercel Dashboard

Endpoint:

https://ai-gateway.vercel.sh/v1/chat/completions

Model format: Same as OpenRouter —

provider/model-name

Provider ordering and fallbacks (when using Vercel AI SDK):

typescript

import { gateway } from '@ai-sdk/gateway';
import { generateText } from 'ai';

const result = await generateText({
  model: gateway('anthropic/claude-opus-4-6'),
  prompt: 'Plan this task...',
  providerOptions: {
    gateway: {
      order: ['anthropic', 'bedrock'],     // Try Anthropic first, fall back to Bedrock
      caching: 'auto',                     // Automatic provider-appropriate caching
    }
  }
});

Gateway script for CLI tools (same pattern as OpenRouter, different endpoint):

bash

#!/bin/bash

支持40+提供商，零令牌加价，托管基础设施：

bash

export AI_GATEWAY_API_KEY="..."    # 来自Vercel控制台

端点：

https://ai-gateway.vercel.sh/v1/chat/completions

模型格式： 与OpenRouter相同——

provider/model-name

提供商排序和回退（使用Vercel AI SDK时）：

typescript

import { gateway } from '@ai-sdk/gateway';
import { generateText } from 'ai';

const result = await generateText({
  model: gateway('anthropic/claude-opus-4-6'),
  prompt: '规划此任务...',
  providerOptions: {
    gateway: {
      order: ['anthropic', 'bedrock'],     // 先尝试Anthropic，回退到Bedrock
      caching: 'auto',                     // 自动使用提供商合适的缓存
    }
  }
});

CLI工具网关脚本（与OpenRouter模式相同，端点不同）：

bash

#!/bin/bash

gateway-vercel.sh — call any model via Vercel AI Gateway

gateway-vercel.sh — 通过Vercel AI网关调用任何模型

Usage: gateway-vercel.sh <model> <system_prompt> <user_prompt>

用法：gateway-vercel.sh <模型> <系统提示> <用户提示>

MODEL="$1" SYSTEM="$2" PROMPT="$3"

if [ -z "$AI_GATEWAY_API_KEY" ]; then echo "Error: AI_GATEWAY_API_KEY not set" >&2 exit 1 fi

RESPONSE=$(curl -s https://ai-gateway.vercel.sh/v1/chat/completions
-H "Authorization: Bearer $AI_GATEWAY_API_KEY"
-H "Content-Type: application/json"
-d "{ "model": "$MODEL", "messages": [ {"role": "system", "content": $(echo "$SYSTEM" | jq -Rs .)}, {"role": "user", "content": $(echo "$PROMPT" | jq -Rs .)} ], "temperature": 0.3 }")

echo "$RESPONSE" | jq -r '.choices[0].message.content'


**Best for:** Vercel projects, TypeScript codebases using AI SDK, teams wanting managed gateway with no token markup.

---

MODEL="$1" SYSTEM="$2" PROMPT="$3"

if [ -z "$AI_GATEWAY_API_KEY" ]; then echo "错误：未设置AI_GATEWAY_API_KEY" >&2 exit 1 fi

echo "$RESPONSE" | jq -r '.choices[0].message.content'


**最适合：** Vercel项目、使用AI SDK的TypeScript代码库、需要托管网关且零令牌加价的团队。

---

Escalation Rules

升级规则

The planner re-enters the loop at defined checkpoints. These rules are non-negotiable:

规划Agent在定义的检查点重新进入循环。这些规则不可协商：

When the Planner MUST Re-Enter

规划Agent必须重新介入的场景

After every execution phase — The planner evaluates all results before sending to review
After every review phase — The planner synthesizes review feedback and decides next steps
When any agent fails — The planner decides: retry with same model, reassign to fallback model, or simplify the subtask
When reviewers disagree — The planner evaluates both positions, considers the evidence, and makes a final decision
When new information emerges — If any agent discovers something that invalidates the plan, the planner replans
When scope changes — User requirements change, the planner re-decomposes

每个执行阶段结束后 — 规划Agent在发送到评审前评估所有结果
每个评审阶段结束后 — 规划Agent综合评审反馈并决定下一步
任何Agent失败时 — 规划Agent决定：使用相同模型重试、重新分配给备选模型或简化子任务
评审意见不一致时 — 规划Agent评估双方立场、考虑证据并做出最终决定
出现新信息时 — 如果任何Agent发现使规划无效的信息，规划Agent重新规划
范围变更时 — 用户需求变更，规划Agent重新分解任务

When the Planner Steps Aside

规划Agent暂退的场景

During execution — Agents execute independently within their assigned scope
During review — Reviewers form opinions independently without planner influence
For trivial tasks — Pattern E (rapid iteration) minimizes planner involvement

执行期间 — Agent在分配的范围内独立执行
评审期间 — 评审者独立形成意见，不受规划Agent影响
琐碎任务 — 模式E（快速迭代）最小化规划Agent的参与

What the Planner NEVER Does

规划Agent绝对不做的事

Writes code (that's the coder's job)
Does research (that's the researcher's job)
Makes mathematical claims without the scientist
Approves its own plans (that's the reviewer's job)
Overrules both reviewers simultaneously (if both say no, the code needs work)

编写代码（这是编码Agent的工作）
进行研究（这是研究Agent的工作）
在没有科研Agent参与的情况下做出数学声明
批准自己的规划（这是评审Agent的工作）
同时否决两位评审者的意见（如果两位都否决，代码需要修改）

Anti-Patterns

反模式

Using one model for everything

单一模型包办所有任务

Why it's wrong: You lose the adversarial advantage. Same model, same blind spots. A model reviewing its own code is like a writer proofreading their own manuscript — they'll miss the same errors.

Fix: At minimum, use a different model family for the adversarial reviewer.

错误原因： 失去对抗性优势。同一模型存在相同的盲点。模型评审自己的代码就像作者校对自己的手稿——会遗漏相同的错误。

修复： 至少为对抗性评审Agent使用不同的模型家族。

Skipping adversarial review

跳过对抗性评审

Why it's wrong: Peer review is constructive by default — it finds improvements but misses security issues and logical flaws. The adversarial reviewer exists specifically to find what politeness misses.

Fix: Always include adversarial review. Use Pattern E (rapid iteration) for small tasks — it still includes the adversarial pass.

错误原因： 同行评审默认具有建设性——它能发现改进点，但会遗漏安全问题和逻辑缺陷。对抗性评审Agent专门用于发现礼貌性评审遗漏的问题。

修复： 始终包含对抗性评审。小型任务使用模式E（快速迭代）——它仍然包含对抗性评审步骤。

Letting the coder review its own code

让编码Agent评审自己的代码

Why it's wrong: If the coder didn't see the bug while writing, it won't see it while reviewing. Same context, same assumptions, same errors.

Fix: Always use a different model (ideally different family) for review.

错误原因： 如果编码Agent在编写代码时没有发现Bug，评审时也不会发现。相同的上下文、相同的假设、相同的错误。

修复： 始终使用不同的模型（理想情况下是不同家族）进行评审。

Over-orchestrating simple tasks

过度编排简单任务

Why it's wrong: A one-line fix doesn't need seven agents. The overhead of full orchestration exceeds the benefit for trivial changes.

Fix: Use Pattern E for small tasks. Use your judgment — if the fix is obvious and low-risk, just do it.

错误原因： 单行修复不需要七个Agent。对于琐碎变更，完整编排的开销超过收益。

修复： 小型任务使用模式E。运用你的判断力——如果修复明显且低风险，直接执行即可。

Ignoring success criteria

忽略成功标准

Why it's wrong: Without criteria, agents don't know when they're done. They either under-deliver or gold-plate. The planner's success criteria are the contract.

Fix: The planner must define success criteria for every subtask. Agents must verify their work against criteria before reporting completion.

错误原因： 没有标准，Agent不知道何时完成任务。它们要么交付不足，要么过度优化。规划Agent的成功标准是契约。

修复： 规划Agent必须为每个子任务定义成功标准。Agent在报告完成前必须验证其工作是否符合标准。

Giving the planner file edit access

赋予规划Agent文件编辑权限

Why it's wrong: If the planner writes code, it can't objectively evaluate the result. Separation of concerns is the foundation of this workflow.

Fix: The planner never edits files. It reads code and runs read-only commands (git log, ls) for context, and spawns sub-agents to do the actual work.

错误原因： 如果规划Agent编写代码，它无法客观评估结果。职责分离是此工作流的基础。

修复： 规划Agent从不编辑文件。它读取代码并执行只读命令（如git log、ls）获取上下文，并生成子Agent执行实际工作。

Passing insufficient context in handoffs

交接时传递的上下文不足

Why it's wrong: If the coder doesn't know what the researcher found, it'll re-research or guess. If the reviewer doesn't know the constraints, it'll flag intentional tradeoffs as bugs.

Fix: Follow the handoff protocol. The planner is responsible for ensuring every agent has the context it needs.

错误原因： 如果编码Agent不知道研究Agent的发现，它会重新研究或猜测。如果评审Agent不知道约束条件，它会将有意的权衡标记为Bug。

修复： 遵循交接协议。规划Agent负责确保每个Agent获得所需的上下文。

Letting agents argue directly

让Agent直接争论

Why it's wrong: Agents don't have shared context. A "debate" between agents without the planner mediating leads to circular arguments and wasted tokens.

Fix: All communication goes through the planner. The planner synthesizes, decides, and directs.

错误原因： Agent没有共享上下文。没有规划Agent调解的Agent「辩论」会导致循环论证和令牌浪费。

修复： 所有通信都通过规划Agent进行。规划Agent负责综合信息、做出决定并下达指令。

Model Map

模型映射

Current recommended model assignments based on benchmarks as of April 2026. These will evolve as new models launch.

基于2026年4月的基准测试，当前推荐的模型分配。随着新模型发布，这些分配会不断演变。

Primary Assignments

主要分配

Role	Model	Provider ID	Benchmark Evidence
Planner	Claude Opus 4.6 (thinking)	`anthropic/claude-opus-4-6`	Arena #1 (1504 Elo), ARC-AGI 2 68.8%
Coder	GPT-5.4 (high)	`openai/gpt-5.4`	Aider leaderboard #1 (88%), strong SWE-bench
Researcher	Gemini 3.1 Pro	`google/gemini-3.1-pro`	HLE 45.8%, MMMLU 91.8%
Scientist	Gemini 3 Pro	`google/gemini-3-pro`	AIME 2025 100%, GPQA Diamond 94.3%
Visual Analyst	Claude Opus 4.6	`anthropic/claude-opus-4-6`	ARC-AGI 2 68.8%, strong MMMU
Adversarial Reviewer	Grok 4	`xai/grok-4`	Arena #4, contrarian style, different family
Peer Reviewer	Claude Opus 4.6	`anthropic/claude-opus-4-6`	Structured analysis, low positivity bias

角色	模型	提供商ID	基准测试依据
规划Agent	Claude Opus 4.6（思考模式）	`anthropic/claude-opus-4-6`	Arena排名第1（1504 Elo），ARC-AGI 2得分68.8%
编码Agent	GPT-5.4（高推理）	`openai/gpt-5.4`	Aider排行榜第1（88%），SWE-bench表现出色
研究Agent	Gemini 3.1 Pro	`google/gemini-3.1-pro`	HLE得分45.8%，MMMLU得分91.8%
科研Agent	Gemini 3 Pro	`google/gemini-3-pro`	AIME 2025得分100%，GPQA Diamond得分94.3%
视觉分析Agent	Claude Opus 4.6	`anthropic/claude-opus-4-6`	ARC-AGI 2得分68.8%，MMMU表现出色
对抗性评审Agent	Grok 4	`xai/grok-4`	Arena排名第4，逆向风格，不同家族
同行评审Agent	Claude Opus 4.6	`anthropic/claude-opus-4-6`	结构化分析，低积极偏见

Fallback Assignments

备选分配

Role	Fallback 1	Fallback 2
Planner	`anthropic/claude-opus-4-5`	`openai/gpt-5.4`
Coder	`anthropic/claude-sonnet-4-5`	`anthropic/claude-sonnet-4-6`
Researcher	`anthropic/claude-opus-4-6`	`openai/gpt-5.4`
Scientist	`openai/gpt-5.2`	`anthropic/claude-opus-4-6`
Visual Analyst	`google/gemini-3.1-pro`	`openai/gpt-5.4`
Adversarial Reviewer	`google/gemini-3.1-pro`	`anthropic/claude-opus-4-6`
Peer Reviewer	`openai/gpt-4o`	`google/gemini-3.1-pro`

角色	备选1	备选2
规划Agent	`anthropic/claude-opus-4-5`	`openai/gpt-5.4`
编码Agent	`anthropic/claude-sonnet-4-5`	`anthropic/claude-sonnet-4-6`
研究Agent	`anthropic/claude-opus-4-6`	`openai/gpt-5.4`
科研Agent	`openai/gpt-5.2`	`anthropic/claude-opus-4-6`
视觉分析Agent	`google/gemini-3.1-pro`	`openai/gpt-5.4`
对抗性评审Agent	`google/gemini-3.1-pro`	`anthropic/claude-opus-4-6`
同行评审Agent	`openai/gpt-4o`	`google/gemini-3.1-pro`

Budget-Conscious Assignments

预算友好型分配

For teams optimizing cost while maintaining the multi-model advantage:

Role	Budget Model	Provider ID	Tradeoff
Planner	Claude Sonnet 4.6	`anthropic/claude-sonnet-4-6`	Slightly less nuanced planning
Coder	GPT-4.1	`openai/gpt-4.1`	Good coding, lower cost
Researcher	Gemini 3 Flash	`google/gemini-3-flash`	Fast research, less depth
Scientist	Gemini 3 Flash	`google/gemini-3-flash`	Good math, less formal rigor
Visual Analyst	Gemini 3.1 Pro	`google/gemini-3.1-pro`	Strong visual, lower cost than Opus
Adversarial Reviewer	Grok 4	`xai/grok-4`	Keep this — adversarial review is critical
Peer Reviewer	Claude Sonnet 4.6	`anthropic/claude-sonnet-4-6`	Good reviews, lower cost

适用于在保持多模型优势的同时优化成本的团队：

角色	预算模型	提供商ID	权衡点
规划Agent	Claude Sonnet 4.6	`anthropic/claude-sonnet-4-6`	规划的细微度略有降低
编码Agent	GPT-4.1	`openai/gpt-4.1`	编码能力良好，成本更低
研究Agent	Gemini 3 Flash	`google/gemini-3-flash`	研究速度快，深度略有不足
科研Agent	Gemini 3 Flash	`google/gemini-3-flash`	数学能力良好，形式化严谨性略有降低
视觉分析Agent	Gemini 3.1 Pro	`google/gemini-3.1-pro`	视觉能力强，成本低于Opus
对抗性评审Agent	Grok 4	`xai/grok-4`	保留此模型——对抗性评审至关重要
同行评审Agent	Claude Sonnet 4.6	`anthropic/claude-sonnet-4-6`	评审能力良好，成本更低

Cloud Provider Assignments

云提供商分配

For organizations routing through Amazon Bedrock, Microsoft Azure, or Google Vertex AI instead of direct provider APIs:

Amazon Bedrock:

Role	Bedrock Model ID	Notes
Planner	`bedrock/anthropic.claude-opus-4-6-v1`	Claude via AWS billing
Coder	`bedrock/anthropic.claude-sonnet-4-5-v1`	Claude Sonnet strong at code
Researcher	`bedrock/amazon.nova-pro-v1`	Nova Pro for knowledge tasks
Scientist	`bedrock/anthropic.claude-opus-4-6-v1`	Opus for math reasoning
Visual Analyst	`bedrock/anthropic.claude-opus-4-6-v1`	Opus strong multimodal
Adversarial Reviewer	`bedrock/amazon.nova-pro-v1`	Different model family for adversarial benefit
Peer Reviewer	`bedrock/anthropic.claude-opus-4-6-v1`	Structured analysis

Note: Bedrock doesn't have GPT, Gemini, or Grok. Mix Bedrock with direct providers for full coverage:

bedrock/

for Claude agents,

openai/

for coder,

google/

for research/science,

xai/

for adversarial review.

Microsoft Azure OpenAI:

Role	Azure Model ID	Notes
Planner	`azure/claude-opus-4-6`	Claude via Azure AI Foundry
Coder	`azure/gpt-5.4`	GPT via Azure OpenAI
Researcher	`azure/gpt-5.4`	GPT for knowledge tasks
Scientist	`azure/gpt-5.4`	GPT for math reasoning
Visual Analyst	`azure/claude-opus-4-6`	Claude multimodal via Foundry
Adversarial Reviewer	`azure/gpt-4o`	Different model for adversarial benefit
Peer Reviewer	`azure/claude-opus-4-6`	Structured analysis

Note: Azure model IDs depend on your deployment names — the above assumes deployments matching model names. Azure doesn't have Gemini or Grok natively. Mix Azure with direct providers:

azure/

for GPT/Claude agents,

google/

for research/science,

xai/

for adversarial review.

Google Vertex AI:

Role	Vertex AI Model ID	Notes
Planner	`vertex/claude-opus-4-6`	Claude via Model Garden
Coder	`vertex/gemini-3.1-pro`	Gemini strong at code
Researcher	`vertex/gemini-3.1-pro`	Gemini native strength
Scientist	`vertex/gemini-3-pro`	Gemini native math
Visual Analyst	`vertex/gemini-3.1-pro`	Gemini strong multimodal
Adversarial Reviewer	`vertex/claude-opus-4-6`	Different family via Model Garden
Peer Reviewer	`vertex/gemini-3.1-pro`	Structured analysis

Note: Vertex AI has Gemini natively and Claude via Model Garden. No GPT or Grok. Mix with direct providers for full coverage.

Recommended hybrid for enterprise — mix cloud providers with one or two direct APIs for maximum model diversity:

Role	Hybrid Enterprise	Why
Planner	`bedrock/anthropic.claude-opus-4-6-v1`	Claude via AWS billing
Coder	`azure/gpt-5.4`	GPT via Azure billing
Researcher	`vertex/gemini-3.1-pro`	Gemini via GCP billing
Scientist	`vertex/gemini-3-pro`	Gemini via GCP billing
Visual Analyst	`bedrock/anthropic.claude-opus-4-6-v1`	Claude via AWS billing
Adversarial Reviewer	`xai/grok-4`	Direct — Grok not on clouds
Peer Reviewer	`bedrock/anthropic.claude-opus-4-6-v1`	Claude via AWS billing

This routes billing through your existing cloud agreements while maintaining full model diversity. Only Grok requires a direct API key since xAI is not yet available on any cloud marketplace.

适用于通过Amazon Bedrock、Microsoft Azure或Google Vertex AI而非直接提供商API路由的组织：

Amazon Bedrock：

角色	Bedrock模型ID	说明
规划Agent	`bedrock/anthropic.claude-opus-4-6-v1`	Claude通过AWS计费
编码Agent	`bedrock/anthropic.claude-sonnet-4-5-v1`	Claude Sonnet擅长代码
研究Agent	`bedrock/amazon.nova-pro-v1`	Nova Pro适用于知识类任务
科研Agent	`bedrock/anthropic.claude-opus-4-6-v1`	Opus适用于数学推理
视觉分析Agent	`bedrock/anthropic.claude-opus-4-6-v1`	Opus多模态能力强
对抗性评审Agent	`bedrock/amazon.nova-pro-v1`	不同模型家族，具备对抗性优势
同行评审Agent	`bedrock/anthropic.claude-opus-4-6-v1`	结构化分析

注意：Bedrock没有GPT、Gemini或Grok。混合使用Bedrock和直接提供商以获得完整覆盖：

bedrock/

处理Claude Agent，

openai/

处理编码Agent，

google/

处理研究/科研Agent，

xai/

处理对抗性评审Agent。

Microsoft Azure OpenAI：

角色	Azure模型ID	说明
规划Agent	`azure/claude-opus-4-6`	Claude通过Azure AI Foundry
编码Agent	`azure/gpt-5.4`	GPT通过Azure OpenAI
研究Agent	`azure/gpt-5.4`	GPT适用于知识类任务
科研Agent	`azure/gpt-5.4`	GPT适用于数学推理
视觉分析Agent	`azure/claude-opus-4-6`	Claude多模态能力通过Foundry提供
对抗性评审Agent	`azure/gpt-4o`	不同模型，具备对抗性优势
同行评审Agent	`azure/claude-opus-4-6`	结构化分析

注意：Azure模型ID取决于你的部署名称——上述假设部署名称与模型名称匹配。Azure原生没有Gemini或Grok。混合使用Azure和直接提供商：

azure/

处理GPT/Claude Agent，

google/

处理研究/科研Agent，

xai/

处理对抗性评审Agent。

Google Vertex AI：

角色	Vertex AI模型ID	说明
规划Agent	`vertex/claude-opus-4-6`	Claude通过Model Garden
编码Agent	`vertex/gemini-3.1-pro`	Gemini擅长代码
研究Agent	`vertex/gemini-3.1-pro`	Gemini原生优势
科研Agent	`vertex/gemini-3-pro`	Gemini原生数学能力
视觉分析Agent	`vertex/gemini-3.1-pro`	Gemini多模态能力强
对抗性评审Agent	`vertex/claude-opus-4-6`	通过Model Garden使用不同家族模型
同行评审Agent	`vertex/gemini-3.1-pro`	结构化分析

注意：Vertex AI原生提供Gemini，通过Model Garden提供Claude。没有GPT或Grok。混合使用直接提供商以获得完整覆盖。

企业推荐混合方案——混合云提供商和一两个直接API，以实现最大模型多样性：

角色	企业混合方案	原因
规划Agent	`bedrock/anthropic.claude-opus-4-6-v1`	Claude通过AWS计费
编码Agent	`azure/gpt-5.4`	GPT通过Azure计费
研究Agent	`vertex/gemini-3.1-pro`	Gemini通过GCP计费
科研Agent	`vertex/gemini-3-pro`	Gemini通过GCP计费
视觉分析Agent	`bedrock/anthropic.claude-opus-4-6-v1`	Claude通过AWS计费
对抗性评审Agent	`xai/grok-4`	直接调用——Grok尚未在任何云市场提供
同行评审Agent	`bedrock/anthropic.claude-opus-4-6-v1`	Claude通过AWS计费

此方案通过现有云协议计费，同时保持完整的模型多样性。只有Grok需要直接API密钥，因为xAI尚未在任何云市场提供。

Cross-Provider Model ID Reference

跨提供商模型ID参考

The same model is accessed through different ID formats depending on the provider. Use this table to swap providers in agent definitions:

Model	Direct	Bedrock	Azure	Vertex AI	OpenRouter
Claude Opus 4.6	`anthropic/claude-opus-4-6`	`bedrock/anthropic.claude-opus-4-6-v1`	`azure/claude-opus-4-6`	`vertex/claude-opus-4-6`	`anthropic/claude-opus-4-6`
Claude Sonnet 4.5	`anthropic/claude-sonnet-4-5`	`bedrock/anthropic.claude-sonnet-4-5-v1`	`azure/claude-sonnet-4-5`	`vertex/claude-sonnet-4-5`	`anthropic/claude-sonnet-4-5`
Claude Sonnet 4.6	`anthropic/claude-sonnet-4-6`	`bedrock/anthropic.claude-sonnet-4-6-v1`	`azure/claude-sonnet-4-6`	`vertex/claude-sonnet-4-6`	`anthropic/claude-sonnet-4-6`
GPT-5.4	`openai/gpt-5.4`	—	`azure/gpt-5.4`	—	`openai/gpt-5.4`
GPT-5.2	`openai/gpt-5.2`	—	`azure/gpt-5.2`	—	`openai/gpt-5.2`
GPT-4o	`openai/gpt-4o`	—	`azure/gpt-4o`	—	`openai/gpt-4o`
Gemini 3.1 Pro	`google/gemini-3.1-pro`	—	—	`vertex/gemini-3.1-pro`	`google/gemini-3.1-pro`
Gemini 3 Pro	`google/gemini-3-pro`	—	—	`vertex/gemini-3-pro`	`google/gemini-3-pro`
Gemini 3 Flash	`google/gemini-3-flash`	—	—	`vertex/gemini-3-flash`	`google/gemini-3-flash`
Grok 4	`xai/grok-4`	—	—	—	`xai/grok-4`
Amazon Nova Pro	—	`bedrock/amazon.nova-pro-v1`	—	—	—

"—" means the model is not available through that provider. Mix providers as needed.

同一模型通过不同提供商访问时使用不同的ID格式。使用此表在Agent定义中切换提供商：

模型	直接提供商	Bedrock	Azure	Vertex AI	OpenRouter
Claude Opus 4.6	`anthropic/claude-opus-4-6`	`bedrock/anthropic.claude-opus-4-6-v1`	`azure/claude-opus-4-6`	`vertex/claude-opus-4-6`	`anthropic/claude-opus-4-6`
Claude Sonnet 4.5	`anthropic/claude-sonnet-4-5`	`bedrock/anthropic.claude-sonnet-4-5-v1`	`azure/claude-sonnet-4-5`	`vertex/claude-sonnet-4-5`	`anthropic/claude-sonnet-4-5`
Claude Sonnet 4.6	`anthropic/claude-sonnet-4-6`	`bedrock/anthropic.claude-sonnet-4-6-v1`	`azure/claude-sonnet-4-6`	`vertex/claude-sonnet-4-6`	`anthropic/claude-sonnet-4-6`
GPT-5.4	`openai/gpt-5.4`	—	`azure/gpt-5.4`	—	`openai/gpt-5.4`
GPT-5.2	`openai/gpt-5.2`	—	`azure/gpt-5.2`	—	`openai/gpt-5.2`
GPT-4o	`openai/gpt-4o`	—	`azure/gpt-4o`	—	`openai/gpt-4o`
Gemini 3.1 Pro	`google/gemini-3.1-pro`	—	—	`vertex/gemini-3.1-pro`	`google/gemini-3.1-pro`
Gemini 3 Pro	`google/gemini-3-pro`	—	—	`vertex/gemini-3-pro`	`google/gemini-3-pro`
Gemini 3 Flash	`google/gemini-3-flash`	—	—	`vertex/gemini-3-flash`	`google/gemini-3-flash`
Grok 4	`xai/grok-4`	—	—	—	`xai/grok-4`
Amazon Nova Pro	—	`bedrock/amazon.nova-pro-v1`	—	—	—

"—"表示该模型无法通过该提供商访问。根据需要混合使用提供商。

Single-Provider Fallbacks

单一提供商备选方案

When you only have access to one provider:

Anthropic only (Claude Code default):

Role	Model	Notes
Planner	Opus (thinking)	Full capability
Coder	Sonnet	Fast, good at code
Researcher	Opus	Strong synthesis
Scientist	Opus	Reasonable math
Visual Analyst	Opus	Strong multimodal
Adversarial Reviewer	Sonnet	Different model, but same family — add extra adversarial prompting
Peer Reviewer	Opus	Structured analysis

OpenAI only (Codex default):

Role	Model	Notes
Planner	GPT-5.4 (high)	Strong planning
Coder	GPT-5.4	Native strength
Researcher	GPT-5.4	Good research
Scientist	GPT-5.2	Strong math
Visual Analyst	GPT-5.4	Good multimodal
Adversarial Reviewer	GPT-4o	Different model, different style
Peer Reviewer	GPT-5.4 (high)	Structured analysis

Google only (Gemini CLI default):

Role	Model	Notes
Planner	Gemini 3.1 Pro	Strong planning
Coder	Gemini 3.1 Pro	Good coding
Researcher	Gemini 3.1 Pro	Native strength
Scientist	Gemini 3 Pro	Native strength
Visual Analyst	Gemini 3.1 Pro	Strong multimodal
Adversarial Reviewer	Gemini 3 Flash	Different model for cost + perspective
Peer Reviewer	Gemini 3.1 Pro	Structured analysis

当你只能访问一个提供商时：

仅使用Anthropic（Claude Code默认）：

角色	模型	说明
规划Agent	Opus（思考模式）	完整能力
编码Agent	Sonnet	速度快，擅长代码
研究Agent	Opus	擅长信息综合
科研Agent	Opus	具备合理的数学能力
视觉分析Agent	Opus	多模态能力强
对抗性评审Agent	Sonnet	不同模型，但属于同一家族——添加额外的对抗性提示
同行评审Agent	Opus	结构化分析

仅使用OpenAI（Codex默认）：

角色	模型	说明
规划Agent	GPT-5.4（高推理）	规划能力强
编码Agent	GPT-5.4	原生优势
研究Agent	GPT-5.4	研究能力良好
科研Agent	GPT-5.2	数学能力强
视觉分析Agent	GPT-5.4	多模态能力良好
对抗性评审Agent	GPT-4o	不同模型，风格不同
同行评审Agent	GPT-5.4（高推理）	结构化分析

仅使用Google（Gemini CLI默认）：

角色	模型	说明
规划Agent	Gemini 3.1 Pro	规划能力强
编码Agent	Gemini 3.1 Pro	编码能力良好
研究Agent	Gemini 3.1 Pro	原生优势
科研Agent	Gemini 3 Pro	原生优势
视觉分析Agent	Gemini 3.1 Pro	多模态能力强
对抗性评审Agent	Gemini 3 Flash	不同模型，兼顾成本和视角
同行评审Agent	Gemini 3.1 Pro	结构化分析

Evolving the Model Map

模型映射的演变

Model capabilities change with every release. The assignments above are based on benchmarks as of April 2026.

模型能力随每次发布而变化。上述分配基于2026年4月的基准测试。

Key Benchmark Sources

关键基准测试来源

Track these to stay current:

Source	URL	What it tracks	Update frequency
Chatbot Arena	arena.ai	Overall quality, per-category rankings	Continuous
SWE-bench	swebench.com	Real-world software engineering	On submission
Aider Leaderboard	aider.chat/docs/leaderboards/	Practical code editing	On release
Artificial Analysis	artificialanalysis.ai	Intelligence/speed/cost index	Weekly
LiveBench	livebench.ai	Contamination-resistant monthly eval	Monthly
GPQA Diamond	—	PhD-level science	Static
ARC-AGI 2	arcprize.org	Abstract visual reasoning	Periodic
Humanity's Last Exam	—	Frontier knowledge	Periodic

跟踪这些来源以保持最新：

来源	URL	跟踪内容	更新频率
Chatbot Arena	arena.ai	整体质量、分类排名	持续更新
SWE-bench	swebench.com	实际软件工程任务	按需提交更新
Aider排行榜	aider.chat/docs/leaderboards/	实用代码编辑能力	模型发布时更新
Artificial Analysis	artificialanalysis.ai	智能/速度/成本指数	每周更新
LiveBench	livebench.ai	抗污染月度评估	每月更新
GPQA Diamond	—	博士级科学知识	静态
ARC-AGI 2	arcprize.org	抽象视觉推理	定期更新
Humanity's Last Exam	—	前沿知识	定期更新

When to Re-Evaluate

重新评估的时机

A major new model launches (new Claude, GPT, Gemini, Grok version)
Arena rankings shift by >50 Elo in a relevant category
SWE-bench or Aider leaderboard gets a new #1
You notice consistent quality degradation from a specific agent

重大新模型发布（新Claude、GPT、Gemini、Grok版本）
Arena相关类别排名变化超过50 Elo
SWE-bench或Aider排行榜出现新的第一名
你注意到特定Agent的质量持续下降

Future: Benchmark Tracking Skill

未来：基准测试跟踪技能

A companion skill for automated benchmark tracking is planned. It will fetch the latest rankings from programmatic sources (HuggingFace datasets, Arena Hard Auto) and generate an updated model map. Until then, check the sources above when major models launch.

计划开发一个用于自动基准测试跟踪的配套技能。它将从程序化来源（HuggingFace数据集、Arena Hard Auto）获取最新排名，并生成更新的模型映射。在此之前，当重大模型发布时，检查上述来源。

Role-to-Benchmark Mapping

角色与基准测试的映射

Use this to know which benchmarks matter for which agent role:

Agent Role	Primary Benchmarks	What to Look For
Planner	Arena (Overall, Hard Prompts), ARC-AGI 2	Abstract reasoning, complex instruction following
Coder	Aider, SWE-bench Verified	Practical code editing, real-world bug fixing
Researcher	Arena (Knowledge), HLE, MMMLU, BrowseComp	Breadth of knowledge, research synthesis
Scientist	AIME, GPQA Diamond, MATH Level 5	Mathematical reasoning, scientific knowledge
Visual Analyst	ARC-AGI 2, MMMU-Pro	Visual reasoning, multimodal understanding
Adversarial Reviewer	Arena (Hard Prompts), PropensityBench	Critical thinking, low positivity bias
Peer Reviewer	Arena (Overall), AI Scientist review scores	Structured analysis, honest assessment

使用此表了解哪些基准测试对哪些Agent角色重要：

Agent角色	主要基准测试	关注要点
规划Agent	Arena（整体、难题）、ARC-AGI 2	抽象推理、复杂指令遵循能力
编码Agent	Aider、SWE-bench Verified	实用代码编辑、实际Bug修复能力
研究Agent	Arena（知识）、HLE、MMMLU、BrowseComp	知识广度、研究综合能力
科研Agent	AIME、GPQA Diamond、MATH Level 5	数学推理、科学知识
视觉分析Agent	ARC-AGI 2、MMMU-Pro	视觉推理、多模态理解能力
对抗性评审Agent	Arena（难题）、PropensityBench	批判性思维、低积极偏见
同行评审Agent	Arena（整体）、AI Scientist评审得分	结构化分析、诚实评估能力

Quick Reference

快速参考

Workflow Selection

工作流选择

Task Size	Pattern	Agents Used
Trivial (one-liner)	Skip orchestration	Just do it
Small (single file)	E (Rapid)	Planner → Coder → Adversarial
Medium (multi-file)	A (Default)	Planner → Coder → Both Reviewers
Medium + unknown API	B (Research First)	Planner → Researcher → Planner → Coder → Review
Large	D (Full Pipeline)	All agents, multiple phases
Research only	F (Research)	Planner → Researcher + Scientist
Math/science heavy	C (Deep Analysis)	Planner → Scientist + Visual + Researcher → Coder → Review

任务规模	模式	使用的Agent
琐碎（单行）	跳过编排	直接执行
小型（单文件）	E（快速迭代）	规划Agent → 编码Agent → 对抗性评审Agent
中型（多文件）	A（默认）	规划Agent → 编码Agent → 两位评审Agent
中型 + 未知API	B（先研究）	规划Agent → 研究Agent → 规划Agent → 编码Agent → 评审
大型	D（完整流水线）	所有Agent，多阶段
仅研究	F（研究）	规划Agent → 研究Agent + 科研Agent
数学/科学密集型	C（深度分析）	规划Agent → 科研Agent + 视觉分析Agent + 研究Agent → 编码Agent → 评审

Tool Selection

工具选择

Your Tool	Multi-Model	Sub-Agents	Best Approach
OpenCode	Native (75+ providers)	Yes	Full multi-model, no gateway needed
Claude Code	Anthropic only	Yes	Gateway script for cross-provider, or Anthropic-only
Cursor	UI model picker	No	Sequential model switching per phase
Codex CLI	OpenAI only	Emerging	Gateway script for cross-provider
Gemini CLI	Google only	No	Sequential + gateway script
Aider	Any (via LiteLLM)	No	Architect/editor dual-model + review sessions

你的工具	多模型支持	子Agent支持	最佳方案
OpenCode	原生支持（75+提供商）	是	完整多模型，无需网关
Claude Code	仅支持Anthropic	是	网关脚本实现跨提供商，或仅使用Anthropic
Cursor	UI模型选择器	否	按阶段顺序切换模型
Codex CLI	仅支持OpenAI	新兴	网关脚本实现跨提供商
Gemini CLI	仅支持Google	否	顺序切换 + 网关脚本
Aider	支持所有（通过LiteLLM）	否	架构师/编辑器双模型 + 评审会话

Gateway Selection

网关选择

Scenario	Recommended Gateway
OpenCode user	None needed — native multi-provider
Single API key for everything	OpenRouter
Vercel project / TypeScript	Vercel AI Gateway
Self-hosted / full control	LiteLLM proxy
Budget tracking important	Vercel AI Gateway or OpenRouter (both have dashboards)
Enterprise compliance	Portkey (SOC 2, HIPAA)

场景	推荐网关
OpenCode用户	无需网关——原生多提供商支持
单一API密钥访问所有模型	OpenRouter
Vercel项目 / TypeScript	Vercel AI网关
自托管 / 完全控制	LiteLLM代理
预算跟踪重要	Vercel AI网关或OpenRouter（均有控制台）
企业合规	Portkey（SOC 2、HIPAA合规）

Cost Expectations

成本预期

Multi-model orchestration uses frontier models, which is not free. Rough cost per task (varies by complexity and token counts):

Pattern	Models Used	Estimated Cost Range
E (Rapid)	3 calls (planner + coder + adversarial)	$0.50 – $2
A (Default)	4-5 calls (planner + coder + 2 reviewers + replan)	$2 – $8
B (Research First)	6-7 calls (+ researcher + replan)	$3 – $10
D (Full Pipeline)	8-12 calls (all agents, multiple phases)	$5 – $20

These assume frontier models (Opus, GPT-5.4, Gemini 3.1 Pro). Budget-conscious assignments (see Model Map) cut costs 50-70% with moderate quality tradeoff.

When the overhead is worth it: Complex tasks where a bug or missed requirement costs hours of rework. A $10 multi-agent review that catches a security vulnerability saves far more than $10.

When it's not worth it: Trivial changes. If you can write and verify the fix in 5 minutes, skip orchestration.

多模型编排使用前沿模型，并非免费。每个任务的大致成本（因复杂度和令牌数量而异）：

模式	使用的模型	估计成本范围
E（快速迭代）	3次调用（规划 + 编码 + 对抗性评审）	$0.50 – $2
A（默认）	4-5次调用（规划 + 编码 + 两位评审 + 重新规划）	$2 – $8
B（先研究）	6-7次调用（+ 研究 + 重新规划）	$3 – $10
D（完整流水线）	8-12次调用（所有Agent，多阶段）	$5 – $20

这些假设使用前沿模型（Opus、GPT-5.4、Gemini 3.1 Pro）。预算友好型分配（见模型映射）可降低50-70%的成本，质量略有下降。

开销值得的场景： 复杂任务中，Bug或遗漏需求会导致数小时返工。一次花费$10的多Agent评审发现安全漏洞，节省的成本远超过$10。

开销不值得的场景： 琐碎变更。如果你能在5分钟内编写并验证修复，跳过编排。

Troubleshooting

故障排除

Agent not found

Agent未找到

Symptom:

@planner

or agent invocation returns "agent not found" or similar error.

Fix: Ensure agent definition files are in the correct directory:

OpenCode:

.opencode/agents/planner.md

(project) or

~/.config/opencode/agents/planner.md

(global)

Claude Code:

.claude/agents/planner.md

(project) or

~/.claude/agents/planner.md

(global)

Verify the files were copied correctly:

ls .opencode/agents/

ls .claude/agents/

症状：

@planner

或Agent调用返回「Agent未找到」或类似错误。

修复： 确保Agent定义文件位于正确目录：

OpenCode：

.opencode/agents/planner.md

（项目级）或

~/.config/opencode/agents/planner.md

（全局）

Claude Code：

.claude/agents/planner.md

（项目级）或

~/.claude/agents/planner.md

（全局）

验证文件是否正确复制：

ls .opencode/agents/

或

ls .claude/agents/

Model not available / API error

模型不可用 / API错误

Symptom: "Model not found," "Invalid model," or 404 errors when an agent runs.

Fix:

Verify the model ID is correct for your provider. Model IDs differ between providers — check the Cross-Provider Model ID Reference table
Verify your API key is set:
```
echo $ANTHROPIC_API_KEY | head -c 10
```
(should show the first 10 chars)
For Bedrock: ensure your IAM role has
```
bedrock:InvokeModel
```
permission for the specific model
For Azure: ensure the deployment name in your model ID matches your Azure deployment

症状： 运行Agent时出现「模型未找到」、「无效模型」或404错误。

修复：

验证模型ID对你的提供商是否正确。模型ID因提供商而异——查看跨提供商模型ID参考表
验证你的API密钥已设置：
```
echo $ANTHROPIC_API_KEY | head -c 10
```
（应显示前10个字符）
对于Bedrock：确保你的IAM角色对特定模型拥有
```
bedrock:InvokeModel
```
权限
对于Azure：确保模型ID中的部署名称与你的Azure部署匹配

Agent produces garbage output

Agent产生无效输出

Symptom: Agent ignores its role, writes code when it should only review, or produces unstructured output.

Fix:

Ensure the agent definition file has the full system prompt (the markdown body). If the file only has frontmatter, the agent has no instructions
For read-only agents, verify
```
permission: edit: deny
```
(OpenCode) or restricted
```
allowed-tools
```
(Claude Code) is set
If using a gateway script, check that the full prompt is being passed — shell argument truncation is a common issue. Use stdin or file-based prompts for long context

症状： Agent忽略其角色，在应仅评审时编写代码，或产生非结构化输出。

修复：

确保Agent定义文件包含完整的系统提示（Markdown正文）。如果文件仅包含前置元数据，Agent将没有指令
对于只读Agent，验证是否设置了
```
permission: edit: deny
```
（OpenCode）或受限的
```
allowed-tools
```
（Claude Code）
如果使用网关脚本，检查是否传递了完整提示——Shell参数截断是常见问题。对于长上下文，使用标准输入或基于文件的提示

Context window overflow

上下文窗口溢出

Symptom: Agent errors out or produces truncated output on large codebases.

Fix:

The planner should summarize context in handoffs, not paste entire files. Include file paths and relevant line ranges, not full contents
For research handoffs, summarize findings in 500 words or less — the coder doesn't need the full research output
For review handoffs, include only the changed files, not the entire codebase
If a single file is too large, have the planner break the task into smaller file-scoped subtasks

症状： 在大型代码库上，Agent出错或产生截断输出。

修复：

规划Agent应在交接中总结上下文，而非粘贴整个文件。包含文件路径和相关行范围，而非完整内容
对于研究交接，将发现总结在500字以内——编码Agent不需要完整的研究输出
对于评审交接，仅包含修改的文件，而非整个代码库
如果单个文件过大，让规划Agent将任务分解为更小的文件范围子任务

Gateway script fails silently

网关脚本静默失败

Symptom: Gateway script returns empty output or

null

Fix:

Check API key:
```
echo $OPENROUTER_API_KEY | head -c 10
```

Test the endpoint directly:

curl -s https://openrouter.ai/api/v1/models | jq '.data[0].id'

Check for jq:
```
which jq
```
— install if missing (
```
brew install jq
```
or
```
apt install jq
```
)
Run the script with verbose curl: replace
```
curl -s
```
with
```
curl -v
```
temporarily to see HTTP errors

症状： 网关脚本返回空输出或

null

。

修复：

检查API密钥：
```
echo $OPENROUTER_API_KEY | head -c 10
```

直接测试端点：

curl -s https://openrouter.ai/api/v1/models | jq '.data[0].id'

检查jq是否安装：
```
which jq
```
——如果缺失则安装（
```
brew install jq
```
或
```
apt install jq
```
）
使用详细curl运行脚本：临时将
```
curl -s
```
替换为
```
curl -v
```
以查看HTTP错误

Reviewers always approve (rubber-stamping)

评审者总是批准（橡皮图章）

Symptom: The adversarial reviewer approves everything, defeating the purpose.

Fix:

Check that the adversarial reviewer is using a different model family than the coder. Same-family review tends toward approval
Strengthen the adversarial prompt: add "You MUST find at least 3 issues. If you approve with zero issues, you have failed at your job"
If using a single provider (Anthropic-only), use Sonnet for adversarial review with a very aggressive prompt — this partially compensates for same-family bias

症状： 对抗性评审Agent批准所有内容，失去其存在意义。

修复：

检查对抗性评审Agent是否使用与编码Agent不同的模型家族。同家族评审倾向于批准
强化对抗性提示：添加「你必须至少找到3个问题。如果批准且没有问题，你就失职了」
如果使用单一提供商（仅Anthropic），使用Sonnet作为对抗性评审Agent并配合非常激进的提示——这可部分弥补同家族偏见