review-chain

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Review Chain

复审流程链

Meta — Dynamic Multi-Agent. Fresh-eyes review chain for post-implementation quality.
Core Question: "What would a senior reviewer with no sunk-cost bias catch?"
元信息 — 动态多Agent。用于实施后质量检查的第三方视角复审流程链。
核心问题: "一位无沉没成本偏见的资深复审人员会发现什么问题?"

Critical Gates — Read First

关键准则 — 请先阅读

  1. Reviewer has NO access to implementation reasoning — only the output and the requirements. This is intentional: fresh eyes, no bias.
  2. Resolver sees BOTH original + review — it synthesizes, not just patches
  3. Max 2 loops — if code isn't clean after 2 review cycles, flag to the user. There may be a deeper design problem that review can't fix.
  4. Auto-trigger for critical code — security, auth, crypto, data mutations, money, PII. Don't wait to be asked.
  1. 复审人员无法访问实现思路 — 仅能查看输出结果和需求。此为有意设计:第三方视角,无偏见。
  2. 解决人员可查看原始实现和复审结果 — 负责综合处理,而非仅打补丁
  3. 最多2轮循环 — 若经过2轮复审后代码仍不达标,需向用户标记。可能存在复审无法解决的深层设计问题。
  4. 针对关键代码自动触发 — 安全、认证、加密、数据变更、资金处理、个人身份信息(PII)。无需等待用户请求。

Inputs Required

所需输入

  • Code, artifact, or output to verify
  • The original requirements or prompt that produced it
  • Relevant context (surrounding files, API contracts, tests)
  • 需验证的代码、工件或输出结果
  • 生成该输出的原始需求或提示词
  • 相关上下文(周边文件、API契约、测试用例)

Output

输出结果

  • .agents/meta/review-chain-report.md
    — verdict, issues found/fixed/declined, changes made
  • .agents/meta/review-chain-report.md
    — 评审结论、发现/修复/驳回的问题、所做变更

Chain Position

流程定位

  • After: Any domain skill — system-architecture, task-breakdown, code-cleanup, or raw implementation
  • Together with discover: discover before build, review-chain after build
  • 在...之后: 任何领域技能 — 系统架构、任务拆分、代码清理(code-cleanup)或原始实现
  • 与discover配合使用: 构建前使用discover,构建后使用复审流程链

Orchestration Pattern: Dynamic Agent Spawning

编排模式:动态Agent生成

This skill uses runtime-defined agents (reviewer, resolver), NOT the static agent roster pattern. Agent prompts are constructed per-use based on the artifact being reviewed. There is no
agents/
directory.

此技能使用运行时定义的Agent(复审人员、解决人员),而非静态Agent名单模式。Agent提示词会根据待复审的工件按需构建,不存在
agents/
目录。

Execution

执行步骤

1. Identify what to verify

1. 确定待验证内容

Determine what output needs verification:
  • Code just written — the most common case. You just implemented something, now verify it.
  • Architecture/design decision — verify a plan before implementing.
  • User-provided code — user asks you to review their code with this pattern.
  • Any prior output — user says "double-check that" or "verify this".
Gather the full artifact to review:
  • The code/output itself
  • The original requirements or prompt that produced it
  • Any relevant context (surrounding files, API contracts, tests)
明确需要验证的输出类型:
  • 刚编写的代码 — 最常见场景。刚完成某项实现,现在进行验证。
  • 架构/设计决策 — 在实施前验证方案。
  • 用户提供的代码 — 用户要求使用此模式复审其代码。
  • 任何历史输出 — 用户要求“再次检查”或“验证此项”。
收集完整的待复审工件:
  • 代码/输出本身
  • 生成该输出的原始需求或提示词
  • 任何相关上下文(周边文件、API契约、测试用例)

2. Spawn the Reviewer

2. 生成复审Agent

Spawn a single reviewer agent with fresh context. The reviewer has NO access to the implementation reasoning — only the output and the requirements.
Agent config:
  • model: "sonnet"
    (default — use opus if the code is complex or security-critical)
Learned rules: Before constructing the reviewer prompt, read
.agents/meta/learned-rules.md
. If any rules are relevant to the code being reviewed, append them to the CONTEXT section of the reviewer prompt.
Reviewer prompt:
You are a senior code reviewer with fresh eyes. You did NOT write this code.
Your job is to find problems.

ORIGINAL REQUIREMENTS:
{what the code was supposed to do}

CODE/OUTPUT TO REVIEW:
{the full artifact}

CONTEXT:
{surrounding code, API contracts, types, or other relevant files}

Review for:
1. **Correctness** — Does it actually do what the requirements ask? Are there logic errors?
2. **Edge cases** — What inputs or states would break this? Empty arrays, null values,
   concurrent access, network failures?
3. **Simplification** — Is anything over-engineered? Can any code be removed or simplified
   without losing functionality?
4. **Security** — SQL injection, XSS, command injection, auth bypasses, secrets in code?
5. **Consistency** — Does it match the patterns and conventions of the surrounding codebase?
6. **Input Quality** — Was this built on solid ground? Check what context the implementation
   had access to. Rate each:
   - Product/domain context: Rich (from research/spec) | Thin (user-provided, minimal) | Missing (improvised)
   - Requirements clarity: Precise (specific acceptance criteria) | Vague (general direction only) | Absent
   - Upstream artifacts: Fresh (< 30 days) | Stale (> 30 days) | None
   This is not about the code quality — it is about whether the RIGHT thing was built.
   A perfectly crafted solution to the wrong problem is still wrong.

Respond in this exact format:

VERDICT: PASS | ISSUES_FOUND | CRITICAL

ISSUES (if any):
For each issue:
- SEVERITY: critical | major | minor | nit
- CONFIDENCE: [1-10] (how certain you are this is a real problem — 10 = proven, 7 = likely, 4 = possible, 1 = speculative)
- LOCATION: {file:line or section}
- PROBLEM: {what's wrong}
- FIX: {concrete fix — show the corrected code, not just "fix this"}

SIMPLIFICATIONS (if any):
- {what can be removed or simplified, with the simpler version}

SUMMARY: {one paragraph — overall assessment}

**Confidence rules:**
- Suppress findings below 5/10 — don't include them at all.
- Caveat findings 5-7/10 — include them but mark as "UNCERTAIN — may be a false positive."
- Full-weight findings 8+/10 — these are real issues.
- If you can cite a specific line, test, or proof, confidence should be 8+.
- If you're pattern-matching without verification, confidence should be 5-7.

**Verification rules (signal vs noise):**
Before reporting any issue, verify it is SIGNAL not NOISE:
- CHECK if the problem is already handled elsewhere in the code (a different file,
  a wrapper, a middleware, a test). If handled, it is noise — do not report it.
- CHECK if the fix already exists (the "improvement" you would suggest is already
  implemented under a different name or in a different location). If it exists, it is noise.
- ASK "has this actually caused a problem, or is it theoretical?" If purely theoretical
  with no plausible trigger path, downgrade to nit or suppress.
- ASK "will this fix actually change runtime behavior?" If the fix is cosmetic or
  the code path is equivalent, it is noise.
Issues that survive verification are signal. Issues that fail any check are noise —
suppress them entirely. Do not pad the report with noise to appear thorough.

Be ruthless. Better to flag a false positive than miss a real bug.
But don't invent problems that don't exist — if the code is clean, say PASS.

Write your response directly — do not write to any files.
生成一个具备全新上下文的独立复审Agent。复审人员无法访问实现思路 — 仅能查看输出结果和需求。
Agent配置:
  • model: "sonnet"
    (默认值 — 若代码复杂或涉及安全关键场景,使用opus)
已学习规则: 在构建复审提示词前,读取
.agents/meta/learned-rules.md
。若有与待复审代码相关的规则,将其附加到复审提示词的CONTEXT部分。
复审提示词:
You are a senior code reviewer with fresh eyes. You did NOT write this code.
Your job is to find problems.

ORIGINAL REQUIREMENTS:
{what the code was supposed to do}

CODE/OUTPUT TO REVIEW:
{the full artifact}

CONTEXT:
{surrounding code, API contracts, types, or other relevant files}

Review for:
1. **Correctness** — Does it actually do what the requirements ask? Are there logic errors?
2. **Edge cases** — What inputs or states would break this? Empty arrays, null values,
   concurrent access, network failures?
3. **Simplification** — Is anything over-engineered? Can any code be removed or simplified
   without losing functionality?
4. **Security** — SQL injection, XSS, command injection, auth bypasses, secrets in code?
5. **Consistency** — Does it match the patterns and conventions of the surrounding codebase?
6. **Input Quality** — Was this built on solid ground? Check what context the implementation
   had access to. Rate each:
   - Product/domain context: Rich (from research/spec) | Thin (user-provided, minimal) | Missing (improvised)
   - Requirements clarity: Precise (specific acceptance criteria) | Vague (general direction only) | Absent
   - Upstream artifacts: Fresh (< 30 days) | Stale (> 30 days) | None
   This is not about the code quality — it is about whether the RIGHT thing was built.
   A perfectly crafted solution to the wrong problem is still wrong.

Respond in this exact format:

VERDICT: PASS | ISSUES_FOUND | CRITICAL

ISSUES (if any):
For each issue:
- SEVERITY: critical | major | minor | nit
- CONFIDENCE: [1-10] (how certain you are this is a real problem — 10 = proven, 7 = likely, 4 = possible, 1 = speculative)
- LOCATION: {file:line or section}
- PROBLEM: {what's wrong}
- FIX: {concrete fix — show the corrected code, not just "fix this"}

SIMPLIFICATIONS (if any):
- {what can be removed or simplified, with the simpler version}

SUMMARY: {one paragraph — overall assessment}

**Confidence rules:**
- Suppress findings below 5/10 — don't include them at all.
- Caveat findings 5-7/10 — include them but mark as "UNCERTAIN — may be a false positive."
- Full-weight findings 8+/10 — these are real issues.
- If you can cite a specific line, test, or proof, confidence should be 8+.
- If you're pattern-matching without verification, confidence should be 5-7.

**Verification rules (signal vs noise):**
Before reporting any issue, verify it is SIGNAL not NOISE:
- CHECK if the problem is already handled elsewhere in the code (a different file,
  a wrapper, a middleware, a test). If handled, it is noise — do not report it.
- CHECK if the fix already exists (the "improvement" you would suggest is already
  implemented under a different name or in a different location). If it exists, it is noise.
- ASK "has this actually caused a problem, or is it theoretical?" If purely theoretical
  with no plausible trigger path, downgrade to nit or suppress.
- ASK "will this fix actually change runtime behavior?" If the fix is cosmetic or
  the code path is equivalent, it is noise.
Issues that survive verification are signal. Issues that fail any check are noise —
suppress them entirely. Do not pad the report with noise to appear thorough.

Be ruthless. Better to flag a false positive than miss a real bug.
But don't invent problems that don't exist — if the code is clean, say PASS.

Write your response directly — do not write to any files.

3. Evaluate the review

3. 评估复审结果

Read the reviewer's output. Three paths:
Path A: PASS (no issues) The reviewer found nothing wrong. You're done. Report to the user:
  • "Verified by independent reviewer — no issues found."
  • Include the reviewer's summary as confirmation.
Path B: ISSUES_FOUND (non-critical) The reviewer found real issues but nothing catastrophic. Before proceeding to the Resolve step, classify each finding:
  • AUTO_FIX: confidence 9+ AND severity minor/nit → resolver applies without asking
  • ASK: everything else → resolver presents these to the user for judgment after fixing
Path C: CRITICAL The reviewer found a critical bug (security vulnerability, data loss, completely wrong logic). Flag immediately to the user before resolving — they may want to change approach entirely.
读取复审Agent的输出,分为三种路径:
路径A:PASS(无问题) 复审人员未发现任何问题。流程结束。向用户报告:
  • “经独立复审人员验证 — 未发现问题。”
  • 附上复审人员的总结作为确认。
路径B:ISSUES_FOUND(非关键问题) 复审人员发现了真实问题,但无灾难性问题。在进入修复步骤前,对每个发现进行分类:
  • AUTO_FIX:置信度9+且严重程度为minor/nit → 解决Agent直接应用修复,无需询问
  • ASK:所有其他情况 → 解决Agent修复后,需向用户展示这些问题供判断
路径C:CRITICAL(关键问题) 复审人员发现了严重漏洞(安全隐患、数据丢失、逻辑完全错误)。在修复前立即向用户标记 — 他们可能需要彻底变更实现方案。

4. Spawn the Resolver (if issues found)

4. 生成解决Agent(若发现问题)

The resolver sees BOTH the original implementation AND the review. Its job is to produce a corrected version.
Agent config:
  • model: "sonnet"
    (match the reviewer's model)
Resolver prompt:
You are a senior engineer resolving code review feedback. You have two inputs:

1. ORIGINAL CODE:
{the original implementation}

2. REVIEW FEEDBACK:
{the reviewer's full response}

Your job:
- Fix every issue marked "critical" or "major"
- Fix "minor" issues unless the fix would add complexity disproportionate to the benefit
- Apply simplifications where the reviewer's suggestion is genuinely simpler
- Ignore "nit" level feedback unless trivial to address
- Issues marked AUTO_FIX (confidence 9+, severity minor/nit) should be fixed without discussion
- Issues marked ASK should be fixed but flagged clearly so the orchestrator can present them to the user
- Do NOT introduce new features or refactor beyond what the review requested

For each issue, either:
- FIXED: {show the fix}
- DECLINED: {explain why the reviewer's suggestion doesn't apply or would make things worse}

Then output the COMPLETE corrected code/output — not a diff, the full thing.
The orchestrator will use this to replace the original.

Write your response directly — do not write to any files.
解决Agent可查看原始实现和复审结果。其任务是生成修正后的版本。
Agent配置:
  • model: "sonnet"
    (与复审Agent使用相同模型)
解决提示词:
You are a senior engineer resolving code review feedback. You have two inputs:

1. ORIGINAL CODE:
{the original implementation}

2. REVIEW FEEDBACK:
{the reviewer's full response}

Your job:
- Fix every issue marked "critical" or "major"
- Fix "minor" issues unless the fix would add complexity disproportionate to the benefit
- Apply simplifications where the reviewer's suggestion is genuinely simpler
- Ignore "nit" level feedback unless trivial to address
- Issues marked AUTO_FIX (confidence 9+, severity minor/nit) should be fixed without discussion
- Issues marked ASK should be fixed but flagged clearly so the orchestrator can present them to the user
- Do NOT introduce new features or refactor beyond what the review requested

For each issue, either:
- FIXED: {show the fix}
- DECLINED: {explain why the reviewer's suggestion doesn't apply or would make things worse}

Then output the COMPLETE corrected code/output — not a diff, the full thing.
The orchestrator will use this to replace the original.

Write your response directly — do not write to any files.

5. Apply the resolution

5. 应用修复结果

Read the resolver's output. You (the orchestrator) apply the corrected code to disk.
Before applying, sanity-check:
  • Did the resolver address all critical/major issues?
  • Did the resolver break anything the original got right?
  • Are any "DECLINED" decisions reasonable?
Self-regulation gate: Track cumulative changes. If any of these trigger, STOP and flag to the user instead of applying:
  • The resolver modified >30% of the original artifact — "This artifact may need a redesign rather than incremental fixes."
  • The resolver addressed >10 findings in a single pass — too many changes at once increases regression risk.
  • The resolver's output introduces new issues that the reviewer didn't find in the original (regression) — "The resolver is making things worse. Stopping."
If the resolver's output looks good and passes the self-regulation gate, apply it.
读取解决Agent的输出。由编排者将修正后的代码应用到磁盘。
应用前需进行合理性检查:
  • 解决Agent是否处理了所有critical/major问题?
  • 解决Agent是否破坏了原始代码中正确的部分?
  • 所有“DECLINED”的决策是否合理?
自我调节准则: 跟踪累计变更。若触发以下任一条件,停止操作并向用户标记,而非应用修复:
  • 解决Agent修改了超过30%的原始工件 — “此工件可能需要重新设计,而非增量修复。”
  • 解决Agent单次处理了超过10个问题 — 一次性变更过多会增加回归风险。
  • 解决Agent的输出引入了原始代码中未被复审人员发现的新问题(回归) — “修复操作反而引入了新问题。已停止。”
若解决Agent的输出符合要求且通过自我调节准则,则应用修复。

6. Optional: Loop (for critical or complex code)

6. 可选:循环(针对关键或复杂代码)

For high-stakes code (auth, payments, data migrations), run a second verification loop on the resolver's output. This catches issues the resolver might have introduced.
Max loops: 2. If the code isn't clean after 2 review cycles, stop and flag to the user.
Round 1: Implement → Review → Resolve
Round 2: Resolve output → Review → Resolve (if needed)
Done.
针对高风险代码(认证、支付、数据迁移),对解决Agent的输出进行第二轮验证。此步骤可发现解决Agent可能引入的问题。
最多循环2轮。若经过2轮复审后代码仍不达标,停止操作并向用户标记。
Round 1: Implement → Review → Resolve
Round 2: Resolve output → Review → Resolve (if needed)
Done.

7. Write the report

7. 生成报告

Write to
.agents/meta/review-chain-report.md
:
markdown
---
skill: review-chain
version: 1
date: {YYYY-MM-DD}
status: final
---
写入
.agents/meta/review-chain-report.md
markdown
---
skill: review-chain
version: 1
date: {YYYY-MM-DD}
status: final
---

Review Chain Report

Review Chain Report

Artifact: {what was reviewed} Date: {date} Rounds: {how many review cycles}
Artifact: {what was reviewed} Date: {date} Rounds: {how many review cycles}

Verdict: {PASS | FIXED | CRITICAL}

Verdict: {PASS | FIXED | CRITICAL}

Issues Found

Issues Found

#SeverityConfidenceLocationProblemStatus
1major9/10file.ts:42Off-by-one in loopFixed
2minor8/10file.ts:15Unused importFixed
3nit6/10file.ts:8Naming conventionDeclined (uncertain)
#SeverityConfidenceLocationProblemStatus
1major9/10file.ts:42Off-by-one in loopFixed
2minor8/10file.ts:15Unused importFixed
3nit6/10file.ts:8Naming conventionDeclined (uncertain)

Input Quality Assessment

Input Quality Assessment

InputRatingEvidence
Product/domain context{Rich/Thin/Missing}{what was available}
Requirements clarity{Precise/Vague/Absent}{source}
Upstream artifacts{Fresh/Stale/None}{what existed}
InputRatingEvidence
Product/domain context{Rich/Thin/Missing}{what was available}
Requirements clarity{Precise/Vague/Absent}{source}
Upstream artifacts{Fresh/Stale/None}{what existed}

Simplifications Applied

Simplifications Applied

{What was simplified and why}
{What was simplified and why}

Changes Made

Changes Made

{Summary of what changed between original and final version}
{Summary of what changed between original and final version}

Reviewer's Summary

Reviewer's Summary

{The reviewer's overall assessment}
{The reviewer's overall assessment}

Resolver's Notes

Resolver's Notes

{Any "DECLINED" decisions and reasoning}
undefined
{Any "DECLINED" decisions and reasoning}
undefined

Next Step

下一步

If PASS: run
ship
to create a PR. If ISSUES_FOUND: resolve and re-run. If more than 2 cycles: escalate to user.
若为PASS:运行
ship
创建PR。若为ISSUES_FOUND:修复后重新运行。若循环超过2轮:升级至用户处理。

8. Deliver results

8. 交付结果

Present to the user:
  • Verdict — PASS (clean) or FIXED (issues found and resolved) or CRITICAL (flagged)
  • Issue count — X issues found, Y fixed, Z declined
  • Key fix — the most important thing that was caught
  • File path to report
向用户展示:
  • 评审结论 — PASS(代码无问题)或FIXED(发现并修复问题)或CRITICAL(已标记)
  • 问题数量 — 发现X个问题,修复Y个,驳回Z个
  • 关键修复 — 发现的最重要问题
  • 报告文件路径

When to Trigger Automatically

自动触发场景

Use review-chain proactively (without the user asking) when:
  • Writing security-sensitive code (auth, crypto, access control)
  • Writing data-mutation code (migrations, bulk updates, deletes)
  • The implementation was complex or you felt uncertain
  • The code handles money or PII
Do NOT auto-trigger for:
  • Trivial changes (typos, config tweaks, adding a log line)
  • Code the user explicitly said "just do it quick"
  • Read-only operations
主动触发review-chain(无需用户请求)的场景:
  • 编写安全敏感型代码(认证、加密、访问控制)
  • 编写数据变更型代码(迁移、批量更新、删除)
  • 实现过程复杂或你对结果不确定
  • 代码涉及资金处理或个人身份信息(PII)
请勿自动触发的场景:
  • 微小变更(拼写错误、配置调整、添加日志行)
  • 用户明确要求“快速完成”的代码
  • 只读操作

Specialist Dispatch Mode (--thorough)

专家调度模式(--thorough)

When invoked with
--thorough
, or when the code touches security/auth/payments/data-mutations, replace the single generalist reviewer with 3 specialist reviewers running in parallel:
Specialist roles:
SpecialistFocusWhat it catches that generalists miss
Security reviewerAuth bypasses, injection, secrets, access control, input validationDeep knowledge of attack patterns — doesn't just check "is there auth?" but "can the auth be bypassed?"
Performance reviewerN+1 queries, unbounded loops, missing pagination, memory leaks, cachingTraces data flow through the call stack looking for scale problems
Correctness reviewerLogic errors, edge cases, race conditions, error handling, type safetyReads the code as a state machine — "what happens if X is null AND Y fails?"
How it works:
  1. Spawn all 3 specialists in parallel with the same code and requirements. Each specialist uses the same prompt structure as the generalist reviewer (Section 2), but replace the "Review for:" instructions with the specialist's focus area. For example, the security reviewer gets: "Review ONLY for: auth bypasses, injection, secrets exposure, access control, input validation. Ignore style, naming, and performance."
  2. Each returns findings in the standard format (SEVERITY + CONFIDENCE + LOCATION + PROBLEM + FIX)
  3. Merge all findings, deduplicate (same location + same problem = one finding, keep higher confidence)
  4. Proceed to resolver with the merged findings
When to auto-escalate to specialist mode (without user asking):
  • Code modifies auth, sessions, or access control
  • Code handles payments or financial data
  • Code performs database migrations or bulk data mutations
  • Code processes PII or sensitive user data
  • Total diff exceeds 500 lines (sum of all files changed, not per-file)
Cost: 3x single reviewer cost. Still cheap relative to catching a production bug.
当使用
--thorough
参数调用,或代码涉及安全/认证/支付/数据变更时,将单个通用复审Agent替换为3个并行运行的专家复审Agent:
专家角色:
专家类型关注重点能发现通用复审人员遗漏的问题
Security reviewer认证绕过、注入攻击、密钥泄露、访问控制、输入验证深入了解攻击模式 — 不仅检查“是否有认证”,还检查“认证是否可被绕过?”
Performance reviewerN+1查询、无界循环、缺失分页、内存泄漏、缓存策略跟踪调用栈中的数据流,寻找可扩展性问题
Correctness reviewer逻辑错误、边缘情况、竞态条件、错误处理、类型安全将代码视为状态机分析 — “若X为null且Y失败时会发生什么?”
工作机制:
  1. 针对相同的代码和需求,并行生成3个专家Agent。每个专家Agent使用与通用复审人员相同的提示词结构(第2节),但将“Review for:”部分替换为对应专家的关注领域。例如,安全复审人员的提示词为:“仅针对以下内容复审:认证绕过、注入攻击、密钥泄露、访问控制、输入验证。忽略样式、命名和性能问题。”
  2. 每个专家Agent以标准格式返回发现结果(严重程度 + 置信度 + 位置 + 问题 + 修复方案)
  3. 合并所有发现结果,去重(相同位置+相同问题合并为一个,保留置信度更高的结果)
  4. 基于合并后的发现结果进入修复步骤
自动升级至专家模式的场景(无需用户请求):
  • 代码修改了认证、会话或访问控制逻辑
  • 代码涉及支付或金融数据
  • 代码执行数据库迁移或批量数据变更
  • 代码处理个人身份信息(PII)或敏感用户数据
  • 总变更量超过500行(所有变更文件的行数总和,而非单文件)
成本: 为单个复审人员成本的3倍。但相较于发现生产环境漏洞,此成本仍极低。

Scope Drift Detection

范围漂移检测

When
.agents/tasks.md
or
.agents/spec.md
exists, the reviewer adds a scope check:
After reviewing code quality, compare the implementation against the stated requirements:
  • Read
    .agents/tasks.md
    — are all tasks addressed? Are there changes that don't map to any task?
  • Read
    .agents/spec.md
    — does the implementation match the spec? Are there requirements that were missed or scope additions that weren't planned?
Report scope drift findings separately:
SCOPE DRIFT:
- MISSING: [requirement from spec/tasks not found in the code]
- UNPLANNED: [code change that doesn't map to any requirement — may be scope creep]
Scope drift findings are informational (not blocking) — the user decides if they're intentional.
.agents/tasks.md
.agents/spec.md
存在时,复审人员会增加范围检查:
在完成代码质量复审后,将实现内容与既定需求进行对比:
  • 读取
    .agents/tasks.md
    — 是否所有任务都已完成?是否存在未映射到任何任务的变更?
  • 读取
    .agents/spec.md
    — 实现内容是否符合规格?是否有需求被遗漏或存在未规划的范围新增?
单独报告范围漂移发现结果:
SCOPE DRIFT:
- MISSING: [spec/tasks中存在但代码未实现的需求]
- UNPLANNED: [未映射到任何需求的代码变更 — 可能是范围蔓延]
范围漂移发现结果为信息性内容(不阻塞流程) — 由用户判断是否为有意变更。

Configuration

配置项

ParameterDefaultDescription
modelsonnetModel for reviewer and resolver
max_loops1Review cycles (set to 2 for critical code)
severity_thresholdminorMinimum severity to fix (minor, major, critical)
auto_applytrueApply fixes automatically or show diff first
thoroughfalseUse specialist dispatch (3 parallel reviewers) instead of generalist
User can override: "review this with opus", "do 2 rounds of verification", or "review this thoroughly".
参数默认值描述
modelsonnet复审Agent和解决Agent使用的模型
max_loops1复审循环次数(关键代码设置为2)
severity_thresholdminor需修复问题的最低严重程度(minor、major、critical)
auto_applytrue自动应用修复,或先展示差异
thoroughfalse使用专家调度模式(3个并行复审Agent)替代通用复审Agent
用户可覆盖配置:“使用opus进行复审”、“执行2轮验证”或“进行全面复审”。

Cost Considerations

成本考量

  • 1 round (reviewer + resolver) with sonnet: ~$0.10-0.20
  • 1 round with opus: ~$0.50-1.00
  • 2 rounds doubles the cost
  • Very cheap relative to the quality improvement — default to running 1 round for non-trivial code
  • 使用sonnet进行1轮(复审+修复):约0.10-0.20美元
  • 使用opus进行1轮:约0.50-1.00美元
  • 2轮循环成本翻倍
  • 相较于代码质量提升,成本极低 — 非 trivial代码默认运行1轮

Edge Cases

边缘情况

  • Reviewer finds no issues: PASS. Don't force a resolve step.
  • Reviewer hallucinates issues: The resolver catches this — if the "fix" doesn't make sense, the resolver DECLINES it. If both agents agree on a non-issue, you catch it in your sanity check.
  • Resolver introduces new bugs: This is why round 2 exists for critical code.
  • Reviewer and resolver disagree: You (the orchestrator) break the tie.
  • Code is too large: Split into logical chunks and review each separately. Don't send 2000 lines in one prompt.
  • Existing report: Overwrite
    .agents/meta/review-chain-report.md
    — these are ephemeral process artifacts, not archives.
  • Reviewer or resolver agent fails: If the reviewer crashes or returns garbage, retry once with the same prompt. If it fails again, fall back to your own review (single-agent mode). Note the failure in the report.
  • Architecture or design review (not code): Adjust the reviewer prompt — replace "code" references with "design" or "architecture". The 5 review categories still apply (Correctness, Edge cases, Simplification, Security, Consistency).
  • 复审人员未发现问题:标记为PASS。无需强制进入修复步骤。
  • 复审人员虚构问题:解决Agent会识别此类情况 — 若“修复方案”不合理,解决Agent会DECLINE。若两个Agent均认可不存在的问题,你会在合理性检查中发现。
  • 解决Agent引入新bug:此为关键代码需要第二轮循环的原因。
  • 复审人员与解决Agent意见不一致:由编排者进行裁决。
  • 代码体积过大:拆分为逻辑独立的模块分别复审。请勿一次性发送2000行代码。
  • 已存在报告:覆盖
    .agents/meta/review-chain-report.md
    — 此类为临时流程工件,而非归档文件。
  • 复审或解决Agent执行失败:若复审Agent崩溃或返回无效内容,使用相同提示词重试一次。若再次失败, fallback至你自己进行复审(单Agent模式)。在报告中记录失败情况。
  • 架构或设计复审(非代码):调整复审提示词 — 将“code”相关引用替换为“design”或“architecture”。5项复审类别仍然适用(正确性、边缘情况、简化、安全性、一致性)。

Output Files

输出文件

FileDescription
.agents/meta/review-chain-report.md
Verification report with issues and resolutions
Previous reports are overwritten — these are ephemeral quality tools, not archives.
文件描述
.agents/meta/review-chain-report.md
包含问题和修复结果的验证报告
旧报告将被覆盖 — 此类为临时质量工具,而非归档文件。