review-loop

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Review Loop

审阅循环（Review Loop）

Iterative worker-reviewer cycle within a single session. You do the work, spawn a reviewer subagent to critique it, revise based on feedback, repeat until quality gate is met.

Core principle: First drafts are never final. Iterative critique produces better output than a single pass.

Platform note: This skill works best with agents that support subagent spawning. On platforms without that capability, simulate the reviewer step by opening a fresh chat context, pasting only the work product (no prior reasoning), and asking it to score 1-10 with specific feedback.

单次会话内的迭代式工作者-审阅者循环机制。你先完成工作任务，生成一个审阅者子代理（subagent）对成果进行批评，根据反馈修订内容，重复此过程直到达到质量门槛（quality gate）。

核心原则： 初稿永远不是终稿。迭代式批评比一次性产出的成果质量更高。

平台说明： 此技能在支持子代理生成的Agent平台上效果最佳。在不具备该能力的平台上，可以通过打开新的聊天上下文，仅粘贴工作成果（不包含之前的推理过程），并要求其给出1-10分的评分和具体反馈，以此模拟审阅者环节。

Quick Start

快速开始

Say:

"implement X, use review-loop"

"run review-loop on the file I just wrote"

The agent does the work (or reads existing work)
A separate critic subagent scores it 1-10 with specific feedback
The agent revises and repeats until score >= 8 (default quality gate)
A loop summary is delivered with the final output

输入：

"实现X，use review-loop"

或者

"对我刚写的文件run review-loop"

Agent完成工作（或读取现有工作成果）
独立的批评家子代理给出1-10分的评分和具体反馈
Agent根据反馈修订内容，重复此过程直到评分≥8（默认质量门槛）
最终输出时会附带循环总结

When to Use

使用场景

User says "use review-loop", "polish this", "iterate on this", "/review-loop"
Complex implementations where quality matters more than speed
Design docs, specs, or technical writing
Code that needs to be robust (security, data pipelines, financial logic)
When user wants adversarial critique baked into the process

用户输入"use review-loop"、"polish this"、"iterate on this"、"/review-loop"时
质量优先于速度的复杂实现场景
设计文档、规格说明或技术写作
需要具备鲁棒性的代码（安全、数据管道、财务逻辑相关）
用户希望在流程中加入对抗式批评的场景

When NOT to Use

不适用场景

Simple fixes, quick edits, one-liner changes
Tasks where tests are the quality gate (use TDD + CI instead)
When the user just wants it done fast
Exploratory/spike work where the goal is learning, not shipping
Tasks with no clear acceptance criteria — define those first, then use review-loop

简单修复、快速编辑、单行代码修改
以测试作为质量门槛的任务（改用TDD + CI）
用户只追求快速完成的任务
以学习而非交付为目标的探索性/spike工作
没有明确验收标准的任务——先定义验收标准，再使用审阅循环

Defaults

默认设置

Setting	Default
Min loops	2
Max loops	4
Quality gate	8/10
Worker model	(your current model)
Reviewer model	(your current model or fast/balanced alternative)

设置项	默认值
最小循环次数	2
最大循环次数	4
质量门槛	8/10
工作者模型	（当前使用的模型）
审阅者模型	（当前使用的模型或快速/平衡替代模型）

Model Selection

模型选择

The reviewer task evaluates logic and constraints. The worker task writes and modifies code. Pick subagent capabilities accordingly.

CRITICAL RULE: The Reviewer must always be EQUAL TO or MORE POWERFUL than the Worker. If the reviewer is weaker than the worker, it cannot properly critique complex logic or catch subtle regressions.

Task Complexity	Worker Capability	Reviewer Capability	Rationale
Simple/mechanical (CRUD, formatting, boilerplate)	Fast / Lightweight	Balanced	Lightweight worker is fast, balanced reviewer easily catches issues
Standard (features, refactors, docs)	Balanced	Balanced	Good mix of cost, speed, and quality
Complex (multi-file, integration, design)	Balanced	Advanced / Reasoning	Advanced reviewer catches subtle architecture issues, balanced worker can implement them
Very complex (security, quant, distributed systems)	Advanced / Reasoning	Advanced / Reasoning	Both need full context and reasoning power. Reviewer MUST match worker power.

The golden rule: The worker must be smart enough to ACT on the reviewer's feedback. If the reviewer says "fix the race condition with a channel-based semaphore" and the worker can't reason about concurrency, the loop won't converge.

Escalation signal: If score doesn't improve after 2 consecutive loops with the same feedback, the worker model is too weak. Escalate:

Option A: Ask the user to switch to a more capable model
Option B: Lower the quality gate temporarily and note the gap to the user
Option C: Break the task into smaller sub-tasks and run review-loop on each

Default behavior: Since you (the main agent) ARE the worker, spawn a reviewer subagent that matches or exceeds your current capability based on the task:

Most tasks → Standard/Balanced reviewer
Specialized/hard tasks → Advanced/Reasoning reviewer
Quick checks → Fast/Lightweight reviewer (only if you are also acting as a lightweight worker)

审阅者任务负责评估逻辑和约束条件，工作者任务负责编写和修改代码。需根据任务选择对应的子代理能力。

关键规则： 审阅者模型的能力必须等于或强于工作者模型。如果审阅者模型弱于工作者模型，它将无法正确批评复杂逻辑或捕捉细微的回归问题。

任务复杂度	工作者能力	审阅者能力	理由
简单/机械性（CRUD、格式化、模板代码）	快速/轻量型	平衡型	轻量型工作者速度快，平衡型审阅者能轻松发现问题
标准型（功能开发、重构、文档）	平衡型	平衡型	在成本、速度和质量之间达到良好平衡
复杂型（多文件、集成、设计）	平衡型	高级/推理型	高级审阅者能捕捉细微的架构问题，平衡型工作者可实现相应修改
极复杂型（安全、量化、分布式系统）	高级/推理型	高级/推理型	两者都需要完整的上下文和推理能力。审阅者模型必须与工作者模型能力匹配

黄金法则： 工作者模型必须足够智能，能够执行审阅者的反馈。如果审阅者提出"使用基于通道的信号量修复竞态条件"，但工作者模型无法理解并发逻辑，循环将无法收敛。

升级信号： 如果连续2次循环后评分没有提升且反馈相同，说明工作者模型能力不足。可采取以下升级方案：

选项A：请求用户切换到更强大的模型
选项B：暂时降低质量门槛，并向用户说明差距
选项C：将任务拆分为更小的子任务，对每个子任务运行审阅循环

默认行为： 由于你（主Agent）就是工作者，需根据任务生成能力匹配或超过当前能力的审阅者子代理：

大多数任务 → 标准/平衡型审阅者
专业/高难度任务 → 高级/推理型审阅者
快速检查 → 快速/轻量型审阅者（仅当你同时作为轻量型工作者时）

User Overrides

用户自定义设置

Users can override any default inline with their request. Parse these naturally:

"implement X with review-loop, quality gate 9, use advanced model for review, max 5 loops"
"polish this, 2 loops minimum, gate at 8"
"run review-loop with fast reviewer, max 2 loops"
"use review-loop, reasoning reviewer, quality gate 9, min 3 max 6"

Parsing rules:

"quality gate N" or "gate N" → quality_gate = N
"[model] reviewer" or "review with [model]" → reviewer model override
"max N loops" → max_loops = N
"min N loops" → min_loops = N
If user doesn't specify, use defaults
If user says "thorough" or "strict" → interpret as quality_gate 8, min_loops 2
If user says "quick" or "fast" → interpret as max_loops 2, quality_gate 6

用户可在请求中直接覆盖任何默认设置。需自然解析以下表述：

"实现X并use review-loop，质量门槛设为9，使用高级模型作为审阅者，最大循环次数5次"
"polish this，最小循环次数2次，质量门槛8"
"run review-loop，使用快速审阅者，最大循环次数2次"
"use review-loop，推理型审阅者，质量门槛9，最小3次最大6次循环"

解析规则：

"quality gate N" 或 "gate N" → quality_gate = N
"[model] reviewer" 或 "review with [model]" → 覆盖审阅者模型
"max N loops" → max_loops = N
"min N loops" → min_loops = N
用户未指定时，使用默认值
用户说"thorough"或"strict" → 解析为quality_gate 8，min_loops 2
用户说"quick"或"fast" → 解析为max_loops 2，quality_gate 6

The Process

流程

mermaid

flowchart TD
    A[Do the work] --> B[Spawn reviewer subagent]
    B --> C{Min loops met?}
    C -- no, always continue --> E[Revise based on feedback]
    C -- yes --> D{Score >= quality gate?}
    D -- yes --> F[Final polish pass]
    D -- no --> G{Max loops reached?}
    G -- yes, stop anyway --> F
    G -- no --> E
    E --> B
    F --> H([Done])

mermaid

flowchart TD
    A[完成工作] --> B[生成审阅者子代理]
    B --> C{达到最小循环次数？}
    C -- 未达到，继续循环 --> E[根据反馈修订]
    C -- 已达到 --> D{评分≥质量门槛？}
    D -- 是 --> F[最终润色]
    D -- 否 --> G{达到最大循环次数？}
    G -- 是，强制停止 --> F
    G -- 否 --> E
    E --> B
    F --> H([完成])

Step-by-Step

分步指南

Step 1: Do the Work

步骤1：完成工作

Complete the task as you normally would. Write the code, create the spec, implement the feature. Don't hold back — produce your best first attempt.

按常规方式完成任务。编写代码、创建规格文档、实现功能。无需保留——产出你最好的初稿。

Step 2: Spawn Reviewer Subagent

步骤2：生成审阅者子代理

Use the Agent tool to dispatch a reviewer. The reviewer must:

Be a separate subagent (fresh context, no anchoring to your reasoning)
Receive only the work product (files, diffs) — not your thought process
Score 1-10 with specific, actionable feedback
Use a balanced/standard model by default (or an advanced reasoning model for complex/specialized tasks)

Reviewer prompt template:

You are a critical reviewer. Score the following work 1-10 and provide specific, actionable feedback.

使用Agent工具分派审阅者。审阅者必须：

是独立的子代理（全新上下文，不依赖你的推理过程）
仅接收工作成果（文件、差异）——不包含你的思考过程
给出1-10分的评分和具体、可落地的反馈
默认使用平衡/标准模型（复杂/专业任务使用高级推理模型）

审阅者提示模板：

你是一名严格的审阅者。请对以下工作成果给出1-10分的评分，并提供具体、可落地的反馈。

What was done

任务内容

{brief description of the task}

{任务简要描述}

Review criteria

审阅标准

{task-specific criteria — what matters for THIS task}

Examples of well-written criteria: For a REST API endpoint: - Correct HTTP status codes used - Input validation present on all parameters - Auth enforced; no unauthenticated access - No N+1 query patterns

For a design doc: - Problem statement is unambiguous - Alternatives considered with trade-offs - No hand-waving on implementation complexity - Success metrics are measurable

For a data pipeline: - Idempotent — safe to re-run - Schema changes handled gracefully - Failure modes documented - PII handling addressed

{任务特定标准——此任务的核心要求}

优秀标准示例：对于REST API端点： - 使用正确的HTTP状态码 - 所有参数均有输入验证 - 强制认证；不允许未授权访问 - 无N+1查询模式

对于设计文档： - 问题陈述清晰明确 - 考虑了替代方案并分析了利弊 - 对实现复杂度没有含糊表述 - 成功指标可衡量

对于数据管道： - 幂等性——可安全重跑 - 优雅处理 schema 变更 - 记录故障模式 - 处理PII数据

Instructions

说明

Read the work carefully
Score 1-10 where:
- 1-3: Fundamentally broken or missing major requirements
- 4-6: Works but has significant issues
- 7-8: Good, minor issues only
- 9-10: Excellent, ready to ship
List specific issues with file:line references where applicable
For each issue, explain WHY it matters and WHAT to fix
Do NOT be polite — be honest and direct
State your score clearly as "Score: N/10"

仔细阅读工作成果
给出1-10分评分，评分标准：
- 1-3分：存在根本性缺陷或缺失主要需求
- 4-6分：可运行但存在显著问题
- 7-8分：良好，仅存在 minor 问题
- 9-10分：优秀，可直接交付
列出具体问题，如有可能注明文件:行号
针对每个问题，说明为什么重要以及如何修复
无需客气——保持诚实直接
明确给出评分格式："Score: N/10"

Files to review

待审阅文件

{paste file contents or list file paths with relevant excerpts}

undefined

{粘贴文件内容或列出文件路径及相关片段}

undefined

Step 3: Parse Feedback

步骤3：解析反馈

From the reviewer response, extract:

Score (the number)
Issues (categorized as Critical / Important / Minor)
Specific fixes (what to change)

Report to the user:

Loop {N}/{max}: Score {X}/10
- {summary of key feedback points}

从审阅者的回复中提取：

评分（数字）
问题（分为严重/重要/次要）
具体修复方案（需要修改的内容）

向用户汇报：

第{N}/{max}次循环：评分{X}/10
- {关键反馈要点总结}

Step 4: Check Stop Conditions

步骤4：检查停止条件

In this order:

If loops completed < min_loops → continue (always)
If score >= quality_gate → stop, go to final polish
If loops completed >= max_loops → stop, go to final polish
Otherwise → revise and loop

按以下顺序检查：

已完成循环次数 < 最小循环次数 → 继续循环（必须）
评分 ≥ 质量门槛 → 停止，进入最终润色
已完成循环次数 ≥ 最大循环次数 → 停止，进入最终润色
其他情况 → 修订并循环

Step 5: Revise

步骤5：修订内容

Address the reviewer's feedback. Fix Critical and Important issues. Minor issues are optional. Then go back to Step 2.

Important: Each revision should be targeted. Don't rewrite everything — fix what the reviewer flagged. Maintain a mental list of ALL prior feedback to avoid regressions.

处理审阅者的反馈。修复严重和重要问题。次要问题可选择性修复。然后回到步骤2。

注意： 每次修订都应针对性修改。不要重写所有内容——只修复审阅者指出的问题。记录所有之前的反馈，避免出现回归问题。

Step 6: Final Polish

步骤6：最终润色

Once the loop exits (quality gate met or max loops hit):

Address any remaining minor issues if trivial
Verify the final output is coherent (no artifacts from revision cycles)
Report final score and loop count to user

当循环结束（达到质量门槛或最大循环次数）：

如果次要问题容易修复，处理剩余的次要问题
验证最终输出连贯（无修订周期产生的冗余内容）
向用户汇报最终评分和循环次数

Adapting Reviewer Criteria by Task Type

按任务类型调整审阅标准

Task Type	Reviewer Should Focus On
Code	Correctness, edge cases, error handling, readability, no security issues
Spec/Design	Completeness, feasibility, no hand-waving, implementability
Refactor	No behavior changes, no regressions, cleaner than before
Writing	Clarity, structure, audience-appropriate, no fluff
Bug fix	Root cause addressed, regression test exists, no side effects
Infrastructure / IaC	Idempotency, least privilege, no hardcoded secrets, destroy safety
Database migration	Reversibility, index strategy, data loss risk, performance at scale
API design	Backward compatibility, auth, versioning, error contract
Test suite	Edge case coverage, no test interdependency, meaningful assertions

任务类型	审阅者重点关注
代码	正确性、边界情况、错误处理、可读性、无安全问题
规格/设计文档	完整性、可行性、无含糊表述、可实现性
重构	无行为变更、无回归问题、比之前更简洁
写作	清晰度、结构、符合受众需求、无冗余内容
Bug修复	解决根本原因、存在回归测试、无副作用
基础设施/IaC	幂等性、最小权限原则、无硬编码密钥、销毁安全性
数据库迁移	可逆性、索引策略、数据丢失风险、大规模场景下的性能
API设计	向后兼容性、认证、版本控制、错误契约
测试套件	边界场景覆盖、无测试依赖、有意义的断言

Two Modes of Operation

两种运行模式

Mode A: "Do and Review" (full cycle)

模式A：“完成并审阅”（全周期）

User gives a task + says to use review-loop. You do the work AND run the review loop.

User: "Implement the caching layer. Use review-loop, quality gate 8."

You:
1. Implement caching layer
2. Spawn reviewer → Score 6, feedback: missing eviction, no TTL
3. Revise: add eviction + TTL
4. Spawn reviewer → Score 8, feedback: minor naming nit
5. Final polish, done

用户给出任务并要求使用审阅循环。你完成工作并运行审阅循环。

用户："实现缓存层。Use review-loop，质量门槛8。"

你：
1. 实现缓存层
2. 生成审阅者 → 评分6，反馈：缺少淘汰机制、无TTL
3. 修订：添加淘汰机制 + TTL
4. 生成审阅者 → 评分8，反馈：minor命名问题
5. 最终润色，完成

Mode B: "Review Existing" (review only)

模式B：“审阅现有成果”（仅审阅）

User already did work or you already did work. Just run the review loop on what exists.

User: "Run review-loop on the auth module I just wrote"

You:
1. Read the auth module
2. Spawn reviewer → Score 5, feedback: SQL injection, no rate limiting
3. Fix: parameterize queries, add rate limiter
4. Spawn reviewer → Score 8, approved
5. Done

用户已完成工作或你已完成工作。仅对现有成果运行审阅循环。

用户："对我刚写的认证模块run review-loop"

你：
1. 读取认证模块
2. 生成审阅者 → 评分5，反馈：存在SQL注入风险、无速率限制
3. 修复：参数化查询、添加速率限制器
4. 生成审阅者 → 评分8，通过
5. 完成

Red Flags

注意事项

Never:

Self-review instead of spawning a subagent (anchoring bias)
Skip a revision cycle because "the feedback is wrong" without justification
Inflate your own score ("I think this is actually a 9")
Continue past max_loops without user consent
Spawn the reviewer with your full reasoning/history (they must review the WORK, not your intent)

If reviewer is wrong:

Push back with evidence (show code/tests that disprove the feedback)
Skip that specific point in revision
Note it in your report to the user
Do NOT lower the quality bar to compensate

绝对禁止：

自我审阅而非生成子代理（锚定偏差）
无正当理由跳过修订环节，声称“反馈有误”
抬高自己的评分（比如“我认为这实际上是9分”）
未经用户同意超过最大循环次数
将你的完整推理/历史记录提供给审阅者（他们必须审阅工作成果，而非你的意图）

如果审阅者反馈错误：

提供证据反驳（展示证明反馈错误的代码/测试）
跳过该特定反馈点的修订
在向用户的汇报中注明此情况
不得因此降低质量标准

Example Report Format

示例汇报格式

--- Review Loop: {task name} ---

Loop 1/4: Score 6/10
  Reviewer: Missing input validation, no error handling for network timeout,
            function too long (80 lines).
  Action: Fixing all three issues.

Loop 2/4: Score 8/10
  Reviewer: Clean. Minor: variable name `d` could be more descriptive.
  Action: Quality gate met (8 >= 8). Final polish.

Result: 8/10 in 2 loops. Done.

--- 审阅循环：{任务名称} ---

第1/4次循环：评分6/10
  审阅者反馈：缺少输入验证、无网络超时错误处理、
            函数过长（80行）。
  行动：修复所有三个问题。

第2/4次循环：评分8/10
  审阅者反馈：代码简洁。Minor问题：变量名`d`可更具描述性。
  行动：达到质量门槛（8≥8）。进行最终润色。

结果：2次循环后评分8/10。完成。

Known Limitations

已知局限性

The reviewer subagent has no memory of prior loops — include prior feedback context explicitly in each reviewer prompt to avoid repeating resolved issues
Score inflation is possible if the reviewer prompt criteria are too vague — invest time in writing specific, measurable criteria
This skill does not replace human code review for security-critical or compliance-sensitive code; treat the output as a strong first pass
Loop convergence is not guaranteed if the task is underspecified — define clear acceptance criteria before starting

审阅者子代理没有之前循环的记忆——需在每次审阅者提示中明确包含之前的反馈上下文，避免重复已解决的问题
如果审阅者提示标准过于模糊，可能出现评分虚高的情况——需投入时间编写具体、可衡量的标准
此技能无法替代针对安全关键或合规敏感代码的人工代码审阅；将输出视为高质量初稿即可
如果任务定义不明确，循环可能无法收敛——开始前先定义清晰的验收标准

Cost and Speed

成本与速度

Each loop = 1 reviewer subagent call
Budget roughly 1–2x the time of a single implementation pass for a full 3-loop cycle
This is cheap compared to shipping buggy code, vague specs, or triggering a late-stage review cycle
Use your default/standard model for most reviews; only upgrade to advanced reasoning models for specialized domains (security audits, distributed systems, quant finance)

每次循环 = 1次审阅者子代理调用
完整3次循环的时间大约是单次实现时间的1-2倍
这与交付有bug的代码、模糊的规格说明或触发后期审阅周期相比成本更低
大多数审阅使用默认/标准模型；仅在专业领域（安全审计、分布式系统、量化金融）使用高级推理模型

License

许可证

MIT — free to use, adapt, and redistribute with any AI tool or platform. Attribution appreciated. Contributions welcome.

MIT许可证——可免费用于任何AI工具或平台，允许修改和再分发。感谢署名。欢迎贡献。