review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseReview
评审
Run an adversarial review in an isolated context. The critic must have no access to the current conversation history — this is what makes the critique independent, not which model is used.
在孤立上下文中执行对抗性评审。评审者不得访问当前对话历史——这是确保评审独立性的关键,而非所使用的模型。
Critic Selection
评审者选择
- Use the first available isolated critic in priority order:
- Claude subagent (via tool or similar) with a clean context — preferred when no external MCP is configured, same model is fine
Task - (Codex MCP) — isolated by the MCP boundary
codex:codex - skill (any OpenAI-compatible endpoint)
auto-review-loop-llm - skill (MiniMax)
auto-review-loop-minimax
- Claude subagent (via
- Record which critic was used in Context alongside the score.
RESEARCH.md - If no isolated context can be established, say so and stop. Do not review within the same conversation context.
- 按优先级顺序使用首个可用的孤立评审者:
- 具备干净上下文的Claude子代理(通过工具或类似工具)——未配置外部MCP时优先选择,使用相同模型即可
Task - (Codex MCP)——通过MCP边界实现隔离
codex:codex - 技能(任何兼容OpenAI的端点)
auto-review-loop-llm - 技能(MiniMax)
auto-review-loop-minimax
- 具备干净上下文的Claude子代理(通过
- 在的Context部分记录所使用的评审者及评分。
RESEARCH.md - 若无法建立孤立上下文,需告知用户并停止操作。不得在同一对话上下文中进行评审。
Review Rubric (Fixed — Do Not Modify)
评审准则(固定内容——请勿修改)
- Send the work to the critic with this exact rubric:
VERIFIABLE (graded on evidence, not opinion):
- Every claim maps to an experiment result in Context. Flag claims that don't.
RESEARCH.md - No hallucinated citations. Check every DOI and arXiv ID. Flag any that don't resolve.
- Ablations support stated conclusions. Check the numbers match.
SUBJECTIVE (graded by the critic):
- Clarity of problem statement (1–5)
- Novelty relative to cited work (1–5)
- Writing quality (1–5)
- The critic must return a structured result matching the schema in . See
RUBRIC.mdfor field definitions, the PROCEED/REFINE/PIVOT decision rules, and the verbatim integrity instruction to include in every critic prompt.RUBRIC.md
- 将待评审内容连同以下精确准则发送给评审者:
可验证项(基于证据评分,而非主观意见):
- 每一项主张均需对应Context中的实验结果。标记无对应结果的主张。
RESEARCH.md - 不得存在虚构引用。检查每一个DOI和arXiv ID。标记无法解析的引用。
- 消融实验需支持所述结论。检查数据是否匹配。
主观项(由评审者评分):
- 问题陈述清晰度(1–5分)
- 相较于引用文献的创新性(1–5分)
- 写作质量(1–5分)
- 评审者必须返回符合中 schema 的结构化结果。有关字段定义、PROCEED/REFINE/PIVOT决策规则以及需包含在每个评审者提示中的完整原文说明,请参阅
RUBRIC.md。RUBRIC.md
Integrity Checks
完整性检查
- If the score rises between rounds without corresponding verifiable improvements (new experiments, fixed citations), flag it as potential reward hacking. Write a warning to Context.
RESEARCH.md
- 若评审轮次间评分上升,但未出现相应的可验证改进(如新实验、修正引用),需标记为潜在奖励操纵。在的Context部分写入警告。
RESEARCH.md
Limits
限制条件
- Run at most 4 review rounds per session. After round 4, escalate to the user regardless of score.
- After each round, write the score and top weaknesses to Context.
RESEARCH.md
- 每个会话最多执行4轮评审。第4轮结束后,无论评分如何,均需将情况上报给用户。
- 每轮评审结束后,将评分及主要问题写入的Context部分。
RESEARCH.md
Example
示例
Input: paper.md with 3 quantitative claims; 1 claim lacks a matching RESEARCH.md result.
Output: score=6/10, verifiable_failures=["Table 2 improvement not in Context"], recommended_action=REFINE.
输入:paper.md包含3项量化主张;其中1项主张无匹配的RESEARCH.md结果。
输出:score=6/10, verifiable_failures=["Table 2 improvement not in Context"], recommended_action=REFINE.