review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Review

评审

Run an adversarial review in an isolated context. The critic must have no access to the current conversation history — this is what makes the critique independent, not which model is used.
孤立上下文中执行对抗性评审。评审者不得访问当前对话历史——这是确保评审独立性的关键,而非所使用的模型。

Critic Selection

评审者选择

  1. Use the first available isolated critic in priority order:
    • Claude subagent (via
      Task
      tool or similar) with a clean context — preferred when no external MCP is configured, same model is fine
    • codex:codex
      (Codex MCP) — isolated by the MCP boundary
    • auto-review-loop-llm
      skill (any OpenAI-compatible endpoint)
    • auto-review-loop-minimax
      skill (MiniMax)
  2. Record which critic was used in
    RESEARCH.md
    Context alongside the score.
  3. If no isolated context can be established, say so and stop. Do not review within the same conversation context.
  1. 按优先级顺序使用首个可用的孤立评审者:
    • 具备干净上下文的Claude子代理(通过
      Task
      工具或类似工具)——未配置外部MCP时优先选择,使用相同模型即可
    • codex:codex
      (Codex MCP)——通过MCP边界实现隔离
    • auto-review-loop-llm
      技能(任何兼容OpenAI的端点)
    • auto-review-loop-minimax
      技能(MiniMax)
  2. RESEARCH.md
    Context部分记录所使用的评审者及评分。
  3. 若无法建立孤立上下文,需告知用户并停止操作。不得在同一对话上下文中进行评审。

Review Rubric (Fixed — Do Not Modify)

评审准则(固定内容——请勿修改)

  1. Send the work to the critic with this exact rubric:
VERIFIABLE (graded on evidence, not opinion):
  • Every claim maps to an experiment result in
    RESEARCH.md
    Context. Flag claims that don't.
  • No hallucinated citations. Check every DOI and arXiv ID. Flag any that don't resolve.
  • Ablations support stated conclusions. Check the numbers match.
SUBJECTIVE (graded by the critic):
  • Clarity of problem statement (1–5)
  • Novelty relative to cited work (1–5)
  • Writing quality (1–5)
  1. The critic must return a structured result matching the schema in
    RUBRIC.md
    . See
    RUBRIC.md
    for field definitions, the PROCEED/REFINE/PIVOT decision rules, and the verbatim integrity instruction to include in every critic prompt.
  1. 将待评审内容连同以下精确准则发送给评审者:
可验证项(基于证据评分,而非主观意见):
  • 每一项主张均需对应
    RESEARCH.md
    Context中的实验结果。标记无对应结果的主张。
  • 不得存在虚构引用。检查每一个DOI和arXiv ID。标记无法解析的引用。
  • 消融实验需支持所述结论。检查数据是否匹配。
主观项(由评审者评分):
  • 问题陈述清晰度(1–5分)
  • 相较于引用文献的创新性(1–5分)
  • 写作质量(1–5分)
  1. 评审者必须返回符合
    RUBRIC.md
    中 schema 的结构化结果。有关字段定义、PROCEED/REFINE/PIVOT决策规则以及需包含在每个评审者提示中的完整原文说明,请参阅
    RUBRIC.md

Integrity Checks

完整性检查

  1. If the score rises between rounds without corresponding verifiable improvements (new experiments, fixed citations), flag it as potential reward hacking. Write a warning to
    RESEARCH.md
    Context.
  1. 若评审轮次间评分上升,但未出现相应的可验证改进(如新实验、修正引用),需标记为潜在奖励操纵。在
    RESEARCH.md
    Context部分写入警告。

Limits

限制条件

  1. Run at most 4 review rounds per session. After round 4, escalate to the user regardless of score.
  2. After each round, write the score and top weaknesses to
    RESEARCH.md
    Context.
  1. 每个会话最多执行4轮评审。第4轮结束后,无论评分如何,均需将情况上报给用户。
  2. 每轮评审结束后,将评分及主要问题写入
    RESEARCH.md
    Context部分。

Example

示例

Input: paper.md with 3 quantitative claims; 1 claim lacks a matching RESEARCH.md result. Output: score=6/10, verifiable_failures=["Table 2 improvement not in Context"], recommended_action=REFINE.
输入:paper.md包含3项量化主张;其中1项主张无匹配的RESEARCH.md结果。 输出:score=6/10, verifiable_failures=["Table 2 improvement not in Context"], recommended_action=REFINE.