self-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

self-review

self-review

Auto-loop dev-stage self-review on the current branch. Runs
codex
(OpenAI GPT) cross-model review following pr-review's multi-role methodology, applies fixes per-finding, re-reviews, and loops until findings converge or a stop condition fires.
Single-shot mode (
review-only
) is available as opt-in — produces findings without auto-fix.
<HARD-GATE> This skill auto-modifies code on the current branch. Constraints:
Loop semantics:
  • Per-finding atomic commits — every fix gets its own conventional commit, easy to revert
  • Stop conditions are NON-NEGOTIABLE (see Stop Conditions below) — when one fires, the loop STOPS and the remaining work surfaces to the user, period
  • User can interrupt anytime with ctrl-c; main session catches and reports partial state
Context-mix prevention:
  • Codex runs in a fresh process every iteration — no shared conversation context with main session
  • Codex output (verbose finding text) goes to a temp file, NOT into chat — only a structured JSON summary enters main session memory, per-iteration
  • Main session implementing fixes MUST execute the codex finding's
    Mitigation:
    field literally, mechanically — NO additional reasoning about whether the fix is right, NO substitution of "a better approach", NO scope expansion
  • If you cannot fix the finding from the
    Mitigation:
    field alone (because it's ambiguous, requires design judgment, or touches code outside the scope of this finding) → SKIP that finding for this iter and surface it to user at end
Verdict-reasoning forbidden:
  • This skill does not produce wontfix decisions. Findings are either fixed (auto) or surfaced (escalation).
  • Do NOT decide "this finding doesn't matter" → all findings are fixed unless they hit the "cannot fix from Mitigation alone" gate above
  • Do NOT fill out Wontfix Template fields from main-session memory (see
    pr-babysit
    § 4.6)
Quality gate:
  • Tests run after each iter's fixes —
    pnpm test
    (or detected per-repo command). Failure → STOP and escalate
  • This is the natural safety net for "fixes that break things"
Author bias on fix step:
  • Codex (cross-model) generates findings → bias-isolated at finding-generation layer
  • Main session writes fix → not bias-isolated, but follows codex Mitigation literally → bias attack surface is "mechanical execution accuracy", not "verdict judgment"
  • If you catch yourself reasoning "I think codex is wrong about this" → that's verdict reasoning, NOT allowed in this skill. Surface to user at end. </HARD-GATE>
对当前分支执行开发阶段的自动循环self-review。运行遵循pr-review多角色方法论的
codex
(OpenAI GPT)跨模型审查,针对每个问题应用修复,再次审查,循环直到问题收敛或触发停止条件。
可选启用单次模式(
review-only
)——仅生成问题报告,不自动修复。
<HARD-GATE> 此技能会自动修改当前分支上的代码。约束条件:
循环语义:
  • 每个问题对应原子提交——每个修复都有独立的规范提交,便于回滚
  • 停止条件为不可协商(见下文停止条件)——一旦触发,循环立即停止,剩余工作交由用户处理
  • 用户可随时按ctrl-c中断;主会话会捕获并报告部分状态
上下文混淆预防:
  • 每次迭代时,codex都会在全新进程中运行——与主会话无共享对话上下文
  • codex输出(详细问题文本)会写入临时文件,不会进入聊天会话——仅结构化JSON摘要会进入主会话内存(按迭代存储)
  • 执行修复的主会话必须严格、机械地执行codex问题中的
    Mitigation:
    字段内容——不得额外判断修复是否正确,不得替换为“更好的方案”,不得扩大修复范围
  • 若仅通过
    Mitigation:
    字段无法完成修复(内容模糊、需要设计判断或涉及超出当前问题范围的代码)→ 跳过该问题,在循环结束时告知用户
禁止 verdict 推理:
  • 此技能不生成“无需修复”(wontfix)的决策。问题要么被自动修复,要么被提交给用户处理
  • 不得判定“此问题无关紧要”→ 除非遇到上述“无法仅通过Mitigation修复”的限制,否则所有问题都需修复
  • 不得从主会话内存中填写Wontfix模板字段(详见
    pr-babysit
    § 4.6)
质量门禁:
  • 每次迭代修复后运行测试——
    pnpm test
    (或自动检测仓库专属命令)。测试失败→停止并提交给用户处理
  • 这是“修复导致功能损坏”的天然安全屏障
修复步骤的作者偏差:
  • codex(跨模型)生成问题→在问题生成层隔离偏差
  • 主会话执行修复→未隔离偏差,但严格遵循codex的Mitigation内容→偏差风险仅存在于“机械执行的准确性”,而非“ verdict 判断”
  • 若发现自己在思考“我认为codex的判断有误”→这属于verdict推理,违反此技能规则,应在结束时告知用户 </HARD-GATE>

Modes

模式

  • loop
    (default) — codex review → per-finding fix + commit → tests → re-review → loop until stop
  • review-only
    — single codex pass, present findings, stop. No fix, no commit, no test run. Use when you want a manual review without auto-modification.
Mode selection:
  • User says "self review" or "review my branch" → loop (default)
  • User says "just show me findings" or "self review without fixing" → review-only
  • loop
    (默认)——codex审查→逐个问题修复+提交→测试→再次审查→循环直到触发停止条件
  • review-only
    ——单次codex审查,展示问题后停止。不修复、不提交、不运行测试。适用于需要手动审查但不希望自动修改代码的场景
模式选择规则:
  • 用户输入"self review"或"review my branch"→默认使用loop模式
  • 用户输入"just show me findings"或"self review without fixing"→使用review-only模式

When to use

适用场景

  • Just wrote code on a feature branch, working tree is clean and committed, want auto cross-model review + fix before push
  • Iterating on a feature, want to converge on a clean state quickly
  • Pre-PR sanity pass — catch the obvious stuff codex spots before sending to humans
  • 刚在功能分支上完成代码编写,工作树已清理并提交,希望在推送前进行自动跨模型审查+修复
  • 迭代开发功能,希望快速收敛到整洁的代码状态
  • PR前置检查——在提交给人工审查前,先让codex发现明显问题

When NOT to use

不适用于场景

  • Live PR review with sticky comment + inline threads →
    /cadence:pr-review
    (default mode)
  • PR babysit work after PR is open
    /cadence:pr-babysit
    (handles thread reply, dedup, CI gates)
  • Working tree has uncommitted changes — STOP and ask user to commit or stash first. Loop assumes clean working tree at start so per-finding commits are atomic
  • You want manual verdict control → use
    mode=review-only
    and decide yourself
  • First-time use on a new branch → consider
    mode=review-only
    first to see what codex produces before letting it auto-fix
  • 带固定评论+内联线程的实时PR审查→使用
    /cadence:pr-review
    (默认模式)
  • PR已创建后的看护工作→使用
    /cadence:pr-babysit
    (处理线程回复、去重、CI门禁)
  • 工作树存在未提交变更→停止并要求用户先提交或暂存变更。循环要求初始工作树干净,以确保每个问题的提交都是原子性的
  • 需要手动控制verdict→使用
    mode=review-only
    模式自行决策
  • 首次在新分支上使用→建议先使用
    mode=review-only
    模式查看codex的输出,再启用自动修复

Setup check

环境检查

Run these checks at start. STOP on any failure with the listed message — do NOT attempt auto-install.
bash
undefined
开始前运行以下检查。若任何检查失败,输出指定消息并停止——请勿尝试自动安装。
bash
undefined

Codex CLI

Codex CLI

codex --version 2>/dev/null || { echo "STOP: Install codex — npm install -g @openai/codex"; exit 1; } codex login --status 2>/dev/null || { echo "STOP: Run 'codex login' to authenticate"; exit 1; }
codex --version 2>/dev/null || { echo "STOP: Install codex — npm install -g @openai/codex"; exit 1; } codex login --status 2>/dev/null || { echo "STOP: Run 'codex login' to authenticate"; exit 1; }

Plugin install root — needed so codex can locate the pr-review methodology prompts

Plugin install root — needed so codex can locate the pr-review methodology prompts

[ -n "${CLAUDE_PLUGIN_ROOT:-}" ] && [ -d "${CLAUDE_PLUGIN_ROOT}/skills/pr-review" ] || { echo "STOP: CLAUDE_PLUGIN_ROOT must point at cadence's install root (so codex can read ./skills/pr-review/*-prompt.md). Set it explicitly if your harness doesn't export it (e.g. export CLAUDE_PLUGIN_ROOT=~/.claude/plugins/cache/cadence/cadence/<version>)."; exit 1; }
[ -n "${CLAUDE_PLUGIN_ROOT:-}" ] && [ -d "${CLAUDE_PLUGIN_ROOT}/skills/pr-review" ] || { echo "STOP: CLAUDE_PLUGIN_ROOT must point at cadence's install root (so codex can read ./skills/pr-review/*-prompt.md). Set it explicitly if your harness doesn't export it (e.g. export CLAUDE_PLUGIN_ROOT=~/.claude/plugins/cache/cadence/cadence/<version>)."; exit 1; }

Clean working tree

Clean working tree

git diff --quiet && git diff --cached --quiet || { echo "STOP: Working tree has uncommitted changes. Commit or stash, then re-invoke."; exit 1; }
git diff --quiet && git diff --cached --quiet || { echo "STOP: Working tree has uncommitted changes. Commit or stash, then re-invoke."; exit 1; }

Detect base branch

Detect base branch

BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||') [ -z "$BASE" ] && BASE=main echo "BASE: origin/$BASE"
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||') [ -z "$BASE" ] && BASE=main echo "BASE: origin/$BASE"

Detect test command (best-effort — repo-specific)

Detect test command (best-effort — repo-specific)

if [ -f package.json ] && jq -e '.scripts.test' package.json >/dev/null 2>&1; then TEST_CMD="pnpm test" elif [ -f Makefile ] && grep -q '^test:' Makefile; then TEST_CMD="make test" else TEST_CMD="" echo "WARN: No test command detected — quality gate disabled" fi echo "TEST_CMD: ${TEST_CMD:-<none>}"
undefined
if [ -f package.json ] && jq -e '.scripts.test' package.json >/dev/null 2>&1; then TEST_CMD="pnpm test" elif [ -f Makefile ] && grep -q '^test:' Makefile; then TEST_CMD="make test" else TEST_CMD="" echo "WARN: No test command detected — quality gate disabled" fi echo "TEST_CMD: ${TEST_CMD:-<none>}"
undefined

Loop algorithm (mode=loop)

循环算法(mode=loop)

Maintain in main session memory:
  • ITER
    — current iteration number (starts at 1)
  • MAX_ITERS = 3
    — hard cap. Empirically codex's high-value findings (real reachable bugs, the kind a same-model reviewer misses) land in iters 1–3; iters 4–5 trend to hygiene-tier nits + the occasional false positive that the controller then has to spend judgment rejecting. The real stop is SC0 (severity floor) below —
    MAX_ITERS
    is the blunt backstop.
  • FINDING_HISTORY
    — list of
    file:line:slug
    fingerprints from prior iterations (for repetition detection)
在主会话内存中维护以下状态:
  • ITER
    ——当前迭代次数(从1开始)
  • MAX_ITERS = 3
    ——硬上限。根据经验,codex发现的高价值问题(实际可触发的bug,同模型审查者容易遗漏的类型)集中在第1-3次迭代;第4-5次迭代多为卫生级别的小问题,偶尔会出现控制器需要判断排除的误报。核心停止条件是下文的SC0(严重度下限)——
    MAX_ITERS
    是兜底的硬性限制
  • FINDING_HISTORY
    ——来自之前迭代的
    file:line:slug
    指纹列表(用于检测重复问题)

Step 1: Run codex review

步骤1:执行codex审查

Same prompt construction as Step 2 of
mode=review-only
(see Codex Prompt section below). Output goes to
/tmp/self-review-iter-$ITER.md
. Do NOT inline the verbose codex output into main session — only parse the JSON summary block.
Codex prompt MUST end with a summary JSON block (described in Codex Prompt section below) so main session can drive the loop without parsing prose.
bash
PROMPT_FILE=$(mktemp /tmp/self-review-prompt-XXXXXX.md)
OUTPUT_FILE="/tmp/self-review-iter-$ITER.md"
JSON_FILE="/tmp/self-review-iter-$ITER.json"
mode=review-only
的步骤2使用相同的提示构建方式(详见下文Codex提示部分)。输出写入
/tmp/self-review-iter-$ITER.md
。请勿将冗长的codex输出内联到主会话中——仅解析JSON摘要块。
Codex提示必须以摘要JSON块结尾(详见下文Codex提示部分),以便主会话无需解析 prose 即可驱动循环。
bash
PROMPT_FILE=$(mktemp /tmp/self-review-prompt-XXXXXX.md)
OUTPUT_FILE="/tmp/self-review-iter-$ITER.md"
JSON_FILE="/tmp/self-review-iter-$ITER.json"

Write prompt (see Codex Prompt section)

Write prompt (see Codex Prompt section)

write_codex_prompt > "$PROMPT_FILE"
write_codex_prompt > "$PROMPT_FILE"

Run codex (5-min timeout)

Run codex (5-min timeout)

_REPO_ROOT=$(git rev-parse --show-toplevel) codex exec "$(cat "$PROMPT_FILE")"
-C "$_REPO_ROOT"
-s read-only
-c 'model_reasoning_effort="high"'
--enable web_search_cached \
"$OUTPUT_FILE" 2>/tmp/self-review-err
_REPO_ROOT=$(git rev-parse --show-toplevel) codex exec "$(cat "$PROMPT_FILE")"
-C "$_REPO_ROOT"
-s read-only
-c 'model_reasoning_effort="high"'
--enable web_search_cached \
"$OUTPUT_FILE" 2>/tmp/self-review-err

Extract JSON summary block between unique sentinels

Extract JSON summary block between unique sentinels

sed -n '/<!-- SELF-REVIEW-JSON-START -->/,/<!-- SELF-REVIEW-JSON-END -->/{ /<!-- SELF-REVIEW-JSON-/d p }' "$OUTPUT_FILE" > "$JSON_FILE"
sed -n '/<!-- SELF-REVIEW-JSON-START -->/,/<!-- SELF-REVIEW-JSON-END -->/{ /<!-- SELF-REVIEW-JSON-/d p }' "$OUTPUT_FILE" > "$JSON_FILE"

Guard: codex must emit a parseable JSON block — empty or malformed is NOT zero findings

Guard: codex must emit a parseable JSON block — empty or malformed is NOT zero findings

if [ ! -s "$JSON_FILE" ]; then echo "STOP: codex did not emit a JSON summary block. Output saved to $OUTPUT_FILE for manual review." break fi if ! jq -e '.findings | type == "array"' "$JSON_FILE" >/dev/null 2>&1; then echo "STOP: codex JSON summary malformed. Output: $OUTPUT_FILE, JSON: $JSON_FILE" break fi
if [ ! -s "$JSON_FILE" ]; then echo "STOP: codex did not emit a JSON summary block. Output saved to $OUTPUT_FILE for manual review." break fi if ! jq -e '.findings | type == "array"' "$JSON_FILE" >/dev/null 2>&1; then echo "STOP: codex JSON summary malformed. Output: $OUTPUT_FILE, JSON: $JSON_FILE" break fi

Sanity: confirm line_source field is "source" on every finding (catches diff-line confusion)

Sanity: confirm line_source field is "source" on every finding (catches diff-line confusion)

BAD_LINE_SOURCE=$(jq -r '[.findings[] | select(.line_source != "source")] | length' "$JSON_FILE") if [ "$BAD_LINE_SOURCE" != "0" ]; then echo "STOP: codex emitted findings with line_source != 'source' (likely diff-line numbers, not source-file lines). Review $OUTPUT_FILE manually." break fi
undefined
BAD_LINE_SOURCE=$(jq -r '[.findings[] | select(.line_source != "source")] | length' "$JSON_FILE") if [ "$BAD_LINE_SOURCE" != "0" ]; then echo "STOP: codex emitted findings with line_source != 'source' (likely diff-line numbers, not source-file lines). Review $OUTPUT_FILE manually." break fi
undefined

Step 2: Convergence check

步骤2:收敛性检查

Read findings from
$JSON_FILE
. Apply stop conditions IN ORDER — first match wins:
bash
FINDINGS_COUNT=$(jq '.findings | length' "$JSON_FILE")
SC1 — Success:
FINDINGS_COUNT == 0
→ STOP, report success.
SC0 — Severity floor (converged-enough): an iteration whose findings are ALL hygiene-tier → STOP, report as converged. "Hygiene-tier" = no finding is both (
severity
in
Blocker|Factual
) AND (
justification
in
Reachable|Asymmetric
). I.e. nothing left that is a real, reachable bug — only
Suggestion
/
Question
, or
Factual
findings resting on
Precedent
/
Historical
speculation.
bash
REAL_BUGS=$(jq -r '[.findings[]
  | select((.severity == "Blocker" or .severity == "Factual")
           and (.justification == "Reachable" or .justification == "Asymmetric"))]
  | length' "$JSON_FILE")
REAL_BUGS == 0
(with
FINDINGS_COUNT > 0
) → STOP. The remaining hygiene findings are surfaced to the user, not auto-fixed — they are below the bar that justifies another codex round. This is the intended stop in a healthy run; it usually fires at iter 2–3.
MAX_ITERS
only catches runs where codex keeps producing real-bug findings that far out (itself a signal the change is too big and should be split).
Why SC0 is checked before SC2 but after SC1: zero findings is unambiguous success; a hygiene-only iteration is converged-enough success; the
MAX_ITERS
cap is the unhappy backstop. Naming it SC0 keeps it visually adjacent to SC1 (both success-class) without renumbering SC2–SC6.
SC2 — Cap reached:
ITER > MAX_ITERS
→ STOP, escalate with current findings. Do NOT apply more fixes.
SC3 — Same finding 3x (race-of-race signal):
bash
undefined
$JSON_FILE
读取问题。按顺序应用停止条件——第一个匹配的条件生效:
bash
FINDINGS_COUNT=$(jq '.findings | length' "$JSON_FILE")
SC1 — 成功
FINDINGS_COUNT == 0
→停止,报告成功。
SC0 — 严重度下限(足够收敛):某次迭代的所有问题均为卫生级别→停止,报告为已收敛。“卫生级别”指没有同时满足(
severity
Blocker|Factual
)且(
justification
Reachable|Asymmetric
)的问题。即剩余问题均非实际可触发的bug——仅为
Suggestion
/
Question
,或基于
Precedent
/
Historical
推测的
Factual
问题。
bash
REAL_BUGS=$(jq -r '[.findings[]
  | select((.severity == "Blocker" or .severity == "Factual")
           and (.justification == "Reachable" or .justification == "Asymmetric"))]
  | length' "$JSON_FILE")
REAL_BUGS == 0
(且
FINDINGS_COUNT > 0
)→停止。剩余的卫生级问题会告知用户,不会自动修复——这些问题不值得再进行一轮codex审查。这是健康运行时的预期停止条件,通常在第2-3次迭代触发。
MAX_ITERS
仅用于捕获codex在多次迭代后仍持续发现实际bug的情况(这本身就表明变更过大,应拆分)。
为何SC0在SC2之前、SC1之后检查:零问题是明确的成功;仅存在卫生级问题是“足够收敛”的成功;
MAX_ITERS
上限是兜底的异常情况。命名为SC0是为了在视觉上与SC1(同属成功类)相邻,无需重新编号SC2-SC6。
SC2 — 达到上限
ITER > MAX_ITERS
→停止,将当前问题提交给用户处理。不得再应用更多修复。
SC3 — 同一问题出现3次(迭代失效信号)
bash
undefined

Build current iter's fingerprints

Build current iter's fingerprints

CURR_FPS=$(jq -r '.findings[] | "(.file):(.line):(.slug)"' "$JSON_FILE")
CURR_FPS=$(jq -r '.findings[] | "(.file):(.line):(.slug)"' "$JSON_FILE")

For each fingerprint, count occurrences in HISTORY + this iter

For each fingerprint, count occurrences in HISTORY + this iter

for FP in $CURR_FPS; do COUNT=$(echo "$FINDING_HISTORY" | grep -cF "$FP" || true) COUNT=$((COUNT + 1)) # current iter if [ "$COUNT" -ge 3 ]; then REPEATED="$FP" break fi done

If `REPEATED` non-empty → STOP, escalate. Same finding has fired 3+ times across iterations; auto-fix is not converging. Reference `pr-babysit` § 4.5 Gate B for the equivalent convergence-failure pattern.

**SC3.5 — File-only fallback for slug drift**: codex may use a different slug each iter for the same logical issue (`missing-nil-check` → `null-dereference-guard` → `null-check-omitted`). Exact fingerprint match would miss this. Track per-file appearance count across iters; if the same file has produced findings in 3+ consecutive iters → STOP, surface as warning:

```bash
for FP in $CURR_FPS; do COUNT=$(echo "$FINDING_HISTORY" | grep -cF "$FP" || true) COUNT=$((COUNT + 1)) # current iter if [ "$COUNT" -ge 3 ]; then REPEATED="$FP" break fi done

若`REPEATED`非空→停止,提交给用户处理。同一问题在多次迭代中出现3次以上;自动修复无法收敛。参考`pr-babysit` § 4.5 Gate B中的等效收敛失败模式。

**SC3.5 — 针对slug漂移的文件级兜底**:codex可能对同一逻辑问题使用不同的slug(如`missing-nil-check`→`null-dereference-guard`→`null-check-omitted`)。精确指纹匹配会遗漏这种情况。跟踪每个文件在迭代中的出现次数;若同一文件连续3次迭代都产生问题→停止,作为警告告知用户:

```bash

Per-iter file set (just file names, dedup)

Per-iter file set (just file names, dedup)

CURR_FILES=$(jq -r '.findings[].file' "$JSON_FILE" | sort -u) echo "$CURR_FILES" > "/tmp/self-review-iter-$ITER.files"
CURR_FILES=$(jq -r '.findings[].file' "$JSON_FILE" | sort -u) echo "$CURR_FILES" > "/tmp/self-review-iter-$ITER.files"

Count files that appear in current iter AND last two iters' file lists

Count files that appear in current iter AND last two iters' file lists

if [ "$ITER" -ge 3 ]; then PREV1="/tmp/self-review-iter-$((ITER - 1)).files" PREV2="/tmp/self-review-iter-$((ITER - 2)).files" if [ -s "$PREV1" ] && [ -s "$PREV2" ]; then PERSISTENT=$(grep -Fxf "$PREV1" "/tmp/self-review-iter-$ITER.files" | grep -Fxf "$PREV2" | head -3) if [ -n "$PERSISTENT" ]; then echo "STOP: file(s) producing findings 3+ consecutive iters — possible slug drift hiding stuck finding:" echo "$PERSISTENT" break fi fi fi

This is an ADDITIVE signal — doesn't replace SC3's exact fingerprint check. SC3 catches lexical convergence failure; SC3.5 catches semantic convergence failure that slug drift hides.

**SC4 — Findings diverging**: fires on EITHER of two signals (race-of-race detection, cf `pr-babysit` § 4.5 Gate B):

(a) **Count growing**: `FINDINGS_COUNT` strictly larger than previous iter's count → STOP, escalate. The fix step is introducing new issues faster than it resolves them.

(b) **Set replacement**: `FINDINGS_COUNT >= 3` AND zero fingerprint overlap between current iter and previous iter (`|current ∩ prev_iter| == 0`) → STOP, escalate. Count unchanged but the WHOLE finding set turned over — fixes are opening completely new surfaces. Count-only check misses this.

```bash
PREV_FPS_FILE="/tmp/self-review-iter-$((ITER - 1)).fps"
CURR_FPS_FILE="/tmp/self-review-iter-$ITER.fps"
echo "$CURR_FPS" > "$CURR_FPS_FILE"

if [ "$ITER" -gt 1 ] && [ "$FINDINGS_COUNT" -ge 3 ] && [ -s "$PREV_FPS_FILE" ]; then
  OVERLAP=$(grep -Fxf "$PREV_FPS_FILE" "$CURR_FPS_FILE" | wc -l | tr -d ' ')
  if [ "$OVERLAP" = "0" ]; then
    echo "STOP: set-replacement divergence (iter $((ITER-1)) and iter $ITER share zero findings)"
    break
  fi
fi
if [ "$ITER" -ge 3 ]; then PREV1="/tmp/self-review-iter-$((ITER - 1)).files" PREV2="/tmp/self-review-iter-$((ITER - 2)).files" if [ -s "$PREV1" ] && [ -s "$PREV2" ]; then PERSISTENT=$(grep -Fxf "$PREV1" "/tmp/self-review-iter-$ITER.files" | grep -Fxf "$PREV2" | head -3) if [ -n "$PERSISTENT" ]; then echo "STOP: file(s) producing findings 3+ consecutive iters — possible slug drift hiding stuck finding:" echo "$PERSISTENT" break fi fi fi

这是一个附加信号——不会替代SC3的精确指纹检查。SC3捕获词汇层面的收敛失败;SC3.5捕获slug漂移掩盖的语义层面收敛失败。

**SC4 — 问题发散**:以下两个信号任意一个触发(迭代失效检测,参考`pr-babysit` § 4.5 Gate B):

(a) **数量增长**:`FINDINGS_COUNT`严格大于上一次迭代的数量→停止,提交给用户处理。修复步骤引入新问题的速度快于解决问题的速度。

(b) **集合替换**:`FINDINGS_COUNT >= 3`且当前迭代与上一次迭代的问题指纹完全无重叠(`|current ∩ prev_iter| == 0`)→停止,提交给用户处理。数量未变,但整个问题集合完全替换——修复打开了全新的问题面。仅检查数量会遗漏这种情况。

```bash
PREV_FPS_FILE="/tmp/self-review-iter-$((ITER - 1)).fps"
CURR_FPS_FILE="/tmp/self-review-iter-$ITER.fps"
echo "$CURR_FPS" > "$CURR_FPS_FILE"

if [ "$ITER" -gt 1 ] && [ "$FINDINGS_COUNT" -ge 3 ] && [ -s "$PREV_FPS_FILE" ]; then
  OVERLAP=$(grep -Fxf "$PREV_FPS_FILE" "$CURR_FPS_FILE" | wc -l | tr -d ' ')
  if [ "$OVERLAP" = "0" ]; then
    echo "STOP: set-replacement divergence (iter $((ITER-1)) and iter $ITER share zero findings)"
    break
  fi
fi

Step 3: Apply fixes per finding

步骤3:逐个应用问题修复

If no stop condition fired, iterate through findings and apply each as an atomic commit:
bash
jq -c '.findings[]' "$JSON_FILE" | while read -r FINDING; do
  ID=$(echo "$FINDING" | jq -r '.id')
  PERSONA=$(echo "$FINDING" | jq -r '.persona')
  CATEGORY=$(echo "$FINDING" | jq -r '.category')
  SLUG=$(echo "$FINDING" | jq -r '.slug')
  FILE=$(echo "$FINDING" | jq -r '.file')
  LINE=$(echo "$FINDING" | jq -r '.line')
  FAILURE_MODE=$(echo "$FINDING" | jq -r '.failure_mode')
  MITIGATION=$(echo "$FINDING" | jq -r '.mitigation')

  # Main session reads the full finding from OUTPUT_FILE for context
  # then implements MITIGATION literally
  apply_minimal_fix_from_mitigation "$FILE" "$LINE" "$MITIGATION"

  # Verify file changed
  if git diff --quiet "$FILE"; then
    SKIPPED_FINDINGS+=("$ID: no diff produced — mitigation may need design judgment")
    continue
  fi

  # Atomic commit
  git add "$FILE"
  git commit -m "fix($PERSONA): $SLUG — self-review iter $ITER #$ID

Failure mode: $FAILURE_MODE
Mitigation: $MITIGATION

Source: codex review following pr-review methodology
Category: $CATEGORY
"
done
Per-finding fix rules (HARD-GATE reinforcement):
  • Read the codex
    Mitigation:
    field for this finding. Implement it MINIMALLY.
  • Do NOT expand scope (don't refactor adjacent code, don't add tests not requested, don't add comments)
  • Do NOT add Claude reasoning ("I also noticed X, fixed that too" — NO)
  • If
    Mitigation:
    is ambiguous or requires design judgment to implement → SKIP, add to
    SKIPPED_FINDINGS
    , surface to user at end
  • One finding = one commit. If a single mitigation actually touches 3 files, one commit covering all 3 is fine. But two different findings = two commits.
Pattern generalization (mandatory — not scope expansion):
When a finding's
Failure mode
describes a class of bug rather than a single-site defect (e.g. "Slack post failure leaves the row non-terminal", "unvalidated input reaches X", "missing
await
on Y-shaped call"), the fix is NOT complete until every sibling site of that exact pattern is fixed in the SAME iteration.
  • Before committing, grep the codebase for the same pattern (the failure shape, not the literal line). Fix all matching sites under that finding's commit.
  • This is explicitly NOT the "expand scope" violation above. Scope expansion is fixing unrelated things. Fixing the same flagged pattern at sibling sites is finishing the finding — leaving siblings for the next codex pass wastes an iteration AND ships the identical bug at an unflagged site until then.
  • How to tell them apart: would codex, on the next pass, file a finding with the same
    Failure mode
    wording pointing at a different
    file:line
    ? If yes, that site belongs in THIS commit.
  • If grep surfaces sibling sites whose fix needs design judgment (not a mechanical copy of the same mitigation) → fix the mechanical ones, SKIP the judgment ones, surface them. Don't force a uniform fix across sites that aren't actually uniform.
Rationale: the loop otherwise amplifies one conceptual bug into N findings across N iterations — codex finds one site per pass, the controller fixes one site per pass, and the round count inflates to do what a single pattern-sweep does in one. Generalize on first sighting.
若未触发任何停止条件,则遍历所有问题并以原子提交的方式应用每个修复:
bash
jq -c '.findings[]' "$JSON_FILE" | while read -r FINDING; do
  ID=$(echo "$FINDING" | jq -r '.id')
  PERSONA=$(echo "$FINDING" | jq -r '.persona')
  CATEGORY=$(echo "$FINDING" | jq -r '.category')
  SLUG=$(echo "$FINDING" | jq -r '.slug')
  FILE=$(echo "$FINDING" | jq -r '.file')
  LINE=$(echo "$FINDING" | jq -r '.line')
  FAILURE_MODE=$(echo "$FINDING" | jq -r '.failure_mode')
  MITIGATION=$(echo "$FINDING" | jq -r '.mitigation')

  # Main session reads the full finding from OUTPUT_FILE for context
  # then implements MITIGATION literally
  apply_minimal_fix_from_mitigation "$FILE" "$LINE" "$MITIGATION"

  # Verify file changed
  if git diff --quiet "$FILE"; then
    SKIPPED_FINDINGS+=("$ID: no diff produced — mitigation may need design judgment")
    continue
  fi

  # Atomic commit
  git add "$FILE"
  git commit -m "fix($PERSONA): $SLUG — self-review iter $ITER #$ID

Failure mode: $FAILURE_MODE
Mitigation: $MITIGATION

Source: codex review following pr-review methodology
Category: $CATEGORY
"
done
逐个问题修复规则(强化HARD-GATE约束):
  • 读取该问题的codex
    Mitigation:
    字段内容。最小化实现修复。
  • 不得扩大范围(不得重构相邻代码,不得添加未要求的测试,不得添加注释)
  • 不得加入Claude的推理(如“我还注意到X,顺便修复了”——禁止)
  • Mitigation:
    内容模糊或需要设计判断才能实现→跳过,加入
    SKIPPED_FINDINGS
    ,在结束时告知用户
  • 一个问题对应一个提交。若单个修复实际涉及3个文件,可使用一个提交覆盖所有3个文件。但两个不同的问题必须对应两个提交。
模式泛化(强制要求——不属于范围扩大)
当问题的
Failure mode
描述的是一类bug而非单个站点缺陷时(如“Slack发送失败导致行未终止”、“未验证的输入到达X”、“Y形调用缺少
await
”),必须在同一迭代中修复该模式的所有同类站点,才算完成修复。
  • 提交前,在代码库中搜索相同模式(失败的形态,而非字面代码)。在该问题的提交中修复所有匹配的站点。
  • 这明确不属于上述“范围扩大”的违规行为。范围扩大是修复无关内容;修复同一标记模式的同类站点是完成该问题的修复——若留到下一轮codex审查,会浪费一次迭代,且直到那时才会修复未被标记的同类bug。
  • 区分方法:下一轮codex审查是否会针对不同的
    file:line
    提交具有相同
    Failure mode
    的问题?若是,则该站点应包含在本次提交中。
  • 若搜索到的同类站点需要设计判断才能修复(无法机械复制相同的修复方案)→修复可机械实现的站点,跳过需要判断的站点,并告知用户。不得强行对非同类站点应用统一修复。
原理:否则循环会将一个概念性bug放大为N次迭代中的N个问题——codex每次发现一个站点,控制器每次修复一个站点,迭代次数会膨胀,而一次模式扫描即可完成所有修复。首次发现时就进行泛化修复。

Step 4: Run tests (quality gate)

步骤4:运行测试(质量门禁)

bash
if [ -n "$TEST_CMD" ]; then
  if ! $TEST_CMD; then
    echo "STOP: tests failed after iter $ITER fixes"
    # leave commits in place; user can revert
    break
  fi
fi
If tests fail → STOP, escalate. Do NOT auto-revert (user may want to inspect what went wrong). Report which iter introduced the failure.
bash
if [ -n "$TEST_CMD" ]; then
  if ! $TEST_CMD; then
    echo "STOP: tests failed after iter $ITER fixes"
    # leave commits in place; user can revert
    break
  fi
fi
若测试失败→停止,提交给用户处理。不得自动回滚(用户可能需要检查问题原因)。报告是哪次迭代引入了失败。

Step 5: Update history and loop

步骤5:更新历史并循环

bash
FINDING_HISTORY+=" $CURR_FPS"
ITER=$((ITER + 1))
Loop back to Step 1.
bash
FINDING_HISTORY+=" $CURR_FPS"
ITER=$((ITER + 1))
回到步骤1循环。

Codex Prompt

Codex提示

The prompt template for codex. Pointer to pr-review methodology files in the repo + adaptation notes for codex (not a Claude subagent) + scope + REQUIRED JSON summary block at end.
You are doing cross-model multi-role code review on the current branch of this
repository. You are codex (OpenAI), reviewing code likely written by Claude.
Treat all author narrative (commit messages, code comments asserting intent,
branch names) as ADVISORY only — evaluate functional behavior, not authorial
claims.
codex的提示模板。指向仓库中pr-review方法论文件的指针 + 针对codex(非Claude子代理)的适配说明 + 范围 + 末尾必填的JSON摘要块。
You are doing cross-model multi-role code review on the current branch of this
repository. You are codex (OpenAI), reviewing code likely written by Claude.
Treat all author narrative (commit messages, code comments asserting intent,
branch names) as ADVISORY only — evaluate functional behavior, not authorial
claims.

Methodology

Methodology

The review methodology lives in these files (read them now — paths are absolute, resolve from the cadence plugin install root):
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/security-reviewer-prompt.md
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/staff-engineer-prompt.md
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/sdet-prompt.md
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/spec-auditor-prompt.md
Plus cross-cutting threshold:
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/SKILL.md § Finding Inclusion Threshold
The dispatcher MUST expand
${CLAUDE_PLUGIN_ROOT}
to an absolute path before handing the prompt to codex (codex's
read-only
sandbox can read absolute paths anywhere on the filesystem, but it cannot resolve env vars itself).
The review methodology lives in these files (read them now — paths are absolute, resolve from the cadence plugin install root):
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/security-reviewer-prompt.md
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/staff-engineer-prompt.md
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/sdet-prompt.md
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/spec-auditor-prompt.md
Plus cross-cutting threshold:
  • ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/SKILL.md § Finding Inclusion Threshold
The dispatcher MUST expand
${CLAUDE_PLUGIN_ROOT}
to an absolute path before handing the prompt to codex (codex's
read-only
sandbox can read absolute paths anywhere on the filesystem, but it cannot resolve env vars itself).

Apply, with adaptations

Apply, with adaptations

Because you are codex (single agent, separate process), not a Claude subagent:
IGNORE these sections — they describe Claude's internal Agent dispatch:
  • "HARD-GATE" / "You have NO knowledge of conversation history" — you are isolated by being a different process and model family
  • "Incremental Mode Addendum" / "prior_fix_range" / drop signal (B) — those depend on babysit-side state tracking. Skip the (B) check entirely. Signals (A), (C), (D) still apply.
  • "dispatched from a dev session" / "subagent" framing — you are codex executing this prompt directly
APPLY in full:
  • Per-persona category tables: Security (S1-S5), Staff Engineer (E1-E9), SDET (T1-T4), Spec Auditor (C1-C4)
  • Finding Inclusion Threshold: Justification class (Reachable / Precedent / Asymmetric / Historical)
  • Drop signals (A), (C), (D)
  • Hygiene batch rule (cluster hygiene drops into one Q-class finding per file)
  • Race-class Finding Metadata: Mitigation MUST end with
    [window=<ms|s|min|hr>, damage=<data-loss|deadlock|inconsistency|latency|marginal>, recovery=<has|no>]
  • Per-prompt Output Schema (Severity / Confidence / Blast / Justification / Evidence / Failure mode / Mitigation)
Because you are codex (single agent, separate process), not a Claude subagent:
IGNORE these sections — they describe Claude's internal Agent dispatch:
  • "HARD-GATE" / "You have NO knowledge of conversation history" — you are isolated by being a different process and model family
  • "Incremental Mode Addendum" / "prior_fix_range" / drop signal (B) — those depend on babysit-side state tracking. Skip the (B) check entirely. Signals (A), (C), (D) still apply.
  • "dispatched from a dev session" / "subagent" framing — you are codex executing this prompt directly
APPLY in full:
  • Per-persona category tables: Security (S1-S5), Staff Engineer (E1-E9), SDET (T1-T4), Spec Auditor (C1-C4)
  • Finding Inclusion Threshold: Justification class (Reachable / Precedent / Asymmetric / Historical)
  • Drop signals (A), (C), (D)
  • Hygiene batch rule (cluster hygiene drops into one Q-class finding per file)
  • Race-class Finding Metadata: Mitigation MUST end with
    [window=<ms|s|min|hr>, damage=<data-loss|deadlock|inconsistency|latency|marginal>, recovery=<has|no>]
  • Per-prompt Output Schema (Severity / Confidence / Blast / Justification / Evidence / Failure mode / Mitigation)

Execution

Execution

Execute all 4 personas sequentially. Output a combined finding list grouped by persona.
Execute all 4 personas sequentially. Output a combined finding list grouped by persona.

Scope

Scope

Review
git diff origin/<BASE>..HEAD
where BASE is below. Use
git diff
and
git log --oneline
to understand the change. Read source files as needed.
Review
git diff origin/<BASE>..HEAD
where BASE is below. Use
git diff
and
git log --oneline
to understand the change. Read source files as needed.

Output format (REQUIRED)

Output format (REQUIRED)

First emit per-persona findings in the per-prompt format from the prompt files. Group by persona.
Then at the END emit a structured JSON summary block — this is REQUIRED for the calling skill to drive its loop. The JSON block MUST be valid and parseable. Wrap it in unique sentinel markers (NOT a generic markdown fenced block — those collide with code examples in Evidence fields):
<!-- SELF-REVIEW-JSON-START -->
{ "findings": [ { "id": "1", "persona": "Security|Staff|SDET|Spec", "category": "S2|E5|T1|C4|...", "slug": "kebab-case-slug-from-finding", "file": "path/to/file (relative to repo root)", "line": 42, "line_source": "source", "severity": "Blocker|Factual|Suggestion|Question", "justification": "Reachable|Precedent|Asymmetric|Historical", "confidence": "high|medium|low", "blast": "Local|Module|Cross-service|Data layer", "failure_mode": "one-line", "mitigation": "one-line, ending with race-meta tag if applicable" } ] }
<!-- SELF-REVIEW-JSON-END -->
Rules:
  • findings: []
    (empty array) is VALID output meaning no findings. Still emit the block with
    {"findings": []}
    between the sentinels — do NOT omit the JSON block.
  • line
    MUST be the source file line number (the line in the file as written on disk after your reading), NOT the diff hunk line number. If the diff shifted lines, use the post-shift source file line.
  • line_source
    MUST be
    "source"
    literal — this confirms you used source file lines, not diff lines. Any other value → caller treats as malformed and escalates.
  • All listed fields are REQUIRED. Do not emit findings with missing fields.
First emit per-persona findings in the per-prompt format from the prompt files. Group by persona.
Then at the END emit a structured JSON summary block — this is REQUIRED for the calling skill to drive its loop. The JSON block MUST be valid and parseable. Wrap it in unique sentinel markers (NOT a generic markdown fenced block — those collide with code examples in Evidence fields):
<!-- SELF-REVIEW-JSON-START -->
{ "findings": [ { "id": "1", "persona": "Security|Staff|SDET|Spec", "category": "S2|E5|T1|C4|...", "slug": "kebab-case-slug-from-finding", "file": "path/to/file (relative to repo root)", "line": 42, "line_source": "source", "severity": "Blocker|Factual|Suggestion|Question", "justification": "Reachable|Precedent|Asymmetric|Historical", "confidence": "high|medium|low", "blast": "Local|Module|Cross-service|Data layer", "failure_mode": "one-line", "mitigation": "one-line, ending with race-meta tag if applicable" } ] }
<!-- SELF-REVIEW-JSON-END -->
Rules:
  • findings: []
    (empty array) is VALID output meaning no findings. Still emit the block with
    {"findings": []}
    between the sentinels — do NOT omit the JSON block.
  • line
    MUST be the source file line number (the line in the file as written on disk after your reading), NOT the diff hunk line number. If the diff shifted lines, use the post-shift source file line.
  • line_source
    MUST be
    "source"
    literal — this confirms you used source file lines, not diff lines. Any other value → caller treats as malformed and escalates.
  • All listed fields are REQUIRED. Do not emit findings with missing fields.

Important

Important

  • Do NOT modify any files
  • Race-class findings without meta tag → drop the finding
  • You are codex, not Claude — your prose can be your own
  • Stay focused on the diff
BASE branch: origin/<substitute BASE from skill caller>
undefined
  • Do NOT modify any files
  • Race-class findings without meta tag → drop the finding
  • You are codex, not Claude — your prose can be your own
  • Stay focused on the diff
BASE branch: origin/<substitute BASE from skill caller>
undefined

Stop conditions summary

停止条件摘要

ConditionWhenAction
SC1 Success
findings == 0
Report iter count + total fixes applied
SC0 Severity flooriteration has findings but ZERO real bugs (no
Blocker
/
Factual
×
Reachable
/
Asymmetric
)
STOP, converged-enough; surface hygiene findings, don't auto-fix
SC2 Cap reached
iter > 3
(
MAX_ITERS
)
Escalate; surface remaining findings to user
SC3 Repeat 3xSame finding fingerprint (
file:line:slug
) 3 iterations
Escalate; race-of-race signal (cf
pr-babysit
§ 4.5 Gate B)
SC3.5 Slug driftSame file produces findings in 3+ consecutive iters (file-only fallback)Escalate; possible slug drift hiding stuck finding
SC4 Findings diverging(a) count growing iter-over-iter, OR (b) zero fingerprint overlap between consecutive iters with ≥3 findingsEscalate; auto-fix opening new surfaces
SC5 Test failure
$TEST_CMD
exits non-zero after iter's fixes
Escalate; commits left in place for user inspection
SC6 Skip backlog≥3 findings skipped this iter (can't fix from Mitigation alone)Continue loop, but surface skipped list at end
User ctrl-cUser interruptsReport partial state, last committed iter, what was in progress
条件触发时机操作
SC1 成功
findings == 0
报告迭代次数 + 已应用的修复总数
SC0 严重度下限迭代存在问题,但无实际bug(无
Blocker
/
Factual
×
Reachable
/
Asymmetric
组合)
停止,已足够收敛;展示卫生级问题,不自动修复
SC2 达到上限
iter > 3
MAX_ITERS
提交给用户处理;展示剩余问题
SC3 重复3次同一问题指纹(
file:line:slug
)出现3次迭代
提交给用户处理;迭代失效信号(参考
pr-babysit
§ 4.5 Gate B)
SC3.5 Slug漂移同一文件连续3次以上迭代产生问题(文件级兜底)提交给用户处理;可能存在slug漂移掩盖未解决问题
SC4 问题发散(a) 问题数量逐次增长,或(b) 连续两次迭代问题数量≥3且指纹完全无重叠提交给用户处理;自动修复打开了新的问题面
SC5 测试失败迭代修复后
$TEST_CMD
执行失败
提交给用户处理;保留提交供用户检查
SC6 跳过积压本次迭代跳过≥3个问题(无法仅通过Mitigation修复)继续循环,但在结束时展示跳过的问题列表
用户ctrl-c用户中断报告部分状态、最后一次提交的迭代、正在进行的操作

Mode=review-only

Mode=review-only

When user explicitly asks for findings without auto-fix:
  1. Run codex (same prompt as loop mode Step 1)
  2. Present output verbatim from
    $OUTPUT_FILE
    (full per-persona findings text)
  3. Stop. Do NOT commit. Do NOT touch files.
This is the L0+ "advisory findings" path — verdict stays with user.
当用户明确要求仅展示问题而不自动修复时:
  1. 执行codex审查(与loop模式步骤1使用相同的提示)
  2. 原样展示
    $OUTPUT_FILE
    的输出(完整的分角色问题文本)
  3. 停止。不提交。不修改文件。
这是L0+的“建议性问题”路径——verdict由用户决定。

Report at end

结束报告

After loop exit (any stop condition), generate report:
SELF-REVIEW LOOP REPORT
═════════════════════════════════════════════════════════════
Iterations: <N>
Stop reason: <SC code + brief>
Commits made: <count> (atomic, one per finding)
  - <sha> fix(<persona>): <slug>
  - ...
Tests: <pass | fail | skipped (no TEST_CMD)>

Findings still open (if escalation):
  - <persona> / <category> @ <file>:<line>: <slug>
    Mitigation: <one-line>
    Why surfaced: <which SC fired>

Skipped findings (Mitigation needed design judgment):
  - <persona> @ <file>:<line>: <slug>
    Mitigation: <verbatim>
    Reason: <why main session couldn't fix mechanically>

Suggested next steps:
- Review the atomic commits — revert any you disagree with
- For surfaced findings: read /tmp/self-review-iter-<N>.md for full context, decide modify/wontfix/defer manually
- Consider /cadence:pr-review mode=local for Claude-side multi-role view + comparison
- Push when satisfied
═════════════════════════════════════════════════════════════
循环退出(任何停止条件触发)后,生成报告:
SELF-REVIEW 循环报告
═════════════════════════════════════════════════════════════
迭代次数: <N>
停止原因: <SC代码 + 简要说明>
已创建提交: <数量>(原子提交,每个问题对应一个)
  - <sha> fix(<persona>): <slug>
  - ...
测试状态: <通过 | 失败 | 跳过(无TEST_CMD)>

仍未解决的问题(若提交给用户处理):
  - <persona> / <category> @ <file>:<line>: <slug>
    修复方案: <单行>
    提交原因: <触发了哪个SC>

跳过的问题(修复方案需要设计判断):
  - <persona> @ <file>:<line>: <slug>
    修复方案: <原文>
    原因: <主会话无法机械修复的原因>

建议下一步操作:
- 审查原子提交——回滚任何你不同意的修复
- 对于提交的问题:阅读/tmp/self-review-iter-<N>.md获取完整上下文,手动决定修改/无需修复/延期
- 考虑使用/cadence:pr-review mode=local获取Claude端的多角色视图并进行对比
- 满意后推送代码
═════════════════════════════════════════════════════════════

Notes

说明

  • Cross-model isolation rationale: codex (OpenAI GPT) reviews Claude-generated code → avoids same-model self-preference bias (Wataoka et al., perplexity-driven). Each codex invocation is a fresh process — no conversation context inheritance.
  • Context-mix prevention design: codex output goes to
    /tmp/self-review-iter-$ITER.md
    (file, not chat). Main session only parses the JSON summary block into conversation memory. Full finding text accessed by main session via file read when implementing each fix — not auto-injected. This keeps main session's growing conversation lean across iterations.
  • Methodology single-sourced: codex reads
    ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/*-prompt.md
    directly (dispatcher expands the env var to an absolute path before handing the prompt to codex). When pr-review prompts update, this skill picks up the new methodology automatically. No keep-in-sync burden between pr-review and self-review.
  • Adaptation layer keep-in-sync: the "IGNORE these sections" list in the codex prompt mirrors Claude-specific sections in pr-review prompts. If pr-review adds new Claude-only machinery, update the IGNORE list. Annotated as a maintenance concern, not auto-detected.
  • Per-finding atomic commits:
    git revert <sha>
    undoes one finding's fix cleanly. History preserves the audit trail of what codex flagged + how it was fixed.
  • Test gate as natural safety net: the cheapest signal that "auto-fix broke things" is a failing test. Catches regressions without needing complex semantic verification.
  • Loop count cap (3) reasoning: empirically (5-iter run on a real feature branch) codex's high-value findings — real reachable bugs, the same-model-blind-spot class — all landed in iters 1–3. Iters 4–5 produced hygiene-tier nits, the tail of an already-identified pattern, and one outright false positive the controller had to reject. The cap was 5; it is now 3. The principled stop is SC0 (severity floor) —
    MAX_ITERS
    is the blunt backstop, and a run that still yields real-bug findings at iter 3 is itself signalling the change is too big and should be split.
  • SC0 vs MAX_ITERS — why both: SC0 (severity floor) is the intended stop — it fires when an iteration produces no real reachable bug, i.e. further rounds would only surface nits.
    MAX_ITERS=3
    is the backstop for the pathological case where codex keeps finding real bugs that far out. A healthy run stops on SC0 at iter 2–3; only an unhealthy (oversized-diff) run reaches the cap.
  • Pattern generalization beats round count: the loop's structural weakness is amplifying one conceptual bug into N findings across N iterations (codex finds one site per pass; controller fixes one site per pass). The Step 3 "Pattern generalization" rule counters this — on first sighting of a pattern-class finding, grep + fix all sibling sites in the same iteration. Done well, the loop self-converges inside the cap without relying on it.
  • Author bias still applies to FIX step, not VERDICT step: codex finding generation is cross-model isolated. Main session writing the fix is NOT — but the HARD-GATE constrains main session to mechanical execution of
    Mitigation:
    field, removing the verdict-reasoning attack surface. If you notice main session "reasoning whether codex is right" → that's a HARD-GATE violation, surface the finding instead of arguing.
  • Worktree assumption: skill expects to run on a feature branch (user already in worktree or non-main branch). Doesn't self-create worktrees. If user is on
    main
    , warn but don't block — they may know what they're doing.
  • No state persistence across invocations: each invocation starts fresh.
    FINDING_HISTORY
    is per-invocation. If you re-invoke after manual edits, the loop has no memory of previous runs — by design (keeps the skill stateless, no
    .claude/state/
    files to maintain).
  • 跨模型隔离原理:codex(OpenAI GPT)审查Claude生成的代码→避免同模型自我偏好偏差(Wataoka et al.,基于困惑度驱动)。每次codex调用都是全新进程——无对话上下文继承。
  • 上下文混淆预防设计:codex输出写入
    /tmp/self-review-iter-$ITER.md
    (文件,而非聊天)。主会话仅将JSON摘要块解析到对话内存中。主会话在实现每个修复时通过文件读取完整问题文本——不会自动注入。这确保主会话的对话内容在多次迭代中保持精简。
  • 方法论单一来源:codex直接读取
    ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/*-prompt.md
    (调度器在将提示交给codex前会将环境变量展开为绝对路径)。当pr-review提示更新时,此技能会自动采用新的方法论。无需在pr-review和self-review之间同步内容。
  • 适配层同步:codex提示中的“忽略以下章节”列表与pr-review提示中Claude专属的章节对应。若pr-review添加新的Claude专属机制,需更新忽略列表。这是维护关注点,无法自动检测。
  • 每个问题对应原子提交
    git revert <sha>
    可干净地撤销一个问题的修复。提交历史保留了codex标记的问题及修复方式的审计轨迹。
  • 测试门禁作为天然安全屏障:“自动修复导致功能损坏”的最廉价信号是测试失败。无需复杂的语义验证即可捕获回归问题。
  • 循环次数上限(3次)的依据:根据经验(在真实功能分支上运行5次迭代),codex发现的高价值问题——实际可触发的bug、同模型盲点类问题——全部出现在第1-3次迭代。第4-5次迭代产生卫生级小问题、已识别模式的尾部问题,以及一个控制器必须排除的误报。之前上限为5次;现在调整为3次。原则性的停止条件是SC0(严重度下限)——
    MAX_ITERS
    是兜底的硬性限制,若第3次迭代仍能发现实际bug,本身就表明变更过大,应拆分。
  • SC0与MAX_ITERS并存的原因:SC0(严重度下限)是预期的停止条件——当某次迭代未发现实际可触发的bug时触发,即进一步迭代只会发现小问题。
    MAX_ITERS=3
    是针对codex持续发现实际bug的异常情况的兜底。健康运行会在第2-3次迭代触发SC0;只有变更过大的不健康运行才会达到上限。
  • 模式泛化优于迭代次数:循环的结构性弱点是将一个概念性bug放大为N次迭代中的N个问题(codex每次发现一个站点;控制器每次修复一个站点)。步骤3的“模式泛化”规则可解决此问题——首次发现模式类问题时,搜索并修复同一迭代中的所有同类站点。操作得当的话,循环会在达到上限前自行收敛。
  • 作者偏差仍适用于修复步骤,而非Verdict步骤:codex问题生成是跨模型隔离的。主会话执行修复并非隔离偏差,但HARD-GATE约束主会话机械执行
    Mitigation:
    字段内容,消除了verdict推理的风险。若发现主会话在“判断codex是否正确”→这违反了HARD-GATE规则,应将问题提交给用户处理而非争论。
  • 工作树假设:此技能预期在功能分支上运行(用户已处于工作树或非主分支)。不会自行创建工作树。若用户在
    main
    分支上,会发出警告但不阻止——用户可能清楚自己的操作。
  • 调用间无状态持久化:每次调用都是全新开始。
    FINDING_HISTORY
    仅针对本次调用。若用户手动编辑后重新调用,循环无之前运行的记忆——这是设计使然(保持技能无状态,无需维护
    .claude/state/
    文件)。