self-review
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseself-review
self-review
Auto-loop dev-stage self-review on the current branch. Runs (OpenAI GPT) cross-model review following pr-review's multi-role methodology, applies fixes per-finding, re-reviews, and loops until findings converge or a stop condition fires.
codexSingle-shot mode () is available as opt-in — produces findings without auto-fix.
<HARD-GATE>
This skill auto-modifies code on the current branch. Constraints:
review-onlyLoop semantics:
- Per-finding atomic commits — every fix gets its own conventional commit, easy to revert
- Stop conditions are NON-NEGOTIABLE (see Stop Conditions below) — when one fires, the loop STOPS and the remaining work surfaces to the user, period
- User can interrupt anytime with ctrl-c; main session catches and reports partial state
Context-mix prevention:
- Codex runs in a fresh process every iteration — no shared conversation context with main session
- Codex output (verbose finding text) goes to a temp file, NOT into chat — only a structured JSON summary enters main session memory, per-iteration
- Main session implementing fixes MUST execute the codex finding's field literally, mechanically — NO additional reasoning about whether the fix is right, NO substitution of "a better approach", NO scope expansion
Mitigation: - If you cannot fix the finding from the field alone (because it's ambiguous, requires design judgment, or touches code outside the scope of this finding) → SKIP that finding for this iter and surface it to user at end
Mitigation:
Verdict-reasoning forbidden:
- This skill does not produce wontfix decisions. Findings are either fixed (auto) or surfaced (escalation).
- Do NOT decide "this finding doesn't matter" → all findings are fixed unless they hit the "cannot fix from Mitigation alone" gate above
- Do NOT fill out Wontfix Template fields from main-session memory (see § 4.6)
pr-babysit
Quality gate:
- Tests run after each iter's fixes — (or detected per-repo command). Failure → STOP and escalate
pnpm test - This is the natural safety net for "fixes that break things"
Author bias on fix step:
- Codex (cross-model) generates findings → bias-isolated at finding-generation layer
- Main session writes fix → not bias-isolated, but follows codex Mitigation literally → bias attack surface is "mechanical execution accuracy", not "verdict judgment"
- If you catch yourself reasoning "I think codex is wrong about this" → that's verdict reasoning, NOT allowed in this skill. Surface to user at end. </HARD-GATE>
对当前分支执行开发阶段的自动循环self-review。运行遵循pr-review多角色方法论的(OpenAI GPT)跨模型审查,针对每个问题应用修复,再次审查,循环直到问题收敛或触发停止条件。
codex可选启用单次模式()——仅生成问题报告,不自动修复。
<HARD-GATE>
此技能会自动修改当前分支上的代码。约束条件:
review-only循环语义:
- 每个问题对应原子提交——每个修复都有独立的规范提交,便于回滚
- 停止条件为不可协商(见下文停止条件)——一旦触发,循环立即停止,剩余工作交由用户处理
- 用户可随时按ctrl-c中断;主会话会捕获并报告部分状态
上下文混淆预防:
- 每次迭代时,codex都会在全新进程中运行——与主会话无共享对话上下文
- codex输出(详细问题文本)会写入临时文件,不会进入聊天会话——仅结构化JSON摘要会进入主会话内存(按迭代存储)
- 执行修复的主会话必须严格、机械地执行codex问题中的字段内容——不得额外判断修复是否正确,不得替换为“更好的方案”,不得扩大修复范围
Mitigation: - 若仅通过字段无法完成修复(内容模糊、需要设计判断或涉及超出当前问题范围的代码)→ 跳过该问题,在循环结束时告知用户
Mitigation:
禁止 verdict 推理:
- 此技能不生成“无需修复”(wontfix)的决策。问题要么被自动修复,要么被提交给用户处理
- 不得判定“此问题无关紧要”→ 除非遇到上述“无法仅通过Mitigation修复”的限制,否则所有问题都需修复
- 不得从主会话内存中填写Wontfix模板字段(详见§ 4.6)
pr-babysit
质量门禁:
- 每次迭代修复后运行测试——(或自动检测仓库专属命令)。测试失败→停止并提交给用户处理
pnpm test - 这是“修复导致功能损坏”的天然安全屏障
修复步骤的作者偏差:
- codex(跨模型)生成问题→在问题生成层隔离偏差
- 主会话执行修复→未隔离偏差,但严格遵循codex的Mitigation内容→偏差风险仅存在于“机械执行的准确性”,而非“ verdict 判断”
- 若发现自己在思考“我认为codex的判断有误”→这属于verdict推理,违反此技能规则,应在结束时告知用户 </HARD-GATE>
Modes
模式
- (default) — codex review → per-finding fix + commit → tests → re-review → loop until stop
loop - — single codex pass, present findings, stop. No fix, no commit, no test run. Use when you want a manual review without auto-modification.
review-only
Mode selection:
- User says "self review" or "review my branch" → loop (default)
- User says "just show me findings" or "self review without fixing" → review-only
- (默认)——codex审查→逐个问题修复+提交→测试→再次审查→循环直到触发停止条件
loop - ——单次codex审查,展示问题后停止。不修复、不提交、不运行测试。适用于需要手动审查但不希望自动修改代码的场景
review-only
模式选择规则:
- 用户输入"self review"或"review my branch"→默认使用loop模式
- 用户输入"just show me findings"或"self review without fixing"→使用review-only模式
When to use
适用场景
- Just wrote code on a feature branch, working tree is clean and committed, want auto cross-model review + fix before push
- Iterating on a feature, want to converge on a clean state quickly
- Pre-PR sanity pass — catch the obvious stuff codex spots before sending to humans
- 刚在功能分支上完成代码编写,工作树已清理并提交,希望在推送前进行自动跨模型审查+修复
- 迭代开发功能,希望快速收敛到整洁的代码状态
- PR前置检查——在提交给人工审查前,先让codex发现明显问题
When NOT to use
不适用于场景
- Live PR review with sticky comment + inline threads → (default mode)
/cadence:pr-review - PR babysit work after PR is open → (handles thread reply, dedup, CI gates)
/cadence:pr-babysit - Working tree has uncommitted changes — STOP and ask user to commit or stash first. Loop assumes clean working tree at start so per-finding commits are atomic
- You want manual verdict control → use and decide yourself
mode=review-only - First-time use on a new branch → consider first to see what codex produces before letting it auto-fix
mode=review-only
- 带固定评论+内联线程的实时PR审查→使用(默认模式)
/cadence:pr-review - PR已创建后的看护工作→使用(处理线程回复、去重、CI门禁)
/cadence:pr-babysit - 工作树存在未提交变更→停止并要求用户先提交或暂存变更。循环要求初始工作树干净,以确保每个问题的提交都是原子性的
- 需要手动控制verdict→使用模式自行决策
mode=review-only - 首次在新分支上使用→建议先使用模式查看codex的输出,再启用自动修复
mode=review-only
Setup check
环境检查
Run these checks at start. STOP on any failure with the listed message — do NOT attempt auto-install.
bash
undefined开始前运行以下检查。若任何检查失败,输出指定消息并停止——请勿尝试自动安装。
bash
undefinedCodex CLI
Codex CLI
codex --version 2>/dev/null || { echo "STOP: Install codex — npm install -g @openai/codex"; exit 1; }
codex login --status 2>/dev/null || { echo "STOP: Run 'codex login' to authenticate"; exit 1; }
codex --version 2>/dev/null || { echo "STOP: Install codex — npm install -g @openai/codex"; exit 1; }
codex login --status 2>/dev/null || { echo "STOP: Run 'codex login' to authenticate"; exit 1; }
Plugin install root — needed so codex can locate the pr-review methodology prompts
Plugin install root — needed so codex can locate the pr-review methodology prompts
[ -n "${CLAUDE_PLUGIN_ROOT:-}" ] && [ -d "${CLAUDE_PLUGIN_ROOT}/skills/pr-review" ] || {
echo "STOP: CLAUDE_PLUGIN_ROOT must point at cadence's install root (so codex can read ./skills/pr-review/*-prompt.md). Set it explicitly if your harness doesn't export it (e.g. export CLAUDE_PLUGIN_ROOT=~/.claude/plugins/cache/cadence/cadence/<version>)."; exit 1;
}
[ -n "${CLAUDE_PLUGIN_ROOT:-}" ] && [ -d "${CLAUDE_PLUGIN_ROOT}/skills/pr-review" ] || {
echo "STOP: CLAUDE_PLUGIN_ROOT must point at cadence's install root (so codex can read ./skills/pr-review/*-prompt.md). Set it explicitly if your harness doesn't export it (e.g. export CLAUDE_PLUGIN_ROOT=~/.claude/plugins/cache/cadence/cadence/<version>)."; exit 1;
}
Clean working tree
Clean working tree
git diff --quiet && git diff --cached --quiet || { echo "STOP: Working tree has uncommitted changes. Commit or stash, then re-invoke."; exit 1; }
git diff --quiet && git diff --cached --quiet || { echo "STOP: Working tree has uncommitted changes. Commit or stash, then re-invoke."; exit 1; }
Detect base branch
Detect base branch
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||')
[ -z "$BASE" ] && BASE=main
echo "BASE: origin/$BASE"
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||')
[ -z "$BASE" ] && BASE=main
echo "BASE: origin/$BASE"
Detect test command (best-effort — repo-specific)
Detect test command (best-effort — repo-specific)
if [ -f package.json ] && jq -e '.scripts.test' package.json >/dev/null 2>&1; then
TEST_CMD="pnpm test"
elif [ -f Makefile ] && grep -q '^test:' Makefile; then
TEST_CMD="make test"
else
TEST_CMD=""
echo "WARN: No test command detected — quality gate disabled"
fi
echo "TEST_CMD: ${TEST_CMD:-<none>}"
undefinedif [ -f package.json ] && jq -e '.scripts.test' package.json >/dev/null 2>&1; then
TEST_CMD="pnpm test"
elif [ -f Makefile ] && grep -q '^test:' Makefile; then
TEST_CMD="make test"
else
TEST_CMD=""
echo "WARN: No test command detected — quality gate disabled"
fi
echo "TEST_CMD: ${TEST_CMD:-<none>}"
undefinedLoop algorithm (mode=loop)
循环算法(mode=loop)
Maintain in main session memory:
- — current iteration number (starts at 1)
ITER - — hard cap. Empirically codex's high-value findings (real reachable bugs, the kind a same-model reviewer misses) land in iters 1–3; iters 4–5 trend to hygiene-tier nits + the occasional false positive that the controller then has to spend judgment rejecting. The real stop is SC0 (severity floor) below —
MAX_ITERS = 3is the blunt backstop.MAX_ITERS - — list of
FINDING_HISTORYfingerprints from prior iterations (for repetition detection)file:line:slug
在主会话内存中维护以下状态:
- ——当前迭代次数(从1开始)
ITER - ——硬上限。根据经验,codex发现的高价值问题(实际可触发的bug,同模型审查者容易遗漏的类型)集中在第1-3次迭代;第4-5次迭代多为卫生级别的小问题,偶尔会出现控制器需要判断排除的误报。核心停止条件是下文的SC0(严重度下限)——
MAX_ITERS = 3是兜底的硬性限制MAX_ITERS - ——来自之前迭代的
FINDING_HISTORY指纹列表(用于检测重复问题)file:line:slug
Step 1: Run codex review
步骤1:执行codex审查
Same prompt construction as Step 2 of (see Codex Prompt section below). Output goes to . Do NOT inline the verbose codex output into main session — only parse the JSON summary block.
mode=review-only/tmp/self-review-iter-$ITER.mdCodex prompt MUST end with a summary JSON block (described in Codex Prompt section below) so main session can drive the loop without parsing prose.
bash
PROMPT_FILE=$(mktemp /tmp/self-review-prompt-XXXXXX.md)
OUTPUT_FILE="/tmp/self-review-iter-$ITER.md"
JSON_FILE="/tmp/self-review-iter-$ITER.json"与的步骤2使用相同的提示构建方式(详见下文Codex提示部分)。输出写入。请勿将冗长的codex输出内联到主会话中——仅解析JSON摘要块。
mode=review-only/tmp/self-review-iter-$ITER.mdCodex提示必须以摘要JSON块结尾(详见下文Codex提示部分),以便主会话无需解析 prose 即可驱动循环。
bash
PROMPT_FILE=$(mktemp /tmp/self-review-prompt-XXXXXX.md)
OUTPUT_FILE="/tmp/self-review-iter-$ITER.md"
JSON_FILE="/tmp/self-review-iter-$ITER.json"Write prompt (see Codex Prompt section)
Write prompt (see Codex Prompt section)
write_codex_prompt > "$PROMPT_FILE"
write_codex_prompt > "$PROMPT_FILE"
Run codex (5-min timeout)
Run codex (5-min timeout)
_REPO_ROOT=$(git rev-parse --show-toplevel)
codex exec "$(cat "$PROMPT_FILE")"
-C "$_REPO_ROOT"
-s read-only
-c 'model_reasoning_effort="high"'
--enable web_search_cached \
-C "$_REPO_ROOT"
-s read-only
-c 'model_reasoning_effort="high"'
--enable web_search_cached \
"$OUTPUT_FILE" 2>/tmp/self-review-err
_REPO_ROOT=$(git rev-parse --show-toplevel)
codex exec "$(cat "$PROMPT_FILE")"
-C "$_REPO_ROOT"
-s read-only
-c 'model_reasoning_effort="high"'
--enable web_search_cached \
-C "$_REPO_ROOT"
-s read-only
-c 'model_reasoning_effort="high"'
--enable web_search_cached \
"$OUTPUT_FILE" 2>/tmp/self-review-err
Extract JSON summary block between unique sentinels
Extract JSON summary block between unique sentinels
sed -n '/<!-- SELF-REVIEW-JSON-START -->/,/<!-- SELF-REVIEW-JSON-END -->/{
/<!-- SELF-REVIEW-JSON-/d
p
}' "$OUTPUT_FILE" > "$JSON_FILE"
sed -n '/<!-- SELF-REVIEW-JSON-START -->/,/<!-- SELF-REVIEW-JSON-END -->/{
/<!-- SELF-REVIEW-JSON-/d
p
}' "$OUTPUT_FILE" > "$JSON_FILE"
Guard: codex must emit a parseable JSON block — empty or malformed is NOT zero findings
Guard: codex must emit a parseable JSON block — empty or malformed is NOT zero findings
if [ ! -s "$JSON_FILE" ]; then
echo "STOP: codex did not emit a JSON summary block. Output saved to $OUTPUT_FILE for manual review."
break
fi
if ! jq -e '.findings | type == "array"' "$JSON_FILE" >/dev/null 2>&1; then
echo "STOP: codex JSON summary malformed. Output: $OUTPUT_FILE, JSON: $JSON_FILE"
break
fi
if [ ! -s "$JSON_FILE" ]; then
echo "STOP: codex did not emit a JSON summary block. Output saved to $OUTPUT_FILE for manual review."
break
fi
if ! jq -e '.findings | type == "array"' "$JSON_FILE" >/dev/null 2>&1; then
echo "STOP: codex JSON summary malformed. Output: $OUTPUT_FILE, JSON: $JSON_FILE"
break
fi
Sanity: confirm line_source field is "source" on every finding (catches diff-line confusion)
Sanity: confirm line_source field is "source" on every finding (catches diff-line confusion)
BAD_LINE_SOURCE=$(jq -r '[.findings[] | select(.line_source != "source")] | length' "$JSON_FILE")
if [ "$BAD_LINE_SOURCE" != "0" ]; then
echo "STOP: codex emitted findings with line_source != 'source' (likely diff-line numbers, not source-file lines). Review $OUTPUT_FILE manually."
break
fi
undefinedBAD_LINE_SOURCE=$(jq -r '[.findings[] | select(.line_source != "source")] | length' "$JSON_FILE")
if [ "$BAD_LINE_SOURCE" != "0" ]; then
echo "STOP: codex emitted findings with line_source != 'source' (likely diff-line numbers, not source-file lines). Review $OUTPUT_FILE manually."
break
fi
undefinedStep 2: Convergence check
步骤2:收敛性检查
Read findings from . Apply stop conditions IN ORDER — first match wins:
$JSON_FILEbash
FINDINGS_COUNT=$(jq '.findings | length' "$JSON_FILE")SC1 — Success: → STOP, report success.
FINDINGS_COUNT == 0SC0 — Severity floor (converged-enough): an iteration whose findings are
ALL hygiene-tier → STOP, report as converged. "Hygiene-tier" = no finding is
both ( in ) AND ( in
). I.e. nothing left that is a real, reachable bug —
only /, or findings resting on
/ speculation.
severityBlocker|FactualjustificationReachable|AsymmetricSuggestionQuestionFactualPrecedentHistoricalbash
REAL_BUGS=$(jq -r '[.findings[]
| select((.severity == "Blocker" or .severity == "Factual")
and (.justification == "Reachable" or .justification == "Asymmetric"))]
| length' "$JSON_FILE")REAL_BUGS == 0FINDINGS_COUNT > 0MAX_ITERSWhy SC0 is checked before SC2 but after SC1: zero findings is unambiguous success; a hygiene-only iteration is converged-enough success; thecap is the unhappy backstop. Naming it SC0 keeps it visually adjacent to SC1 (both success-class) without renumbering SC2–SC6.MAX_ITERS
SC2 — Cap reached: → STOP, escalate with current findings. Do NOT apply more fixes.
ITER > MAX_ITERSSC3 — Same finding 3x (race-of-race signal):
bash
undefined从读取问题。按顺序应用停止条件——第一个匹配的条件生效:
$JSON_FILEbash
FINDINGS_COUNT=$(jq '.findings | length' "$JSON_FILE")SC1 — 成功:→停止,报告成功。
FINDINGS_COUNT == 0SC0 — 严重度下限(足够收敛):某次迭代的所有问题均为卫生级别→停止,报告为已收敛。“卫生级别”指没有同时满足(为)且(为)的问题。即剩余问题均非实际可触发的bug——仅为/,或基于/推测的问题。
severityBlocker|FactualjustificationReachable|AsymmetricSuggestionQuestionPrecedentHistoricalFactualbash
REAL_BUGS=$(jq -r '[.findings[]
| select((.severity == "Blocker" or .severity == "Factual")
and (.justification == "Reachable" or .justification == "Asymmetric"))]
| length' "$JSON_FILE")REAL_BUGS == 0FINDINGS_COUNT > 0MAX_ITERS为何SC0在SC2之前、SC1之后检查:零问题是明确的成功;仅存在卫生级问题是“足够收敛”的成功;上限是兜底的异常情况。命名为SC0是为了在视觉上与SC1(同属成功类)相邻,无需重新编号SC2-SC6。MAX_ITERS
SC2 — 达到上限:→停止,将当前问题提交给用户处理。不得再应用更多修复。
ITER > MAX_ITERSSC3 — 同一问题出现3次(迭代失效信号):
bash
undefinedBuild current iter's fingerprints
Build current iter's fingerprints
CURR_FPS=$(jq -r '.findings[] | "(.file):(.line):(.slug)"' "$JSON_FILE")
CURR_FPS=$(jq -r '.findings[] | "(.file):(.line):(.slug)"' "$JSON_FILE")
For each fingerprint, count occurrences in HISTORY + this iter
For each fingerprint, count occurrences in HISTORY + this iter
for FP in $CURR_FPS; do
COUNT=$(echo "$FINDING_HISTORY" | grep -cF "$FP" || true)
COUNT=$((COUNT + 1)) # current iter
if [ "$COUNT" -ge 3 ]; then
REPEATED="$FP"
break
fi
done
If `REPEATED` non-empty → STOP, escalate. Same finding has fired 3+ times across iterations; auto-fix is not converging. Reference `pr-babysit` § 4.5 Gate B for the equivalent convergence-failure pattern.
**SC3.5 — File-only fallback for slug drift**: codex may use a different slug each iter for the same logical issue (`missing-nil-check` → `null-dereference-guard` → `null-check-omitted`). Exact fingerprint match would miss this. Track per-file appearance count across iters; if the same file has produced findings in 3+ consecutive iters → STOP, surface as warning:
```bashfor FP in $CURR_FPS; do
COUNT=$(echo "$FINDING_HISTORY" | grep -cF "$FP" || true)
COUNT=$((COUNT + 1)) # current iter
if [ "$COUNT" -ge 3 ]; then
REPEATED="$FP"
break
fi
done
若`REPEATED`非空→停止,提交给用户处理。同一问题在多次迭代中出现3次以上;自动修复无法收敛。参考`pr-babysit` § 4.5 Gate B中的等效收敛失败模式。
**SC3.5 — 针对slug漂移的文件级兜底**:codex可能对同一逻辑问题使用不同的slug(如`missing-nil-check`→`null-dereference-guard`→`null-check-omitted`)。精确指纹匹配会遗漏这种情况。跟踪每个文件在迭代中的出现次数;若同一文件连续3次迭代都产生问题→停止,作为警告告知用户:
```bashPer-iter file set (just file names, dedup)
Per-iter file set (just file names, dedup)
CURR_FILES=$(jq -r '.findings[].file' "$JSON_FILE" | sort -u)
echo "$CURR_FILES" > "/tmp/self-review-iter-$ITER.files"
CURR_FILES=$(jq -r '.findings[].file' "$JSON_FILE" | sort -u)
echo "$CURR_FILES" > "/tmp/self-review-iter-$ITER.files"
Count files that appear in current iter AND last two iters' file lists
Count files that appear in current iter AND last two iters' file lists
if [ "$ITER" -ge 3 ]; then
PREV1="/tmp/self-review-iter-$((ITER - 1)).files"
PREV2="/tmp/self-review-iter-$((ITER - 2)).files"
if [ -s "$PREV1" ] && [ -s "$PREV2" ]; then
PERSISTENT=$(grep -Fxf "$PREV1" "/tmp/self-review-iter-$ITER.files" | grep -Fxf "$PREV2" | head -3)
if [ -n "$PERSISTENT" ]; then
echo "STOP: file(s) producing findings 3+ consecutive iters — possible slug drift hiding stuck finding:"
echo "$PERSISTENT"
break
fi
fi
fi
This is an ADDITIVE signal — doesn't replace SC3's exact fingerprint check. SC3 catches lexical convergence failure; SC3.5 catches semantic convergence failure that slug drift hides.
**SC4 — Findings diverging**: fires on EITHER of two signals (race-of-race detection, cf `pr-babysit` § 4.5 Gate B):
(a) **Count growing**: `FINDINGS_COUNT` strictly larger than previous iter's count → STOP, escalate. The fix step is introducing new issues faster than it resolves them.
(b) **Set replacement**: `FINDINGS_COUNT >= 3` AND zero fingerprint overlap between current iter and previous iter (`|current ∩ prev_iter| == 0`) → STOP, escalate. Count unchanged but the WHOLE finding set turned over — fixes are opening completely new surfaces. Count-only check misses this.
```bash
PREV_FPS_FILE="/tmp/self-review-iter-$((ITER - 1)).fps"
CURR_FPS_FILE="/tmp/self-review-iter-$ITER.fps"
echo "$CURR_FPS" > "$CURR_FPS_FILE"
if [ "$ITER" -gt 1 ] && [ "$FINDINGS_COUNT" -ge 3 ] && [ -s "$PREV_FPS_FILE" ]; then
OVERLAP=$(grep -Fxf "$PREV_FPS_FILE" "$CURR_FPS_FILE" | wc -l | tr -d ' ')
if [ "$OVERLAP" = "0" ]; then
echo "STOP: set-replacement divergence (iter $((ITER-1)) and iter $ITER share zero findings)"
break
fi
fiif [ "$ITER" -ge 3 ]; then
PREV1="/tmp/self-review-iter-$((ITER - 1)).files"
PREV2="/tmp/self-review-iter-$((ITER - 2)).files"
if [ -s "$PREV1" ] && [ -s "$PREV2" ]; then
PERSISTENT=$(grep -Fxf "$PREV1" "/tmp/self-review-iter-$ITER.files" | grep -Fxf "$PREV2" | head -3)
if [ -n "$PERSISTENT" ]; then
echo "STOP: file(s) producing findings 3+ consecutive iters — possible slug drift hiding stuck finding:"
echo "$PERSISTENT"
break
fi
fi
fi
这是一个附加信号——不会替代SC3的精确指纹检查。SC3捕获词汇层面的收敛失败;SC3.5捕获slug漂移掩盖的语义层面收敛失败。
**SC4 — 问题发散**:以下两个信号任意一个触发(迭代失效检测,参考`pr-babysit` § 4.5 Gate B):
(a) **数量增长**:`FINDINGS_COUNT`严格大于上一次迭代的数量→停止,提交给用户处理。修复步骤引入新问题的速度快于解决问题的速度。
(b) **集合替换**:`FINDINGS_COUNT >= 3`且当前迭代与上一次迭代的问题指纹完全无重叠(`|current ∩ prev_iter| == 0`)→停止,提交给用户处理。数量未变,但整个问题集合完全替换——修复打开了全新的问题面。仅检查数量会遗漏这种情况。
```bash
PREV_FPS_FILE="/tmp/self-review-iter-$((ITER - 1)).fps"
CURR_FPS_FILE="/tmp/self-review-iter-$ITER.fps"
echo "$CURR_FPS" > "$CURR_FPS_FILE"
if [ "$ITER" -gt 1 ] && [ "$FINDINGS_COUNT" -ge 3 ] && [ -s "$PREV_FPS_FILE" ]; then
OVERLAP=$(grep -Fxf "$PREV_FPS_FILE" "$CURR_FPS_FILE" | wc -l | tr -d ' ')
if [ "$OVERLAP" = "0" ]; then
echo "STOP: set-replacement divergence (iter $((ITER-1)) and iter $ITER share zero findings)"
break
fi
fiStep 3: Apply fixes per finding
步骤3:逐个应用问题修复
If no stop condition fired, iterate through findings and apply each as an atomic commit:
bash
jq -c '.findings[]' "$JSON_FILE" | while read -r FINDING; do
ID=$(echo "$FINDING" | jq -r '.id')
PERSONA=$(echo "$FINDING" | jq -r '.persona')
CATEGORY=$(echo "$FINDING" | jq -r '.category')
SLUG=$(echo "$FINDING" | jq -r '.slug')
FILE=$(echo "$FINDING" | jq -r '.file')
LINE=$(echo "$FINDING" | jq -r '.line')
FAILURE_MODE=$(echo "$FINDING" | jq -r '.failure_mode')
MITIGATION=$(echo "$FINDING" | jq -r '.mitigation')
# Main session reads the full finding from OUTPUT_FILE for context
# then implements MITIGATION literally
apply_minimal_fix_from_mitigation "$FILE" "$LINE" "$MITIGATION"
# Verify file changed
if git diff --quiet "$FILE"; then
SKIPPED_FINDINGS+=("$ID: no diff produced — mitigation may need design judgment")
continue
fi
# Atomic commit
git add "$FILE"
git commit -m "fix($PERSONA): $SLUG — self-review iter $ITER #$ID
Failure mode: $FAILURE_MODE
Mitigation: $MITIGATION
Source: codex review following pr-review methodology
Category: $CATEGORY
"
donePer-finding fix rules (HARD-GATE reinforcement):
- Read the codex field for this finding. Implement it MINIMALLY.
Mitigation: - Do NOT expand scope (don't refactor adjacent code, don't add tests not requested, don't add comments)
- Do NOT add Claude reasoning ("I also noticed X, fixed that too" — NO)
- If is ambiguous or requires design judgment to implement → SKIP, add to
Mitigation:, surface to user at endSKIPPED_FINDINGS - One finding = one commit. If a single mitigation actually touches 3 files, one commit covering all 3 is fine. But two different findings = two commits.
Pattern generalization (mandatory — not scope expansion):
When a finding's describes a class of bug rather than a
single-site defect (e.g. "Slack post failure leaves the row non-terminal",
"unvalidated input reaches X", "missing on Y-shaped call"), the fix
is NOT complete until every sibling site of that exact pattern is fixed in
the SAME iteration.
Failure modeawait- Before committing, grep the codebase for the same pattern (the failure shape, not the literal line). Fix all matching sites under that finding's commit.
- This is explicitly NOT the "expand scope" violation above. Scope expansion is fixing unrelated things. Fixing the same flagged pattern at sibling sites is finishing the finding — leaving siblings for the next codex pass wastes an iteration AND ships the identical bug at an unflagged site until then.
- How to tell them apart: would codex, on the next pass, file a finding with
the same wording pointing at a different
Failure mode? If yes, that site belongs in THIS commit.file:line - If grep surfaces sibling sites whose fix needs design judgment (not a mechanical copy of the same mitigation) → fix the mechanical ones, SKIP the judgment ones, surface them. Don't force a uniform fix across sites that aren't actually uniform.
Rationale: the loop otherwise amplifies one conceptual bug into N findings
across N iterations — codex finds one site per pass, the controller fixes
one site per pass, and the round count inflates to do what a single
pattern-sweep does in one. Generalize on first sighting.
若未触发任何停止条件,则遍历所有问题并以原子提交的方式应用每个修复:
bash
jq -c '.findings[]' "$JSON_FILE" | while read -r FINDING; do
ID=$(echo "$FINDING" | jq -r '.id')
PERSONA=$(echo "$FINDING" | jq -r '.persona')
CATEGORY=$(echo "$FINDING" | jq -r '.category')
SLUG=$(echo "$FINDING" | jq -r '.slug')
FILE=$(echo "$FINDING" | jq -r '.file')
LINE=$(echo "$FINDING" | jq -r '.line')
FAILURE_MODE=$(echo "$FINDING" | jq -r '.failure_mode')
MITIGATION=$(echo "$FINDING" | jq -r '.mitigation')
# Main session reads the full finding from OUTPUT_FILE for context
# then implements MITIGATION literally
apply_minimal_fix_from_mitigation "$FILE" "$LINE" "$MITIGATION"
# Verify file changed
if git diff --quiet "$FILE"; then
SKIPPED_FINDINGS+=("$ID: no diff produced — mitigation may need design judgment")
continue
fi
# Atomic commit
git add "$FILE"
git commit -m "fix($PERSONA): $SLUG — self-review iter $ITER #$ID
Failure mode: $FAILURE_MODE
Mitigation: $MITIGATION
Source: codex review following pr-review methodology
Category: $CATEGORY
"
done逐个问题修复规则(强化HARD-GATE约束):
- 读取该问题的codex 字段内容。最小化实现修复。
Mitigation: - 不得扩大范围(不得重构相邻代码,不得添加未要求的测试,不得添加注释)
- 不得加入Claude的推理(如“我还注意到X,顺便修复了”——禁止)
- 若内容模糊或需要设计判断才能实现→跳过,加入
Mitigation:,在结束时告知用户SKIPPED_FINDINGS - 一个问题对应一个提交。若单个修复实际涉及3个文件,可使用一个提交覆盖所有3个文件。但两个不同的问题必须对应两个提交。
模式泛化(强制要求——不属于范围扩大):
当问题的描述的是一类bug而非单个站点缺陷时(如“Slack发送失败导致行未终止”、“未验证的输入到达X”、“Y形调用缺少”),必须在同一迭代中修复该模式的所有同类站点,才算完成修复。
Failure modeawait- 提交前,在代码库中搜索相同模式(失败的形态,而非字面代码)。在该问题的提交中修复所有匹配的站点。
- 这明确不属于上述“范围扩大”的违规行为。范围扩大是修复无关内容;修复同一标记模式的同类站点是完成该问题的修复——若留到下一轮codex审查,会浪费一次迭代,且直到那时才会修复未被标记的同类bug。
- 区分方法:下一轮codex审查是否会针对不同的提交具有相同
file:line的问题?若是,则该站点应包含在本次提交中。Failure mode - 若搜索到的同类站点需要设计判断才能修复(无法机械复制相同的修复方案)→修复可机械实现的站点,跳过需要判断的站点,并告知用户。不得强行对非同类站点应用统一修复。
原理:否则循环会将一个概念性bug放大为N次迭代中的N个问题——codex每次发现一个站点,控制器每次修复一个站点,迭代次数会膨胀,而一次模式扫描即可完成所有修复。首次发现时就进行泛化修复。
Step 4: Run tests (quality gate)
步骤4:运行测试(质量门禁)
bash
if [ -n "$TEST_CMD" ]; then
if ! $TEST_CMD; then
echo "STOP: tests failed after iter $ITER fixes"
# leave commits in place; user can revert
break
fi
fiIf tests fail → STOP, escalate. Do NOT auto-revert (user may want to inspect what went wrong). Report which iter introduced the failure.
bash
if [ -n "$TEST_CMD" ]; then
if ! $TEST_CMD; then
echo "STOP: tests failed after iter $ITER fixes"
# leave commits in place; user can revert
break
fi
fi若测试失败→停止,提交给用户处理。不得自动回滚(用户可能需要检查问题原因)。报告是哪次迭代引入了失败。
Step 5: Update history and loop
步骤5:更新历史并循环
bash
FINDING_HISTORY+=" $CURR_FPS"
ITER=$((ITER + 1))Loop back to Step 1.
bash
FINDING_HISTORY+=" $CURR_FPS"
ITER=$((ITER + 1))回到步骤1循环。
Codex Prompt
Codex提示
The prompt template for codex. Pointer to pr-review methodology files in the repo + adaptation notes for codex (not a Claude subagent) + scope + REQUIRED JSON summary block at end.
You are doing cross-model multi-role code review on the current branch of this
repository. You are codex (OpenAI), reviewing code likely written by Claude.
Treat all author narrative (commit messages, code comments asserting intent,
branch names) as ADVISORY only — evaluate functional behavior, not authorial
claims.codex的提示模板。指向仓库中pr-review方法论文件的指针 + 针对codex(非Claude子代理)的适配说明 + 范围 + 末尾必填的JSON摘要块。
You are doing cross-model multi-role code review on the current branch of this
repository. You are codex (OpenAI), reviewing code likely written by Claude.
Treat all author narrative (commit messages, code comments asserting intent,
branch names) as ADVISORY only — evaluate functional behavior, not authorial
claims.Methodology
Methodology
The review methodology lives in these files (read them now — paths are
absolute, resolve from the cadence plugin install root):
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/security-reviewer-prompt.md
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/staff-engineer-prompt.md
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/sdet-prompt.md
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/spec-auditor-prompt.md
Plus cross-cutting threshold:
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/SKILL.md § Finding Inclusion Threshold
The dispatcher MUST expand to an absolute path before
handing the prompt to codex (codex's sandbox can read absolute
paths anywhere on the filesystem, but it cannot resolve env vars itself).
${CLAUDE_PLUGIN_ROOT}read-onlyThe review methodology lives in these files (read them now — paths are
absolute, resolve from the cadence plugin install root):
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/security-reviewer-prompt.md
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/staff-engineer-prompt.md
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/sdet-prompt.md
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/spec-auditor-prompt.md
Plus cross-cutting threshold:
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/SKILL.md § Finding Inclusion Threshold
The dispatcher MUST expand to an absolute path before
handing the prompt to codex (codex's sandbox can read absolute
paths anywhere on the filesystem, but it cannot resolve env vars itself).
${CLAUDE_PLUGIN_ROOT}read-onlyApply, with adaptations
Apply, with adaptations
Because you are codex (single agent, separate process), not a Claude subagent:
IGNORE these sections — they describe Claude's internal Agent dispatch:
- "HARD-GATE" / "You have NO knowledge of conversation history" — you are isolated by being a different process and model family
- "Incremental Mode Addendum" / "prior_fix_range" / drop signal (B) — those depend on babysit-side state tracking. Skip the (B) check entirely. Signals (A), (C), (D) still apply.
- "dispatched from a dev session" / "subagent" framing — you are codex executing this prompt directly
APPLY in full:
- Per-persona category tables: Security (S1-S5), Staff Engineer (E1-E9), SDET (T1-T4), Spec Auditor (C1-C4)
- Finding Inclusion Threshold: Justification class (Reachable / Precedent / Asymmetric / Historical)
- Drop signals (A), (C), (D)
- Hygiene batch rule (cluster hygiene drops into one Q-class finding per file)
- Race-class Finding Metadata: Mitigation MUST end with
[window=<ms|s|min|hr>, damage=<data-loss|deadlock|inconsistency|latency|marginal>, recovery=<has|no>] - Per-prompt Output Schema (Severity / Confidence / Blast / Justification / Evidence / Failure mode / Mitigation)
Because you are codex (single agent, separate process), not a Claude subagent:
IGNORE these sections — they describe Claude's internal Agent dispatch:
- "HARD-GATE" / "You have NO knowledge of conversation history" — you are isolated by being a different process and model family
- "Incremental Mode Addendum" / "prior_fix_range" / drop signal (B) — those depend on babysit-side state tracking. Skip the (B) check entirely. Signals (A), (C), (D) still apply.
- "dispatched from a dev session" / "subagent" framing — you are codex executing this prompt directly
APPLY in full:
- Per-persona category tables: Security (S1-S5), Staff Engineer (E1-E9), SDET (T1-T4), Spec Auditor (C1-C4)
- Finding Inclusion Threshold: Justification class (Reachable / Precedent / Asymmetric / Historical)
- Drop signals (A), (C), (D)
- Hygiene batch rule (cluster hygiene drops into one Q-class finding per file)
- Race-class Finding Metadata: Mitigation MUST end with
[window=<ms|s|min|hr>, damage=<data-loss|deadlock|inconsistency|latency|marginal>, recovery=<has|no>] - Per-prompt Output Schema (Severity / Confidence / Blast / Justification / Evidence / Failure mode / Mitigation)
Execution
Execution
Execute all 4 personas sequentially. Output a combined finding list grouped
by persona.
Execute all 4 personas sequentially. Output a combined finding list grouped
by persona.
Scope
Scope
Review where BASE is below. Use and
to understand the change. Read source files as needed.
git diff origin/<BASE>..HEADgit diffgit log --onelineReview where BASE is below. Use and
to understand the change. Read source files as needed.
git diff origin/<BASE>..HEADgit diffgit log --onelineOutput format (REQUIRED)
Output format (REQUIRED)
First emit per-persona findings in the per-prompt format from the prompt files.
Group by persona.
Then at the END emit a structured JSON summary block — this is REQUIRED for the
calling skill to drive its loop. The JSON block MUST be valid and parseable. Wrap
it in unique sentinel markers (NOT a generic markdown fenced block — those collide
with code examples in Evidence fields):
<!-- SELF-REVIEW-JSON-START -->
{
"findings": [
{
"id": "1",
"persona": "Security|Staff|SDET|Spec",
"category": "S2|E5|T1|C4|...",
"slug": "kebab-case-slug-from-finding",
"file": "path/to/file (relative to repo root)",
"line": 42,
"line_source": "source",
"severity": "Blocker|Factual|Suggestion|Question",
"justification": "Reachable|Precedent|Asymmetric|Historical",
"confidence": "high|medium|low",
"blast": "Local|Module|Cross-service|Data layer",
"failure_mode": "one-line",
"mitigation": "one-line, ending with race-meta tag if applicable"
}
]
}
<!-- SELF-REVIEW-JSON-END -->
Rules:
- (empty array) is VALID output meaning no findings. Still emit the block with
findings: []between the sentinels — do NOT omit the JSON block.{"findings": []} - MUST be the source file line number (the line in the file as written on disk after your reading), NOT the diff hunk line number. If the diff shifted lines, use the post-shift source file line.
line - MUST be
line_sourceliteral — this confirms you used source file lines, not diff lines. Any other value → caller treats as malformed and escalates."source" - All listed fields are REQUIRED. Do not emit findings with missing fields.
First emit per-persona findings in the per-prompt format from the prompt files.
Group by persona.
Then at the END emit a structured JSON summary block — this is REQUIRED for the
calling skill to drive its loop. The JSON block MUST be valid and parseable. Wrap
it in unique sentinel markers (NOT a generic markdown fenced block — those collide
with code examples in Evidence fields):
<!-- SELF-REVIEW-JSON-START -->
{
"findings": [
{
"id": "1",
"persona": "Security|Staff|SDET|Spec",
"category": "S2|E5|T1|C4|...",
"slug": "kebab-case-slug-from-finding",
"file": "path/to/file (relative to repo root)",
"line": 42,
"line_source": "source",
"severity": "Blocker|Factual|Suggestion|Question",
"justification": "Reachable|Precedent|Asymmetric|Historical",
"confidence": "high|medium|low",
"blast": "Local|Module|Cross-service|Data layer",
"failure_mode": "one-line",
"mitigation": "one-line, ending with race-meta tag if applicable"
}
]
}
<!-- SELF-REVIEW-JSON-END -->
Rules:
- (empty array) is VALID output meaning no findings. Still emit the block with
findings: []between the sentinels — do NOT omit the JSON block.{"findings": []} - MUST be the source file line number (the line in the file as written on disk after your reading), NOT the diff hunk line number. If the diff shifted lines, use the post-shift source file line.
line - MUST be
line_sourceliteral — this confirms you used source file lines, not diff lines. Any other value → caller treats as malformed and escalates."source" - All listed fields are REQUIRED. Do not emit findings with missing fields.
Important
Important
- Do NOT modify any files
- Race-class findings without meta tag → drop the finding
- You are codex, not Claude — your prose can be your own
- Stay focused on the diff
BASE branch: origin/<substitute BASE from skill caller>
undefined- Do NOT modify any files
- Race-class findings without meta tag → drop the finding
- You are codex, not Claude — your prose can be your own
- Stay focused on the diff
BASE branch: origin/<substitute BASE from skill caller>
undefinedStop conditions summary
停止条件摘要
| Condition | When | Action |
|---|---|---|
| SC1 Success | | Report iter count + total fixes applied |
| SC0 Severity floor | iteration has findings but ZERO real bugs (no | STOP, converged-enough; surface hygiene findings, don't auto-fix |
| SC2 Cap reached | | Escalate; surface remaining findings to user |
| SC3 Repeat 3x | Same finding fingerprint ( | Escalate; race-of-race signal (cf |
| SC3.5 Slug drift | Same file produces findings in 3+ consecutive iters (file-only fallback) | Escalate; possible slug drift hiding stuck finding |
| SC4 Findings diverging | (a) count growing iter-over-iter, OR (b) zero fingerprint overlap between consecutive iters with ≥3 findings | Escalate; auto-fix opening new surfaces |
| SC5 Test failure | | Escalate; commits left in place for user inspection |
| SC6 Skip backlog | ≥3 findings skipped this iter (can't fix from Mitigation alone) | Continue loop, but surface skipped list at end |
| User ctrl-c | User interrupts | Report partial state, last committed iter, what was in progress |
| 条件 | 触发时机 | 操作 |
|---|---|---|
| SC1 成功 | | 报告迭代次数 + 已应用的修复总数 |
| SC0 严重度下限 | 迭代存在问题,但无实际bug(无 | 停止,已足够收敛;展示卫生级问题,不自动修复 |
| SC2 达到上限 | | 提交给用户处理;展示剩余问题 |
| SC3 重复3次 | 同一问题指纹( | 提交给用户处理;迭代失效信号(参考 |
| SC3.5 Slug漂移 | 同一文件连续3次以上迭代产生问题(文件级兜底) | 提交给用户处理;可能存在slug漂移掩盖未解决问题 |
| SC4 问题发散 | (a) 问题数量逐次增长,或(b) 连续两次迭代问题数量≥3且指纹完全无重叠 | 提交给用户处理;自动修复打开了新的问题面 |
| SC5 测试失败 | 迭代修复后 | 提交给用户处理;保留提交供用户检查 |
| SC6 跳过积压 | 本次迭代跳过≥3个问题(无法仅通过Mitigation修复) | 继续循环,但在结束时展示跳过的问题列表 |
| 用户ctrl-c | 用户中断 | 报告部分状态、最后一次提交的迭代、正在进行的操作 |
Mode=review-only
Mode=review-only
When user explicitly asks for findings without auto-fix:
- Run codex (same prompt as loop mode Step 1)
- Present output verbatim from (full per-persona findings text)
$OUTPUT_FILE - Stop. Do NOT commit. Do NOT touch files.
This is the L0+ "advisory findings" path — verdict stays with user.
当用户明确要求仅展示问题而不自动修复时:
- 执行codex审查(与loop模式步骤1使用相同的提示)
- 原样展示的输出(完整的分角色问题文本)
$OUTPUT_FILE - 停止。不提交。不修改文件。
这是L0+的“建议性问题”路径——verdict由用户决定。
Report at end
结束报告
After loop exit (any stop condition), generate report:
SELF-REVIEW LOOP REPORT
═════════════════════════════════════════════════════════════
Iterations: <N>
Stop reason: <SC code + brief>
Commits made: <count> (atomic, one per finding)
- <sha> fix(<persona>): <slug>
- ...
Tests: <pass | fail | skipped (no TEST_CMD)>
Findings still open (if escalation):
- <persona> / <category> @ <file>:<line>: <slug>
Mitigation: <one-line>
Why surfaced: <which SC fired>
Skipped findings (Mitigation needed design judgment):
- <persona> @ <file>:<line>: <slug>
Mitigation: <verbatim>
Reason: <why main session couldn't fix mechanically>
Suggested next steps:
- Review the atomic commits — revert any you disagree with
- For surfaced findings: read /tmp/self-review-iter-<N>.md for full context, decide modify/wontfix/defer manually
- Consider /cadence:pr-review mode=local for Claude-side multi-role view + comparison
- Push when satisfied
═════════════════════════════════════════════════════════════循环退出(任何停止条件触发)后,生成报告:
SELF-REVIEW 循环报告
═════════════════════════════════════════════════════════════
迭代次数: <N>
停止原因: <SC代码 + 简要说明>
已创建提交: <数量>(原子提交,每个问题对应一个)
- <sha> fix(<persona>): <slug>
- ...
测试状态: <通过 | 失败 | 跳过(无TEST_CMD)>
仍未解决的问题(若提交给用户处理):
- <persona> / <category> @ <file>:<line>: <slug>
修复方案: <单行>
提交原因: <触发了哪个SC>
跳过的问题(修复方案需要设计判断):
- <persona> @ <file>:<line>: <slug>
修复方案: <原文>
原因: <主会话无法机械修复的原因>
建议下一步操作:
- 审查原子提交——回滚任何你不同意的修复
- 对于提交的问题:阅读/tmp/self-review-iter-<N>.md获取完整上下文,手动决定修改/无需修复/延期
- 考虑使用/cadence:pr-review mode=local获取Claude端的多角色视图并进行对比
- 满意后推送代码
═════════════════════════════════════════════════════════════Notes
说明
-
Cross-model isolation rationale: codex (OpenAI GPT) reviews Claude-generated code → avoids same-model self-preference bias (Wataoka et al., perplexity-driven). Each codex invocation is a fresh process — no conversation context inheritance.
-
Context-mix prevention design: codex output goes to(file, not chat). Main session only parses the JSON summary block into conversation memory. Full finding text accessed by main session via file read when implementing each fix — not auto-injected. This keeps main session's growing conversation lean across iterations.
/tmp/self-review-iter-$ITER.md -
Methodology single-sourced: codex readsdirectly (dispatcher expands the env var to an absolute path before handing the prompt to codex). When pr-review prompts update, this skill picks up the new methodology automatically. No keep-in-sync burden between pr-review and self-review.
${CLAUDE_PLUGIN_ROOT}/skills/pr-review/*-prompt.md -
Adaptation layer keep-in-sync: the "IGNORE these sections" list in the codex prompt mirrors Claude-specific sections in pr-review prompts. If pr-review adds new Claude-only machinery, update the IGNORE list. Annotated as a maintenance concern, not auto-detected.
-
Per-finding atomic commits:undoes one finding's fix cleanly. History preserves the audit trail of what codex flagged + how it was fixed.
git revert <sha> -
Test gate as natural safety net: the cheapest signal that "auto-fix broke things" is a failing test. Catches regressions without needing complex semantic verification.
-
Loop count cap (3) reasoning: empirically (5-iter run on a real feature branch) codex's high-value findings — real reachable bugs, the same-model-blind-spot class — all landed in iters 1–3. Iters 4–5 produced hygiene-tier nits, the tail of an already-identified pattern, and one outright false positive the controller had to reject. The cap was 5; it is now 3. The principled stop is SC0 (severity floor) —is the blunt backstop, and a run that still yields real-bug findings at iter 3 is itself signalling the change is too big and should be split.
MAX_ITERS -
SC0 vs MAX_ITERS — why both: SC0 (severity floor) is the intended stop — it fires when an iteration produces no real reachable bug, i.e. further rounds would only surface nits.is the backstop for the pathological case where codex keeps finding real bugs that far out. A healthy run stops on SC0 at iter 2–3; only an unhealthy (oversized-diff) run reaches the cap.
MAX_ITERS=3 -
Pattern generalization beats round count: the loop's structural weakness is amplifying one conceptual bug into N findings across N iterations (codex finds one site per pass; controller fixes one site per pass). The Step 3 "Pattern generalization" rule counters this — on first sighting of a pattern-class finding, grep + fix all sibling sites in the same iteration. Done well, the loop self-converges inside the cap without relying on it.
-
Author bias still applies to FIX step, not VERDICT step: codex finding generation is cross-model isolated. Main session writing the fix is NOT — but the HARD-GATE constrains main session to mechanical execution offield, removing the verdict-reasoning attack surface. If you notice main session "reasoning whether codex is right" → that's a HARD-GATE violation, surface the finding instead of arguing.
Mitigation: -
Worktree assumption: skill expects to run on a feature branch (user already in worktree or non-main branch). Doesn't self-create worktrees. If user is on, warn but don't block — they may know what they're doing.
main -
No state persistence across invocations: each invocation starts fresh.is per-invocation. If you re-invoke after manual edits, the loop has no memory of previous runs — by design (keeps the skill stateless, no
FINDING_HISTORYfiles to maintain)..claude/state/
-
跨模型隔离原理:codex(OpenAI GPT)审查Claude生成的代码→避免同模型自我偏好偏差(Wataoka et al.,基于困惑度驱动)。每次codex调用都是全新进程——无对话上下文继承。
-
上下文混淆预防设计:codex输出写入(文件,而非聊天)。主会话仅将JSON摘要块解析到对话内存中。主会话在实现每个修复时通过文件读取完整问题文本——不会自动注入。这确保主会话的对话内容在多次迭代中保持精简。
/tmp/self-review-iter-$ITER.md -
方法论单一来源:codex直接读取(调度器在将提示交给codex前会将环境变量展开为绝对路径)。当pr-review提示更新时,此技能会自动采用新的方法论。无需在pr-review和self-review之间同步内容。
${CLAUDE_PLUGIN_ROOT}/skills/pr-review/*-prompt.md -
适配层同步:codex提示中的“忽略以下章节”列表与pr-review提示中Claude专属的章节对应。若pr-review添加新的Claude专属机制,需更新忽略列表。这是维护关注点,无法自动检测。
-
每个问题对应原子提交:可干净地撤销一个问题的修复。提交历史保留了codex标记的问题及修复方式的审计轨迹。
git revert <sha> -
测试门禁作为天然安全屏障:“自动修复导致功能损坏”的最廉价信号是测试失败。无需复杂的语义验证即可捕获回归问题。
-
循环次数上限(3次)的依据:根据经验(在真实功能分支上运行5次迭代),codex发现的高价值问题——实际可触发的bug、同模型盲点类问题——全部出现在第1-3次迭代。第4-5次迭代产生卫生级小问题、已识别模式的尾部问题,以及一个控制器必须排除的误报。之前上限为5次;现在调整为3次。原则性的停止条件是SC0(严重度下限)——是兜底的硬性限制,若第3次迭代仍能发现实际bug,本身就表明变更过大,应拆分。
MAX_ITERS -
SC0与MAX_ITERS并存的原因:SC0(严重度下限)是预期的停止条件——当某次迭代未发现实际可触发的bug时触发,即进一步迭代只会发现小问题。是针对codex持续发现实际bug的异常情况的兜底。健康运行会在第2-3次迭代触发SC0;只有变更过大的不健康运行才会达到上限。
MAX_ITERS=3 -
模式泛化优于迭代次数:循环的结构性弱点是将一个概念性bug放大为N次迭代中的N个问题(codex每次发现一个站点;控制器每次修复一个站点)。步骤3的“模式泛化”规则可解决此问题——首次发现模式类问题时,搜索并修复同一迭代中的所有同类站点。操作得当的话,循环会在达到上限前自行收敛。
-
作者偏差仍适用于修复步骤,而非Verdict步骤:codex问题生成是跨模型隔离的。主会话执行修复并非隔离偏差,但HARD-GATE约束主会话机械执行字段内容,消除了verdict推理的风险。若发现主会话在“判断codex是否正确”→这违反了HARD-GATE规则,应将问题提交给用户处理而非争论。
Mitigation: -
工作树假设:此技能预期在功能分支上运行(用户已处于工作树或非主分支)。不会自行创建工作树。若用户在分支上,会发出警告但不阻止——用户可能清楚自己的操作。
main -
调用间无状态持久化:每次调用都是全新开始。仅针对本次调用。若用户手动编辑后重新调用,循环无之前运行的记忆——这是设计使然(保持技能无状态,无需维护
FINDING_HISTORY文件)。.claude/state/