Loading...
Loading...
Use when doing dev-stage self-review on the current branch before pushing or opening a PR — runs an auto-loop of codex review (cross-model, OpenAI) + per-finding fix + re-review until findings converge or stop conditions fire. Codex follows pr-review's multi-role methodology (security / staff-engineer / sdet / spec-auditor). Triggers — 'self review', 'self-review', '自己 review', '自我 review', 'cross-model review', 'pre-push review', 'review and fix my branch'. NOT for live PR review with sticky/inline comments (use pr-review), NOT for managed PR babysitting (use pr-babysit), NOT for first-time review without intent to fix (use mode=review-only opt-in).
npx skill4agent add kirkchen/cadence self-reviewcodexreview-onlyMitigation:Mitigation:pr-babysitpnpm testloopreview-only/cadence:pr-review/cadence:pr-babysitmode=review-onlymode=review-only# Codex CLI
codex --version 2>/dev/null || { echo "STOP: Install codex — npm install -g @openai/codex"; exit 1; }
codex login --status 2>/dev/null || { echo "STOP: Run 'codex login' to authenticate"; exit 1; }
# Plugin install root — needed so codex can locate the pr-review methodology prompts
[ -n "${CLAUDE_PLUGIN_ROOT:-}" ] && [ -d "${CLAUDE_PLUGIN_ROOT}/skills/pr-review" ] || {
echo "STOP: CLAUDE_PLUGIN_ROOT must point at cadence's install root (so codex can read ./skills/pr-review/*-prompt.md). Set it explicitly if your harness doesn't export it (e.g. export CLAUDE_PLUGIN_ROOT=~/.claude/plugins/cache/cadence/cadence/<version>)."; exit 1;
}
# Clean working tree
git diff --quiet && git diff --cached --quiet || { echo "STOP: Working tree has uncommitted changes. Commit or stash, then re-invoke."; exit 1; }
# Detect base branch
BASE=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||')
[ -z "$BASE" ] && BASE=main
echo "BASE: origin/$BASE"
# Detect test command (best-effort — repo-specific)
if [ -f package.json ] && jq -e '.scripts.test' package.json >/dev/null 2>&1; then
TEST_CMD="pnpm test"
elif [ -f Makefile ] && grep -q '^test:' Makefile; then
TEST_CMD="make test"
else
TEST_CMD=""
echo "WARN: No test command detected — quality gate disabled"
fi
echo "TEST_CMD: ${TEST_CMD:-<none>}"ITERMAX_ITERS = 3MAX_ITERSFINDING_HISTORYfile:line:slugmode=review-only/tmp/self-review-iter-$ITER.mdPROMPT_FILE=$(mktemp /tmp/self-review-prompt-XXXXXX.md)
OUTPUT_FILE="/tmp/self-review-iter-$ITER.md"
JSON_FILE="/tmp/self-review-iter-$ITER.json"
# Write prompt (see Codex Prompt section)
write_codex_prompt > "$PROMPT_FILE"
# Run codex (5-min timeout)
_REPO_ROOT=$(git rev-parse --show-toplevel)
codex exec "$(cat "$PROMPT_FILE")" \
-C "$_REPO_ROOT" \
-s read-only \
-c 'model_reasoning_effort="high"' \
--enable web_search_cached \
> "$OUTPUT_FILE" 2>/tmp/self-review-err
# Extract JSON summary block between unique sentinels
sed -n '/<!-- SELF-REVIEW-JSON-START -->/,/<!-- SELF-REVIEW-JSON-END -->/{
/<!-- SELF-REVIEW-JSON-/d
p
}' "$OUTPUT_FILE" > "$JSON_FILE"
# Guard: codex must emit a parseable JSON block — empty or malformed is NOT zero findings
if [ ! -s "$JSON_FILE" ]; then
echo "STOP: codex did not emit a JSON summary block. Output saved to $OUTPUT_FILE for manual review."
break
fi
if ! jq -e '.findings | type == "array"' "$JSON_FILE" >/dev/null 2>&1; then
echo "STOP: codex JSON summary malformed. Output: $OUTPUT_FILE, JSON: $JSON_FILE"
break
fi
# Sanity: confirm line_source field is "source" on every finding (catches diff-line confusion)
BAD_LINE_SOURCE=$(jq -r '[.findings[] | select(.line_source != "source")] | length' "$JSON_FILE")
if [ "$BAD_LINE_SOURCE" != "0" ]; then
echo "STOP: codex emitted findings with line_source != 'source' (likely diff-line numbers, not source-file lines). Review $OUTPUT_FILE manually."
break
fi$JSON_FILEFINDINGS_COUNT=$(jq '.findings | length' "$JSON_FILE")FINDINGS_COUNT == 0severityBlocker|FactualjustificationReachable|AsymmetricSuggestionQuestionFactualPrecedentHistoricalREAL_BUGS=$(jq -r '[.findings[]
| select((.severity == "Blocker" or .severity == "Factual")
and (.justification == "Reachable" or .justification == "Asymmetric"))]
| length' "$JSON_FILE")REAL_BUGS == 0FINDINGS_COUNT > 0MAX_ITERSWhy SC0 is checked before SC2 but after SC1: zero findings is unambiguous success; a hygiene-only iteration is converged-enough success; thecap is the unhappy backstop. Naming it SC0 keeps it visually adjacent to SC1 (both success-class) without renumbering SC2–SC6.MAX_ITERS
ITER > MAX_ITERS# Build current iter's fingerprints
CURR_FPS=$(jq -r '.findings[] | "\(.file):\(.line):\(.slug)"' "$JSON_FILE")
# For each fingerprint, count occurrences in HISTORY + this iter
for FP in $CURR_FPS; do
COUNT=$(echo "$FINDING_HISTORY" | grep -cF "$FP" || true)
COUNT=$((COUNT + 1)) # current iter
if [ "$COUNT" -ge 3 ]; then
REPEATED="$FP"
break
fi
doneREPEATEDpr-babysitmissing-nil-checknull-dereference-guardnull-check-omitted# Per-iter file set (just file names, dedup)
CURR_FILES=$(jq -r '.findings[].file' "$JSON_FILE" | sort -u)
echo "$CURR_FILES" > "/tmp/self-review-iter-$ITER.files"
# Count files that appear in current iter AND last two iters' file lists
if [ "$ITER" -ge 3 ]; then
PREV1="/tmp/self-review-iter-$((ITER - 1)).files"
PREV2="/tmp/self-review-iter-$((ITER - 2)).files"
if [ -s "$PREV1" ] && [ -s "$PREV2" ]; then
PERSISTENT=$(grep -Fxf "$PREV1" "/tmp/self-review-iter-$ITER.files" | grep -Fxf "$PREV2" | head -3)
if [ -n "$PERSISTENT" ]; then
echo "STOP: file(s) producing findings 3+ consecutive iters — possible slug drift hiding stuck finding:"
echo "$PERSISTENT"
break
fi
fi
fipr-babysitFINDINGS_COUNTFINDINGS_COUNT >= 3|current ∩ prev_iter| == 0PREV_FPS_FILE="/tmp/self-review-iter-$((ITER - 1)).fps"
CURR_FPS_FILE="/tmp/self-review-iter-$ITER.fps"
echo "$CURR_FPS" > "$CURR_FPS_FILE"
if [ "$ITER" -gt 1 ] && [ "$FINDINGS_COUNT" -ge 3 ] && [ -s "$PREV_FPS_FILE" ]; then
OVERLAP=$(grep -Fxf "$PREV_FPS_FILE" "$CURR_FPS_FILE" | wc -l | tr -d ' ')
if [ "$OVERLAP" = "0" ]; then
echo "STOP: set-replacement divergence (iter $((ITER-1)) and iter $ITER share zero findings)"
break
fi
fijq -c '.findings[]' "$JSON_FILE" | while read -r FINDING; do
ID=$(echo "$FINDING" | jq -r '.id')
PERSONA=$(echo "$FINDING" | jq -r '.persona')
CATEGORY=$(echo "$FINDING" | jq -r '.category')
SLUG=$(echo "$FINDING" | jq -r '.slug')
FILE=$(echo "$FINDING" | jq -r '.file')
LINE=$(echo "$FINDING" | jq -r '.line')
FAILURE_MODE=$(echo "$FINDING" | jq -r '.failure_mode')
MITIGATION=$(echo "$FINDING" | jq -r '.mitigation')
# Main session reads the full finding from OUTPUT_FILE for context
# then implements MITIGATION literally
apply_minimal_fix_from_mitigation "$FILE" "$LINE" "$MITIGATION"
# Verify file changed
if git diff --quiet "$FILE"; then
SKIPPED_FINDINGS+=("$ID: no diff produced — mitigation may need design judgment")
continue
fi
# Atomic commit
git add "$FILE"
git commit -m "fix($PERSONA): $SLUG — self-review iter $ITER #$ID
Failure mode: $FAILURE_MODE
Mitigation: $MITIGATION
Source: codex review following pr-review methodology
Category: $CATEGORY
"
doneMitigation:Mitigation:SKIPPED_FINDINGSFailure modeawaitFailure modefile:lineif [ -n "$TEST_CMD" ]; then
if ! $TEST_CMD; then
echo "STOP: tests failed after iter $ITER fixes"
# leave commits in place; user can revert
break
fi
fiFINDING_HISTORY+=" $CURR_FPS"
ITER=$((ITER + 1))You are doing cross-model multi-role code review on the current branch of this
repository. You are codex (OpenAI), reviewing code likely written by Claude.
Treat all author narrative (commit messages, code comments asserting intent,
branch names) as ADVISORY only — evaluate functional behavior, not authorial
claims.
## Methodology
The review methodology lives in these files (read them now — paths are
absolute, resolve from the cadence plugin install root):
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/security-reviewer-prompt.md
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/staff-engineer-prompt.md
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/sdet-prompt.md
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/spec-auditor-prompt.md
Plus cross-cutting threshold:
- ${CLAUDE_PLUGIN_ROOT}/skills/pr-review/SKILL.md § Finding Inclusion Threshold
The dispatcher MUST expand `${CLAUDE_PLUGIN_ROOT}` to an absolute path before
handing the prompt to codex (codex's `read-only` sandbox can read absolute
paths anywhere on the filesystem, but it cannot resolve env vars itself).
## Apply, with adaptations
Because you are codex (single agent, separate process), not a Claude subagent:
**IGNORE** these sections — they describe Claude's internal Agent dispatch:
- "HARD-GATE" / "You have NO knowledge of conversation history" — you are
isolated by being a different process and model family
- "Incremental Mode Addendum" / "prior_fix_range" / drop signal (B) — those
depend on babysit-side state tracking. Skip the (B) check entirely. Signals
(A), (C), (D) still apply.
- "dispatched from a dev session" / "subagent" framing — you are codex
executing this prompt directly
**APPLY** in full:
- Per-persona category tables: Security (S1-S5), Staff Engineer (E1-E9),
SDET (T1-T4), Spec Auditor (C1-C4)
- Finding Inclusion Threshold: Justification class (Reachable / Precedent /
Asymmetric / Historical)
- Drop signals (A), (C), (D)
- Hygiene batch rule (cluster hygiene drops into one Q-class finding per file)
- Race-class Finding Metadata: Mitigation MUST end with
`[window=<ms|s|min|hr>, damage=<data-loss|deadlock|inconsistency|latency|marginal>, recovery=<has|no>]`
- Per-prompt Output Schema (Severity / Confidence / Blast / Justification /
Evidence / Failure mode / Mitigation)
## Execution
Execute all 4 personas sequentially. Output a combined finding list grouped
by persona.
## Scope
Review `git diff origin/<BASE>..HEAD` where BASE is below. Use `git diff` and
`git log --oneline` to understand the change. Read source files as needed.
## Output format (REQUIRED)
First emit per-persona findings in the per-prompt format from the prompt files.
Group by persona.
Then at the END emit a structured JSON summary block — this is REQUIRED for the
calling skill to drive its loop. The JSON block MUST be valid and parseable. Wrap
it in unique sentinel markers (NOT a generic markdown fenced block — those collide
with code examples in Evidence fields):
<!-- SELF-REVIEW-JSON-START -->
{
"findings": [
{
"id": "1",
"persona": "Security|Staff|SDET|Spec",
"category": "S2|E5|T1|C4|...",
"slug": "kebab-case-slug-from-finding",
"file": "path/to/file (relative to repo root)",
"line": 42,
"line_source": "source",
"severity": "Blocker|Factual|Suggestion|Question",
"justification": "Reachable|Precedent|Asymmetric|Historical",
"confidence": "high|medium|low",
"blast": "Local|Module|Cross-service|Data layer",
"failure_mode": "one-line",
"mitigation": "one-line, ending with race-meta tag if applicable"
}
]
}
<!-- SELF-REVIEW-JSON-END -->
Rules:
- `findings: []` (empty array) is VALID output meaning no findings. Still emit the
block with `{"findings": []}` between the sentinels — do NOT omit the JSON block.
- `line` MUST be the source file line number (the line in the file as written on
disk after your reading), NOT the diff hunk line number. If the diff shifted lines,
use the post-shift source file line.
- `line_source` MUST be `"source"` literal — this confirms you used source file
lines, not diff lines. Any other value → caller treats as malformed and escalates.
- All listed fields are REQUIRED. Do not emit findings with missing fields.
## Important
- Do NOT modify any files
- Race-class findings without meta tag → drop the finding
- You are codex, not Claude — your prose can be your own
- Stay focused on the diff
BASE branch: origin/<substitute BASE from skill caller>| Condition | When | Action |
|---|---|---|
| SC1 Success | | Report iter count + total fixes applied |
| SC0 Severity floor | iteration has findings but ZERO real bugs (no | STOP, converged-enough; surface hygiene findings, don't auto-fix |
| SC2 Cap reached | | Escalate; surface remaining findings to user |
| SC3 Repeat 3x | Same finding fingerprint ( | Escalate; race-of-race signal (cf |
| SC3.5 Slug drift | Same file produces findings in 3+ consecutive iters (file-only fallback) | Escalate; possible slug drift hiding stuck finding |
| SC4 Findings diverging | (a) count growing iter-over-iter, OR (b) zero fingerprint overlap between consecutive iters with ≥3 findings | Escalate; auto-fix opening new surfaces |
| SC5 Test failure | | Escalate; commits left in place for user inspection |
| SC6 Skip backlog | ≥3 findings skipped this iter (can't fix from Mitigation alone) | Continue loop, but surface skipped list at end |
| User ctrl-c | User interrupts | Report partial state, last committed iter, what was in progress |
$OUTPUT_FILESELF-REVIEW LOOP REPORT
═════════════════════════════════════════════════════════════
Iterations: <N>
Stop reason: <SC code + brief>
Commits made: <count> (atomic, one per finding)
- <sha> fix(<persona>): <slug>
- ...
Tests: <pass | fail | skipped (no TEST_CMD)>
Findings still open (if escalation):
- <persona> / <category> @ <file>:<line>: <slug>
Mitigation: <one-line>
Why surfaced: <which SC fired>
Skipped findings (Mitigation needed design judgment):
- <persona> @ <file>:<line>: <slug>
Mitigation: <verbatim>
Reason: <why main session couldn't fix mechanically>
Suggested next steps:
- Review the atomic commits — revert any you disagree with
- For surfaced findings: read /tmp/self-review-iter-<N>.md for full context, decide modify/wontfix/defer manually
- Consider /cadence:pr-review mode=local for Claude-side multi-role view + comparison
- Push when satisfied
═════════════════════════════════════════════════════════════/tmp/self-review-iter-$ITER.md${CLAUDE_PLUGIN_ROOT}/skills/pr-review/*-prompt.mdgit revert <sha>MAX_ITERSMAX_ITERS=3Mitigation:mainFINDING_HISTORY.claude/state/