pr-babysit

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

pr-babysit

pr-babysit

Babysit a PR/MR until CI is green AND every valid reviewer feedback is addressed. Supports GitHub PR (via
gh
) and GitLab MR (via
glab
) — auto-detect by
git remote get-url origin
(github.com → gh; gitlab.com / self-hosted GitLab → glab).
持续监控PR/MR,直到CI检测通过且所有有效的评审反馈都得到处理。支持GitHub PR(通过
gh
工具)和GitLab MR(通过
glab
工具)——通过
git remote get-url origin
自动检测平台(github.com → 使用gh;gitlab.com/自托管GitLab → 使用glab)。

Arguments

参数

$ARGUMENTS
— accepts:
  • empty → current branch's open PR/MR
  • a number → PR/MR by IID on the current repo
  • a URL → parse owner/repo + IID from it
If multiple PRs/MRs match the current branch, stop and ask which one.
$ARGUMENTS
— 支持以下输入:
  • 空值 → 当前分支对应的已打开PR/MR
  • 数字 → 当前仓库中对应IID的PR/MR
  • URL → 从中解析仓库所有者/仓库名称 + IID
如果当前分支对应多个PR/MR,则停止操作并询问选择哪一个。

Reply Language

回复语言

Reply prose posted to PR/MR threads — the
<what changed>
/
<reason>
/
<evidence>
content following each reply-template anchor, plus the prose inside Wontfix Template fields — renders in the PR/MR description's primary language. Everything else stays English: the anchor phrases themselves, Wontfix Template field labels, conventional commit prefixes, the race meta tag, P-codes / severity / justification tokens (same canonical set as
pr-review
's Output Language).
Fallback when the PR description lacks substantive prose: linked issue body, then English.
Terminal output (step 6 run report, Gate A / Gate B audit messages, invisible-findings prompt) stays English — those go to the dispatcher session, not the PR.
发布到PR/MR线程中的回复正文——即每个回复模板锚点后的
<what changed>
/
<reason>
/
<evidence>
内容,以及Wontfix模板字段内的正文——将以PR/MR描述的主要语言呈现。其余内容保持英文:锚点短语本身、Wontfix模板字段标签、规范提交前缀、竞争元标签、P代码/严重程度/理由标记(与
pr-review
输出语言使用的标准集合一致)。
如果PR描述中没有实质性正文,则依次回退为:关联的Issue正文、英文。
终端输出(步骤6的运行报告、Gate A/Gate B审计消息、不可见问题提示)保持英文——这些内容发送给调度会话,而非PR本身。

Loop

循环流程

1. Snapshot

1. 快照

Fetch: PR/MR metadata + head SHA, all checks / pipeline jobs, all review comments, all general comments, all review threads / discussions (with resolved state), the current user login.
For each thread you've previously replied to in this PR, cache
{file path, rule code or primary keyword, your reply summary}
— used by step 2 dedup.
Filter on content, not author:
  • Drop comments whose body is only CI status lines (build green/red, deploy event, "pipeline succeeded"). That is noise.
  • Keep any comment containing actionable signals (
    Suggestion
    /
    Warning
    /
    Critical
    /
    Issue
    /
    quality gate
    /
    failed
    / line-level review notes) — even from bot accounts. AI review bots, SonarQube, Snyk are content bots, not noise bots.
  • Drop your own past replies and already-resolved threads.
获取:PR/MR元数据 + 最新提交SHA、所有检查/流水线任务、所有评审评论、所有通用评论、所有评审线程/讨论(包含已解决状态)、当前用户登录信息。
针对此PR中你之前回复过的每个线程,缓存
{文件路径, 规则代码或主要关键词, 你的回复摘要}
——供步骤2去重使用。
基于内容而非作者进行过滤:
  • 丢弃仅包含CI状态行的评论(如构建成功/失败、部署事件、"pipeline succeeded")。这些属于无效噪声。
  • 保留任何包含可操作信号的评论(
    Suggestion
    /
    Warning
    /
    Critical
    /
    Issue
    /
    quality gate
    /
    failed
    /行级评审备注)——即使来自机器人账号。AI评审机器人、SonarQube、Snyk属于内容类机器人,而非噪声类机器人。
  • 丢弃你自己过去的回复以及已解决的线程。

2. Triage

2. 分类处理

Hard gate — invisible findings: if a check is failing but the actual finding list lives in an external dashboard your CLI cannot reach (SonarQube, Snyk, DataDog test reports, etc. — no token, no API endpoint accessible), STOP immediately and ask the user to paste the findings. Do not reproduce locally and process "guessed" findings as a complete cycle. Do not process unrelated feedback first while the invisible finding sits unaddressed. Root-cause diagnosis assumes you can see the finding; when you can't, this gate fires first.
Cross-round dedup — for each new comment, check the cache from step 1:
  • Same file + same rule code (e.g.
    CA1031
    ) OR same primary keyword as a thread you already replied to → treat as duplicate. Reply with one line linking back to the earlier thread, do not re-implement or re-explain.
  • Same issue surviving 3 rounds despite fix attempts → escalate to
    needs-user-input
    (the bot is stuck; user has to break the tie).
Feedback — bucket each remaining unresolved comment:
  • Valid — bug, security, logic error, clear actionable suggestion
  • Discuss — ambiguous, possible source misread, design tradeoff, scope unclear → do NOT reply autonomously, do NOT implement — collect for user
  • Out-of-scope — clearly outside this PR's stated goal → collect for user
Checks — for each failing check: pull the failure log via CLI, diagnose root cause before attempting a fix (no patch without a named cause). Distinguish real failure vs flaky; only retry on evidence of flake. If the failure log doesn't contain the actual findings → invisible-findings gate above.
硬性关卡——不可见问题:如果某项检查失败,但实际问题列表位于CLI无法访问的外部仪表板(如SonarQube、Snyk、DataDog测试报告等——无令牌、无可用API端点),请立即停止并要求用户粘贴问题内容。请勿在本地重现并将“猜测的”问题作为完整周期处理。请勿先处理无关反馈而忽略未解决的不可见问题。根本原因诊断需要你能看到问题内容;当无法查看时,此关卡优先触发。
跨轮次去重——针对每条新评论,检查步骤1中的缓存:
  • 相同文件 + 相同规则代码(如
    CA1031
    )或与你已回复过的线程相同的主要关键词 → 视为重复内容。回复一行内容链接到之前的线程,无需重新实现或解释。
  • 同一问题经过3轮修复尝试仍存在 → 升级为
    needs-user-input
    (机器人已陷入僵局;用户需介入解决)。
反馈分类——将剩余未解决的评论分为三类:
  • 有效—— bug、安全问题、逻辑错误、明确的可操作建议
  • 待讨论—— 模糊不清、可能误解源代码、设计权衡、范围不明确 → 请勿自主回复,请勿实现——收集后提交给用户
  • 超出范围—— 明显超出此PR既定目标 → 收集后提交给用户
检查项处理——针对每个失败的检查项:通过CLI拉取失败日志,在尝试修复前诊断根本原因(无明确原因不提交补丁)。区分真实失败与偶发失败;仅在有证据证明是偶发失败时重试。如果失败日志不包含实际问题内容 → 触发上述不可见问题关卡。

3. Address (Valid + real failures only)

3. 处理(仅针对有效项和真实失败)

For each item:
  1. Implement the fix.
  2. Small commit, conventional commits format, one logical change per commit. Type cheat: behaviour change →
    fix
    ; behaviour-preserving structure / readability (incl. lint suppressions) →
    refactor
    ; non-source (CI, husky, tooling) →
    chore
    ; pure docs →
    docs
    .
  3. Reply on the originating comment / discussion thread (template table below).
  4. Verify the reply landed inside the thread, not as a top-level note (see "Reply endpoints" below).
Reply endpoints by platform — mismatching these creates orphan top-level notes:
ActionGitHubGitLab
Reply to a review thread
POST /repos/{O}/{R}/pulls/{id}/comments
with
in_reply_to_id
POST /projects/:id/merge_requests/{iid}/discussions/{disc_id}/notes
New top-level comment
POST /repos/{O}/{R}/issues/{id}/comments
POST /projects/:id/merge_requests/{iid}/notes
After posting a reply,
GET
the discussion / review thread back and confirm your note is in the thread (note count ≥ 2, your username present). If it landed top-level → delete it and retry on the right endpoint.
Reply templates — pick by situation:
SituationTemplate
Adopted and fixed
Addressed in <SHA> — <what changed>.
Deliberate design, won't change
Deliberate design — <reason>. <spec or codebase ref>.
Same issue already replied earlier in this PR
Same as the earlier <topic> thread — <link>.
Bot premise wrong, won't fix
Won't fix — premise doesn't hold. <evidence: file:line / spec section>.
The Deliberate / Won't-fix templates exist to keep tone neutral and evidence-led — without a template these tend to drift into defensive or implementation-dump replies.
Anchor phrases stay English; only the prose after each anchor adapts to the PR description's language. See Reply Language.
Lint / warning suppression — any
#pragma
,
// eslint-disable
,
# noqa
,
@SuppressWarnings
, etc. must include:
  • (a) inline rationale comment on the same line, AND
  • (b) reference to the spec section OR an existing codebase precedent (
    file:line
    ) using the same suppression for the same reason.
If neither (a) nor (b) is available → do not suppress, refactor instead. When (b) applies, cite the precedent
file:line
in the commit message.
Hard rules:
  • No
    --amend
    on already-pushed commits
  • No
    --force-push
  • Don't mark GitLab discussions resolved unless the reviewer explicitly asked for that
  • Don't close any reviewer thread without a reply
  • 3 failed attempts on the same fix → STOP, document what failed + assumptions to question, hand back to user (per global CLAUDE.md)
针对每个项:
  1. 实施修复。
  2. 小型提交,采用规范提交格式,每次提交对应一个逻辑变更。类型速查:行为变更 →
    fix
    ;保持行为不变的结构/可读性优化(包括禁用lint检查)→
    refactor
    ;非源代码修改(CI、husky、工具配置)→
    chore
    ;纯文档修改 →
    docs
  3. 在原评论/讨论线程中回复(见下方模板表格)。
  4. 验证回复已发布在线程内,而非顶级评论(见下方“回复端点”)。
各平台回复端点——使用错误端点会导致孤立的顶级评论:
操作GitHubGitLab
回复评审线程
POST /repos/{O}/{R}/pulls/{id}/comments
并携带
in_reply_to_id
POST /projects/:id/merge_requests/{iid}/discussions/{disc_id}/notes
新建顶级评论
POST /repos/{O}/{R}/issues/{id}/comments
POST /projects/:id/merge_requests/{iid}/notes
发布回复后,
GET
该讨论/评审线程并确认你的回复已包含在内(评论数≥2,且你的用户名存在)。如果回复被发布为顶级评论 → 删除它并使用正确端点重试。
回复模板——根据场景选择:
场景模板
已采纳并修复
Addressed in <SHA> — <what changed>.
故意设计,不做修改
Deliberate design — <reason>. <spec or codebase ref>.
同一问题已在此PR中回复过
Same as the earlier <topic> thread — <link>.
机器人前提错误,不做修复
Won't fix — premise doesn't hold. <evidence: file:line / spec section>.
故意设计/不修复模板用于保持语气中立且基于证据——若无模板,回复容易偏向防御性或堆砌实现细节。
锚点短语保持英文;仅锚点后的正文适配PR描述的语言。详见回复语言
Lint/警告抑制——任何
#pragma
// eslint-disable
# noqa
@SuppressWarnings
等抑制指令必须包含:
  • (a) 同一行的内联理由注释,且
  • (b) 引用规范章节或现有代码库中因相同理由使用相同抑制的先例(
    file:line
    )。
如果(a)和(b)都无法满足 → 请勿使用抑制,改为重构。当(b)适用时,在提交信息中引用先例的
file:line
硬性规则:
  • 不对已推送的提交使用
    --amend
  • 不使用
    --force-push
  • 除非评审者明确要求,否则不要标记GitLab讨论为已解决
  • 无回复时不要关闭任何评审线程
  • 同一修复尝试失败3次 → 停止,记录失败内容和需要质疑的假设,交还给用户(遵循全局CLAUDE.md)

4. Push & wait

4. 推送并等待

git push
. Poll CI to a terminal state (GitHub:
gh pr checks --watch
; GitLab: poll
head_pipeline.status
until success/failed/canceled).
执行
git push
。轮询CI直到进入终端状态(GitHub:
gh pr checks --watch
;GitLab:轮询
head_pipeline.status
直到成功/失败/取消)。

4.1 Record
prior_fix_range

4.1 记录
prior_fix_range

After step 3's fix commits land and step 4 has pushed them, capture the SHA range covering this iter's fixes. This range is the canonical source-of-truth for two downstream consumers:
  1. Next iter's pr-review invocation — pass as
    prior_fix_range
    input so pr-review's incremental mode can apply drop signal (B) self-introduced surface
  2. Gate B in step 4.5 below — same range, same line-level attribution mechanism
bash
undefined
步骤3的修复提交完成且步骤4已推送后,捕获本次迭代修复对应的SHA范围。此范围是两个下游环节的权威来源
  1. 下一轮pr-review调用——作为
    prior_fix_range
    输入传递,以便pr-review的增量模式应用丢弃信号(B)排除自引入的代码面
  2. 下方步骤4.5中的Gate B——使用相同范围和行级归因机制
bash
undefined

After step 4 push, before invoking the next pr-review iter:

步骤4推送后,调用下一轮pr-review之前:

FIRST_FIX_SHA=$(git log --format='%H' "$PREV_HEAD..HEAD" | tail -1) # oldest fix in this iter LAST_FIX_SHA=$(git rev-parse HEAD) # newest fix in this iter PRIOR_FIX_RANGE="${FIRST_FIX_SHA}^..${LAST_FIX_SHA}"

Persist `PRIOR_FIX_RANGE` (and `$LAST_FIX_SHA` as the next iter's `$PREV_HEAD`) into the babysit state file or session env. If the iter pushed a single commit, `FIRST_FIX_SHA == LAST_FIX_SHA` and the range collapses to `<sha>^..<sha>`.

If this iter pushed zero commits (CI re-run only) → no fix range to record; skip the Gate B self-introduced check for the next iter, but still run Gate A as normal.

**Why not compute lazily at Gate B**: computing at push time anchors the range to the exact commits that addressed iter (N-1) findings. Lazy computation at Gate B time could pick up unrelated commits if the user manually edits the branch between iters.
FIRST_FIX_SHA=$(git log --format='%H' "$PREV_HEAD..HEAD" | tail -1) # 本次迭代中最早的修复提交 LAST_FIX_SHA=$(git rev-parse HEAD) # 本次迭代中最新的修复提交 PRIOR_FIX_RANGE="${FIRST_FIX_SHA}^..${LAST_FIX_SHA}"

将`PRIOR_FIX_RANGE`(以及`$LAST_FIX_SHA`作为下一轮迭代的`$PREV_HEAD`)保存到babysit状态文件或会话环境中。如果本次迭代仅推送了一个提交,则`FIRST_FIX_SHA == LAST_FIX_SHA`,范围简化为`<sha>^..<sha>`。

如果本次迭代未推送任何提交(仅重新运行CI)→ 无修复范围可记录;跳过下一轮迭代的Gate B自引入检查,但仍正常运行Gate A。

**为何不在Gate B时延迟计算**:在推送时计算范围可将其锚定到解决第(N-1)轮问题的精确提交。若在Gate B时延迟计算,用户可能在迭代间隙手动编辑分支,导致范围包含无关提交。

4.5 Self-feedback loop gates

4.5 自我反馈循环关卡

After pushing this iter's fixes and waiting for CI green, before looping back to step 1, run TWO sub-gates that catch different self-feedback failure modes. Without these, an automated reviewer paired with an automated babysitter can spend N iterations either chasing test-hygiene nits (Gate A) or chasing race-of-race surfaces (Gate B).
Both gates parse pr-review's inline comments on this PR:
bash
gh api repos/$OWNER/$REPO/pulls/$N/comments \
  --jq '[.[] | select(.body | contains("<!-- pr-review:finding-id=")) |
         {id, created_at, path, line, body,
          justification: (.body | capture("<!-- pr-review:justification=(?<j>[^ ]+) -->").j),
          race_meta: (.body | capture("\\[window=(?<w>[^,]+), damage=(?<d>[^,]+), recovery=(?<r>[^\\]]+)\\]") // null)}]'
Take only findings created since the previous iter's HEAD sha (the new ones this iter introduced).
推送本次迭代的修复并等待CI通过后,在回到步骤1之前,运行两个子关卡以捕获不同的自我反馈失败模式。若无这些关卡,自动化评审与自动化监控工具可能会花费N轮迭代来处理测试卫生细节(Gate A)或循环引入竞争问题(Gate B)。
两个关卡均解析此PR上pr-review的内联评论:
bash
gh api repos/$OWNER/$REPO/pulls/$N/comments \
  --jq '[.[] | select(.body | contains("<!-- pr-review:finding-id=")) |
         {id, created_at, path, line, body,
          justification: (.body | capture("<!-- pr-review:justification=(?<j>[^ ]+) -->").j),
          race_meta: (.body | capture("\\[window=(?<w>[^,]+), damage=(?<d>[^,]+), recovery=(?<r>[^\\]]+)\\]") // null)}]'
仅保留自上一轮迭代HEAD sha以来创建的问题(即本次迭代引入的新问题)。

Gate A: Diminishing Returns (only-hygiene iter)

Gate A:收益递减(仅卫生类迭代)

Fires when ALL of:
  • ≥1 new pr-review finding this iter
  • ZERO new findings have
    justification ∈ {Reachable, Precedent, Asymmetric, Historical}
  • ALL new findings are
    justification=Hygiene
    (or missing — treat missing as Hygiene)
Action: STOP automatic loop, skip step 5's normal decision, jump to step 6 with:
Status: needs-user-input (diminishing returns)

This iter's pr-review surfaced only hygiene findings — no Reachable / Precedent /
Asymmetric / Historical justification on any new finding.

Hygiene followups (N):
  <list — id, slug, file:line, one-line failure mode>

Continuing the loop will likely surface more hygiene from the same code paths.

Your call:
  (s) ship — open a single follow-up issue collecting the hygiene items, mark PR ready-to-merge
  (p) polish — keep looping (override the gate for this round)
  (r) re-review-full — challenge whether the self-loop missed anything (force `mode=full` on next pr-review)
触发条件:同时满足以下所有条件
  • 本次迭代产生≥1个新的pr-review问题
  • 所有新问题的
    justification
    均不属于
    {Reachable, Precedent, Asymmetric, Historical}
  • 所有新问题的
    justification=Hygiene
    (或缺失——缺失视为Hygiene)
操作:停止自动循环,跳过步骤5的常规决策,直接进入步骤6并输出:
状态:needs-user-input(收益递减)

本次迭代的pr-review仅发现卫生类问题——所有新问题均无Reachable/Precedent/Asymmetric/Historical理由。

待处理的卫生项(N):
  <列表——ID、标识、file:line、单行失败模式>

继续循环可能会在相同代码路径中发现更多卫生类问题。

请决策:
  (s) 发布——创建一个单独的跟进Issue收集所有卫生项,标记PR为就绪可合并
  (p) 优化——继续循环(本次绕过关卡)
  (r) 全面重审——检查自我循环是否遗漏内容(强制下一轮pr-review使用`mode=full`)

Gate B: Convergence Audit (race-of-race iter)

Gate B:收敛审计(循环竞争迭代)

Catches the failure mode where iter (N-1)'s fix introduces a new race / state-transition surface, the reviewer flags it as a Reachable finding, the next fix introduces yet another race surface, ad infinitum. Gate A does NOT catch this — those findings carry
justification=Reachable
and are individually valid; the divergence is only visible at cluster level.
prior_fix_range
: use the range recorded in step 4.1. This is the same range fed to pr-review's incremental-mode invocation, so Gate B's self-introduced check and pr-review's drop signal (B) operate on identical evidence. If step 4.1 recorded nothing (iter N-1 pushed no commits), Gate B does not fire — there is no iter (N-1) fix surface to converge against.
Fires when ALL of:
  • iter ≥ 3
    (first two iters are normal review cadence, not divergence)
  • ≥ 2 new findings this iter cite
    file:line
    inside
    prior_fix_range
    — i.e. critiquing iter (N-1)'s freshly-added surface
  • ≥ 2 of those findings are race-class — detection is OR of:
    • (i) carries
      [window=..., damage=..., recovery=...]
      meta from pr-review's race-class metadata requirement, OR
    • (ii) slug/category keyword-matches one of:
      race | TOCTOU | concurren | sweep | lifecycle | state-transition | debounce | claim | lease | fence | stale | orphan | race-window
      , OR
    • (iii)
      \bwindow=
      (matches the meta-tag prefix even when full meta is malformed) OR
      atomic.*race | race.*atomic
      (require co-occurrence to avoid catching DB-transaction
      atomic
      and frontend-viewport
      window
      noise)
Keyword design notes: bare
window
and bare
atomic
are deliberately excluded — they false-positive on rate-limiter / viewport / DB-transaction-correctness comments.
TOCTOU
is the canonical security-race term and matches Codex findings that bypass the meta-tag path.
debounce / claim / lease / fence
cover distributed-locking vocabulary;
stale / orphan
cover sweep-race descriptions.
How to verify file:line inside prior_fix_range:
bash
git diff --name-only $prior_fix_range                  # files touched
git diff -U0 $prior_fix_range -- <file>                # line-level attribution
Action: STOP automatic loop, run Convergence Audit for the cluster. For each race-class finding, apply the Wontfix Template five-step decision:
  1. Window: estimate ms / s / min / hr between the race operations (use the meta tag if present)
  2. Damage: classify as
    data-loss | deadlock | inconsistency | latency | marginal
  3. Asymmetric check: is the failure mode security / data-integrity / billing?
  4. Mitigation cost: does the proposed fix introduce a new race surface?
  5. Recovery path: does fault tolerance / next webhook / sweeper cover the race?
Audit verdict per finding:
VerdictWhen
modify (Asymmetric)Justification is Asymmetric (security / data-loss / data-integrity / billing) → ALWAYS modify, regardless of mitigation cost
modify (damage gate)
damage
value is
data-loss
/
deadlock
/
inconsistency
→ modify even if Justification is not formally Asymmetric. These damage classes have no acceptable "fault tolerance" answer
modify (safe fix)non-Asymmetric,
damage ∈ {latency, marginal}
, BUT mitigation does NOT introduce new race surface → modify (no race-of-race risk)
wontfix-with-templatenon-Asymmetric +
damage ∈ {latency, marginal}
+
recovery=has
+ mitigation introduces new race surface → reply using Wontfix Template. ALL five conditions required; missing any → fall through to modify
defer-followupvalid concern but resolution requires infrastructure (e.g. real DB test, schema migration, new background job) that belongs to a follow-up issue
Report to user:
Status: convergence-audit (race-of-race detected)

iter (N-1) fix surface attracted N race-class findings this iter (cluster):
  <id> <slug> @ <file:line>  window=<w> damage=<d> recovery=<r>
  ...

Audit verdict per finding:
  <id>: modify    — <reason: Asymmetric / mitigation safe / etc>
  <id>: wontfix   — <five-field summary from Wontfix Template>
  <id>: defer     — <followup issue suggestion>

Your call:
  (a) accept all verdicts (post wontfix replies via template, address modify items, open defer issues)
  (m) modify a specific verdict — say which finding-id and target verdict
  (s) ship — accept all wontfix + defer as-is, mark PR ready-to-merge
  (p) override audit — treat as normal iter, loop back to step 1
Gate B does NOT fire when:
  • Cluster contains any Asymmetric finding — Asymmetric (security / data-loss / data-integrity / billing) bypasses the convergence escape just as it does in pr-review's drop signal (B). Surface them and modify
  • iter < 3
    — early iters are normal review cadence
  • Race-class meta is missing AND no slug/category keyword match — keeps gate narrow to actual race domain; non-race convergence (e.g. naming-bikeshed) falls back to Gate A or normal flow
Rationale: Gate A catches iters where everything is hygiene; Gate B catches iters where individually-valid race findings cluster on freshly-introduced surfaces. Together they cover the two main self-feedback failure modes without suppressing genuine Asymmetric findings or third-party signal (Codex / SonarQube / Snyk findings without pr-review's metadata bypass both gates and route through normal step 2 dedup + 3-round escalation).
捕获第(N-1)轮修复引入新竞争/状态转换代码面,评审者标记为Reachable问题,下一轮修复又引入新竞争代码面的无限循环失败模式。Gate A无法捕获此情况——这些问题带有
justification=Reachable
且单独来看是有效的;仅从集群层面才能发现发散趋势。
prior_fix_range
:使用步骤4.1中记录的范围。此范围与pr-review增量模式调用时使用的范围相同,因此Gate B的自引入检查与pr-review的丢弃信号(B)基于相同证据。如果步骤4.1未记录任何内容(第N-1轮未推送提交),则Gate B不触发——无第(N-1)轮修复代码面可进行收敛检查。
触发条件:同时满足以下所有条件
  • 迭代次数≥3
    (前两轮属于正常评审节奏,不属于发散)
  • 本次迭代产生≥2个新问题,且这些问题引用的
    file:line
    位于
    prior_fix_range
    内——即针对第(N-1)轮新增代码面的批评
  • 其中≥2个问题属于竞争类——满足以下任一检测条件:
    • (i) 带有pr-review竞争类元数据要求的
      [window=..., damage=..., recovery=...]
      元标签,或
    • (ii) 标识/类别关键词匹配以下任一:
      race | TOCTOU | concurren | sweep | lifecycle | state-transition | debounce | claim | lease | fence | stale | orphan | race-window
      ,或
    • (iii)
      \bwindow=
      (即使元标签格式错误也匹配前缀)或
      atomic.*race | race.*atomic
      (要求同时出现以避免误判DB事务
      atomic
      和前端视窗
      window
      相关评论)
关键词设计说明:故意排除单独的
window
atomic
——它们会误判限流器/视窗/DB事务正确性相关评论。
TOCTOU
是标准的安全竞争术语,可匹配绕过元标签路径的Codex问题。
debounce / claim / lease / fence
涵盖分布式锁相关词汇;
stale / orphan
涵盖扫描竞争描述。
如何验证file:line是否在prior_fix_range内:
bash
git diff --name-only $prior_fix_range                  # 被修改的文件
git diff -U0 $prior_fix_range -- <file>                # 行级归因
操作:停止自动循环,对集群执行收敛审计。针对每个竞争类问题,应用Wontfix模板五步决策:
  1. 窗口:估算竞争操作之间的时间(ms/s/min/hr,如有元标签则使用其中的值)
  2. 影响:分类为
    data-loss | deadlock | inconsistency | latency | marginal
  3. 不对称检查:失败模式是否涉及安全/数据完整性/计费?
  4. 缓解成本:提议的修复是否会引入新的竞争代码面?
  5. 恢复路径:容错机制/下一个Webhook/扫描器是否能覆盖此竞争?
每个问题的审计结论:
结论适用场景
修改(不对称)理由为Asymmetric(安全/数据丢失/数据完整性/计费)→ 无论缓解成本如何,必须修改
修改(影响关卡)
影响
data-loss
/
deadlock
/
inconsistency
→ 即使理由未正式标记为Asymmetric也需修改。这些影响类别没有可接受的“容错”解决方案
修改(安全修复)非不对称、
影响∈{latency, marginal}
,但缓解方案不会引入新竞争代码面→ 修改(无循环竞争风险)
使用Wontfix模板不修改非不对称 +
影响∈{latency, marginal}
+
recovery=has
+ 缓解方案会引入新竞争代码面→ 使用Wontfix模板回复。需满足所有五个条件;缺少任一条件则回退为修改
延迟到跟进Issue合理担忧,但解决方案需要基础设施(如真实DB测试、 schema迁移、新后台任务),应放入跟进Issue处理
向用户报告:
状态:convergence-audit(检测到循环竞争)

第(N-1)轮修复代码面在本次迭代中引发N个竞争类问题(集群):
  <ID> <标识> @ <file:line>  window=<w> damage=<d> recovery=<r>
  ...

每个问题的审计结论:
  <ID>: 修改    — <理由:不对称/修复安全等>
  <ID>: 不修改   — <Wontfix模板的五字段摘要>
  <ID>: 延迟处理     — <跟进Issue建议>

请决策:
  (a) 接受所有结论(使用模板发布不修改回复,处理需修改项,创建延迟处理Issue)
  (m) 修改特定结论——说明问题ID和目标结论
  (s) 发布——接受所有不修改和延迟处理项,标记PR为就绪可合并
  (p) 覆盖审计——视为正常迭代,回到步骤1
Gate B不触发的情况:
  • 集群包含任何不对称问题——不对称(安全/数据丢失/数据完整性/计费)问题会绕过收敛规则,就像在pr-review的丢弃信号(B)中一样。需呈现这些问题并进行修改
  • 迭代次数<3
    ——早期迭代属于正常评审节奏
  • 无竞争类元标签且无标识/类别关键词匹配——将关卡范围限制为实际竞争领域;非竞争收敛(如命名争议)回退到Gate A或正常流程
原理:Gate A捕获所有问题均为卫生类的迭代;Gate B捕获针对新增代码面的竞争类问题集群的迭代。两者共同覆盖两种主要的自我反馈失败模式,同时不会抑制真正的不对称问题或第三方信号(无pr-review元数据的Codex/SonarQube/Snyk问题会绕过两个关卡,通过正常的步骤2去重+3轮升级流程处理)。

4.6 Wontfix Template

4.6 Wontfix模板

Used by step 4.5 Gate B (Convergence Audit) and as a manual reply template for race / state / sweep / atomic class findings where modification would introduce new race surfaces.
Five fields are minimum-required. Missing any one → finding deserves modification, not wontfix.
Wontfix — deliberate trade-off.

Race window: <ms / s / min / hr> between <op A> and <op B>.
Precondition: <only fires when X is in Y state for N+ time>
Damage if race fires: <not data-loss / not deadlock / only X happens N seconds earlier than ideal>
Recovery path: <new event / cron sweeper / next webhook covers it; user-visible behavior unchanged>

Asymmetric check: <not security / not data-loss / not data-integrity / not billing>
Mitigation cost: <atomic re-check / two-step merge into transaction is doable, but introduces new race-of-race surface at X>

Acknowledged as known trade-off; fault tolerance covers genuinely <abandoned / stranded / dropped> class.
Tracking: <if needed, opened followup issue X>
Field semantics:
  • Race window — concrete time estimate, not "small".
    ms
    for tight CAS,
    min
    for sweep cycle gap,
    hr
    for cron lifecycle. Reviewer needs the magnitude to judge.
  • Precondition — what state the system must already be in for the race to even matter. If precondition is rare or already-degraded, race is acceptable.
  • Damage — concrete user / data observation, not "could be a problem". If you cannot describe damage in one line, the finding may not actually be Reachable.
  • Recovery path — must name a concrete mechanism (next webhook / sweeper run / cron / fault-tolerant retry). "It'll probably be fine" is not a recovery path.
  • Asymmetric check — explicit declaration that finding is not security / data-integrity / billing. Wontfix is INVALID for Asymmetric findings — modify them.
  • Mitigation cost — name the new race surface the proposed fix would introduce. "race-of-race" is the load-bearing reasoning.
Reference example: PR #148
sweepAbandonedTasklessThreads
two-UPDATE race — Codex flagged "re-check thread state before abandoning queued events"; race window was milliseconds between two sweep UPDATEs, precondition was thread already stranded 1+ hour, damage was
marginal
(already-stranded events terminalize seconds earlier than ideal), recovery path was new webhook hits reactivation gate. Wontfix posted; PR shipped.
When NOT to use:
  • Any of the five fields cannot be filled honestly → finding is real, modify it. Wontfix Template is for the specific case where modification introduces equivalent or worse race surface; it is NOT a generic decline template.
  • Dev-stage self-review context (no separate session between code author and verdict reasoner): do NOT fill these fields from main-session memory. Babysit normally runs in a session separate from the code author, which is what makes Wontfix Template safe to apply — the babysit session has no prior commitment to the design and can honestly reason about damage / recovery / mitigation cost. In a dev-stage self-review loop (same session wrote the code AND is reasoning about findings), author-narrative bias compounds — bug-free framing produces the strongest detection drop among framing conditions tested across 6 LLMs (Mitropoulos et al., Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review, arXiv:2603.18740). Pause and either (a) hand off to a separate session for the verdict, or (b) use a fresh-spawn verdict subagent that independently derives
    damage
    /
    recovery
    /
    mitigation cost
    from code, not from the finding object's fields. The Deriver-pattern verdict subagent is not built as a skill yet — until it is, treat dev-stage wontfix decisions as advisory and surface them to the user.
用于步骤4.5的Gate B(收敛审计),以及手动回复竞争/状态/扫描/原子类问题——这些问题的修改会引入新的竞争代码面。
五个字段为必填项。缺少任一字段→问题需修改,而非不处理。
Wontfix — deliberate trade-off.

Race window: <ms / s / min / hr> between <op A> and <op B>.
Precondition: <only fires when X is in Y state for N+ time>
Damage if race fires: <not data-loss / not deadlock / only X happens N seconds earlier than ideal>
Recovery path: <new event / cron sweeper / next webhook covers it; user-visible behavior unchanged>

Asymmetric check: <not security / not data-loss / not data-integrity / not billing>
Mitigation cost: <atomic re-check / two-step merge into transaction is doable, but introduces new race-of-race surface at X>

Acknowledged as known trade-off; fault tolerance covers genuinely <abandoned / stranded / dropped> class.
Tracking: <if needed, opened followup issue X>
字段语义
  • Race window——具体时间估算,而非“很小”。CAS竞争用
    ms
    ,扫描周期间隙用
    min
    , cron生命周期用
    hr
    。评审者需要量级来判断。
  • Precondition——系统必须处于何种状态,竞争才会产生影响。如果前提条件罕见或系统已降级,竞争是可接受的。
  • Damage——具体的用户/数据影响,而非“可能有问题”。如果无法用一行描述影响,该问题可能并非真正的Reachable。
  • Recovery path——必须指定具体机制(下一个Webhook/扫描器运行/cron/容错重试)。“可能没问题”不是有效的恢复路径。
  • Asymmetric check——明确声明问题不涉及安全/数据完整性/计费。不对称问题不能使用Wontfix——必须修改。
  • Mitigation cost——说明提议的修复会引入的新竞争代码面。“循环竞争”是核心理由。
参考示例:PR #148
sweepAbandonedTasklessThreads
的两次UPDATE竞争——Codex标记“在放弃排队事件前重新检查线程状态”;竞争窗口为两次扫描UPDATE之间的毫秒级,前提条件是线程已滞留1小时以上,影响为
marginal
(已滞留事件提前几秒终止),恢复路径为新Webhook触发重新激活关卡。发布Wontfix回复;PR已合并。
不适用场景
  • 无法如实填写任一必填字段→问题真实存在,需修改。Wontfix模板仅适用于修改会引入同等或更严重竞争代码面的特定情况;并非通用拒绝模板。
  • 开发阶段自评审场景(代码作者与结论推理者无独立会话):请勿从主会话内存填写这些字段。Babysit通常在与代码作者独立的会话中运行,这是Wontfix模板安全应用的前提——babysit会话对设计无先入为主的承诺,能如实推理影响/恢复/缓解成本。在开发阶段自评审循环(同一会话既编写代码又推理问题)中,作者叙事偏差会加剧——无bug框架在6个LLM测试中产生最强的检测丢弃效果(Mitropoulos等人,Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code ReviewarXiv:2603.18740)。请暂停并选择(a)将结论交给独立会话处理,或(b)使用新生成的结论子代理,从代码而非问题对象字段独立推导
    damage
    /
    recovery
    /
    mitigation cost
    。Deriver模式的结论子代理尚未作为技能实现——在此之前,将开发阶段的不修改决策视为建议并呈现给用户。

5. Decide

5. 决策

  • ✅ All checks green AND all Valid feedback resolved → Report (step 6)
  • 🟡 New comment / check status changed mid-cycle → back to step 1
  • 🔴 Hit 3-failure stop, invisible-findings gate, dedup 3-round escalation, OR something genuinely needs human judgment → Report with
    blocked
    /
    needs-user-input
  • ✅ 所有检查通过且所有有效反馈已解决 → 报告(步骤6)
  • 🟡 循环过程中出现新评论/检查状态变更 → 返回步骤1
  • 🔴 触发3次失败停止规则、不可见问题关卡、去重3轮升级,或确实需要人工判断 → 报告并标记
    blocked
    /
    needs-user-input

6. Report (end of run, not auto-merge)

6. 报告(运行结束,不会自动合并)

PR/MR: <link>
Status: ready-to-merge | needs-user-input | blocked
Checks: <green>/<total>
Addressed (this run): <list of SHA → comment ref + one-liner>

Awaiting your decision:
  Discuss (I did NOT reply): <list with comment text + my read of the ambiguity>
  Out-of-scope: <list>  → open follow-up issues for any of these? (y/N per item)

Blockers (if any): <description + what I tried>

Next command: gh pr merge --squash <id>   # or: glab mr merge <id>
After the report, if there are out-of-scope items, ask once: open follow-up issues for which ones? Open only the ones the user picks (
gh issue create
/
glab issue create
), and edit the report's reply on each MR/PR comment to link the new issue.
PR/MR: <链接>
状态: ready-to-merge | needs-user-input | blocked
检查项: <通过数>/<总数>
已处理(本次运行): <SHA列表 → 评论引用 + 单行描述>

等待你的决策:
  待讨论(我未回复): <列表包含评论文本 + 我对歧义的理解>
  超出范围: <列表>  → 是否为这些项创建跟进Issue?(逐项选择y/N)

阻塞点(如有): <描述 + 我已尝试的操作>

下一步命令: gh pr merge --squash <id>   # 或: glab mr merge <id>
报告后,如果存在超出范围的项,询问一次:为哪些项创建跟进Issue?仅创建用户选择的项(
gh issue create
/
glab issue create
),并在每个MR/PR评论的回复中编辑报告内容以链接新Issue。

What I never do without asking

未经询问绝不会执行的操作

  • Reply, dismiss, or implement based on Discuss items — list them, stop.
  • Open follow-up issues for Out-of-scope items without confirming the list with the user first.
  • Merge the PR/MR. Even when fully green, report ready-to-merge and let the user run the merge.
  • Force-push, amend pushed commits, skip hooks (
    --no-verify
    ), or bypass signing.
  • Loop forever — if a cycle produces no new work and nothing is resolved, stop and report.
  • 基于待讨论项进行回复、驳回或实现——仅列出这些项,停止操作。
  • 未先与用户确认列表就为超出范围项创建跟进Issue。
  • 合并PR/MR。即使完全通过检测,也仅报告就绪可合并,由用户执行合并操作。
  • 强制推送、修改已推送提交、跳过钩子(
    --no-verify
    )或绕过签名。
  • 无限循环——如果一轮循环未产生新工作且未解决任何问题,停止并报告。