pr-babysit
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesepr-babysit
pr-babysit
Babysit a PR/MR until CI is green AND every valid reviewer feedback is addressed. Supports GitHub PR (via ) and GitLab MR (via ) — auto-detect by (github.com → gh; gitlab.com / self-hosted GitLab → glab).
ghglabgit remote get-url origin持续监控PR/MR,直到CI检测通过且所有有效的评审反馈都得到处理。支持GitHub PR(通过工具)和GitLab MR(通过工具)——通过自动检测平台(github.com → 使用gh;gitlab.com/自托管GitLab → 使用glab)。
ghglabgit remote get-url originArguments
参数
$ARGUMENTS- empty → current branch's open PR/MR
- a number → PR/MR by IID on the current repo
- a URL → parse owner/repo + IID from it
If multiple PRs/MRs match the current branch, stop and ask which one.
$ARGUMENTS- 空值 → 当前分支对应的已打开PR/MR
- 数字 → 当前仓库中对应IID的PR/MR
- URL → 从中解析仓库所有者/仓库名称 + IID
如果当前分支对应多个PR/MR,则停止操作并询问选择哪一个。
Reply Language
回复语言
Reply prose posted to PR/MR threads — the / / content following each reply-template anchor, plus the prose inside Wontfix Template fields — renders in the PR/MR description's primary language. Everything else stays English: the anchor phrases themselves, Wontfix Template field labels, conventional commit prefixes, the race meta tag, P-codes / severity / justification tokens (same canonical set as 's Output Language).
<what changed><reason><evidence>pr-reviewFallback when the PR description lacks substantive prose: linked issue body, then English.
Terminal output (step 6 run report, Gate A / Gate B audit messages, invisible-findings prompt) stays English — those go to the dispatcher session, not the PR.
发布到PR/MR线程中的回复正文——即每个回复模板锚点后的//内容,以及Wontfix模板字段内的正文——将以PR/MR描述的主要语言呈现。其余内容保持英文:锚点短语本身、Wontfix模板字段标签、规范提交前缀、竞争元标签、P代码/严重程度/理由标记(与的输出语言使用的标准集合一致)。
<what changed><reason><evidence>pr-review如果PR描述中没有实质性正文,则依次回退为:关联的Issue正文、英文。
终端输出(步骤6的运行报告、Gate A/Gate B审计消息、不可见问题提示)保持英文——这些内容发送给调度会话,而非PR本身。
Loop
循环流程
1. Snapshot
1. 快照
Fetch: PR/MR metadata + head SHA, all checks / pipeline jobs, all review comments, all general comments, all review threads / discussions (with resolved state), the current user login.
For each thread you've previously replied to in this PR, cache — used by step 2 dedup.
{file path, rule code or primary keyword, your reply summary}Filter on content, not author:
- Drop comments whose body is only CI status lines (build green/red, deploy event, "pipeline succeeded"). That is noise.
- Keep any comment containing actionable signals (/
Suggestion/Warning/Critical/Issue/quality gate/ line-level review notes) — even from bot accounts. AI review bots, SonarQube, Snyk are content bots, not noise bots.failed - Drop your own past replies and already-resolved threads.
获取:PR/MR元数据 + 最新提交SHA、所有检查/流水线任务、所有评审评论、所有通用评论、所有评审线程/讨论(包含已解决状态)、当前用户登录信息。
针对此PR中你之前回复过的每个线程,缓存——供步骤2去重使用。
{文件路径, 规则代码或主要关键词, 你的回复摘要}基于内容而非作者进行过滤:
- 丢弃仅包含CI状态行的评论(如构建成功/失败、部署事件、"pipeline succeeded")。这些属于无效噪声。
- 保留任何包含可操作信号的评论(/
Suggestion/Warning/Critical/Issue/quality gate/行级评审备注)——即使来自机器人账号。AI评审机器人、SonarQube、Snyk属于内容类机器人,而非噪声类机器人。failed - 丢弃你自己过去的回复以及已解决的线程。
2. Triage
2. 分类处理
Hard gate — invisible findings: if a check is failing but the actual finding list lives in an external dashboard your CLI cannot reach (SonarQube, Snyk, DataDog test reports, etc. — no token, no API endpoint accessible), STOP immediately and ask the user to paste the findings. Do not reproduce locally and process "guessed" findings as a complete cycle. Do not process unrelated feedback first while the invisible finding sits unaddressed. Root-cause diagnosis assumes you can see the finding; when you can't, this gate fires first.
Cross-round dedup — for each new comment, check the cache from step 1:
- Same file + same rule code (e.g. ) OR same primary keyword as a thread you already replied to → treat as duplicate. Reply with one line linking back to the earlier thread, do not re-implement or re-explain.
CA1031 - Same issue surviving 3 rounds despite fix attempts → escalate to (the bot is stuck; user has to break the tie).
needs-user-input
Feedback — bucket each remaining unresolved comment:
- Valid — bug, security, logic error, clear actionable suggestion
- Discuss — ambiguous, possible source misread, design tradeoff, scope unclear → do NOT reply autonomously, do NOT implement — collect for user
- Out-of-scope — clearly outside this PR's stated goal → collect for user
Checks — for each failing check: pull the failure log via CLI, diagnose root cause before attempting a fix (no patch without a named cause). Distinguish real failure vs flaky; only retry on evidence of flake. If the failure log doesn't contain the actual findings → invisible-findings gate above.
硬性关卡——不可见问题:如果某项检查失败,但实际问题列表位于CLI无法访问的外部仪表板(如SonarQube、Snyk、DataDog测试报告等——无令牌、无可用API端点),请立即停止并要求用户粘贴问题内容。请勿在本地重现并将“猜测的”问题作为完整周期处理。请勿先处理无关反馈而忽略未解决的不可见问题。根本原因诊断需要你能看到问题内容;当无法查看时,此关卡优先触发。
跨轮次去重——针对每条新评论,检查步骤1中的缓存:
- 相同文件 + 相同规则代码(如)或与你已回复过的线程相同的主要关键词 → 视为重复内容。回复一行内容链接到之前的线程,无需重新实现或解释。
CA1031 - 同一问题经过3轮修复尝试仍存在 → 升级为(机器人已陷入僵局;用户需介入解决)。
needs-user-input
反馈分类——将剩余未解决的评论分为三类:
- 有效—— bug、安全问题、逻辑错误、明确的可操作建议
- 待讨论—— 模糊不清、可能误解源代码、设计权衡、范围不明确 → 请勿自主回复,请勿实现——收集后提交给用户
- 超出范围—— 明显超出此PR既定目标 → 收集后提交给用户
检查项处理——针对每个失败的检查项:通过CLI拉取失败日志,在尝试修复前诊断根本原因(无明确原因不提交补丁)。区分真实失败与偶发失败;仅在有证据证明是偶发失败时重试。如果失败日志不包含实际问题内容 → 触发上述不可见问题关卡。
3. Address (Valid + real failures only)
3. 处理(仅针对有效项和真实失败)
For each item:
- Implement the fix.
- Small commit, conventional commits format, one logical change per commit. Type cheat: behaviour change → ; behaviour-preserving structure / readability (incl. lint suppressions) →
fix; non-source (CI, husky, tooling) →refactor; pure docs →chore.docs - Reply on the originating comment / discussion thread (template table below).
- Verify the reply landed inside the thread, not as a top-level note (see "Reply endpoints" below).
Reply endpoints by platform — mismatching these creates orphan top-level notes:
| Action | GitHub | GitLab |
|---|---|---|
| Reply to a review thread | | |
| New top-level comment | | |
After posting a reply, the discussion / review thread back and confirm your note is in the thread (note count ≥ 2, your username present). If it landed top-level → delete it and retry on the right endpoint.
GETReply templates — pick by situation:
| Situation | Template |
|---|---|
| Adopted and fixed | |
| Deliberate design, won't change | |
| Same issue already replied earlier in this PR | |
| Bot premise wrong, won't fix | |
The Deliberate / Won't-fix templates exist to keep tone neutral and evidence-led — without a template these tend to drift into defensive or implementation-dump replies.
Anchor phrases stay English; only the prose after each anchor adapts to the PR description's language. See Reply Language.
Lint / warning suppression — any , , , , etc. must include:
#pragma// eslint-disable# noqa@SuppressWarnings- (a) inline rationale comment on the same line, AND
- (b) reference to the spec section OR an existing codebase precedent () using the same suppression for the same reason.
file:line
If neither (a) nor (b) is available → do not suppress, refactor instead. When (b) applies, cite the precedent in the commit message.
file:lineHard rules:
- No on already-pushed commits
--amend - No
--force-push - Don't mark GitLab discussions resolved unless the reviewer explicitly asked for that
- Don't close any reviewer thread without a reply
- 3 failed attempts on the same fix → STOP, document what failed + assumptions to question, hand back to user (per global CLAUDE.md)
针对每个项:
- 实施修复。
- 小型提交,采用规范提交格式,每次提交对应一个逻辑变更。类型速查:行为变更 → ;保持行为不变的结构/可读性优化(包括禁用lint检查)→
fix;非源代码修改(CI、husky、工具配置)→refactor;纯文档修改 →chore。docs - 在原评论/讨论线程中回复(见下方模板表格)。
- 验证回复已发布在线程内,而非顶级评论(见下方“回复端点”)。
各平台回复端点——使用错误端点会导致孤立的顶级评论:
| 操作 | GitHub | GitLab |
|---|---|---|
| 回复评审线程 | | |
| 新建顶级评论 | | |
发布回复后,该讨论/评审线程并确认你的回复已包含在内(评论数≥2,且你的用户名存在)。如果回复被发布为顶级评论 → 删除它并使用正确端点重试。
GET回复模板——根据场景选择:
| 场景 | 模板 |
|---|---|
| 已采纳并修复 | |
| 故意设计,不做修改 | |
| 同一问题已在此PR中回复过 | |
| 机器人前提错误,不做修复 | |
故意设计/不修复模板用于保持语气中立且基于证据——若无模板,回复容易偏向防御性或堆砌实现细节。
锚点短语保持英文;仅锚点后的正文适配PR描述的语言。详见回复语言。
Lint/警告抑制——任何、、、等抑制指令必须包含:
#pragma// eslint-disable# noqa@SuppressWarnings- (a) 同一行的内联理由注释,且
- (b) 引用规范章节或现有代码库中因相同理由使用相同抑制的先例()。
file:line
如果(a)和(b)都无法满足 → 请勿使用抑制,改为重构。当(b)适用时,在提交信息中引用先例的。
file:line硬性规则:
- 不对已推送的提交使用
--amend - 不使用
--force-push - 除非评审者明确要求,否则不要标记GitLab讨论为已解决
- 无回复时不要关闭任何评审线程
- 同一修复尝试失败3次 → 停止,记录失败内容和需要质疑的假设,交还给用户(遵循全局CLAUDE.md)
4. Push & wait
4. 推送并等待
git pushgh pr checks --watchhead_pipeline.status执行。轮询CI直到进入终端状态(GitHub:;GitLab:轮询直到成功/失败/取消)。
git pushgh pr checks --watchhead_pipeline.status4.1 Record prior_fix_range
prior_fix_range4.1 记录prior_fix_range
prior_fix_rangeAfter step 3's fix commits land and step 4 has pushed them, capture the SHA range covering this iter's fixes. This range is the canonical source-of-truth for two downstream consumers:
- Next iter's pr-review invocation — pass as input so pr-review's incremental mode can apply drop signal (B) self-introduced surface
prior_fix_range - Gate B in step 4.5 below — same range, same line-level attribution mechanism
bash
undefined步骤3的修复提交完成且步骤4已推送后,捕获本次迭代修复对应的SHA范围。此范围是两个下游环节的权威来源:
- 下一轮pr-review调用——作为输入传递,以便pr-review的增量模式应用丢弃信号(B)排除自引入的代码面
prior_fix_range - 下方步骤4.5中的Gate B——使用相同范围和行级归因机制
bash
undefinedAfter step 4 push, before invoking the next pr-review iter:
步骤4推送后,调用下一轮pr-review之前:
FIRST_FIX_SHA=$(git log --format='%H' "$PREV_HEAD..HEAD" | tail -1) # oldest fix in this iter
LAST_FIX_SHA=$(git rev-parse HEAD) # newest fix in this iter
PRIOR_FIX_RANGE="${FIRST_FIX_SHA}^..${LAST_FIX_SHA}"
Persist `PRIOR_FIX_RANGE` (and `$LAST_FIX_SHA` as the next iter's `$PREV_HEAD`) into the babysit state file or session env. If the iter pushed a single commit, `FIRST_FIX_SHA == LAST_FIX_SHA` and the range collapses to `<sha>^..<sha>`.
If this iter pushed zero commits (CI re-run only) → no fix range to record; skip the Gate B self-introduced check for the next iter, but still run Gate A as normal.
**Why not compute lazily at Gate B**: computing at push time anchors the range to the exact commits that addressed iter (N-1) findings. Lazy computation at Gate B time could pick up unrelated commits if the user manually edits the branch between iters.FIRST_FIX_SHA=$(git log --format='%H' "$PREV_HEAD..HEAD" | tail -1) # 本次迭代中最早的修复提交
LAST_FIX_SHA=$(git rev-parse HEAD) # 本次迭代中最新的修复提交
PRIOR_FIX_RANGE="${FIRST_FIX_SHA}^..${LAST_FIX_SHA}"
将`PRIOR_FIX_RANGE`(以及`$LAST_FIX_SHA`作为下一轮迭代的`$PREV_HEAD`)保存到babysit状态文件或会话环境中。如果本次迭代仅推送了一个提交,则`FIRST_FIX_SHA == LAST_FIX_SHA`,范围简化为`<sha>^..<sha>`。
如果本次迭代未推送任何提交(仅重新运行CI)→ 无修复范围可记录;跳过下一轮迭代的Gate B自引入检查,但仍正常运行Gate A。
**为何不在Gate B时延迟计算**:在推送时计算范围可将其锚定到解决第(N-1)轮问题的精确提交。若在Gate B时延迟计算,用户可能在迭代间隙手动编辑分支,导致范围包含无关提交。4.5 Self-feedback loop gates
4.5 自我反馈循环关卡
After pushing this iter's fixes and waiting for CI green, before looping back to step 1, run TWO sub-gates that catch different self-feedback failure modes. Without these, an automated reviewer paired with an automated babysitter can spend N iterations either chasing test-hygiene nits (Gate A) or chasing race-of-race surfaces (Gate B).
Both gates parse pr-review's inline comments on this PR:
bash
gh api repos/$OWNER/$REPO/pulls/$N/comments \
--jq '[.[] | select(.body | contains("<!-- pr-review:finding-id=")) |
{id, created_at, path, line, body,
justification: (.body | capture("<!-- pr-review:justification=(?<j>[^ ]+) -->").j),
race_meta: (.body | capture("\\[window=(?<w>[^,]+), damage=(?<d>[^,]+), recovery=(?<r>[^\\]]+)\\]") // null)}]'Take only findings created since the previous iter's HEAD sha (the new ones this iter introduced).
推送本次迭代的修复并等待CI通过后,在回到步骤1之前,运行两个子关卡以捕获不同的自我反馈失败模式。若无这些关卡,自动化评审与自动化监控工具可能会花费N轮迭代来处理测试卫生细节(Gate A)或循环引入竞争问题(Gate B)。
两个关卡均解析此PR上pr-review的内联评论:
bash
gh api repos/$OWNER/$REPO/pulls/$N/comments \
--jq '[.[] | select(.body | contains("<!-- pr-review:finding-id=")) |
{id, created_at, path, line, body,
justification: (.body | capture("<!-- pr-review:justification=(?<j>[^ ]+) -->").j),
race_meta: (.body | capture("\\[window=(?<w>[^,]+), damage=(?<d>[^,]+), recovery=(?<r>[^\\]]+)\\]") // null)}]'仅保留自上一轮迭代HEAD sha以来创建的问题(即本次迭代引入的新问题)。
Gate A: Diminishing Returns (only-hygiene iter)
Gate A:收益递减(仅卫生类迭代)
Fires when ALL of:
- ≥1 new pr-review finding this iter
- ZERO new findings have
justification ∈ {Reachable, Precedent, Asymmetric, Historical} - ALL new findings are (or missing — treat missing as Hygiene)
justification=Hygiene
Action: STOP automatic loop, skip step 5's normal decision, jump to step 6 with:
Status: needs-user-input (diminishing returns)
This iter's pr-review surfaced only hygiene findings — no Reachable / Precedent /
Asymmetric / Historical justification on any new finding.
Hygiene followups (N):
<list — id, slug, file:line, one-line failure mode>
Continuing the loop will likely surface more hygiene from the same code paths.
Your call:
(s) ship — open a single follow-up issue collecting the hygiene items, mark PR ready-to-merge
(p) polish — keep looping (override the gate for this round)
(r) re-review-full — challenge whether the self-loop missed anything (force `mode=full` on next pr-review)触发条件:同时满足以下所有条件
- 本次迭代产生≥1个新的pr-review问题
- 所有新问题的均不属于
justification{Reachable, Precedent, Asymmetric, Historical} - 所有新问题的(或缺失——缺失视为Hygiene)
justification=Hygiene
操作:停止自动循环,跳过步骤5的常规决策,直接进入步骤6并输出:
状态:needs-user-input(收益递减)
本次迭代的pr-review仅发现卫生类问题——所有新问题均无Reachable/Precedent/Asymmetric/Historical理由。
待处理的卫生项(N):
<列表——ID、标识、file:line、单行失败模式>
继续循环可能会在相同代码路径中发现更多卫生类问题。
请决策:
(s) 发布——创建一个单独的跟进Issue收集所有卫生项,标记PR为就绪可合并
(p) 优化——继续循环(本次绕过关卡)
(r) 全面重审——检查自我循环是否遗漏内容(强制下一轮pr-review使用`mode=full`)Gate B: Convergence Audit (race-of-race iter)
Gate B:收敛审计(循环竞争迭代)
Catches the failure mode where iter (N-1)'s fix introduces a new race / state-transition surface, the reviewer flags it as a Reachable finding, the next fix introduces yet another race surface, ad infinitum. Gate A does NOT catch this — those findings carry and are individually valid; the divergence is only visible at cluster level.
justification=Reachableprior_fix_rangeFires when ALL of:
- (first two iters are normal review cadence, not divergence)
iter ≥ 3 - ≥ 2 new findings this iter cite inside
file:line— i.e. critiquing iter (N-1)'s freshly-added surfaceprior_fix_range - ≥ 2 of those findings are race-class — detection is OR of:
- (i) carries meta from pr-review's race-class metadata requirement, OR
[window=..., damage=..., recovery=...] - (ii) slug/category keyword-matches one of: , OR
race | TOCTOU | concurren | sweep | lifecycle | state-transition | debounce | claim | lease | fence | stale | orphan | race-window - (iii) (matches the meta-tag prefix even when full meta is malformed) OR
\bwindow=(require co-occurrence to avoid catching DB-transactionatomic.*race | race.*atomicand frontend-viewportatomicnoise)window
- (i) carries
Keyword design notes: bare and bare are deliberately excluded — they false-positive on rate-limiter / viewport / DB-transaction-correctness comments. is the canonical security-race term and matches Codex findings that bypass the meta-tag path. cover distributed-locking vocabulary; cover sweep-race descriptions.
windowatomicTOCTOUdebounce / claim / lease / fencestale / orphanHow to verify file:line inside prior_fix_range:
bash
git diff --name-only $prior_fix_range # files touched
git diff -U0 $prior_fix_range -- <file> # line-level attributionAction: STOP automatic loop, run Convergence Audit for the cluster. For each race-class finding, apply the Wontfix Template five-step decision:
- Window: estimate ms / s / min / hr between the race operations (use the meta tag if present)
- Damage: classify as
data-loss | deadlock | inconsistency | latency | marginal - Asymmetric check: is the failure mode security / data-integrity / billing?
- Mitigation cost: does the proposed fix introduce a new race surface?
- Recovery path: does fault tolerance / next webhook / sweeper cover the race?
Audit verdict per finding:
| Verdict | When |
|---|---|
| modify (Asymmetric) | Justification is Asymmetric (security / data-loss / data-integrity / billing) → ALWAYS modify, regardless of mitigation cost |
| modify (damage gate) | |
| modify (safe fix) | non-Asymmetric, |
| wontfix-with-template | non-Asymmetric + |
| defer-followup | valid concern but resolution requires infrastructure (e.g. real DB test, schema migration, new background job) that belongs to a follow-up issue |
Report to user:
Status: convergence-audit (race-of-race detected)
iter (N-1) fix surface attracted N race-class findings this iter (cluster):
<id> <slug> @ <file:line> window=<w> damage=<d> recovery=<r>
...
Audit verdict per finding:
<id>: modify — <reason: Asymmetric / mitigation safe / etc>
<id>: wontfix — <five-field summary from Wontfix Template>
<id>: defer — <followup issue suggestion>
Your call:
(a) accept all verdicts (post wontfix replies via template, address modify items, open defer issues)
(m) modify a specific verdict — say which finding-id and target verdict
(s) ship — accept all wontfix + defer as-is, mark PR ready-to-merge
(p) override audit — treat as normal iter, loop back to step 1Gate B does NOT fire when:
- Cluster contains any Asymmetric finding — Asymmetric (security / data-loss / data-integrity / billing) bypasses the convergence escape just as it does in pr-review's drop signal (B). Surface them and modify
- — early iters are normal review cadence
iter < 3 - Race-class meta is missing AND no slug/category keyword match — keeps gate narrow to actual race domain; non-race convergence (e.g. naming-bikeshed) falls back to Gate A or normal flow
Rationale: Gate A catches iters where everything is hygiene; Gate B catches iters where individually-valid race findings cluster on freshly-introduced surfaces. Together they cover the two main self-feedback failure modes without suppressing genuine Asymmetric findings or third-party signal (Codex / SonarQube / Snyk findings without pr-review's metadata bypass both gates and route through normal step 2 dedup + 3-round escalation).
捕获第(N-1)轮修复引入新竞争/状态转换代码面,评审者标记为Reachable问题,下一轮修复又引入新竞争代码面的无限循环失败模式。Gate A无法捕获此情况——这些问题带有且单独来看是有效的;仅从集群层面才能发现发散趋势。
justification=Reachableprior_fix_range触发条件:同时满足以下所有条件
- (前两轮属于正常评审节奏,不属于发散)
迭代次数≥3 - 本次迭代产生≥2个新问题,且这些问题引用的位于
file:line内——即针对第(N-1)轮新增代码面的批评prior_fix_range - 其中≥2个问题属于竞争类——满足以下任一检测条件:
- (i) 带有pr-review竞争类元数据要求的元标签,或
[window=..., damage=..., recovery=...] - (ii) 标识/类别关键词匹配以下任一:,或
race | TOCTOU | concurren | sweep | lifecycle | state-transition | debounce | claim | lease | fence | stale | orphan | race-window - (iii) (即使元标签格式错误也匹配前缀)或
\bwindow=(要求同时出现以避免误判DB事务atomic.*race | race.*atomic和前端视窗atomic相关评论)window
- (i) 带有pr-review竞争类元数据要求的
关键词设计说明:故意排除单独的和——它们会误判限流器/视窗/DB事务正确性相关评论。是标准的安全竞争术语,可匹配绕过元标签路径的Codex问题。涵盖分布式锁相关词汇;涵盖扫描竞争描述。
windowatomicTOCTOUdebounce / claim / lease / fencestale / orphan如何验证file:line是否在prior_fix_range内:
bash
git diff --name-only $prior_fix_range # 被修改的文件
git diff -U0 $prior_fix_range -- <file> # 行级归因操作:停止自动循环,对集群执行收敛审计。针对每个竞争类问题,应用Wontfix模板五步决策:
- 窗口:估算竞争操作之间的时间(ms/s/min/hr,如有元标签则使用其中的值)
- 影响:分类为
data-loss | deadlock | inconsistency | latency | marginal - 不对称检查:失败模式是否涉及安全/数据完整性/计费?
- 缓解成本:提议的修复是否会引入新的竞争代码面?
- 恢复路径:容错机制/下一个Webhook/扫描器是否能覆盖此竞争?
每个问题的审计结论:
| 结论 | 适用场景 |
|---|---|
| 修改(不对称) | 理由为Asymmetric(安全/数据丢失/数据完整性/计费)→ 无论缓解成本如何,必须修改 |
| 修改(影响关卡) | |
| 修改(安全修复) | 非不对称、 |
| 使用Wontfix模板不修改 | 非不对称 + |
| 延迟到跟进Issue | 合理担忧,但解决方案需要基础设施(如真实DB测试、 schema迁移、新后台任务),应放入跟进Issue处理 |
向用户报告:
状态:convergence-audit(检测到循环竞争)
第(N-1)轮修复代码面在本次迭代中引发N个竞争类问题(集群):
<ID> <标识> @ <file:line> window=<w> damage=<d> recovery=<r>
...
每个问题的审计结论:
<ID>: 修改 — <理由:不对称/修复安全等>
<ID>: 不修改 — <Wontfix模板的五字段摘要>
<ID>: 延迟处理 — <跟进Issue建议>
请决策:
(a) 接受所有结论(使用模板发布不修改回复,处理需修改项,创建延迟处理Issue)
(m) 修改特定结论——说明问题ID和目标结论
(s) 发布——接受所有不修改和延迟处理项,标记PR为就绪可合并
(p) 覆盖审计——视为正常迭代,回到步骤1Gate B不触发的情况:
- 集群包含任何不对称问题——不对称(安全/数据丢失/数据完整性/计费)问题会绕过收敛规则,就像在pr-review的丢弃信号(B)中一样。需呈现这些问题并进行修改
- ——早期迭代属于正常评审节奏
迭代次数<3 - 无竞争类元标签且无标识/类别关键词匹配——将关卡范围限制为实际竞争领域;非竞争收敛(如命名争议)回退到Gate A或正常流程
原理:Gate A捕获所有问题均为卫生类的迭代;Gate B捕获针对新增代码面的竞争类问题集群的迭代。两者共同覆盖两种主要的自我反馈失败模式,同时不会抑制真正的不对称问题或第三方信号(无pr-review元数据的Codex/SonarQube/Snyk问题会绕过两个关卡,通过正常的步骤2去重+3轮升级流程处理)。
4.6 Wontfix Template
4.6 Wontfix模板
Used by step 4.5 Gate B (Convergence Audit) and as a manual reply template for race / state / sweep / atomic class findings where modification would introduce new race surfaces.
Five fields are minimum-required. Missing any one → finding deserves modification, not wontfix.
Wontfix — deliberate trade-off.
Race window: <ms / s / min / hr> between <op A> and <op B>.
Precondition: <only fires when X is in Y state for N+ time>
Damage if race fires: <not data-loss / not deadlock / only X happens N seconds earlier than ideal>
Recovery path: <new event / cron sweeper / next webhook covers it; user-visible behavior unchanged>
Asymmetric check: <not security / not data-loss / not data-integrity / not billing>
Mitigation cost: <atomic re-check / two-step merge into transaction is doable, but introduces new race-of-race surface at X>
Acknowledged as known trade-off; fault tolerance covers genuinely <abandoned / stranded / dropped> class.
Tracking: <if needed, opened followup issue X>Field semantics:
- Race window — concrete time estimate, not "small". for tight CAS,
msfor sweep cycle gap,minfor cron lifecycle. Reviewer needs the magnitude to judge.hr - Precondition — what state the system must already be in for the race to even matter. If precondition is rare or already-degraded, race is acceptable.
- Damage — concrete user / data observation, not "could be a problem". If you cannot describe damage in one line, the finding may not actually be Reachable.
- Recovery path — must name a concrete mechanism (next webhook / sweeper run / cron / fault-tolerant retry). "It'll probably be fine" is not a recovery path.
- Asymmetric check — explicit declaration that finding is not security / data-integrity / billing. Wontfix is INVALID for Asymmetric findings — modify them.
- Mitigation cost — name the new race surface the proposed fix would introduce. "race-of-race" is the load-bearing reasoning.
Reference example: PR #148 two-UPDATE race — Codex flagged "re-check thread state before abandoning queued events"; race window was milliseconds between two sweep UPDATEs, precondition was thread already stranded 1+ hour, damage was (already-stranded events terminalize seconds earlier than ideal), recovery path was new webhook hits reactivation gate. Wontfix posted; PR shipped.
sweepAbandonedTasklessThreadsmarginalWhen NOT to use:
- Any of the five fields cannot be filled honestly → finding is real, modify it. Wontfix Template is for the specific case where modification introduces equivalent or worse race surface; it is NOT a generic decline template.
- Dev-stage self-review context (no separate session between code author and verdict reasoner): do NOT fill these fields from main-session memory. Babysit normally runs in a session separate from the code author, which is what makes Wontfix Template safe to apply — the babysit session has no prior commitment to the design and can honestly reason about damage / recovery / mitigation cost. In a dev-stage self-review loop (same session wrote the code AND is reasoning about findings), author-narrative bias compounds — bug-free framing produces the strongest detection drop among framing conditions tested across 6 LLMs (Mitropoulos et al., Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review, arXiv:2603.18740). Pause and either (a) hand off to a separate session for the verdict, or (b) use a fresh-spawn verdict subagent that independently derives /
damage/recoveryfrom code, not from the finding object's fields. The Deriver-pattern verdict subagent is not built as a skill yet — until it is, treat dev-stage wontfix decisions as advisory and surface them to the user.mitigation cost
用于步骤4.5的Gate B(收敛审计),以及手动回复竞争/状态/扫描/原子类问题——这些问题的修改会引入新的竞争代码面。
五个字段为必填项。缺少任一字段→问题需修改,而非不处理。
Wontfix — deliberate trade-off.
Race window: <ms / s / min / hr> between <op A> and <op B>.
Precondition: <only fires when X is in Y state for N+ time>
Damage if race fires: <not data-loss / not deadlock / only X happens N seconds earlier than ideal>
Recovery path: <new event / cron sweeper / next webhook covers it; user-visible behavior unchanged>
Asymmetric check: <not security / not data-loss / not data-integrity / not billing>
Mitigation cost: <atomic re-check / two-step merge into transaction is doable, but introduces new race-of-race surface at X>
Acknowledged as known trade-off; fault tolerance covers genuinely <abandoned / stranded / dropped> class.
Tracking: <if needed, opened followup issue X>字段语义:
- Race window——具体时间估算,而非“很小”。CAS竞争用,扫描周期间隙用
ms, cron生命周期用min。评审者需要量级来判断。hr - Precondition——系统必须处于何种状态,竞争才会产生影响。如果前提条件罕见或系统已降级,竞争是可接受的。
- Damage——具体的用户/数据影响,而非“可能有问题”。如果无法用一行描述影响,该问题可能并非真正的Reachable。
- Recovery path——必须指定具体机制(下一个Webhook/扫描器运行/cron/容错重试)。“可能没问题”不是有效的恢复路径。
- Asymmetric check——明确声明问题不涉及安全/数据完整性/计费。不对称问题不能使用Wontfix——必须修改。
- Mitigation cost——说明提议的修复会引入的新竞争代码面。“循环竞争”是核心理由。
参考示例:PR #148 的两次UPDATE竞争——Codex标记“在放弃排队事件前重新检查线程状态”;竞争窗口为两次扫描UPDATE之间的毫秒级,前提条件是线程已滞留1小时以上,影响为(已滞留事件提前几秒终止),恢复路径为新Webhook触发重新激活关卡。发布Wontfix回复;PR已合并。
sweepAbandonedTasklessThreadsmarginal不适用场景:
- 无法如实填写任一必填字段→问题真实存在,需修改。Wontfix模板仅适用于修改会引入同等或更严重竞争代码面的特定情况;并非通用拒绝模板。
- 开发阶段自评审场景(代码作者与结论推理者无独立会话):请勿从主会话内存填写这些字段。Babysit通常在与代码作者独立的会话中运行,这是Wontfix模板安全应用的前提——babysit会话对设计无先入为主的承诺,能如实推理影响/恢复/缓解成本。在开发阶段自评审循环(同一会话既编写代码又推理问题)中,作者叙事偏差会加剧——无bug框架在6个LLM测试中产生最强的检测丢弃效果(Mitropoulos等人,Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review,arXiv:2603.18740)。请暂停并选择(a)将结论交给独立会话处理,或(b)使用新生成的结论子代理,从代码而非问题对象字段独立推导/
damage/recovery。Deriver模式的结论子代理尚未作为技能实现——在此之前,将开发阶段的不修改决策视为建议并呈现给用户。mitigation cost
5. Decide
5. 决策
- ✅ All checks green AND all Valid feedback resolved → Report (step 6)
- 🟡 New comment / check status changed mid-cycle → back to step 1
- 🔴 Hit 3-failure stop, invisible-findings gate, dedup 3-round escalation, OR something genuinely needs human judgment → Report with /
blockedneeds-user-input
- ✅ 所有检查通过且所有有效反馈已解决 → 报告(步骤6)
- 🟡 循环过程中出现新评论/检查状态变更 → 返回步骤1
- 🔴 触发3次失败停止规则、不可见问题关卡、去重3轮升级,或确实需要人工判断 → 报告并标记/
blockedneeds-user-input
6. Report (end of run, not auto-merge)
6. 报告(运行结束,不会自动合并)
PR/MR: <link>
Status: ready-to-merge | needs-user-input | blocked
Checks: <green>/<total>
Addressed (this run): <list of SHA → comment ref + one-liner>
Awaiting your decision:
Discuss (I did NOT reply): <list with comment text + my read of the ambiguity>
Out-of-scope: <list> → open follow-up issues for any of these? (y/N per item)
Blockers (if any): <description + what I tried>
Next command: gh pr merge --squash <id> # or: glab mr merge <id>After the report, if there are out-of-scope items, ask once: open follow-up issues for which ones? Open only the ones the user picks ( / ), and edit the report's reply on each MR/PR comment to link the new issue.
gh issue createglab issue createPR/MR: <链接>
状态: ready-to-merge | needs-user-input | blocked
检查项: <通过数>/<总数>
已处理(本次运行): <SHA列表 → 评论引用 + 单行描述>
等待你的决策:
待讨论(我未回复): <列表包含评论文本 + 我对歧义的理解>
超出范围: <列表> → 是否为这些项创建跟进Issue?(逐项选择y/N)
阻塞点(如有): <描述 + 我已尝试的操作>
下一步命令: gh pr merge --squash <id> # 或: glab mr merge <id>报告后,如果存在超出范围的项,询问一次:为哪些项创建跟进Issue?仅创建用户选择的项(/),并在每个MR/PR评论的回复中编辑报告内容以链接新Issue。
gh issue createglab issue createWhat I never do without asking
未经询问绝不会执行的操作
- Reply, dismiss, or implement based on Discuss items — list them, stop.
- Open follow-up issues for Out-of-scope items without confirming the list with the user first.
- Merge the PR/MR. Even when fully green, report ready-to-merge and let the user run the merge.
- Force-push, amend pushed commits, skip hooks (), or bypass signing.
--no-verify - Loop forever — if a cycle produces no new work and nothing is resolved, stop and report.
- 基于待讨论项进行回复、驳回或实现——仅列出这些项,停止操作。
- 未先与用户确认列表就为超出范围项创建跟进Issue。
- 合并PR/MR。即使完全通过检测,也仅报告就绪可合并,由用户执行合并操作。
- 强制推送、修改已推送提交、跳过钩子()或绕过签名。
--no-verify - 无限循环——如果一轮循环未产生新工作且未解决任何问题,停止并报告。