diagnose

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Diagnose

诊断

A discipline for hard bugs. Skip phases only when explicitly justified.

针对疑难Bug的规范化流程。仅在有明确理由时才可跳过某些阶段。

Phase 1 — Build a feedback loop

阶段1 — 构建反馈循环

This is the skill. Everything else is mechanical. If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation all just consume that signal. If you don't have one, no amount of staring at code will save you.

Spend disproportionate effort here. Be aggressive. Be creative. Refuse to give up.

**这是核心技巧。**其他步骤都是机械性的。如果你拥有一个快速、确定、可由Agent执行的Bug通过/失败信号，你就能找到问题根源——二分法、假设验证和插桩都依赖这个信号。如果没有这个信号，再怎么盯着代码看也无济于事。

在此阶段投入大量精力。要主动出击，要富有创意，绝不放弃。

Ways to construct one — try them in roughly this order

构建反馈循环的方法——按大致顺序尝试

Failing test at whatever seam reaches the bug — unit, integration, e2e.
Curl / HTTP script against a running dev server.
CLI invocation with a fixture input, diffing stdout against a known-good snapshot.
Headless browser script (Playwright / Puppeteer) — drives the UI, asserts on DOM/console/network.
Replay a captured trace. Save a real network request / payload / event log to disk; replay it through the code path in isolation.
Throwaway harness. Spin up a minimal subset of the system (one service, mocked deps) that exercises the bug code path with a single function call.
Property / fuzz loop. If the bug is "sometimes wrong output", run 1000 random inputs and look for the failure mode.
Bisection harness. If the bug appeared between two known states (commit, dataset, version), automate "boot at state X, check, repeat" so you can
```
git bisect run
```
it.
Differential loop. Run the same input through old-version vs new-version (or two configs) and diff outputs.
HITL bash script. Last resort. If a human must click, drive them with
```
scripts/hitl-loop.template.sh
```
so the loop is still structured. Captured output feeds back to you.

Build the right feedback loop, and the bug is 90% fixed.

在能触及Bug的任意层级编写失败测试用例——单元测试、集成测试、端到端测试。
针对运行中的开发服务器编写Curl / HTTP脚本。
使用固定输入执行CLI命令，将标准输出与已知正确的快照进行对比。
无头浏览器脚本（Playwright / Puppeteer）——驱动UI，对DOM/控制台/网络进行断言。
重放捕获的跟踪数据。将真实的网络请求/负载/事件日志保存到磁盘；在隔离环境中重放触发Bug的代码路径。
临时测试框架。搭建系统的最小子集（单个服务、模拟依赖），通过单次函数调用触发Bug代码路径。
属性/模糊测试循环。如果Bug表现为“输出偶尔错误”，运行1000次随机输入，寻找失败模式。
二分法测试框架。如果Bug出现在两个已知状态（提交版本、数据集、软件版本）之间，自动化实现“在状态X启动，检查，重复”，以便使用
```
git bisect run
```
进行二分排查。
差异对比循环。将相同输入分别在旧版本和新版本（或两种配置）中运行，对比输出差异。
人机协作（HITL）bash脚本。最后手段。如果必须由人工操作，使用
```
scripts/hitl-loop.template.sh
```
引导操作，确保循环结构清晰。捕获的输出将反馈给你。

构建合适的反馈循环，Bug就已经解决了90%。

Iterate on the loop itself

优化反馈循环本身

Treat the loop as a product. Once you have a loop, ask:

Can I make it faster? (Cache setup, skip unrelated init, narrow the test scope.)
Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".)
Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem, freeze network.)

A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower.

将反馈循环视为产品来打磨。一旦拥有了一个循环，思考：

能否让它更快？（缓存初始化步骤、跳过无关的初始化、缩小测试范围。）
能否让信号更明确？（针对特定症状进行断言，而不是仅判断“未崩溃”。）
能否让它更具确定性？（固定时间、设置随机数种子、隔离文件系统、冻结网络。）

一个30秒的不稳定循环几乎等同于没有循环。一个2秒的确定循环是调试的超能力。

Non-deterministic bugs

非确定性Bug

The goal is not a clean repro but a higher reproduction rate. Loop the trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it's debuggable.

目标不是完美的复现，而是提高复现率。循环触发100次、并行执行、增加压力、缩小时间窗口、注入延迟。50%概率出现的Bug是可调试的；1%概率的则不行——持续提高复现率直到可调试。

When you genuinely cannot build a loop

当确实无法构建循环时

Stop and say so explicitly. List what you tried. Ask the user for: (a) access to whatever environment reproduces it, (b) a captured artifact (HAR file, log dump, core dump, screen recording with timestamps), or (c) permission to add temporary production instrumentation. Do not proceed to hypothesise without a loop.

Do not proceed to Phase 2 until you have a loop you believe in.

明确告知用户这一点。列出你尝试过的方法。向用户请求：(a) 访问能复现Bug的环境，(b) 捕获的工件（HAR文件、日志转储、核心转储、带时间戳的屏幕录制），或(c) 添加临时生产环境插桩的权限。没有循环的情况下，不要进行假设阶段。

在拥有可信的循环之前，不要进入阶段2。

Phase 2 — Reproduce

阶段2 — 复现

Run the loop. Watch the bug appear.

Confirm:

The loop produces the failure mode the user described — not a different failure that happens to be nearby. Wrong bug = wrong fix.
The failure is reproducible across multiple runs (or, for non-deterministic bugs, reproducible at a high enough rate to debug against).
You have captured the exact symptom (error message, wrong output, slow timing) so later phases can verify the fix actually addresses it.

Do not proceed until you reproduce the bug.

运行循环。观察Bug出现。

确认：

循环产生的失败模式与用户描述的一致——不是附近出现的其他失败。找错Bug就会修复错问题。
失败可在多次运行中复现（对于非确定性Bug，复现率足够高以便调试）。
你已捕获了确切症状（错误信息、错误输出、缓慢耗时），以便后续阶段验证修复是否真正解决了问题。

在复现Bug之前，不要进入下一阶段。

Phase 3 — Hypothesise

阶段3 — 假设

Generate 3–5 ranked hypotheses before testing any of them. Single-hypothesis generation anchors on the first plausible idea.

Each hypothesis must be falsifiable: state the prediction it makes.

Format: "If <X> is the cause, then <changing Y> will make the bug disappear / <changing Z> will make it worse."

If you cannot state the prediction, the hypothesis is a vibe — discard or sharpen it.

Show the ranked list to the user before testing. They often have domain knowledge that re-ranks instantly ("we just deployed a change to #3"), or know hypotheses they've already ruled out. Cheap checkpoint, big time saver. Don't block on it — proceed with your ranking if the user is AFK.

在测试任何假设之前，生成3–5个排序后的假设。单一假设会局限于第一个看似合理的想法。

每个假设必须是可证伪的：说明它做出的预测。

格式：“如果<X>是原因，那么<修改Y>会让Bug消失 / <修改Z>会让Bug更严重。”

如果你无法说明预测，这个假设只是一种感觉——丢弃它或细化它。

**在测试前将排序后的列表展示给用户。**他们通常拥有领域知识，可以立即重新排序（“我们刚部署了与#3相关的变更”），或者知道已经排除的假设。这是一个低成本的检查点，能节省大量时间。如果用户未回复，可按自己的排序继续。

Phase 4 — Instrument

阶段4 — 插桩

Each probe must map to a specific prediction from Phase 3. Change one variable at a time.

Tool preference:

Debugger / REPL inspection if the env supports it. One breakpoint beats ten logs.
Targeted logs at the boundaries that distinguish hypotheses.
Never "log everything and grep".

Tag every debug log with a unique prefix, e.g.

[DEBUG-a4f2]

. Cleanup at the end becomes a single grep. Untagged logs survive; tagged logs die.

Perf branch. For performance regressions, logs are usually wrong. Instead: establish a baseline measurement (timing harness,

performance.now()

, profiler, query plan), then bisect. Measure first, fix second.

每个探测必须对应阶段3中的特定预测。每次只改变一个变量。

工具优先级：

如果环境支持，使用调试器 / REPL检查。一个断点胜过十条日志。
在区分不同假设的边界添加针对性日志。
绝不要“记录所有内容再去搜索”。

为每条调试日志添加唯一前缀，例如

[DEBUG-a4f2]

。清理时只需一次搜索即可完成。未标记的日志保留；标记的日志删除。

性能分支处理。对于性能退化问题，日志通常无用。取而代之：建立基准测量（计时框架、

performance.now()

、性能分析器、查询计划），然后进行二分排查。先测量，再修复。

Phase 5 — Fix + regression test

阶段5 — 修复 + 回归测试

Write the regression test before the fix — but only if there is a correct seam for it.

A correct seam is one where the test exercises the real bug pattern as it occurs at the call site. If the only available seam is too shallow (single-caller test when the bug needs multiple callers, unit test that can't replicate the chain that triggered the bug), a regression test there gives false confidence.

If no correct seam exists, that itself is the finding. Note it. The codebase architecture is preventing the bug from being locked down. Flag this for the next phase.

If a correct seam exists:

Turn the minimised repro into a failing test at that seam.
Watch it fail.
Apply the fix.
Watch it pass.
Re-run the Phase 1 feedback loop against the original (un-minimised) scenario.

在修复前编写回归测试——但前提是存在合适的测试切入点。

合适的切入点是指测试能在调用位置真实模拟Bug模式。如果唯一可用的切入点太浅（当Bug需要多个调用者时仅测试单个调用者，单元测试无法复现触发Bug的调用链），此处的回归测试会带来虚假的信心。

**如果没有合适的切入点，这本身就是一个发现。**记录这一点。代码库的架构导致无法锁定Bug。将此标记为下一阶段的任务。

如果存在合适的切入点：

将简化后的复现用例转换为该切入点下的失败测试。
观察测试失败。
应用修复。
观察测试通过。
在原始（未简化的）场景下重新运行阶段1的反馈循环。

Phase 6 — Cleanup + post-mortem

阶段6 — 清理 + 事后总结

Required before declaring done:

Original repro no longer reproduces (re-run the Phase 1 loop)
Regression test passes (or absence of seam is documented)
All
```
[DEBUG-...]
```
instrumentation removed (
```
grep
```
the prefix)
Throwaway prototypes deleted (or moved to a clearly-marked debug location)
The hypothesis that turned out correct is stated in the commit / PR message — so the next debugger learns

Then ask: what would have prevented this bug? If the answer involves architectural change (no good test seam, tangled callers, hidden coupling) hand off to the

/improve-codebase-architecture

skill with the specifics. Make the recommendation after the fix is in, not before — you have more information now than when you started.

在宣告完成前，必须完成以下事项：

原始复现用例不再触发Bug（重新运行阶段1的循环）
回归测试通过（或记录无合适切入点的情况）
所有
```
[DEBUG-...]
```
插桩已移除（搜索前缀即可）
临时原型已删除（或移至标记明确的调试目录）
在提交/PR消息中说明最终验证正确的假设——以便后续调试人员学习

**然后思考：什么可以预防这个Bug？**如果答案涉及架构变更（没有合适的测试切入点、调用者混乱、隐藏耦合），将具体信息移交至

/improve-codebase-architecture

技能。在修复完成后再提出建议——此时你比开始时拥有更多信息。