Hunt: Diagnose Before You Fix
A patch applied to a symptom creates a new bug somewhere else. Find the origin first.
Do not touch code until you can state the root cause in one sentence.
Orientation
Start by building a complete picture of what happened:
- Get the exact error, stack trace, and steps to reproduce. If anything is missing, ask one specific question.
- Run on the files named in the error or stack trace. If no specific files are mentioned, run it on the whole repo. Regressions almost always live in recent changes.
- Trace the execution path from the symptom backward: follow the data, not intuition.
- Reproduce it yourself. If you cannot reproduce it reliably, you do not understand it yet.
Before going further, commit to a testable claim:
"I believe the root cause is [X] because [evidence]."
The claim must name a specific file, function, line, or condition. "A state management issue" is not testable. "Stale cache in
at
because the dependency array is missing
" is testable. If you cannot be that specific, you do not have a hypothesis yet.
Known Failure Shapes
When a hypothesis is hard to form, match the symptom to a known shape:
| Shape | Clues | Where to look |
|---|
| Timing problem | Intermittent, load-dependent | Concurrent access to shared state |
| Missing guard | Crash on field or index access | Optional values, empty collections |
| Ordering bug | Works in isolation, fails in sequence | Event callbacks, transaction scope |
| Boundary failure | Timeout, wrong response shape | External APIs, service edges |
| Environment mismatch | Local pass, CI fail | Env vars, feature flags, seeded data |
| Stale value | Old data shown, refreshes on restart | In-memory cache, memoized result |
Also worth checking: existing TODOs near the failure site, and whether this area has been patched before. Recurring fixes in the same place mean the abstraction is wrong.
Pay attention to deflection. When a developer or user says "that part doesn't matter" or "don't worry about that area," treat it as a signal rather than a clearance. The area someone is reluctant to examine is often where the actual problem lives.
Confirm or Discard the Hypothesis
Add one targeted instrument: a log line, a failing assertion, or the smallest possible test that would fail if the hypothesis is correct. Run it.
If the evidence contradicts the hypothesis, discard it completely and re-orient with what was just learned. Do not preserve a hypothesis that the evidence disproves.
After three failed hypotheses, stop. Do not guess a fourth time. Instead, surface the situation to the user: what was checked, what was ruled out, what is still unknown. Ask whether to add more instrumentation, escalate, or approach the problem differently.
Same symptom after a fix is a hard stop. If the user reports the same symptom after a patch was applied, do not patch again. Treat it as a new investigation: the previous hypothesis was wrong. Re-read the execution path from scratch. Three rounds of "fixed but still broken" in the same area means the abstraction is wrong, not the specific line.
Never state environment details from memory. Before diagnosing OS, compiler, SDK, or tool version issues, run the detection command first:
,
,
,
, etc. State the actual output. A diagnosis built on an assumed version is not a diagnosis.
External tool or MCP failure: diagnose before switching. When an MCP tool, CLI dependency, or external API is unavailable or returning errors, do not immediately try an alternative method. First determine why it failed: is the server running? Is the API key valid or expired? Is the config pointing to the right endpoint? Is a proxy needed? Switching to a workaround without diagnosing the root cause leaves the original problem intact and wastes the next session too.
Stop and reassess if you catch yourself:
- Writing a fix before you have finished tracing the flow
- Thinking "let me just try this"
- Finding that each fix surfaces a new problem in a different module
Apply the Fix
Once the root cause is confirmed:
- Fix the cause, not the symptom it produces
- Keep the diff small: fewest files, fewest lines
- Write one regression test that fails on the unfixed code and passes after the fix. If the bug is non-testable (timing, environment-specific, UI rendering), document why and add the best available guard instead.
- For large projects, run the targeted subset first (tests in the affected module). Run the full suite only after the targeted tests pass. Paste the full output, no summaries.
- If the change touches more than 5 files, pause and confirm the scope with the user
Self-regulation: track how the fix is going. If you have reverted the same area twice, or if the current fix touches more than 3 files for what started as a single bug, stop. Do not keep patching. Describe what is known and unknown, and ask the user how to proceed. Continued patching past this point means the abstraction is wrong, not the code.
After the fix lands, consider whether a second layer of defense makes sense: validate the same condition at the call site, the service boundary, or in a test. A bug that cannot be introduced again is better than a bug that was fixed once.
Outcome
End with a short summary:
Root cause: [what was wrong, file:line]
Fix: [what changed, file:line]
Confirmed: [evidence or test that proves the fix]
Tests: [pass/fail count, regression test location]
Status is one of: resolved, resolved with caveats (state them), or blocked (state what is unknown and why).