debugging-patterns

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Systematic Debugging

系统化调试

Overview

概述

Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
随机修复既浪费时间又会引入新Bug。快速补丁只会掩盖潜在问题。
核心原则: 在尝试修复前,必须找到根本原因。仅修复症状等同于失败。
违背此流程的字面要求,就是违背调试的核心精神。

The Iron Law

铁律

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
未完成根本原因调查,严禁修复问题
如果尚未完成第一阶段,不得提出修复方案。

Quick Five-Step Process (Reference Pattern)

快速五步流程(参考模式)

For rapid debugging, use this concise flow:
1. Capture error message and stack trace
2. Identify reproduction steps
3. Isolate the failure location
4. Implement minimal fix
5. Verify solution works
Debugging techniques:
  • Analyze error messages and logs
  • Check recent code changes
  • Form and test hypotheses
  • Add strategic debug logging
  • Inspect variable states
Root Cause Tracing Technique:
1. Observe symptom - Where does error manifest?
2. Find immediate cause - Which code produces the error?
3. Ask "What called this?" - Map call chain upward
4. Keep tracing up - Follow invalid data backward
5. Find original trigger - Where did problem actually start?
Never fix solely where errors appear—trace to the original trigger.
针对快速调试,可使用以下简洁流程:
1. 捕获错误信息和堆栈跟踪
2. 确定复现步骤
3. 定位故障发生位置
4. 实施最小化修复
5. 验证解决方案有效
调试技巧:
  • 分析错误信息和日志
  • 检查近期代码变更
  • 形成并验证假设
  • 添加针对性调试日志
  • 检查变量状态
根本原因追踪技巧:
1. 观察症状 - 错误出现在哪里?
2. 找到直接原因 - 哪段代码导致了错误?
3. 追问“谁调用了这段代码?” - 向上梳理调用链
4. 持续向上追踪 - 反向追溯无效数据的来源
5. 找到原始触发点 - 问题实际起源于何处?
绝不能仅修复错误出现的位置——要追溯到原始触发点。

LSP-Powered Root Cause Tracing

基于LSP的根本原因追踪

Use LSP to trace execution flow systematically:
Debugging NeedLSP ToolUsage
"Where is this function defined?"
lspGotoDefinition
Jump to source
"What calls this function?"
lspCallHierarchy(incoming)
Trace callers up
"What does this function call?"
lspCallHierarchy(outgoing)
Trace callees down
"All usages of this variable?"
lspFindReferences
Find all access points
Systematic Call Chain Tracing:
1. localSearchCode("errorFunction") → get file + lineHint
2. lspGotoDefinition(lineHint=N) → see implementation
3. lspCallHierarchy(incoming, lineHint=N) → who calls this?
4. For each caller: lspCallHierarchy(incoming) → trace up
5. Continue until you find the root cause
CRITICAL: Always get lineHint from localSearchCode first. Never guess line numbers.
For each issue provide:
  • Root cause explanation
  • Evidence supporting diagnosis
  • Specific code fix
  • Testing approach
  • Prevention recommendations
使用LSP系统地追踪执行流程:
调试需求LSP工具使用方法
“这个函数定义在哪里?”
lspGotoDefinition
跳转到源代码
“谁调用了这个函数?”
lspCallHierarchy(incoming)
向上追踪调用方
“这个函数调用了哪些内容?”
lspCallHierarchy(outgoing)
向下追踪被调用方
“这个变量的所有用法?”
lspFindReferences
找到所有访问点
系统化调用链追踪:
1. localSearchCode("errorFunction") → 获取文件和行号提示
2. lspGotoDefinition(lineHint=N) → 查看实现代码
3. lspCallHierarchy(incoming, lineHint=N) → 查看调用方
4. 针对每个调用方:lspCallHierarchy(incoming) → 继续向上追踪
5. 持续追踪直到找到根本原因
关键注意事项: 必须先通过localSearchCode获取行号提示,切勿猜测行号。
针对每个问题需提供:
  • 根本原因说明
  • 支持诊断的证据
  • 具体代码修复方案
  • 测试方法
  • 预防建议

Common Debugging Scenarios

常见调试场景

Build & Type Errors (Quick Reference)

构建与类型错误(快速参考)

Commands:
bash
npx tsc --noEmit --pretty          # TypeScript check
npm run build                       # Full build
npx eslint . --ext .ts,.tsx        # Lint check
Common Error → Fix Patterns:
Error PatternCauseFix
Parameter 'x' implicitly has 'any' type
Missing type annotationAdd
: Type
annotation
Object is possibly 'undefined'
Null safety violationAdd
?.
optional chaining or null check
Property 'x' does not exist on type
Missing propertyAdd to interface or fix typo
Cannot find module 'x'
Import path wrong or missing packageFix path or
npm install
Type 'string' is not assignable to 'number'
Type mismatchParse string or fix type
'await' only allowed in async function
Missing async keywordAdd
async
to function
JSX element 'X' has no corresponding closing tag
Malformed JSXFix tag structure
Module not found: Can't resolve
Path alias misconfiguredCheck tsconfig paths
Export 'X' was not found in 'Y'
Named export missingCheck export name/default
Minimal Diff Strategy:
  • Add type annotation where missing
  • Add null check where needed
  • Fix import path
  • DO NOT: Refactor, rename, or "improve" unrelated code
Build Error Priority:
LevelSymptomAction
🔴 CRITICALBuild completely brokenFix immediately
🟡 HIGHType errors in new codeFix before commit
🟢 MEDIUMLint warningsFix when possible
命令:
bash
npx tsc --noEmit --pretty          # TypeScript检查
npm run build                       # 完整构建
npx eslint . --ext .ts,.tsx        # 代码检查
常见错误→修复模式:
错误模式原因修复方案
Parameter 'x' implicitly has 'any' type
缺少类型注解添加
: Type
类型注解
Object is possibly 'undefined'
空安全违规添加
?.
可选链或空值检查
Property 'x' does not exist on type
属性缺失向接口添加属性或修正拼写错误
Cannot find module 'x'
导入路径错误或包缺失修正路径或执行
npm install
Type 'string' is not assignable to 'number'
类型不匹配解析字符串或修正类型
'await' only allowed in async function
缺少async关键字为函数添加
async
关键字
JSX element 'X' has no corresponding closing tag
JSX格式错误修正标签结构
Module not found: Can't resolve
路径别名配置错误检查tsconfig路径配置
Export 'X' was not found in 'Y'
命名导出缺失检查导出名称/默认导出设置
最小化差异策略:
  • 为缺失的位置添加类型注解
  • 在需要的地方添加空值检查
  • 修正导入路径
  • 严禁: 重构、重命名或“优化”无关代码
构建错误优先级:
级别症状操作
🔴 关键构建完全失败立即修复
🟡 高新代码中存在类型错误提交前修复
🟢 中代码检查警告尽可能修复

Test Failures

测试失败

1. Read FULL error message and stack trace
2. Identify which assertion failed and why
3. Check test setup - is environment correct?
4. Check test data - are mocks/fixtures correct?
5. Trace to source of unexpected value
1. 完整阅读错误信息和堆栈跟踪
2. 确定哪个断言失败及原因
3. 检查测试环境是否正确
4. 检查测试数据 - 模拟数据/测试用例是否正确?
5. 追溯异常值的来源

Runtime Errors

运行时错误

1. Capture full stack trace
2. Identify line that throws
3. Check what values are undefined/null
4. Trace backward to where bad value originated
5. Add validation at the source
1. 捕获完整堆栈跟踪
2. 定位抛出错误的代码行
3. 检查哪些值为undefined/null
4. 反向追溯错误值的起源
5. 在源头添加验证逻辑

"It worked before"

“之前还正常”问题

1. Use `git bisect` to find breaking commit
2. Compare change with previous working version
3. Identify what assumption changed
4. Fix at source of assumption violation
1. 使用`git bisect`定位导致问题的提交
2. 对比当前版本与之前正常版本的差异
3. 确定哪些假设条件发生了变化
4. 在假设条件被违反的源头修复问题

Intermittent Failures

间歇性失败

1. Look for race conditions
2. Check for shared mutable state
3. Examine async operation ordering
4. Look for timing dependencies
5. Add deterministic waits or proper synchronization
1. 排查竞态条件
2. 检查共享可变状态
3. 分析异步操作的执行顺序
4. 排查时间依赖问题
5. 添加确定性等待或正确的同步机制

Frontend Browser Errors

前端浏览器错误

1. Request clean console: AskUserQuestion → "F12 → Console → Clear → reproduce → Copy all"
2. Analyze grouped messages for repetition patterns
3. Check for hidden CORS errors (enable "Show CORS errors in console")
4. If insufficient: request user add console.log at suspected locations
5. Trace to source of unexpected value
1. 要求用户提供干净的控制台日志:向用户提问 → “按F12→控制台→清空→复现问题→复制所有内容”
2. 分析分组信息中的重复模式
3. 检查隐藏的CORS错误(启用“在控制台显示CORS错误”)
4. 如果信息不足:请求用户在可疑位置添加console.log
5. 追溯异常值的来源

Git Bisect (Finding Breaking Commit)

Git Bisect(定位问题提交)

When to use: "It worked before" scenarios.
bash
undefined
适用场景: “之前还正常”类问题。
bash
undefined

Start bisect

开始bisect

git bisect start
git bisect start

Mark current (broken) as bad

将当前(有问题)版本标记为bad

git bisect bad
git bisect bad

Mark known good commit (e.g., last release)

将已知正常的版本标记为good(例如上一个发布版本)

git bisect good v1.2.0
git bisect good v1.2.0

Git will checkout middle commit - test it

Git会检出中间版本 - 进行测试

npm test # or whatever reproduces the bug
npm test # 或其他可复现Bug的操作

Mark result

标记测试结果

git bisect good # if tests pass git bisect bad # if tests fail
git bisect good # 如果测试通过 git bisect bad # 如果测试失败

Repeat until git identifies the breaking commit

重复上述步骤直到Git定位到问题提交

Git will output: "abc123 is the first bad commit"

Git会输出:“abc123 is the first bad commit”

End bisect

结束bisect

git bisect reset

**Automate if you have a test:**
```bash
git bisect start
git bisect bad HEAD
git bisect good v1.2.0
git bisect run npm test -- --grep "failing test"
git bisect reset

**如果有自动化测试,可实现自动化:**
```bash
git bisect start
git bisect bad HEAD
git bisect good v1.2.0
git bisect run npm test -- --grep "failing test"

When to Use

适用场景

Use for ANY technical issue:
  • Test failures
  • Bugs in production
  • Unexpected behavior
  • Performance problems
  • Build failures
  • Integration issues
Use this ESPECIALLY when:
  • Under time pressure (emergencies make guessing tempting)
  • "Just one quick fix" seems obvious
  • You've already tried multiple fixes
  • Previous fix didn't work
  • You don't fully understand the issue
Don't skip when:
  • Issue seems simple (simple bugs have root causes too)
  • You're in a hurry (rushing guarantees rework)
  • Manager wants it fixed NOW (systematic is faster than thrashing)
适用于所有技术问题:
  • 测试失败
  • 生产环境Bug
  • 异常行为
  • 性能问题
  • 构建失败
  • 集成问题
尤其适用于以下场景:
  • 时间紧迫时(紧急情况下更容易凭猜测修复)
  • “只需快速修复一下”看似可行时
  • 已尝试多种修复方案但无效时
  • 之前的修复未解决问题时
  • 对问题理解不充分时
切勿跳过此流程的场景:
  • 问题看似简单(简单Bug也有根本原因)
  • 时间紧张(仓促修复必然导致返工)
  • 管理者要求立即修复(系统化调试比盲目尝试更快)

The Four Phases

四个阶段

You MUST complete each phase before proceeding to the next.
必须完成上一阶段后,才能进入下一阶段。

Phase 1: Root Cause Investigation

第一阶段:根本原因调查

BEFORE attempting ANY fix:
  1. Read Error Messages Carefully
    • Don't skip past errors or warnings
    • They often contain the exact solution
    • Read stack traces completely
    • Note line numbers, file paths, error codes
  2. Reproduce Consistently
    • Can you trigger it reliably?
    • What are the exact steps?
    • Does it happen every time?
    • If not reproducible → gather more data, don't guess
  3. Check Recent Changes
    • What changed that could cause this?
    • Git diff, recent commits
    • New dependencies, config changes
    • Environmental differences
  4. Gather Evidence in Multi-Component Systems
    WHEN system has multiple components (CI → build → signing, API → service → database):
    BEFORE proposing fixes, add diagnostic instrumentation:
    For EACH component boundary:
      - Log what data enters component
      - Log what data exits component
      - Verify environment/config propagation
      - Check state at each layer
    
    Run once to gather evidence showing WHERE it breaks
    THEN analyze evidence to identify failing component
    THEN investigate that specific component
    Example (multi-layer system):
    bash
    # Layer 1: Entry point
    echo "=== Input data: ==="
    echo "Request: ${REQUEST}"
    
    # Layer 2: Processing layer
    echo "=== After processing: ==="
    echo "Transformed: ${TRANSFORMED}"
    
    # Layer 3: Output layer
    echo "=== Final state: ==="
    echo "Result: ${RESULT}"
    This reveals: Which layer fails (input → processing ✓, processing → output ✗)
  5. Trace Data Flow
    WHEN error is deep in call stack:
    • Where does bad value originate?
    • What called this with bad value?
    • Keep tracing up until you find the source
    • Fix at source, not at symptom
在尝试任何修复前:
  1. 仔细阅读错误信息
    • 切勿忽略错误或警告
    • 错误信息通常包含明确的解决方案
    • 完整阅读堆栈跟踪
    • 记录行号、文件路径、错误代码
  2. 稳定复现问题
    • 能否可靠触发问题?
    • 具体步骤是什么?
    • 是否每次都会出现?
    • 如果无法复现 → 收集更多数据,切勿猜测
  3. 检查近期变更
    • 哪些变更可能导致了问题?
    • Git差异、近期提交
    • 新依赖、配置变更
    • 环境差异
  4. 在多组件系统中收集证据
    当系统包含多个组件时(CI→构建→签名,API→服务→数据库):
    在提出修复方案前,先添加诊断工具:
    针对每个组件边界:
      - 记录进入组件的数据
      - 记录离开组件的数据
      - 验证环境/配置的传递情况
      - 检查每一层的状态
    
    运行一次以收集证据,确定问题出在哪个环节
    然后分析证据定位故障组件
    再针对该组件展开调查
    示例(多层系统):
    bash
    # 第一层:入口
    echo "=== 输入数据: ==="
    echo "请求内容: ${REQUEST}"
    
    # 第二层:处理层
    echo "=== 处理后数据: ==="
    echo "转换结果: ${TRANSFORMED}"
    
    # 第三层:输出层
    echo "=== 最终状态: ==="
    echo "结果: ${RESULT}"
    此方法可揭示: 哪一层出现了故障(输入→处理正常,处理→输出异常)
  5. 追踪数据流
    当错误位于调用栈深处时:
    • 错误值起源于何处?
    • 谁传入了错误值?
    • 持续向上追踪直到找到源头
    • 在源头修复,而非仅修复症状

Phase 2: Pattern Analysis

第二阶段:模式分析

Find the pattern before fixing:
  1. Find Working Examples
    • Locate similar working code in same codebase
    • What works that's similar to what's broken?
  2. Compare Against References
    • If implementing pattern, read reference implementation COMPLETELY
    • Don't skim - read every line
    • Understand the pattern fully before applying
  3. Identify Differences
    • What's different between working and broken?
    • List every difference, however small
    • Don't assume "that can't matter"
  4. Understand Dependencies
    • What other components does this need?
    • What settings, config, environment?
    • What assumptions does it make?
修复前先找到模式:
  1. 寻找可正常运行的示例
    • 在同一代码库中找到类似的可正常运行代码
    • 哪些类似功能可以正常工作?
  2. 与参考实现对比
    • 如果是实现某种模式,需完整阅读参考实现代码
    • 切勿略读——要逐行阅读
    • 在应用前充分理解该模式
  3. 识别差异
    • 可正常运行的代码与故障代码有哪些不同?
    • 列出所有差异,无论多细微
    • 切勿假设“这个差异无关紧要”
  4. 理解依赖关系
    • 该功能依赖哪些其他组件?
    • 需要哪些设置、配置、环境?
    • 该功能有哪些假设条件?

Phase 3: Hypothesis and Testing

第三阶段:假设与测试

Scientific method:
  1. Form Single Hypothesis
    • State clearly: "I think X is the root cause because Y"
    • Write it down
    • Be specific, not vague
  2. Test Minimally
    • Make the SMALLEST possible change to test hypothesis
    • One variable at a time
    • Don't fix multiple things at once
  3. Verify Before Continuing
    • Did it work? Yes → Phase 4
    • Didn't work? Form NEW hypothesis
    • DON'T add more fixes on top
  4. When You Don't Know
    • Say "I don't understand X"
    • Don't pretend to know
    • Ask for help
    • Research more
采用科学方法:
  1. 形成单一假设
    • 清晰表述:“我认为X是根本原因,因为Y”
    • 将假设记录下来
    • 要具体,切勿模糊
  2. 最小化测试
    • 做出最小的变更以验证假设
    • 每次只变更一个变量
    • 切勿同时修复多个问题
  3. 验证后再推进
    • 测试是否通过?是→进入第四阶段
    • 未通过→形成新假设
    • 切勿在原有修复基础上叠加新修复
  4. 当你不确定时
    • 直接说“我不理解X”
    • 切勿不懂装懂
    • 寻求帮助
    • 深入研究

Hypothesis Quality Criteria

假设质量标准

Falsifiability Requirement: A good hypothesis can be proven wrong. If you can't design an experiment to disprove it, it's not useful.
Bad (unfalsifiable):
  • "Something is wrong with the state"
  • "The timing is off"
  • "There's a race condition somewhere"
Good (falsifiable):
  • "User state resets because component remounts when route changes"
  • "API call completes after unmount, causing state update on unmounted component"
  • "Two async operations modify same array without locking, causing data loss"
The difference: Specificity. Good hypotheses make specific, testable claims.
可证伪性要求: 一个好的假设必须可以被证明是错误的。如果无法设计实验来推翻它,这个假设就没有价值。
差的假设(不可证伪):
  • “状态有问题”
  • “时间不对”
  • “某处存在竞态条件”
好的假设(可证伪):
  • “用户状态重置是因为路由变化导致组件重新挂载”
  • “API调用在组件卸载后才完成,导致对已卸载组件进行状态更新”
  • “两个异步操作未加锁就修改同一个数组,导致数据丢失”
差异: 具体性。好的假设会做出具体、可测试的论断。

Hypothesis Confidence Scoring

假设置信度评分

Track multiple hypotheses with confidence levels:
H1: [hypothesis] — Confidence: [0-100]
    Evidence for: [what supports this]
    Evidence against: [what contradicts this]
    Next test: [what would raise or lower confidence]

H2: [hypothesis] — Confidence: [0-100]
    Evidence for: [...]
    Evidence against: [...]
    Next test: [...]

H3: [hypothesis] — Confidence: [0-100]
    Evidence for: [...]
    Evidence against: [...]
    Next test: [...]
Scoring guidance:
RangeMeaningAction
80-100Strong evidence, high certaintyProceed to fix
50-79Circumstantial, needs more dataRun "Next test"
0-49Speculation, weak evidenceDeprioritize or discard
Rules:
  • Always maintain 2-3 hypotheses until one reaches 80+
  • Update confidence after EVERY piece of new evidence
  • Never proceed to fix with highest hypothesis below 50
为多个假设设置置信度并跟踪:
H1: [假设内容] — 置信度: [0-100]
    支持证据: [支持该假设的依据]
    反对证据: [与该假设矛盾的依据]
    下一步测试: [可提升或降低置信度的测试]

H2: [假设内容] — 置信度: [0-100]
    支持证据: [...]
    反对证据: [...]
    下一步测试: [...]

H3: [假设内容] — 置信度: [0-100]
    支持证据: [...]
    反对证据: [...]
    下一步测试: [...]
评分指南:
范围含义操作
80-100证据充分,确定性高推进修复
50-79间接证据,需更多数据执行“下一步测试”
0-49猜测,证据不足降低优先级或放弃
规则:
  • 始终保持2-3个假设,直到其中一个的置信度达到80以上
  • 每获取一条新证据后更新置信度
  • 当最高置信度的假设低于50时,切勿推进修复

Cognitive Biases in Debugging

调试中的认知偏差

BiasTrapAntidote
ConfirmationOnly look for evidence supporting your hypothesis"What would prove me wrong?"
AnchoringFirst explanation becomes your anchorGenerate 3+ hypotheses before investigating any
AvailabilityRecent bugs → assume similar causeTreat each bug as novel until evidence suggests otherwise
Sunk CostSpent 2 hours on path, keep going despite evidenceEvery 30 min: "If fresh, would I take this path?"
偏差类型陷阱应对方法
确认偏差只寻找支持自己假设的证据反问“什么能证明我错了?”
锚定偏差第一个解释成为思维定式在深入调查前先生成3个以上假设
可得性偏差受近期遇到的Bug影响,假设类似原因除非有证据表明,否则将每个Bug视为新问题
沉没成本偏差已在某条路径上花费2小时,即使有证据证明错误仍继续每30分钟自问:“如果重新开始,我还会选这条路径吗?”

Meta-Debugging: Your Own Code

元调试:调试自己写的代码

When debugging code you wrote, you're fighting your own mental model.
Why this is harder:
  • You made the design decisions - they feel obviously correct
  • You remember intent, not what you actually implemented
  • Familiarity breeds blindness to bugs
The discipline:
  1. Treat your code as foreign - Read it as if someone else wrote it
  2. Question your design decisions - Your implementation choices are hypotheses, not facts
  3. Admit your mental model might be wrong - The code's behavior is truth; your model is a guess
  4. Prioritize code you touched - If you modified 100 lines and something breaks, those are prime suspects
The hardest admission: "I implemented this wrong." Not "requirements were unclear" - YOU made an error.
调试自己写的代码时,你在与自己的思维模型对抗。
难度更高的原因:
  • 你做出了设计决策——它们在你看来显然是正确的
  • 你记得自己的设计意图,但不记得实际实现的代码
  • 熟悉感会让你对Bug视而不见
应对原则:
  1. 将自己的代码视为他人编写的代码 - 像阅读陌生人的代码一样阅读它
  2. 质疑自己的设计决策 - 你的实现选择是假设,而非事实
  3. 承认自己的思维模型可能有误 - 代码的行为是事实;你的模型只是猜测
  4. 优先检查自己修改过的代码 - 如果你修改了100行代码后出现问题,这些代码是首要怀疑对象
最难的承认: “我实现错了。” 不是“需求不明确”——是你犯了错误。

When to Restart Investigation

何时重新开始调查

Consider starting over when:
  1. 2+ hours with no progress - You're likely tunnel-visioned
  2. 3+ "fixes" that didn't work - Your mental model is wrong
  3. You can't explain the current behavior - Don't add changes on top of confusion
  4. You're debugging the debugger - Something fundamental is wrong
  5. The fix works but you don't know why - This isn't fixed, this is luck
Restart protocol:
  1. Close all files and terminals
  2. Write down what you know for certain
  3. Write down what you've ruled out
  4. List new hypotheses (different from before)
  5. Begin again from Phase 1
出现以下情况时,考虑重新开始:
  1. 2小时以上无进展 - 你可能陷入了思维定式
  2. 3次以上“修复”均无效 - 你的思维模型存在错误
  3. 无法解释当前行为 - 切勿在困惑的基础上添加新变更
  4. 在调试调试工具 - 存在根本性问题
  5. 修复有效但你不知道原因 - 这不是真正的修复,只是运气
重启流程:
  1. 关闭所有文件和终端
  2. 写下你确定知道的信息
  3. 写下你已排除的可能性
  4. 列出新的假设(与之前不同)
  5. 从第一阶段重新开始

Phase 4: Implementation

第四阶段:实施修复

Fix the root cause, not the symptom:
  1. Create Failing Test Case
    • Simplest possible reproduction
    • Automated test if possible
    • One-off test script if no framework
    • MUST have before fixing
  2. Implement Single Fix
    • Address the root cause identified
    • ONE change at a time
    • No "while I'm here" improvements
    • No bundled refactoring
  3. Verify Fix
    • Test passes now?
    • No other tests broken?
    • Issue actually resolved?
  4. If Fix Doesn't Work
    • STOP
    • Count: How many fixes have you tried?
    • If < 3: Return to Phase 1, re-analyze with new information
    • If >= 3: STOP and question the architecture (step 5 below)
    • DON'T attempt Fix #4 without architectural discussion
  5. If 3+ Fixes Failed: Question Architecture
    Pattern indicating architectural problem:
    • Each fix reveals new shared state/coupling/problem in different place
    • Fixes require "massive refactoring" to implement
    • Each fix creates new symptoms elsewhere
    STOP and question fundamentals:
    • Is this pattern fundamentally sound?
    • Are we "sticking with it through sheer inertia"?
    • Should we refactor architecture vs. continue fixing symptoms?
    Discuss with your human partner before attempting more fixes
    This is NOT a failed hypothesis - this is a wrong architecture.
修复根本原因,而非仅修复症状:
  1. 编写失败的测试用例
    • 最简单的复现方式
    • 尽可能编写自动化测试
    • 如果没有测试框架,可编写一次性测试脚本
    • 必须在修复前完成
  2. 实施单一修复
    • 针对已定位的根本原因修复
    • 每次只做一处变更
    • 切勿顺便“优化”其他代码
    • 切勿同时进行重构
  3. 验证修复效果
    • 测试现在通过了吗?
    • 其他测试是否被破坏?
    • 问题是否真的解决了?
  4. 如果修复无效
    • 停止操作
    • 统计:你已经尝试了多少次修复?
    • 如果<3次:回到第一阶段,结合新信息重新分析
    • 如果≥3次:停止操作,质疑架构设计(见下文第5点)
    • 未经过架构讨论,切勿尝试第4次修复
  5. 如果3次以上修复均失败:质疑架构设计
    表明存在架构问题的模式:
    • 每次修复都会在其他地方暴露出新的共享状态/耦合/问题
    • 修复需要“大规模重构”才能实现
    • 每次修复都会在其他地方引发新症状
    停止操作,质疑基本原则:
    • 这个模式从根本上是否合理?
    • 我们是否“因惯性而坚持”?
    • 应该重构架构还是继续修复症状?
    在尝试更多修复前,与团队成员讨论
    这不是假设错误——而是架构设计错误。

Red Flags - STOP and Follow Process

危险信号 - 停止操作并遵循流程

If you catch yourself thinking:
  • "Quick fix for now, investigate later"
  • "Just try changing X and see if it works"
  • "Add multiple changes, run tests"
  • "Skip the test, I'll manually verify"
  • "It's probably X, let me fix that"
  • "I don't fully understand but this might work"
  • "Pattern says X but I'll adapt it differently"
  • "Here are the main problems: [lists fixes without investigation]"
  • Proposing solutions before tracing data flow
  • "One more fix attempt" (when already tried 2+)
  • Each fix reveals new problem in different place
ALL of these mean: STOP. Return to Phase 1.
If 3+ fixes failed: Question the architecture (see Phase 4.5)
如果你发现自己有以下想法:
  • “先快速修复,之后再调查原因”
  • “试试修改X看看能不能解决”
  • “同时做多处变更,然后运行测试”
  • “跳过测试,我手动验证”
  • “可能是X的问题,我来修复”
  • “我不完全理解,但这样可能有效”
  • “模式要求X,但我要做些调整”
  • “主要问题如下:[列出修复方案但未做调查]”
  • 未追踪数据流就提出解决方案
  • “再试一次修复”(已尝试2次以上)
  • 每次修复都会在其他地方暴露出新问题
以上所有想法都意味着:停止操作。回到第一阶段。
如果3次以上修复均失败: 质疑架构设计(见第四阶段第5点)

User's Signals You're Doing It Wrong

表明你方法错误的用户信号

Watch for these redirections:
  • "Is that not happening?" - You assumed without verifying
  • "Will it show us...?" - You should have added evidence gathering
  • "Stop guessing" - You're proposing fixes without understanding
  • "Ultrathink this" - Question fundamentals, not just symptoms
  • "We're stuck?" (frustrated) - Your approach isn't working
When you see these: STOP. Return to Phase 1.
注意以下反馈:
  • “不是这样的?” - 你未经验证就做出了假设
  • “能让我们看到...吗?” - 你应该先收集证据
  • “别猜了” - 你在未理解问题的情况下提出修复方案
  • “深入思考这个问题” - 要质疑根本原因,而非仅修复症状
  • “我们卡住了?”(语气沮丧) - 你的方法无效
遇到以上反馈:停止操作。回到第一阶段。

Rationalization Prevention

避免自我合理化

ExcuseReality
"Issue is simple, don't need process"Simple issues have root causes too. Process is fast for simple bugs.
"Emergency, no time for process"Systematic debugging is FASTER than guess-and-check thrashing.
"Just try this first, then investigate"First fix sets the pattern. Do it right from the start.
"I'll write test after confirming fix works"Untested fixes don't stick. Test first proves it.
"Multiple fixes at once saves time"Can't isolate what worked. Causes new bugs.
"Reference too long, I'll adapt the pattern"Partial understanding guarantees bugs. Read it completely.
"I see the problem, let me fix it"Seeing symptoms ≠ understanding root cause.
"One more fix attempt" (after 2+ failures)3+ failures = architectural problem. Question pattern, don't fix again.
借口事实
“问题很简单,不需要走流程”简单问题也有根本原因。流程处理简单Bug的速度很快。
“紧急情况,没时间走流程”系统化调试比盲目尝试更快。
“先试试这个,之后再调查”第一次修复会定下错误的基调。从一开始就应该做对。
“确认修复有效后再写测试”未测试的修复无法持久。先写测试可验证修复的有效性。
“同时修复多处可节省时间”无法确定哪部分起作用,还会引入新Bug。
“参考文档太长,我会调整模式”一知半解必然导致Bug。要完整阅读参考文档。
“我看到问题了,我来修复”看到症状≠理解根本原因。
“再试一次修复”(已尝试2次以上)3次以上失败=架构问题。要质疑模式,而非继续修复。

Quick Reference

快速参考

PhaseKey ActivitiesSuccess Criteria
1. Root CauseRead errors, reproduce, check changes, gather evidenceUnderstand WHAT and WHY
2. PatternFind working examples, compareIdentify differences
3. HypothesisForm theory, test minimallyConfirmed or new hypothesis
4. ImplementationCreate test, fix, verifyBug resolved, tests pass
阶段核心活动成功标准
1. 根本原因调查阅读错误信息、复现问题、检查变更、收集证据理解问题是什么及为什么会发生
2. 模式分析寻找可运行示例、对比差异识别关键差异
3. 假设验证形成理论、最小化测试假设被证实或形成新假设
4. 实施修复编写测试、修复问题、验证效果Bug解决,测试通过

When Process Reveals "No Root Cause"

当流程显示“无根本原因”时

If systematic investigation reveals issue is truly environmental, timing-dependent, or external:
  1. You've completed the process
  2. Document what you investigated
  3. Implement appropriate handling (retry, timeout, error message)
  4. Add monitoring/logging for future investigation
But: 95% of "no root cause" cases are incomplete investigation.
如果系统化调查发现问题确实是环境、时间依赖或外部因素导致的:
  1. 你已完成流程
  2. 记录你调查的内容
  3. 实施适当的处理逻辑(重试、超时、错误提示)
  4. 添加监控/日志以便未来调查
但注意: 95%的“无根本原因”案例都是因为调查不彻底。

Output Format

输出格式

markdown
undefined
markdown
undefined

Bug Investigation

Bug调查

Phase 1: Evidence Gathered

第一阶段:收集到的证据

  • Error: [exact error message]
  • Stack trace: [relevant lines]
  • Reproduction: [steps to reproduce]
  • Recent changes: [commits/changes]
  • 错误信息: [完整错误信息]
  • 堆栈跟踪: [相关行]
  • 复现步骤: [具体步骤]
  • 近期变更: [提交记录/变更内容]

Phase 2: Pattern Analysis

第二阶段:模式分析

  • Working example: [similar working code]
  • Key differences: [what's different]
  • 可运行示例: [类似的可运行代码]
  • 关键差异: [具体差异]

Phase 3: Hypothesis

第三阶段:假设

  • Theory: [I think X because Y]
  • Test: [minimal change made]
  • Result: [confirmed/refuted]
  • 理论: [我认为X是根本原因,因为Y]
  • 测试: [最小化变更内容]
  • 结果: [已证实/已推翻]

Phase 4: Fix

第四阶段:修复方案

  • Root cause: [actual cause with evidence]
  • Change: [summary of fix]
  • File: [path:line]
  • Regression test: [test added]
  • 根本原因: [带证据的实际原因]
  • 变更内容: [修复摘要]
  • 文件: [路径:行号]
  • 回归测试: [添加的测试]

Verification

验证

  • Test command: [command] → exit 0
  • All tests: PASS
  • Functionality: Restored
undefined
  • 测试命令: [命令] → 退出码0
  • 所有测试: 通过
  • 功能: 已恢复正常
undefined