shared-bug-investigation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBug Investigation - Scientific Method
Bug调查:科学方法
Apply scientific methodology to investigate and resolve software bugs systematically.
采用科学方法论系统性地调查并解决软件Bug。
Scientific Method Process
科学方法流程
- Observe - Gather data about the problem
- Hypothesize - Form testable explanations (ranked by likelihood)
- Experiment - Test hypotheses with controlled changes
- Analyze - Interpret results objectively
- Conclude - Identify root cause and validate fix
- 观察 - 收集问题相关数据
- 假设 - 形成可测试的解释(按可能性排序)
- 实验 - 通过受控变更测试假设
- 分析 - 客观解读结果
- 结论 - 确定根因并验证修复方案
Core Principles
核心原则
- Context first - Understand the project before investigating
- Hypothesis-driven - Never jump to solutions without forming testable hypotheses
- Isolate variables - Change one thing at a time
- Reproduce reliably - Can't fix what you can't reproduce
- Root causes over symptoms - Dig deeper than surface fixes
- Validate rigorously - Confirm fix resolves issue without regressions
- 先掌握背景 - 调查前先了解项目情况
- 基于假设驱动 - 未形成可测试假设前不要直接找解决方案
- 隔离变量 - 每次只变更一个因素
- 稳定复现 - 无法复现的问题难以修复
- 聚焦根因而非症状 - 不要只修复表面问题,要深入根源
- 严谨验证 - 确认修复方案解决问题且无回归
Investigation Workflow
调查工作流
Phase 1: Project Context (2-5 min)
阶段1:项目背景(2-5分钟)
Discover before investigating:
- Language & version (Python 3.11, Java 17, Go 1.21, etc.)
- Build system (Gradle, npm, Cargo, Make, etc.)
- Key dependencies & frameworks
- Architecture pattern (MVC, microservices, etc.)
- Testing setup
Quick discovery:
bash
undefined调查前先了解:
- 语言及版本(Python 3.11、Java 17、Go 1.21等)
- 构建系统(Gradle、npm、Cargo、Make等)
- 关键依赖与框架
- 架构模式(MVC、微服务等)
- 测试环境设置
快速排查命令:
bash
undefinedFind package managers
查找包管理配置文件
view package.json / requirements.txt / Cargo.toml / pom.xml
view package.json / requirements.txt / Cargo.toml / pom.xml
Check config files
查看配置文件
view .env / config.yml / settings.py
view .env / config.yml / settings.py
Identify entry points
识别入口文件
view main.* / index.* / app.*
**Output: One-line context**Python 3.11, Flask API, PostgreSQL, pytest, Docker
undefinedview main.* / index.* / app.*
**输出:单行背景总结**Python 3.11, Flask API, PostgreSQL, pytest, Docker
undefinedPhase 2: Problem Definition
阶段2:问题定义
Gather:
- Error messages (full text, codes)
- Stack traces / logs
- Steps to reproduce
- Expected vs actual behavior
- Environment (OS, version, config)
- Reproducibility (always/sometimes/rare)
Document:
Bug: [Short description]
Reproduces: [Always/Sometimes/Unable]
Error: [Key error message]
Steps:
1. [Action 1]
2. [Action 2]
3. [Failure occurs]
Expected: [What should happen]
Actual: [What happens]收集信息:
- 错误信息(完整文本、错误码)
- 堆栈跟踪/日志
- 复现步骤
- 预期与实际行为对比
- 运行环境(操作系统、版本、配置)
- 复现概率(总是/有时/极少)
文档记录:
Bug: [简短描述]
复现概率: [总是/有时/无法复现]
错误信息: [关键错误内容]
步骤:
1. [操作1]
2. [操作2]
3. [出现故障]
预期结果: [应该发生的情况]
实际结果: [实际发生的情况]Phase 3: Hypotheses (Ranked)
阶段3:假设(按可能性排序)
Phase 3: Hypotheses (Ranked)
阶段3:假设(按可能性排序)
Form 2-4 testable hypotheses, ranked by likelihood.
H1: [Most likely cause]
- Evidence for: [Why this is likely]
- Test: [How to prove/disprove]
- If true: [Expected result]
- If false: [Expected result]
H2: [Alternative cause]
- Evidence for: [Supporting observations]
- Test: [Falsifiable experiment]
Common categories:
- Logic errors (off-by-one, wrong operator, incorrect condition)
- State issues (race condition, uninitialized, stale data)
- Type/data (null/nil, type mismatch, parsing error)
- Concurrency (data race, deadlock, thread safety)
- Integration (API mismatch, version incompatibility)
- Environment (config, platform-specific, resource limits)
形成2-4个可测试的假设,按可能性从高到低排序。
假设1:[最可能的原因]
- 支持依据:[为何该原因可能性高]
- 测试方法:[如何验证或推翻假设]
- 若成立:[预期结果]
- 若不成立:[预期结果]
假设2:[备选原因]
- 支持依据:[相关观察结果]
- 测试方法:[可证伪的实验方案]
常见问题分类:
- 逻辑错误(边界值错误、运算符误用、条件判断错误)
- 状态问题(竞态条件、未初始化变量、数据过期)
- 类型/数据错误(空指针/空引用、类型不匹配、解析错误)
- 并发问题(数据竞争、死锁、线程安全)
- 集成问题(API不兼容、版本冲突)
- 环境问题(配置错误、平台特定问题、资源限制)
Phase 4: Experiment
阶段4:实验
For each hypothesis:
Test H1: [Hypothesis name]
- Change: [One variable to modify]
- Measure: [What to observe]
- Method: [Specific steps]
- Result: [Actual outcome]
- Conclusion: [Validated/Invalidated]
Techniques:
- Add logging at key points
- Use debugger breakpoints
- Binary search (remove half the code)
- Minimal reproduction (strip to essentials)
- Diff working vs broken states
- Isolate components
针对每个假设:
测试假设1:[假设名称]
- 变更:[要修改的单个变量]
- 观测指标:[需要观察的内容]
- 操作步骤:[具体执行步骤]
- 结果:[实际输出]
- 结论:[假设成立/不成立]
常用技巧:
- 在关键节点添加日志
- 使用调试器断点
- 二分排查(移除一半代码,测试Bug是否存在)
- 最小复现案例(精简至50行以内的可复现代码,排除干扰)
- 对比正常与异常状态的差异
- 隔离组件测试
Phase 5: Root Cause
阶段5:根因分析
Identified:
[Clear statement of actual cause]
Evidence:
[Chain from observation → hypothesis → validation]
Why it occurred:
- Immediate: [Technical reason]
- Contributing: [What enabled this]
- Systemic: [Deeper issue if any]
已确定根因:
[清晰描述实际原因]
证据链:
[从观察→假设→验证的完整链条]
问题产生的原因:
- 直接原因:[技术层面的具体原因]
- 间接原因:[导致问题发生的诱因]
- 系统性原因:[若存在更深层的流程或架构问题]
Phase 6: Solution & Validation
阶段6:解决方案与验证
Fix:
[Specific changes to make]
Why this works:
[Explain causal connection]
Validation:
- Reproduce bug (confirm failure)
- Apply fix
- Retest (confirm success)
- Test edge cases
- Run test suite (no regressions)
- Add test for this bug
Prevention:
- [Test to add]
- [Assertion to include]
- [Pattern to avoid]
修复方案:
[具体修改内容]
修复原理:
[解释修复方案与根因的因果关联]
验证流程:
- 复现Bug(确认故障存在)
- 应用修复方案
- 重新测试(确认问题解决)
- 测试边缘场景
- 运行全量测试套件(确保无回归)
- 新增针对该Bug的测试用例
预防措施:
- [需新增的测试用例]
- [需添加的断言]
- [需避免的编码模式]
Investigation Techniques
调查技巧
Binary Search: Remove half the code, test if bug persists. Repeat on failing half until isolated.
Minimal Reproduction: Strip to <50 lines that reproduce issue. Removes noise.
Differential Testing: Compare working vs broken (commits, versions, configs).
Strategic Logging: Add prints at key decision points to trace execution flow.
Rubber Duck: Explain code line-by-line aloud. Often reveals logic errors.
二分排查: 移除一半代码,测试Bug是否仍存在。在存在Bug的代码段重复此操作,直至定位问题。
最小复现案例: 将代码精简至50行以内的可复现代码,排除无关干扰。
差异测试: 对比正常版本与异常版本的差异(提交记录、版本、配置)。
策略性日志: 在关键决策节点添加打印语句,追踪执行流程。
橡皮鸭调试法: 逐行向他人(或虚拟对象)解释代码逻辑,常能发现隐藏的逻辑错误。
Common Bug Patterns (Language-Agnostic)
通用Bug模式(跨语言)
Logic Errors:
- Off-by-one: vs
i < ni <= n - Wrong operator: vs
&&,||vs=== - Negation errors: logic flipped
!condition
State Issues:
- Race conditions: concurrent access without synchronization
- Uninitialized: using variable before setting value
- Stale state: using outdated cached data
Type/Data:
- Null/nil dereference
- Type coercion errors
- Integer overflow
- Floating-point precision
Concurrency:
- Deadlock: mutual waiting for locks
- Data race: unsynchronized shared access
- Thread safety: non-thread-safe code on multiple threads
Integration:
- API contract mismatch
- Version incompatibility
- Missing dependencies
- Incorrect configuration
逻辑错误:
- 边界值错误:与
i < n混淆i <= n - 运算符误用:与
&&、||与==混淆= - 否定逻辑错误:逻辑反转错误
!condition
状态问题:
- 竞态条件:无同步机制的并发访问
- 未初始化:使用未赋值的变量
- 数据过期:使用缓存中的过期数据
类型/数据错误:
- 空指针/空引用解引用
- 类型转换错误
- 整数溢出
- 浮点数精度问题
并发问题:
- 死锁:线程间互相等待锁资源
- 数据竞争:无同步的共享资源访问
- 线程安全:非线程安全代码在多线程环境运行
集成问题:
- API契约不匹配
- 版本兼容性问题
- 依赖缺失
- 配置错误
Output Format
输出格式
undefinedundefinedBug Investigation: [Name]
Bug调查:[问题名称]
Context
项目背景
[One-line: Language, framework, architecture]
[单行总结:语言、框架、架构]
Problem
问题描述
Error: [Message]
Reproduces: [Always/Sometimes]
Steps: [1,2,3]
错误信息:[具体内容]
复现概率:[总是/有时]
复现步骤:[1,2,3]
Hypotheses
假设
H1: [Most likely] - Test by [method]
H2: [Alternative] - Test by [method]
假设1:[最可能原因] - 测试方法:[具体方案]
假设2:[备选原因] - 测试方法:[具体方案]
Investigation
调查过程
Tested H1: [Result - validated/invalidated]
[If needed] Tested H2: [Result]
已测试假设1:[结果 - 成立/不成立]
[若需要] 已测试假设2:[结果]
Root Cause
根因分析
[One sentence explanation]
Evidence: [What confirmed it]
[一句话解释]
证据:[验证依据]
Solution
解决方案
Fix: [Specific change]
Why it works: [Explanation]
Validated: [Tested successfully]
修复方案:[具体修改内容]
修复原理:[解释]
验证结果:[测试通过]
Prevention
预防措施
- Test added: [Description]
- Warning signs: [What to watch for]
undefined- 新增测试用例:[描述]
- 预警信号:[需要关注的异常现象]
undefinedQuick Decision Tree
快速决策树
Symptom → Likely Category:
- Intermittent failure → Concurrency/state
- Always fails same way → Logic error
- Null/nil crash → Type/data
- Specific environment only → Configuration
- Performance degradation → Resource/algorithm
- After dependency update → Integration
症状 → 可能的问题分类:
- 间歇性故障 → 并发/状态问题
- 稳定复现同一故障 → 逻辑错误
- 空指针/空引用崩溃 → 类型/数据错误
- 仅特定环境出现 → 配置问题
- 性能下降 → 资源/算法问题
- 依赖更新后出现 → 集成问题
Critical Reminders
重要提醒
- Start with context discovery
- Form hypotheses before coding
- Change ONE variable at a time
- Reproduce before fixing
- Validate fix rigorously
- Add regression test
- Document root cause
- 从了解项目背景开始
- 先形成假设再编写代码
- 每次只变更一个变量
- 修复前先稳定复现问题
- 严谨验证修复方案
- 新增回归测试用例
- 记录根因分析结果