shared-bug-investigation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Bug Investigation - Scientific Method

Bug调查：科学方法

Apply scientific methodology to investigate and resolve software bugs systematically.

采用科学方法论系统性地调查并解决软件Bug。

Scientific Method Process

科学方法流程

Observe - Gather data about the problem
Hypothesize - Form testable explanations (ranked by likelihood)
Experiment - Test hypotheses with controlled changes
Analyze - Interpret results objectively
Conclude - Identify root cause and validate fix

观察 - 收集问题相关数据
假设 - 形成可测试的解释（按可能性排序）
实验 - 通过受控变更测试假设
分析 - 客观解读结果
结论 - 确定根因并验证修复方案

Core Principles

核心原则

Context first - Understand the project before investigating
Hypothesis-driven - Never jump to solutions without forming testable hypotheses
Isolate variables - Change one thing at a time
Reproduce reliably - Can't fix what you can't reproduce
Root causes over symptoms - Dig deeper than surface fixes
Validate rigorously - Confirm fix resolves issue without regressions

先掌握背景 - 调查前先了解项目情况
基于假设驱动 - 未形成可测试假设前不要直接找解决方案
隔离变量 - 每次只变更一个因素
稳定复现 - 无法复现的问题难以修复
聚焦根因而非症状 - 不要只修复表面问题，要深入根源
严谨验证 - 确认修复方案解决问题且无回归

Investigation Workflow

调查工作流

Phase 1: Project Context (2-5 min)

阶段1：项目背景（2-5分钟）

Discover before investigating:

Language & version (Python 3.11, Java 17, Go 1.21, etc.)
Build system (Gradle, npm, Cargo, Make, etc.)
Key dependencies & frameworks
Architecture pattern (MVC, microservices, etc.)
Testing setup

Quick discovery:

bash

undefined

调查前先了解：

语言及版本（Python 3.11、Java 17、Go 1.21等）
构建系统（Gradle、npm、Cargo、Make等）
关键依赖与框架
架构模式（MVC、微服务等）
测试环境设置

快速排查命令：

bash

undefined

Find package managers

查找包管理配置文件

view package.json / requirements.txt / Cargo.toml / pom.xml

Check config files

查看配置文件

view .env / config.yml / settings.py

Identify entry points

识别入口文件

view main.* / index.* / app.*


**Output: One-line context**

Python 3.11, Flask API, PostgreSQL, pytest, Docker

undefined

view main.* / index.* / app.*


**输出：单行背景总结**

Python 3.11, Flask API, PostgreSQL, pytest, Docker

undefined

Phase 2: Problem Definition

阶段2：问题定义

Gather:

Error messages (full text, codes)
Stack traces / logs
Steps to reproduce
Expected vs actual behavior
Environment (OS, version, config)
Reproducibility (always/sometimes/rare)

Document:

Bug: [Short description]
Reproduces: [Always/Sometimes/Unable]
Error: [Key error message]
Steps:
1. [Action 1]
2. [Action 2]
3. [Failure occurs]
Expected: [What should happen]
Actual: [What happens]

收集信息：

错误信息（完整文本、错误码）
堆栈跟踪/日志
复现步骤
预期与实际行为对比
运行环境（操作系统、版本、配置）
复现概率（总是/有时/极少）

文档记录：

Bug: [简短描述]
复现概率: [总是/有时/无法复现]
错误信息: [关键错误内容]
步骤:
1. [操作1]
2. [操作2]
3. [出现故障]
预期结果: [应该发生的情况]
实际结果: [实际发生的情况]

Phase 3: Hypotheses (Ranked)

阶段3：假设（按可能性排序）

Phase 3: Hypotheses (Ranked)

阶段3：假设（按可能性排序）

Form 2-4 testable hypotheses, ranked by likelihood.

H1: [Most likely cause]

Evidence for: [Why this is likely]
Test: [How to prove/disprove]
If true: [Expected result]
If false: [Expected result]

H2: [Alternative cause]

Evidence for: [Supporting observations]
Test: [Falsifiable experiment]

Common categories:

Logic errors (off-by-one, wrong operator, incorrect condition)
State issues (race condition, uninitialized, stale data)
Type/data (null/nil, type mismatch, parsing error)
Concurrency (data race, deadlock, thread safety)
Integration (API mismatch, version incompatibility)
Environment (config, platform-specific, resource limits)

形成2-4个可测试的假设，按可能性从高到低排序。

假设1：[最可能的原因]

支持依据：[为何该原因可能性高]
测试方法：[如何验证或推翻假设]
若成立：[预期结果]
若不成立：[预期结果]

假设2：[备选原因]

支持依据：[相关观察结果]
测试方法：[可证伪的实验方案]

常见问题分类：

逻辑错误（边界值错误、运算符误用、条件判断错误）
状态问题（竞态条件、未初始化变量、数据过期）
类型/数据错误（空指针/空引用、类型不匹配、解析错误）
并发问题（数据竞争、死锁、线程安全）
集成问题（API不兼容、版本冲突）
环境问题（配置错误、平台特定问题、资源限制）

Phase 4: Experiment

阶段4：实验

For each hypothesis:

Test H1: [Hypothesis name]

Change: [One variable to modify]
Measure: [What to observe]
Method: [Specific steps]
Result: [Actual outcome]
Conclusion: [Validated/Invalidated]

Techniques:

Add logging at key points
Use debugger breakpoints
Binary search (remove half the code)
Minimal reproduction (strip to essentials)
Diff working vs broken states
Isolate components

针对每个假设：

测试假设1：[假设名称]

变更：[要修改的单个变量]
观测指标：[需要观察的内容]
操作步骤：[具体执行步骤]
结果：[实际输出]
结论：[假设成立/不成立]

常用技巧：

在关键节点添加日志
使用调试器断点
二分排查（移除一半代码，测试Bug是否存在）
最小复现案例（精简至50行以内的可复现代码，排除干扰）
对比正常与异常状态的差异
隔离组件测试

Phase 5: Root Cause

阶段5：根因分析

Identified: [Clear statement of actual cause]

Evidence: [Chain from observation → hypothesis → validation]

Why it occurred:

Immediate: [Technical reason]
Contributing: [What enabled this]
Systemic: [Deeper issue if any]

已确定根因： [清晰描述实际原因]

证据链： [从观察→假设→验证的完整链条]

问题产生的原因：

直接原因：[技术层面的具体原因]
间接原因：[导致问题发生的诱因]
系统性原因：[若存在更深层的流程或架构问题]

Phase 6: Solution & Validation

阶段6：解决方案与验证

Fix: [Specific changes to make]

Why this works: [Explain causal connection]

Validation:

Reproduce bug (confirm failure)
Apply fix
Retest (confirm success)
Test edge cases
Run test suite (no regressions)
Add test for this bug

Prevention:

[Test to add]
[Assertion to include]
[Pattern to avoid]

修复方案： [具体修改内容]

修复原理： [解释修复方案与根因的因果关联]

验证流程：

复现Bug（确认故障存在）
应用修复方案
重新测试（确认问题解决）
测试边缘场景
运行全量测试套件（确保无回归）
新增针对该Bug的测试用例

预防措施：

[需新增的测试用例]
[需添加的断言]
[需避免的编码模式]

Investigation Techniques

调查技巧

Binary Search: Remove half the code, test if bug persists. Repeat on failing half until isolated.

Minimal Reproduction: Strip to <50 lines that reproduce issue. Removes noise.

Differential Testing: Compare working vs broken (commits, versions, configs).

Strategic Logging: Add prints at key decision points to trace execution flow.

Rubber Duck: Explain code line-by-line aloud. Often reveals logic errors.

二分排查： 移除一半代码，测试Bug是否仍存在。在存在Bug的代码段重复此操作，直至定位问题。

最小复现案例： 将代码精简至50行以内的可复现代码，排除无关干扰。

差异测试： 对比正常版本与异常版本的差异（提交记录、版本、配置）。

策略性日志： 在关键决策节点添加打印语句，追踪执行流程。

橡皮鸭调试法： 逐行向他人（或虚拟对象）解释代码逻辑，常能发现隐藏的逻辑错误。

Common Bug Patterns (Language-Agnostic)

通用Bug模式（跨语言）

Logic Errors:

Off-by-one:
```
i < n
```
vs
```
i <= n
```
Wrong operator:
```
&&
```
vs
```
||
```
,
```
==
```
vs
```
=
```
Negation errors:
```
!condition
```
logic flipped

State Issues:

Race conditions: concurrent access without synchronization
Uninitialized: using variable before setting value
Stale state: using outdated cached data

Type/Data:

Null/nil dereference
Type coercion errors
Integer overflow
Floating-point precision

Concurrency:

Deadlock: mutual waiting for locks
Data race: unsynchronized shared access
Thread safety: non-thread-safe code on multiple threads

Integration:

API contract mismatch
Version incompatibility
Missing dependencies
Incorrect configuration

逻辑错误：

边界值错误：
```
i < n
```
与
```
i <= n
```
混淆
运算符误用：
```
&&
```
与
```
||
```
、
```
==
```
与
```
=
```
混淆
否定逻辑错误：
```
!condition
```
逻辑反转错误

状态问题：

竞态条件：无同步机制的并发访问
未初始化：使用未赋值的变量
数据过期：使用缓存中的过期数据

类型/数据错误：

空指针/空引用解引用
类型转换错误
整数溢出
浮点数精度问题

并发问题：

死锁：线程间互相等待锁资源
数据竞争：无同步的共享资源访问
线程安全：非线程安全代码在多线程环境运行

集成问题：

API契约不匹配
版本兼容性问题
依赖缺失
配置错误

Output Format

输出格式

undefined

undefined

Bug Investigation: [Name]

Bug调查：[问题名称]

Context

项目背景

[One-line: Language, framework, architecture]

[单行总结：语言、框架、架构]

Problem

问题描述

Error: [Message] Reproduces: [Always/Sometimes] Steps: [1,2,3]

错误信息：[具体内容] 复现概率：[总是/有时] 复现步骤：[1,2,3]

Hypotheses

假设

H1: [Most likely] - Test by [method] H2: [Alternative] - Test by [method]

假设1：[最可能原因] - 测试方法：[具体方案] 假设2：[备选原因] - 测试方法：[具体方案]

Investigation

调查过程

Tested H1: [Result - validated/invalidated] [If needed] Tested H2: [Result]

已测试假设1：[结果 - 成立/不成立] [若需要] 已测试假设2：[结果]

Root Cause

根因分析

[One sentence explanation] Evidence: [What confirmed it]

[一句话解释] 证据：[验证依据]

Solution

解决方案

Fix: [Specific change] Why it works: [Explanation] Validated: [Tested successfully]

修复方案：[具体修改内容] 修复原理：[解释] 验证结果：[测试通过]

Prevention

预防措施

Test added: [Description]
Warning signs: [What to watch for]

undefined

新增测试用例：[描述]
预警信号：[需要关注的异常现象]

undefined

Quick Decision Tree

快速决策树

Symptom → Likely Category:

Intermittent failure → Concurrency/state
Always fails same way → Logic error
Null/nil crash → Type/data
Specific environment only → Configuration
Performance degradation → Resource/algorithm
After dependency update → Integration

症状 → 可能的问题分类：

间歇性故障 → 并发/状态问题
稳定复现同一故障 → 逻辑错误
空指针/空引用崩溃 → 类型/数据错误
仅特定环境出现 → 配置问题
性能下降 → 资源/算法问题
依赖更新后出现 → 集成问题

Critical Reminders

重要提醒

Start with context discovery
Form hypotheses before coding
Change ONE variable at a time
Reproduce before fixing
Validate fix rigorously
Add regression test
Document root cause

从了解项目背景开始
先形成假设再编写代码
每次只变更一个变量
修复前先稳定复现问题
严谨验证修复方案
新增回归测试用例
记录根因分析结果