debugging

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Systematic Debugging

系统化调试

Evidence-based investigation -> root cause -> verified fix.

基于证据的调查 -> 根本原因 -> 已验证的修复。

Steps

步骤

Load the
```
outfitter:maintain-tasks
```
skill for stage tracking
Collect evidence (reproduce, gather symptoms)
Isolate variables (narrow scope)
Formulate and test hypotheses
Implement fix with failing test first
Verify fix resolves the issue

For formal incident investigation requiring RCA documentation, use

find-root-causes

skill instead (it loads this skill and adds formal RCA methodology).

<when_to_use>

Bugs, errors, exceptions, crashes
Unexpected behavior or wrong results
Failing tests (unit, integration, e2e)
Intermittent or timing-dependent failures
Performance issues (slow, memory leaks, high CPU)
Integration failures (API, database, external services)

NOT for: obvious fixes, feature requests, architecture planning

</when_to_use>

<iron_law>

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

Never propose solutions or "try this" without understanding root cause through systematic investigation.

</iron_law>

See Steps section for skill dependencies. Stages advance forward only.

Stage	Trigger	activeForm
Collect Evidence	Session start	"Collecting evidence"
Isolate Variables	Evidence gathered	"Isolating variables"
Formulate Hypotheses	Problem isolated	"Formulating hypotheses"
Test Hypothesis	Hypothesis formed	"Testing hypothesis"
Verify Fix	Fix identified	"Verifying fix"

Situational (insert when triggered):

Iterate -> Hypothesis disproven, loops back with new hypothesis

Workflow:

Start: "Collect Evidence" as
```
in_progress
```
Transition: Mark current
```
completed
```
, add next
```
in_progress
```
Failed hypothesis: Add "Iterate" task
Quick fixes: If root cause obvious from error, skip to "Verify Fix" (still create failing test)
Need more evidence: Add new evidence task (don't regress stages)
Circuit breaker: After 3 failed hypotheses -> escalate

</stages>

<quick_start>

Create "Collect Evidence" todo as
```
in_progress
```
Reproduce - exact steps to trigger consistently
Investigate - gather evidence about what's happening
Analyze - compare working vs broken, find differences
Test hypothesis - single specific hypothesis, minimal test
Implement - failing test first, then fix
Update todos on stage transitions

</quick_start>

<stage_1_root_cause>

Goal: Understand what's actually happening.

Transition: Mark complete when you have reproduction steps and initial evidence.

Read error messages completely

Stack traces top to bottom
Note file paths, line numbers, variable names
Look for "caused by" chains

Reproduce consistently

Document exact trigger steps
Note inputs that cause vs don't cause
Check if intermittent (timing, race conditions)
Verify in clean environment

Check recent changes

```
git diff
```
- what changed?
```
git log --since="yesterday"
```
- recent commits
Dependency updates
Config/environment changes

Gather evidence

Add logging at key points
Print variable values at transformations
Log function entry/exit with parameters
Capture timestamps for timing issues

Trace data flow backward

Where does bad value come from?
Track through transformations
Find first place it becomes wrong

Red flags (return to evidence gathering):

"I think maybe X is the problem"
"Let's try changing Y"
"It might be related to Z"
Starting to write code before understanding

</stage_1_root_cause>

<stage_2_pattern_analysis>

Goal: Learn from working code to understand broken code.

Transition: Mark complete when key differences identified.

Find working examples

Search for similar functionality that works
```
rg "pattern"
```
for similar patterns
Look for passing vs failing tests
Check git history for when it worked

Read references completely

Every line, not skimming
Full context
All dependencies/imports
Configuration and setup

Identify every difference

Line by line working vs broken
Different imports?
Different function signatures?
Different error handling?
Different data flow?
Different configuration?

Understand dependencies

Libraries/packages involved
Versions in use
External services
Shared state
Assumptions made

Questions to answer:

Why does working version work?
What's fundamentally different?
Edge cases working version handles?
Invariants working version maintains?

</stage_2_pattern_analysis>

<stage_3_hypothesis_testing>

Goal: Test one specific idea with minimal change.

Transition: Mark complete when specific, evidence-based hypothesis formed.

Form single hypothesis

Template: "X is root cause because Y"
Must explain all symptoms
Must be testable with small change
Must be based on evidence from stages 1-2

Design minimal test

Smallest change to test hypothesis
Change ONE variable
Preserve everything else
Make reversible

Execute and verify

Apply change
Run reproduction steps
Observe carefully
Document results

Outcomes:

Fixed: Confirm across all cases, proceed to Verify Fix
Not fixed: Mark complete, add "Iterate", form NEW hypothesis
Partially fixed: Add "Iterate" for remaining issues
Never: Random variations hoping one works

Bad hypotheses (too vague):

"Maybe it's a race condition"
"Could be caching or permissions"
"Probably something with the database"

Good hypotheses (specific, testable):

"Fails because expects number but receives string when API returns empty"
"Race condition: fetchData() called before initializeClient() completes"
"Memory leak: event listeners in useEffect never removed in cleanup"

</stage_3_hypothesis_testing>

<stage_4_implementation>

Goal: Fix root cause permanently with verification.

Transition: Root cause confirmed, ready for permanent fix.

Create failing test

Write test reproducing bug
Verify fails before fix
Should pass after fix
Captures exact broken scenario

Implement single fix

Address identified root cause
No additional "improvements"
No refactoring "while you're there"
Just fix the problem

Verify fix

Failing test now passes
Existing tests still pass
Manual reproduction no longer triggers bug
No new errors/warnings

Circuit breaker If 3+ fixes tried without success: STOP

Problem isn't hypothesis - problem is architecture
May be using wrong pattern entirely
Escalate or redesign

After fixing:

Mark "Verify Fix" completed
Add defensive validation
Document root cause
Consider similar bugs elsewhere

</stage_4_implementation>

<red_flags>

STOP and return to Stage 1 if you catch yourself:

"Quick fix for now, investigate later"
"Just try changing X and see"
"I don't fully understand but this might work"
"One more fix attempt" (already tried 2+)
"Let me try a few different things"
Proposing solutions before gathering evidence
Skipping failing test case
Fixing symptoms instead of root cause

ALL mean: STOP. Add new "Collect Evidence" task.

</red_flags>

When to escalate:

After 3 failed fix attempts - architecture may be wrong
No clear reproduction - need more context/access
External system issues - need vendor/team involvement
Security implications - need security expertise
Data corruption risks - need backup/recovery planning

</escalation> <completion>

Before claiming "fixed":

Root cause identified with evidence
Failing test case created
Fix addresses root cause only
Test now passes
All existing tests pass
Manual reproduction no longer triggers bug
No new warnings/errors
Root cause documented
Prevention measures considered
"Verify Fix" marked completed

Understanding the bug is more valuable than fixing it quickly.

</completion> <rules>

ALWAYS:

Create "Collect Evidence" todo at session start
Follow four-stage framework
Update todos on stage transitions
Create failing test before fix
Test single hypothesis at a time
Document root cause after fix
Mark "Verify Fix" complete only after tests pass

NEVER:

Propose fixes without understanding root cause
Skip evidence gathering
Test multiple hypotheses simultaneously
Skip failing test case
Fix symptoms instead of root cause
Continue after 3 failed fixes without escalation
Regress stages - add new tasks if needed

</rules> <references>

playbooks.md - bug-type specific investigations
evidence-patterns.md - diagnostic techniques
reproduction.md - reproduction techniques
integration.md - workflow integration, anti-patterns

</references>

加载
```
outfitter:maintain-tasks
```
Skill以进行阶段追踪
收集证据（复现问题、收集症状）
隔离变量（缩小范围）
提出并测试假设
先编写失败测试再实施修复
验证修复是否解决问题

若需要进行需RCA文档的正式事件调查，请改用

find-root-causes

Skill（它会加载此Skill并添加正式的RCA方法论）。

<when_to_use>

Bug、错误、异常、崩溃
意外行为或错误结果
测试失败（单元测试、集成测试、E2E测试）
间歇性或依赖时序的故障
性能问题（缓慢、内存泄漏、高CPU占用）
集成失败（API、数据库、外部服务）

不适用场景：明显的修复、功能需求、架构规划

</when_to_use>

<iron_law>

未进行根本原因调查前，绝不修复

在通过系统化调查理解根本原因之前，切勿提出解决方案或"尝试这个"。

</iron_law>

阶段依赖请查看步骤部分。阶段仅可向前推进。

阶段	触发条件	活动状态
收集证据	会话开始	"收集证据中"
隔离变量	证据收集完成	"隔离变量中"
提出假设	问题已隔离	"提出假设中"
测试假设	假设已形成	"测试假设中"
验证修复	修复方案已确定	"验证修复中"

特殊情况（触发时插入）：

迭代 -> 假设不成立，带着新假设返回上一环节

工作流：

开始：将"收集证据"标记为
```
in_progress
```
过渡：标记当前阶段为
```
completed
```
，将下一阶段标记为
```
in_progress
```
假设不成立：添加"迭代"任务
快速修复：若从错误信息中可直接明确根本原因，可跳过至"验证修复"阶段（仍需创建失败测试）
需要更多证据：添加新的证据收集任务（不要回退阶段）
熔断机制：若3次假设均失败 -> 升级问题

</stages>

<quick_start>

创建"收集证据"待办事项并标记为
```
in_progress
```
复现问题 - 记录触发问题的精确步骤
调查 - 收集当前发生的情况的证据
分析 - 对比正常工作与故障状态，找出差异
测试假设 - 单个明确的假设，最小化测试
实施 - 先编写失败测试，再进行修复
阶段过渡时更新待办事项

</quick_start>

<stage_1_root_cause>

目标：理解实际发生的情况。

过渡条件：当你有了复现步骤和初始证据时，标记此阶段完成。

完整阅读错误信息

从顶至底查看堆栈跟踪
记录文件路径、行号、变量名
查找"caused by"链

稳定复现问题

记录精确的触发步骤
记录导致问题和不导致问题的输入
检查是否为间歇性问题（时序、竞态条件）
在干净环境中验证

检查近期变更

```
git diff
```
- 哪些内容发生了变化？
```
git log --since="yesterday"
```
- 近期提交记录
依赖更新
配置/环境变更

收集证据

在关键位置添加日志
在数据转换时打印变量值
记录函数的进入/退出及参数
为时序问题捕获时间戳

反向追踪数据流

错误值来自哪里？
跟踪数据转换过程
找到值首次出错的位置

危险信号（返回证据收集阶段）：

"我觉得可能X是问题所在"
"我们试试修改Y"
"这可能和Z有关"
在理解问题前就开始编写代码

</stage_1_root_cause>

<stage_2_pattern_analysis>

目标：从正常工作的代码中学习，以理解故障代码。

过渡条件：当找出关键差异时，标记此阶段完成。

找到正常工作的示例

搜索功能相似且正常工作的代码
使用
```
rg "pattern"
```
查找相似模式
对比通过与失败的测试
查看git历史记录，找到它正常工作的时期

完整阅读参考资料

逐行阅读，不要略读
完整上下文
所有依赖/导入
配置和设置

找出所有差异

逐行对比正常与故障代码
导入不同？
函数签名不同？
错误处理不同？
数据流不同？
配置不同？

理解依赖关系

涉及的库/包
使用的版本
外部服务
共享状态
做出的假设

需要回答的问题：

为什么正常版本可以工作？
根本差异是什么？
正常版本处理了哪些边缘情况？
正常版本维护了哪些不变量？

</stage_2_pattern_analysis>

<stage_3_hypothesis_testing>

目标：通过最小变更测试单个明确的想法。

过渡条件：当形成明确的、基于证据的假设时，标记此阶段完成。

形成单个假设

模板："X是根本原因，因为Y"
必须解释所有症状
必须可通过小变更测试
必须基于阶段1-2的证据

设计最小化测试

最小的变更以测试假设
仅改变一个变量
保留其他所有内容
可回滚

执行并验证

应用变更
执行复现步骤
仔细观察
记录结果

结果：

修复成功：在所有场景中确认，进入验证修复阶段
未修复：标记此阶段完成，添加"迭代"任务，形成新假设
部分修复：添加"迭代"任务处理剩余问题
禁止：随机尝试不同的修改，寄希望于其中一个能解决问题

糟糕的假设（过于模糊）：

"可能是竞态条件"
"可能是缓存或权限问题"
"大概和数据库有关"

好的假设（明确、可测试）：

"失败原因是API返回空值时，代码期望数字但收到字符串"
"竞态条件：fetchData()在initializeClient()完成前被调用"
"内存泄漏：useEffect中的事件监听器在清理时未被移除"

</stage_3_hypothesis_testing>

<stage_4_implementation>

目标：永久修复根本原因并进行验证。

过渡条件：根本原因已确认，准备进行永久修复。

编写失败测试

编写复现bug的测试
验证修复前测试失败
修复后测试应通过
捕获精确的故障场景

实施单个修复

解决已识别的根本原因
不添加额外的"改进"
不顺便进行重构
只修复问题

验证修复

失败测试现在通过
现有测试仍通过
手动复现不再触发bug
无新的错误/警告

熔断机制 若尝试3次以上修复均未成功：停止

问题不在于假设——而在于架构
可能完全使用了错误的模式
升级问题或重新设计

修复完成后：

标记"验证修复"阶段完成
添加防御性验证
记录根本原因
考虑其他地方是否存在类似bug

</stage_4_implementation>

<red_flags>

若你发现自己出现以下行为，请停止并返回阶段1：

"先临时修复，之后再调查"
"试试修改X看看"
"我不完全理解，但这可能有用"
"再试一次修复"（已尝试2次以上）
"我试试几个不同的修改"
在收集证据前就提出解决方案
跳过失败测试用例
修复症状而非根本原因

以上所有情况均意味着：停止。添加新的"收集证据"任务。

</red_flags>

何时升级问题：

尝试3次修复均失败 - 架构可能存在问题
无法明确复现问题 - 需要更多上下文/权限
外部系统问题 - 需要供应商/团队介入
安全影响 - 需要安全专家参与
数据损坏风险 - 需要备份/恢复规划

</escalation> <completion>

在声称"已修复"前，请确认：

已通过证据识别根本原因
已创建失败测试用例
修复仅针对根本原因
测试现在通过
所有现有测试均通过
手动复现不再触发bug
无新的警告/错误
已记录根本原因
已考虑预防措施
"验证修复"阶段已标记为完成

理解bug比快速修复更有价值。

</completion> <rules>

必须始终：

在会话开始时创建"收集证据"待办事项
遵循四阶段框架
阶段过渡时更新待办事项
修复前编写失败测试
一次测试一个假设
修复后记录根本原因
仅在测试通过后标记"验证修复"阶段完成

绝对禁止：

在未理解根本原因前提出修复方案
跳过证据收集阶段
同时测试多个假设
跳过失败测试用例
修复症状而非根本原因
尝试3次修复失败后仍继续而不升级问题
回退阶段 - 若需要则添加新任务

</rules> <references>

playbooks.md - 特定bug类型的调查指南
evidence-patterns.md - 诊断技巧
reproduction.md - 问题复现技巧
integration.md - 工作流集成、反模式

</references>