debugging-with-tools
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<skill_overview>
Random fixes waste time and create new bugs. Always use tools to understand root cause BEFORE attempting fixes. Symptom fixes are failure.
</skill_overview>
<rigidity_level>
MEDIUM FREEDOM - Must complete investigation phases (tools → hypothesis → test) before fixing.
Can adapt tool choice to language/context. Never skip investigation or guess at fixes.
</rigidity_level>
<quick_reference>
| Phase | Tools to Use | Output |
|---|---|---|
| 1. Investigate | Error messages, internet-researcher agent, debugger, codebase-investigator | Root cause understanding |
| 2. Hypothesize | Form theory based on evidence (not guesses) | Testable hypothesis |
| 3. Test | Validate hypothesis with minimal change | Confirms or rejects theory |
| 4. Fix | Implement proper fix for root cause | Problem solved permanently |
FORBIDDEN: Skip investigation → guess at fix → hope it works
REQUIRED: Tools → evidence → hypothesis → test → fix
Key agents:
- - Search error messages, known bugs, solutions
internet-researcher - - Understand code structure, find related code
codebase-investigator - - Run tests without output pollution
test-runner
</quick_reference>
<when_to_use>
Use for ANY technical issue:
- Test failures
- Bugs in production or development
- Unexpected behavior
- Build failures
- Integration issues
- Performance problems
ESPECIALLY when:
- "Just one quick fix" seems obvious
- Under time pressure (emergencies make guessing tempting)
- Error message is unclear
- Previous fix didn't work </when_to_use>
<the_process>
<skill_overview>
随机修复不仅浪费时间,还会引入新Bug。在尝试修复前,务必先借助工具明确问题的根本原因。仅修复表面症状是无效的。
</skill_overview>
<rigidity_level>
中等自由度——在修复前必须完成调查阶段(工具使用→假设提出→测试验证)。
可根据语言/场景调整工具选择,但绝不能跳过调查环节或凭猜测修复。
</rigidity_level>
<quick_reference>
| 阶段 | 适用工具 | 输出结果 |
|---|---|---|
| 1. 调查 | 错误信息、internet-researcher Agent、调试器、codebase-investigator | 明确根本原因 |
| 2. 提出假设 | 基于证据形成理论(而非猜测) | 可验证的假设 |
| 3. 测试验证 | 通过最小改动验证假设 | 确认或推翻理论 |
| 4. 实施修复 | 针对根本原因执行合理修复 | 永久解决问题 |
禁止行为: 跳过调查→凭猜测修复→寄希望于修复生效
强制要求: 工具使用→证据收集→假设提出→测试验证→实施修复
核心Agent:
- - 搜索错误信息、已知Bug及解决方案
internet-researcher - - 理解代码结构、定位关联代码
codebase-investigator - - 运行测试且避免输出冗余信息
test-runner
</quick_reference>
<when_to_use>
适用于所有技术问题:
- 测试失败
- 生产或开发环境中的Bug
- 异常行为
- 构建失败
- 集成问题
- 性能问题
尤其在以下场景使用:
- 看似有“快速修复方案”时
- 处于时间压力下(紧急情况容易让人凭猜测修复)
- 错误信息不明确时
- 之前的修复无效时 </when_to_use>
<the_process>
Phase 1: Tool-Assisted Investigation
阶段1:工具辅助调查
BEFORE attempting ANY fix, gather evidence with tools:
在尝试任何修复前,先借助工具收集证据:
1. Read Complete Error Messages
1. 完整阅读错误信息
- Entire error message (not just first line)
- Complete stack trace (all frames)
- Line numbers, file paths, error codes
- Stack traces show exact execution path
- 完整的错误信息(而非仅第一行)
- 完整的栈追踪(所有调用帧)
- 行号、文件路径、错误代码
- 栈追踪可展示精确的执行路径
2. Search Internet FIRST (Use internet-researcher Agent)
2. 优先进行网络搜索(使用internet-researcher Agent)
Dispatch internet-researcher with:
"Search for error: [exact error message]
- Check Stack Overflow solutions
- Look for GitHub issues in [library] version [X]
- Find official documentation explaining this error
- Check if this is a known bug"What agent should find:
- Exact matches to your error
- Similar symptoms and solutions
- Known bugs in your dependency versions
- Workarounds that worked for others
调用internet-researcher时传入:
"Search for error: [exact error message]
- Check Stack Overflow solutions
- Look for GitHub issues in [library] version [X]
- Find official documentation explaining this error
- Check if this is a known bug"Agent需返回的内容:
- 与你的错误完全匹配的案例
- 类似症状及解决方案
- 依赖版本中的已知Bug
- 已被验证有效的临时解决方案
3. Use Debugger to Inspect State
3. 使用调试器检查运行状态
Claude cannot run debuggers directly. Instead:
Option A - Recommend debugger to user:
"Let's use lldb/gdb/DevTools to inspect state at error location.
Please run: [specific commands]
When breakpoint hits: [what to inspect]
Share output with me."Option B - Add instrumentation Claude can add:
rust
// Add logging
println!("DEBUG: var = {:?}, state = {:?}", var, state);
// Add assertions
assert!(condition, "Expected X but got {:?}", actual);Claude无法直接运行调试器,可通过以下方式处理:
方案A - 建议用户使用调试器:
"Let's use lldb/gdb/DevTools to inspect state at error location.
Please run: [specific commands]
When breakpoint hits: [what to inspect]
Share output with me."方案B - 添加Claude可插入的埋点代码:
rust
// Add logging
println!("DEBUG: var = {:?}, state = {:?}", var, state);
// Add assertions
assert!(condition, "Expected X but got {:?}", actual);4. Investigate Codebase (Use codebase-investigator Agent)
4. 调研代码库(使用codebase-investigator Agent)
Dispatch codebase-investigator with:
"Error occurs in function X at line Y.
Find:
- How is X called? What are the callers?
- What does variable Z contain at this point?
- Are there similar functions that work correctly?
- What changed recently in this area?"调用codebase-investigator时传入:
"Error occurs in function X at line Y.
Find:
- How is X called? What are the callers?
- What does variable Z contain at this point?
- Are there similar functions that work correctly?
- What changed recently in this area?"Phase 2: Form Hypothesis
阶段2:提出假设
Based on evidence (not guesses):
- State what you know (from investigation)
- Propose theory explaining the evidence
- Make prediction that tests the theory
Example:
Known: Error "null pointer" at auth.rs:45 when email is empty
Theory: Empty email bypasses validation, passes null to login()
Prediction: Adding validation before login() will prevent error
Test: Add validation, verify error doesn't occur with empty emailNEVER:
- Guess without evidence
- Propose fix without hypothesis
- Skip to "try this and see"
基于证据提出(而非猜测):
- 明确已知信息(来自调查阶段)
- 提出解释证据的理论
- 做出可验证的预测
示例:
已知信息:在auth.rs第45行,当邮箱为空时出现“空指针”错误
理论:空邮箱绕过了验证环节,将null传入login()函数
预测:在login()前添加验证可避免该错误
测试:添加验证后,验证空邮箱是否会触发错误绝对禁止:
- 无证据的猜测
- 未提出假设就直接修复
- 跳过步骤直接“尝试这个方案看看”
Phase 3: Test Hypothesis
阶段3:验证假设
Minimal change to validate theory:
- Make smallest change that tests hypothesis
- Run test/reproduction case
- Observe result
If confirmed: Proceed to Phase 4
If rejected: Return to Phase 1 with new information
通过最小改动验证理论:
- 做出验证假设所需的最小改动
- 运行测试/复现用例
- 观察结果
若假设成立: 进入阶段4
若假设不成立: 携带新信息返回阶段1
Phase 4: Implement Fix
阶段4:实施修复
After understanding root cause:
- Write test reproducing bug (RED phase - use test-driven-development skill)
- Implement proper fix addressing root cause
- Verify test passes (GREEN phase)
- Run full test suite (regression check)
- Commit fix
The fix should:
- Address root cause (not symptom)
- Be minimal and focused
- Include test preventing regression
</the_process>
<examples>
<example>
<scenario>Developer encounters test failure, immediately tries "obvious" fix without investigation</scenario>
<code>
Test error:
```
FAIL: test_login_expired_token
AssertionError: Expected Err(TokenExpired), got Ok(User)
```
Developer thinks: "Obviously the token expiration check is wrong"
Makes change without investigation:
rust
// "Fix" - just check if token is expired
if token.expires_at < now() {
return Err(AuthError::TokenExpired);
}Commits without testing other cases.
</code>
<why_it_fails>
No investigation:
- Didn't read error completely
- Didn't check what contains
expires_at - Didn't debug to see token state
- Didn't search for similar issues
What actually happened: Token was being parsed incorrectly, always showing future date. The "fix" adds dead code that never runs.
expires_atResult: Bug not fixed, new dead code added, time wasted.
</why_it_fails>
<correction>
**Phase 1 - Investigate with tools:**
bash
undefined明确根本原因后:
- 编写复现Bug的测试用例(RED阶段 - 使用测试驱动开发技能)
- 针对根本原因实施合理修复
- 验证测试用例通过(GREEN阶段)
- 运行完整测试套件(回归检查)
- 提交修复
修复方案需满足:
- 针对根本原因(而非表面症状)
- 最小化且聚焦问题
- 包含防止回归的测试用例
</the_process>
<examples>
<example>
<scenario>开发者遇到测试失败后,未做调查就直接尝试“看似明显”的修复方案</scenario>
<code>
Test error:
```
FAIL: test_login_expired_token
AssertionError: Expected Err(TokenExpired), got Ok(User)
```
Developer thinks: "Obviously the token expiration check is wrong"
Makes change without investigation:
rust
// "Fix" - just check if token is expired
if token.expires_at < now() {
return Err(AuthError::TokenExpired);
}Commits without testing other cases.
</code>
<why_it_fails>
未进行调查:
- 未完整阅读错误信息
- 未检查的取值
expires_at - 未通过调试查看Token状态
- 未搜索类似问题
实际问题: Token的字段解析错误,始终显示为未来日期。该“修复”添加了永远不会执行的死代码。
expires_at结果: Bug未修复,引入新的死代码,浪费时间。
</why_it_fails>
<correction>
**阶段1 - 使用工具进行调查:**
bash
undefined1. Read complete error
1. Read complete error
FAIL: test_login_expired_token at line 45
Expected: Err(TokenExpired)
Got: Ok(User { id: 123 })
Token: { expires_at: "2099-01-01", ... }
**Dispatch internet-researcher:**"Search for: token expiration always showing future date
- Check date parsing bugs
- Look for timezone issues
- Find JWT expiration handling"
**Add instrumentation:**
```rust
println!("DEBUG: expires_at = {:?}, now = {:?}, expired = {:?}",
token.expires_at, now(), token.expires_at < now());Run test again:
DEBUG: expires_at = 2099-01-01T00:00:00Z, now = 2024-01-15T10:30:00Z, expired = falsePhase 2 - Hypothesis:
"Token is being set to 2099, not actual expiration. Problem is in token creation, not validation."
expires_atPhase 3 - Test:
Check token creation code:
rust
// Found the bug!
fn create_token() -> Token {
Token {
expires_at: "2099-01-01".parse()?, // HARDCODED!
...
}
}Phase 4 - Fix root cause:
rust
fn create_token(duration: Duration) -> Token {
Token {
expires_at: now() + duration, // Correct
...
}
}Result: Root cause fixed, test passes, no dead code.
</correction>
</example>
<example>
<scenario>Developer skips internet search, reinvents solution to known problem</scenario>
<code>
Error:
```
error: linking with `cc` failed: exit status: 1
ld: symbol(s) not found for architecture arm64
```
Developer thinks: "Must be a linking issue, I'll add flags"
Spends 2 hours trying different linker flags:
toml
[target.aarch64-apple-darwin]
rustflags = ["-C", "link-arg=-undefined dynamic_lookup"]FAIL: test_login_expired_token at line 45
Expected: Err(TokenExpired)
Got: Ok(User { id: 123 })
Token: { expires_at: "2099-01-01", ... }
**调用internet-researcher:**"Search for: token expiration always showing future date
- Check date parsing bugs
- Look for timezone issues
- Find JWT expiration handling"
**添加埋点代码:**
```rust
println!("DEBUG: expires_at = {:?}, now = {:?}, expired = {:?}",
token.expires_at, now(), token.expires_at < now());再次运行测试:
DEBUG: expires_at = 2099-01-01T00:00:00Z, now = 2024-01-15T10:30:00Z, expired = false阶段2 - 提出假设:
"Token的字段被硬编码为2099年,而非实际过期时间。问题出在Token创建环节,而非验证环节。"
expires_at阶段3 - 验证假设:
检查Token创建代码:
rust
// Found the bug!
fn create_token() -> Token {
Token {
expires_at: "2099-01-01".parse()?, // HARDCODED!
...
}
}阶段4 - 修复根本原因:
rust
fn create_token(duration: Duration) -> Token {
Token {
expires_at: now() + duration, // Correct
...
}
}结果: 根本原因被修复,测试用例通过,无冗余代码。
</correction>
</example>
<example>
<scenario>开发者跳过网络搜索,自行摸索已知问题的解决方案</scenario>
<code>
Error:
```
error: linking with `cc` failed: exit status: 1
ld: symbol(s) not found for architecture arm64
```
Developer thinks: "Must be a linking issue, I'll add flags"
Spends 2 hours trying different linker flags:
toml
[target.aarch64-apple-darwin]
rustflags = ["-C", "link-arg=-undefined dynamic_lookup"]Doesn't work, tries more flags...
Doesn't work, tries more flags...
</code>
<why_it_fails>
**Skipped internet search:**
- This is a common error with known solutions
- Stack Overflow has exact fix
- Official docs explain the issue
- Wasted 2 hours reinventing solution
**Why it happens:** Impatience, thinking "I can figure this out faster"
</why_it_fails>
<correction>
**Dispatch internet-researcher FIRST:**
"Search for: 'symbol not found for architecture arm64' Rust linking
- Check Stack Overflow solutions
- Look for Xcode/macOS specific fixes
- Find Cargo configuration for Apple Silicon"
**Agent returns (30 seconds):**Found on Stack Overflow (2.4k upvotes):
This occurs when Xcode Command Line Tools aren't installed or outdated.
Solution:
xcode-select --install
Or update existing:
softwareupdate --all --install --force
**Apply solution:**
```bash
xcode-select --install</code>
<why_it_fails>
**跳过网络搜索:**
- 这是一个常见错误,已有成熟解决方案
- Stack Overflow上有精确的修复步骤
- 官方文档已解释该问题
- 浪费2小时自行摸索解决方案
**原因:** 急躁心态,认为“我能更快解决这个问题”
</why_it_fails>
<correction>
**优先调用internet-researcher:**
"Search for: 'symbol not found for architecture arm64' Rust linking
- Check Stack Overflow solutions
- Look for Xcode/macOS specific fixes
- Find Cargo configuration for Apple Silicon"
**Agent在30秒内返回结果:**Found on Stack Overflow (2.4k upvotes):
This occurs when Xcode Command Line Tools aren't installed or outdated.
Solution:
xcode-select --install
Or update existing:
softwareupdate --all --install --force
**应用解决方案:**
```bash
xcode-select --installWait for installation
Wait for installation
cargo build
cargo build
Success!
Success!
**Result:** Fixed in 5 minutes, not 2 hours.
</correction>
</example>
<example>
<scenario>Developer fixes symptom without understanding root cause, bug returns different way</scenario>
<code>
Bug: Users can delete other users' posts
Developer notices: DELETE /posts/:id doesn't check ownership
"Fix":
```python
@app.delete("/posts/{post_id}")
def delete_post(post_id: int, user: User):
post = get_post(post_id)
if post.author_id != user.id:
raise Forbidden()
delete(post)Bug "fixed"... but bug report comes back: "Users can edit other users' posts"
</code>
<why_it_fails>
Symptom fix:
- Fixed DELETE endpoint
- Didn't investigate root cause
- Didn't check other endpoints
Root cause: No authorization middleware checking ownership across ALL endpoints.
Result: Whack-a-mole fixing symptoms, not the underlying issue.
</why_it_fails>
<correction>
**Phase 1 - Investigate with codebase-investigator:**
"User can delete other users' posts. Investigate:
- What authorization exists across all post endpoints?
- Is there a pattern for ownership checks?
- Which other endpoints access posts?
- Is there middleware handling this?"Agent reports:
Found:
- DELETE /posts/:id - NO ownership check
- PUT /posts/:id - NO ownership check
- GET /posts/:id - NO ownership check (allows viewing private posts!)
- No authorization middleware found
- Similar pattern in comments, likes endpointsPhase 2 - Hypothesis:
"Missing authorization layer. Need middleware checking resource ownership across ALL endpoints."
Phase 4 - Fix root cause:
python
undefined
**结果:** 5分钟内解决问题,而非2小时。
</correction>
</example>
<example>
<scenario>开发者未理解根本原因就修复表面症状,Bug以其他形式重现</scenario>
<code>
Bug: Users can delete other users' posts
Developer notices: DELETE /posts/:id doesn't check ownership
"Fix":
```python
@app.delete("/posts/{post_id}")
def delete_post(post_id: int, user: User):
post = get_post(post_id)
if post.author_id != user.id:
raise Forbidden()
delete(post)Bug "fixed"... but bug report comes back: "Users can edit other users' posts"
</code>
<why_it_fails>
仅修复表面症状:
- 仅修复了DELETE端点
- 未调查根本原因
- 未检查其他端点
根本原因: 缺少在所有端点验证资源归属的授权中间件。
结果: 陷入“打地鼠”式修复,始终未解决底层问题。
</why_it_fails>
<correction>
**阶段1 - 使用codebase-investigator进行调查:**
"User can delete other users' posts. Investigate:
- What authorization exists across all post endpoints?
- Is there a pattern for ownership checks?
- Which other endpoints access posts?
- Is there middleware handling this?"Agent返回结果:
Found:
- DELETE /posts/:id - NO ownership check
- PUT /posts/:id - NO ownership check
- GET /posts/:id - NO ownership check (allows viewing private posts!)
- No authorization middleware found
- Similar pattern in comments, likes endpoints阶段2 - 提出假设:
"缺少授权层,需要在所有端点添加验证资源归属的中间件。"
阶段4 - 修复根本原因:
python
undefinedAdd authorization middleware
Add authorization middleware
class OwnershipMiddleware:
def check_ownership(self, resource, user):
if resource.author_id != user.id:
raise Forbidden()
class OwnershipMiddleware:
def check_ownership(self, resource, user):
if resource.author_id != user.id:
raise Forbidden()
Apply to all endpoints
Apply to all endpoints
@app.delete("/posts/{post_id}")
@require_ownership(Post)
def delete_post(...):
...
@app.put("/posts/{post_id}")
@require_ownership(Post)
def update_post(...):
...
**Result:** Root cause fixed, ALL endpoints secured, not just one symptom.
</correction>
</example>
</examples>
<critical_rules>@app.delete("/posts/{post_id}")
@require_ownership(Post)
def delete_post(...):
...
@app.put("/posts/{post_id}")
@require_ownership(Post)
def update_post(...):
...
**结果:** 根本原因被修复,所有端点均已安全防护,而非仅修复单个症状。
</correction>
</example>
</examples>
<critical_rules>Rules That Have No Exceptions
无例外规则
-
Tools before fixes → Never guess without investigation
- Use internet-researcher for errors
- Use debugger or instrumentation for state
- Use codebase-investigator for context
-
Evidence-based hypotheses → Not guesses or hunches
- State what tools revealed
- Propose theory explaining evidence
- Make testable prediction
-
Test hypothesis before fixing → Minimal change to validate
- Smallest change that tests theory
- Observe result
- If wrong, return to investigation
-
Fix root cause, not symptom → One fix, many symptoms prevented
- Understand why problem occurred
- Fix the underlying issue
- Don't play whack-a-mole
-
先工具后修复 → 绝不无调查就猜测
- 使用internet-researcher排查错误
- 使用调试器或埋点代码检查运行状态
- 使用codebase-investigator了解上下文
-
基于证据的假设 → 而非猜测或直觉
- 明确工具揭示的信息
- 提出可解释证据的理论
- 做出可验证的预测
-
先验证假设再修复 → 通过最小改动验证理论
- 仅做验证理论所需的最小改动
- 观察结果
- 若假设错误,返回调查阶段
-
修复根本原因而非症状 → 一次修复,预防多个症状
- 理解问题发生的原因
- 修复底层问题
- 避免“打地鼠”式修复
Common Excuses
常见借口
All of these mean: Stop, use tools to investigate:
- "The fix is obvious"
- "I know what this is"
- "Just a quick try"
- "No time for debugging"
- "Error message is clear enough"
- "Internet search will take too long"
</critical_rules>
<verification_checklist>
Before proposing any fix:
- Read complete error message (not just first line)
- Dispatched internet-researcher for unclear errors
- Used debugger or added instrumentation to inspect state
- Dispatched codebase-investigator to understand context
- Formed hypothesis based on evidence (not guesses)
- Tested hypothesis with minimal change
- Verified hypothesis confirmed before fixing
Before committing fix:
- Written test reproducing bug (RED phase)
- Verified test fails before fix
- Implemented fix addressing root cause
- Verified test passes after fix (GREEN phase)
- Ran full test suite (regression check)
</verification_checklist>
<integration>
This skill calls:
- internet-researcher (search errors, known bugs, solutions)
- codebase-investigator (understand code structure, find related code)
- test-driven-development (write test for bug, implement fix)
- test-runner (run tests without output pollution)
This skill is called by:
- fixing-bugs (complete bug fix workflow)
- root-cause-tracing (deep debugging for complex issues)
- Any skill when encountering unexpected behavior
Agents used:
- hyperpowers:internet-researcher (search for error solutions)
- hyperpowers:codebase-investigator (understand codebase context)
- hyperpowers:test-runner (run tests, return summary only)
Detailed guides:
- Debugger reference - LLDB, GDB, DevTools commands
- Debugging session example - Complete walkthrough
When stuck:
- Error unclear → Dispatch internet-researcher with exact error text
- Don't understand code flow → Dispatch codebase-investigator
- Need to inspect runtime state → Recommend debugger to user or add instrumentation
- Tempted to guess → Stop, use tools to gather evidence first
以下所有借口都意味着:停止当前操作,使用工具收集证据:
- “修复方案很明显”
- “我知道问题出在哪”
- “只是快速试一下”
- “没时间调试”
- “错误信息足够明确了”
- “网络搜索太费时间”
</critical_rules>
<verification_checklist>
在提出任何修复方案前:
- 完整阅读错误信息(而非仅第一行)
- 针对不明确的错误调用了internet-researcher
- 使用调试器或添加埋点代码检查运行状态
- 调用codebase-investigator了解上下文
- 基于证据提出假设(而非猜测)
- 通过最小改动验证假设
- 确认假设成立后再进行修复
在提交修复前:
- 编写了复现Bug的测试用例(RED阶段)
- 验证修复前测试用例失败
- 实施了针对根本原因的修复
- 验证修复后测试用例通过(GREEN阶段)
- 运行了完整测试套件(回归检查)
</verification_checklist>
<integration>
本技能会调用:
- internet-researcher(搜索错误、已知Bug及解决方案)
- codebase-investigator(理解代码结构、定位关联代码)
- test-driven-development(为Bug编写测试用例、实施修复)
- test-runner(运行测试且返回摘要信息)
本技能会被以下技能调用:
- fixing-bugs(完整Bug修复流程)
- root-cause-tracing(复杂问题深度调试)
- 任何遇到异常行为的技能
使用的Agent:
- hyperpowers:internet-researcher(搜索错误解决方案)
- hyperpowers:codebase-investigator(理解代码库上下文)
- hyperpowers:test-runner(运行测试,仅返回摘要)
详细指南:
- Debugger reference - LLDB、GDB、DevTools命令参考
- Debugging session example - 完整调试流程示例
遇到瓶颈时:
- 错误信息不明确 → 传入完整错误文本调用internet-researcher
- 不理解代码流程 → 调用codebase-investigator
- 需要检查运行时状态 → 建议用户使用调试器或添加埋点代码
- 想要凭猜测修复 → 停止操作,先使用工具收集证据