<skill_overview> Random fixes waste time and create new bugs. Always use tools to understand root cause BEFORE attempting fixes. Symptom fixes are failure. </skill_overview>

<rigidity_level> MEDIUM FREEDOM - Must complete investigation phases (tools → hypothesis → test) before fixing.

Can adapt tool choice to language/context. Never skip investigation or guess at fixes. </rigidity_level>

<quick_reference>

Phase	Tools to Use	Output
1. Investigate	Error messages, internet-researcher agent, debugger, codebase-investigator	Root cause understanding
2. Hypothesize	Form theory based on evidence (not guesses)	Testable hypothesis
3. Test	Validate hypothesis with minimal change	Confirms or rejects theory
4. Fix	Implement proper fix for root cause	Problem solved permanently

FORBIDDEN: Skip investigation → guess at fix → hope it works REQUIRED: Tools → evidence → hypothesis → test → fix

Key agents:

```
internet-researcher
```
- Search error messages, known bugs, solutions
```
codebase-investigator
```
- Understand code structure, find related code
```
test-runner
```
- Run tests without output pollution

</quick_reference>

<when_to_use> Use for ANY technical issue:

Test failures
Bugs in production or development
Unexpected behavior
Build failures
Integration issues
Performance problems

ESPECIALLY when:

"Just one quick fix" seems obvious
Under time pressure (emergencies make guessing tempting)
Error message is unclear
Previous fix didn't work </when_to_use>

<the_process>

<skill_overview> 随机修复不仅浪费时间，还会引入新Bug。在尝试修复前，务必先借助工具明确问题的根本原因。仅修复表面症状是无效的。 </skill_overview>

<rigidity_level> 中等自由度——在修复前必须完成调查阶段（工具使用→假设提出→测试验证）。

可根据语言/场景调整工具选择，但绝不能跳过调查环节或凭猜测修复。 </rigidity_level>

<quick_reference>

阶段	适用工具	输出结果
1. 调查	错误信息、internet-researcher Agent、调试器、codebase-investigator	明确根本原因
2. 提出假设	基于证据形成理论（而非猜测）	可验证的假设
3. 测试验证	通过最小改动验证假设	确认或推翻理论
4. 实施修复	针对根本原因执行合理修复	永久解决问题

禁止行为： 跳过调查→凭猜测修复→寄希望于修复生效 强制要求： 工具使用→证据收集→假设提出→测试验证→实施修复

核心Agent：

```
internet-researcher
```
- 搜索错误信息、已知Bug及解决方案
```
codebase-investigator
```
- 理解代码结构、定位关联代码
```
test-runner
```
- 运行测试且避免输出冗余信息

</quick_reference>

<when_to_use> 适用于所有技术问题：

测试失败
生产或开发环境中的Bug
异常行为
构建失败
集成问题
性能问题

尤其在以下场景使用：

看似有“快速修复方案”时
处于时间压力下（紧急情况容易让人凭猜测修复）
错误信息不明确时
之前的修复无效时 </when_to_use>

<the_process>

Phase 1: Tool-Assisted Investigation

阶段1：工具辅助调查

BEFORE attempting ANY fix, gather evidence with tools:

在尝试任何修复前，先借助工具收集证据：

1. Read Complete Error Messages

1. 完整阅读错误信息

Entire error message (not just first line)
Complete stack trace (all frames)
Line numbers, file paths, error codes
Stack traces show exact execution path

完整的错误信息（而非仅第一行）
完整的栈追踪（所有调用帧）
行号、文件路径、错误代码
栈追踪可展示精确的执行路径

2. Search Internet FIRST (Use internet-researcher Agent)

2. 优先进行网络搜索（使用internet-researcher Agent）

Dispatch internet-researcher with:

"Search for error: [exact error message]
- Check Stack Overflow solutions
- Look for GitHub issues in [library] version [X]
- Find official documentation explaining this error
- Check if this is a known bug"

What agent should find:

Exact matches to your error
Similar symptoms and solutions
Known bugs in your dependency versions
Workarounds that worked for others

调用internet-researcher时传入：

"Search for error: [exact error message]
- Check Stack Overflow solutions
- Look for GitHub issues in [library] version [X]
- Find official documentation explaining this error
- Check if this is a known bug"

Agent需返回的内容：

与你的错误完全匹配的案例
类似症状及解决方案
依赖版本中的已知Bug
已被验证有效的临时解决方案

3. Use Debugger to Inspect State

3. 使用调试器检查运行状态

Claude cannot run debuggers directly. Instead:

Option A - Recommend debugger to user:

"Let's use lldb/gdb/DevTools to inspect state at error location.
Please run: [specific commands]
When breakpoint hits: [what to inspect]
Share output with me."

Option B - Add instrumentation Claude can add:

rust

// Add logging
println!("DEBUG: var = {:?}, state = {:?}", var, state);

// Add assertions
assert!(condition, "Expected X but got {:?}", actual);

Claude无法直接运行调试器，可通过以下方式处理：

方案A - 建议用户使用调试器：

"Let's use lldb/gdb/DevTools to inspect state at error location.
Please run: [specific commands]
When breakpoint hits: [what to inspect]
Share output with me."

方案B - 添加Claude可插入的埋点代码：

rust

// Add logging
println!("DEBUG: var = {:?}, state = {:?}", var, state);

// Add assertions
assert!(condition, "Expected X but got {:?}", actual);

4. Investigate Codebase (Use codebase-investigator Agent)

4. 调研代码库（使用codebase-investigator Agent）

Dispatch codebase-investigator with:

"Error occurs in function X at line Y.
Find:
- How is X called? What are the callers?
- What does variable Z contain at this point?
- Are there similar functions that work correctly?
- What changed recently in this area?"

调用codebase-investigator时传入：

"Error occurs in function X at line Y.
Find:
- How is X called? What are the callers?
- What does variable Z contain at this point?
- Are there similar functions that work correctly?
- What changed recently in this area?"

Phase 2: Form Hypothesis

阶段2：提出假设

Based on evidence (not guesses):

State what you know (from investigation)
Propose theory explaining the evidence
Make prediction that tests the theory

Example:

Known: Error "null pointer" at auth.rs:45 when email is empty
Theory: Empty email bypasses validation, passes null to login()
Prediction: Adding validation before login() will prevent error
Test: Add validation, verify error doesn't occur with empty email

NEVER:

Guess without evidence
Propose fix without hypothesis
Skip to "try this and see"

基于证据提出（而非猜测）：

明确已知信息（来自调查阶段）
提出解释证据的理论
做出可验证的预测

示例：

已知信息：在auth.rs第45行，当邮箱为空时出现“空指针”错误
理论：空邮箱绕过了验证环节，将null传入login()函数
预测：在login()前添加验证可避免该错误
测试：添加验证后，验证空邮箱是否会触发错误

绝对禁止：

无证据的猜测
未提出假设就直接修复
跳过步骤直接“尝试这个方案看看”

Phase 3: Test Hypothesis

阶段3：验证假设

Minimal change to validate theory:

Make smallest change that tests hypothesis
Run test/reproduction case
Observe result

If confirmed: Proceed to Phase 4 If rejected: Return to Phase 1 with new information

通过最小改动验证理论：

做出验证假设所需的最小改动
运行测试/复现用例
观察结果

若假设成立： 进入阶段4 若假设不成立： 携带新信息返回阶段1

Phase 4: Implement Fix

阶段4：实施修复

After understanding root cause:

Write test reproducing bug (RED phase - use test-driven-development skill)
Implement proper fix addressing root cause
Verify test passes (GREEN phase)
Run full test suite (regression check)
Commit fix

The fix should:

Address root cause (not symptom)
Be minimal and focused
Include test preventing regression

</the_process>

<examples> <example> <scenario>Developer encounters test failure, immediately tries "obvious" fix without investigation</scenario> <code> Test error: ``` FAIL: test_login_expired_token AssertionError: Expected Err(TokenExpired), got Ok(User) ```

Developer thinks: "Obviously the token expiration check is wrong"

Makes change without investigation:

rust

// "Fix" - just check if token is expired
if token.expires_at < now() {
    return Err(AuthError::TokenExpired);
}

Commits without testing other cases. </code>

<why_it_fails> No investigation:

Didn't read error completely
Didn't check what
```
expires_at
```
contains
Didn't debug to see token state
Didn't search for similar issues

What actually happened: Token

expires_at

was being parsed incorrectly, always showing future date. The "fix" adds dead code that never runs.

Result: Bug not fixed, new dead code added, time wasted. </why_it_fails>

<correction> **Phase 1 - Investigate with tools:**

bash

undefined

明确根本原因后：

编写复现Bug的测试用例（RED阶段 - 使用测试驱动开发技能）
针对根本原因实施合理修复
验证测试用例通过（GREEN阶段）
运行完整测试套件（回归检查）
提交修复

修复方案需满足：

针对根本原因（而非表面症状）
最小化且聚焦问题
包含防止回归的测试用例

</the_process>

<examples> <example> <scenario>开发者遇到测试失败后，未做调查就直接尝试“看似明显”的修复方案</scenario> <code> Test error: ``` FAIL: test_login_expired_token AssertionError: Expected Err(TokenExpired), got Ok(User) ```

Developer thinks: "Obviously the token expiration check is wrong"

Makes change without investigation:

rust

// "Fix" - just check if token is expired
if token.expires_at < now() {
    return Err(AuthError::TokenExpired);
}

Commits without testing other cases. </code>

<why_it_fails> 未进行调查：

未完整阅读错误信息
未检查
```
expires_at
```
的取值
未通过调试查看Token状态
未搜索类似问题

实际问题： Token的

expires_at

字段解析错误，始终显示为未来日期。该“修复”添加了永远不会执行的死代码。

结果： Bug未修复，引入新的死代码，浪费时间。 </why_it_fails>

<correction> **阶段1 - 使用工具进行调查：**

bash

undefined

1. Read complete error

FAIL: test_login_expired_token at line 45 Expected: Err(TokenExpired) Got: Ok(User { id: 123 }) Token: { expires_at: "2099-01-01", ... }


**Dispatch internet-researcher:**

"Search for: token expiration always showing future date

Check date parsing bugs
Look for timezone issues
Find JWT expiration handling"


**Add instrumentation:**
```rust
println!("DEBUG: expires_at = {:?}, now = {:?}, expired = {:?}",
         token.expires_at, now(), token.expires_at < now());

Run test again:

DEBUG: expires_at = 2099-01-01T00:00:00Z, now = 2024-01-15T10:30:00Z, expired = false

Phase 2 - Hypothesis: "Token

expires_at

is being set to 2099, not actual expiration. Problem is in token creation, not validation."

Phase 3 - Test: Check token creation code:

rust

// Found the bug!
fn create_token() -> Token {
    Token {
        expires_at: "2099-01-01".parse()?, // HARDCODED!
        ...
    }
}

Phase 4 - Fix root cause:

rust

fn create_token(duration: Duration) -> Token {
    Token {
        expires_at: now() + duration,  // Correct
        ...
    }
}

Result: Root cause fixed, test passes, no dead code. </correction> </example>

<example> <scenario>Developer skips internet search, reinvents solution to known problem</scenario> <code> Error: ``` error: linking with `cc` failed: exit status: 1 ld: symbol(s) not found for architecture arm64 ```

Developer thinks: "Must be a linking issue, I'll add flags"

Spends 2 hours trying different linker flags:

toml

[target.aarch64-apple-darwin]
rustflags = ["-C", "link-arg=-undefined dynamic_lookup"]

FAIL: test_login_expired_token at line 45 Expected: Err(TokenExpired) Got: Ok(User { id: 123 }) Token: { expires_at: "2099-01-01", ... }


**调用internet-researcher：**

"Search for: token expiration always showing future date

Check date parsing bugs
Look for timezone issues
Find JWT expiration handling"


**添加埋点代码：**
```rust
println!("DEBUG: expires_at = {:?}, now = {:?}, expired = {:?}",
         token.expires_at, now(), token.expires_at < now());

再次运行测试：

DEBUG: expires_at = 2099-01-01T00:00:00Z, now = 2024-01-15T10:30:00Z, expired = false

阶段2 - 提出假设： "Token的

expires_at

字段被硬编码为2099年，而非实际过期时间。问题出在Token创建环节，而非验证环节。"

阶段3 - 验证假设： 检查Token创建代码：

rust

// Found the bug!
fn create_token() -> Token {
    Token {
        expires_at: "2099-01-01".parse()?, // HARDCODED!
        ...
    }
}

阶段4 - 修复根本原因：

rust

fn create_token(duration: Duration) -> Token {
    Token {
        expires_at: now() + duration,  // Correct
        ...
    }
}

结果： 根本原因被修复，测试用例通过，无冗余代码。 </correction> </example>

<example> <scenario>开发者跳过网络搜索，自行摸索已知问题的解决方案</scenario> <code> Error: ``` error: linking with `cc` failed: exit status: 1 ld: symbol(s) not found for architecture arm64 ```

Developer thinks: "Must be a linking issue, I'll add flags"

Spends 2 hours trying different linker flags:

toml

[target.aarch64-apple-darwin]
rustflags = ["-C", "link-arg=-undefined dynamic_lookup"]

Doesn't work, tries more flags...

</code>

<why_it_fails>
**Skipped internet search:**
- This is a common error with known solutions
- Stack Overflow has exact fix
- Official docs explain the issue
- Wasted 2 hours reinventing solution

**Why it happens:** Impatience, thinking "I can figure this out faster"
</why_it_fails>

<correction>
**Dispatch internet-researcher FIRST:**

"Search for: 'symbol not found for architecture arm64' Rust linking

Check Stack Overflow solutions
Look for Xcode/macOS specific fixes
Find Cargo configuration for Apple Silicon"


**Agent returns (30 seconds):**

Found on Stack Overflow (2.4k upvotes): This occurs when Xcode Command Line Tools aren't installed or outdated.

Solution: xcode-select --install

Or update existing: softwareupdate --all --install --force


**Apply solution:**
```bash
xcode-select --install

</code>

<why_it_fails>
**跳过网络搜索：**
- 这是一个常见错误，已有成熟解决方案
- Stack Overflow上有精确的修复步骤
- 官方文档已解释该问题
- 浪费2小时自行摸索解决方案

**原因：** 急躁心态，认为“我能更快解决这个问题”
</why_it_fails>

<correction>
**优先调用internet-researcher：**

"Search for: 'symbol not found for architecture arm64' Rust linking

Check Stack Overflow solutions
Look for Xcode/macOS specific fixes
Find Cargo configuration for Apple Silicon"


**Agent在30秒内返回结果：**

Found on Stack Overflow (2.4k upvotes): This occurs when Xcode Command Line Tools aren't installed or outdated.

Solution: xcode-select --install

Or update existing: softwareupdate --all --install --force


**应用解决方案：**
```bash
xcode-select --install

Wait for installation

cargo build

Success!


**Result:** Fixed in 5 minutes, not 2 hours.
</correction>
</example>

<example>
<scenario>Developer fixes symptom without understanding root cause, bug returns different way</scenario>

<code>
Bug: Users can delete other users' posts

Developer notices: DELETE /posts/:id doesn't check ownership

"Fix":
```python
@app.delete("/posts/{post_id}")
def delete_post(post_id: int, user: User):
    post = get_post(post_id)
    if post.author_id != user.id:
        raise Forbidden()
    delete(post)

Bug "fixed"... but bug report comes back: "Users can edit other users' posts" </code>

<why_it_fails> Symptom fix:

Fixed DELETE endpoint
Didn't investigate root cause
Didn't check other endpoints

Root cause: No authorization middleware checking ownership across ALL endpoints.

Result: Whack-a-mole fixing symptoms, not the underlying issue. </why_it_fails>

<correction> **Phase 1 - Investigate with codebase-investigator:**

"User can delete other users' posts. Investigate:
- What authorization exists across all post endpoints?
- Is there a pattern for ownership checks?
- Which other endpoints access posts?
- Is there middleware handling this?"

Agent reports:

Found:
- DELETE /posts/:id - NO ownership check
- PUT /posts/:id - NO ownership check
- GET /posts/:id - NO ownership check (allows viewing private posts!)
- No authorization middleware found
- Similar pattern in comments, likes endpoints

Phase 2 - Hypothesis: "Missing authorization layer. Need middleware checking resource ownership across ALL endpoints."

Phase 4 - Fix root cause:

python

undefined


**结果：** 5分钟内解决问题，而非2小时。
</correction>
</example>

<example>
<scenario>开发者未理解根本原因就修复表面症状，Bug以其他形式重现</scenario>

<code>
Bug: Users can delete other users' posts

Developer notices: DELETE /posts/:id doesn't check ownership

"Fix":
```python
@app.delete("/posts/{post_id}")
def delete_post(post_id: int, user: User):
    post = get_post(post_id)
    if post.author_id != user.id:
        raise Forbidden()
    delete(post)

Bug "fixed"... but bug report comes back: "Users can edit other users' posts" </code>

<why_it_fails> 仅修复表面症状：

仅修复了DELETE端点
未调查根本原因
未检查其他端点

根本原因： 缺少在所有端点验证资源归属的授权中间件。

结果： 陷入“打地鼠”式修复，始终未解决底层问题。 </why_it_fails>

<correction> **阶段1 - 使用codebase-investigator进行调查：**

"User can delete other users' posts. Investigate:
- What authorization exists across all post endpoints?
- Is there a pattern for ownership checks?
- Which other endpoints access posts?
- Is there middleware handling this?"

Agent返回结果：

Found:
- DELETE /posts/:id - NO ownership check
- PUT /posts/:id - NO ownership check
- GET /posts/:id - NO ownership check (allows viewing private posts!)
- No authorization middleware found
- Similar pattern in comments, likes endpoints

阶段2 - 提出假设： "缺少授权层，需要在所有端点添加验证资源归属的中间件。"

阶段4 - 修复根本原因：

python

undefined

Add authorization middleware

class OwnershipMiddleware: def check_ownership(self, resource, user): if resource.author_id != user.id: raise Forbidden()

Apply to all endpoints

@app.delete("/posts/{post_id}") @require_ownership(Post) def delete_post(...): ...

@app.put("/posts/{post_id}") @require_ownership(Post) def update_post(...): ...


**Result:** Root cause fixed, ALL endpoints secured, not just one symptom.
</correction>
</example>

</examples>

<critical_rules>

@app.delete("/posts/{post_id}") @require_ownership(Post) def delete_post(...): ...

@app.put("/posts/{post_id}") @require_ownership(Post) def update_post(...): ...


**结果：** 根本原因被修复，所有端点均已安全防护，而非仅修复单个症状。
</correction>
</example>

</examples>

<critical_rules>

Rules That Have No Exceptions

无例外规则

Tools before fixes → Never guess without investigation
- Use internet-researcher for errors
- Use debugger or instrumentation for state
- Use codebase-investigator for context
Evidence-based hypotheses → Not guesses or hunches
- State what tools revealed
- Propose theory explaining evidence
- Make testable prediction
Test hypothesis before fixing → Minimal change to validate
- Smallest change that tests theory
- Observe result
- If wrong, return to investigation
Fix root cause, not symptom → One fix, many symptoms prevented
- Understand why problem occurred
- Fix the underlying issue
- Don't play whack-a-mole

先工具后修复 → 绝不无调查就猜测
- 使用internet-researcher排查错误
- 使用调试器或埋点代码检查运行状态
- 使用codebase-investigator了解上下文
基于证据的假设 → 而非猜测或直觉
- 明确工具揭示的信息
- 提出可解释证据的理论
- 做出可验证的预测
先验证假设再修复 → 通过最小改动验证理论
- 仅做验证理论所需的最小改动
- 观察结果
- 若假设错误，返回调查阶段
修复根本原因而非症状 → 一次修复，预防多个症状
- 理解问题发生的原因
- 修复底层问题
- 避免“打地鼠”式修复

debugging-with-tools

Original

Translation

Phase 1: Tool-Assisted Investigation

阶段1：工具辅助调查

1. Read Complete Error Messages

1. 完整阅读错误信息

2. Search Internet FIRST (Use internet-researcher Agent)

2. 优先进行网络搜索（使用internet-researcher Agent）

3. Use Debugger to Inspect State

3. 使用调试器检查运行状态

4. Investigate Codebase (Use codebase-investigator Agent)

4. 调研代码库（使用codebase-investigator Agent）

Phase 2: Form Hypothesis

阶段2：提出假设

Phase 3: Test Hypothesis

阶段3：验证假设

Phase 4: Implement Fix

阶段4：实施修复

1. Read complete error

1. Read complete error

Doesn't work, tries more flags...

Doesn't work, tries more flags...

Wait for installation

Wait for installation

Success!

Success!

Add authorization middleware

Add authorization middleware

Apply to all endpoints

Apply to all endpoints

Rules That Have No Exceptions

无例外规则

Common Excuses

常见借口