debug
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDebug
调试
Systematic debugging for errors, test failures, unexpected behavior, and production issues.
针对错误、测试失败、异常行为及生产环境问题的系统化调试方法。
Usage
使用方法
/debug [issue] [--logs] [--correlate] [--trace] [--type bug|build|perf|deploy]/debug [issue] [--logs] [--correlate] [--trace] [--type bug|build|perf|deploy]Options
选项
| Flag | Purpose |
|---|---|
| Enable log pattern analysis (error spikes, frequency, types) |
| Run SQL correlation queries on structured logs |
| Deep stack trace analysis with context |
| Issue category: bug, build, perf(ormance), deploy(ment) |
| 标记 | 用途 |
|---|---|
| 启用日志模式分析(错误峰值、发生频率、错误类型) |
| 对结构化日志执行SQL关联查询 |
| 结合上下文进行深度堆栈跟踪分析 |
| 问题分类:bug(程序缺陷)、build(构建问题)、perf(性能问题)、deploy(部署问题) |
When to Use
适用场景
- Error messages or stack traces appear
- Tests are failing
- Code behaves unexpectedly
- User says "it's broken" or "not working"
- Production errors need investigation ()
--logs - Need to correlate errors across systems ()
--correlate - Deep stack analysis required ()
--trace
- 出现错误信息或堆栈跟踪
- 测试用例执行失败
- 代码行为不符合预期
- 用户反馈“功能损坏”或“无法正常工作”
- 需要排查生产环境错误(使用)
--logs - 需要跨系统关联错误(使用)
--correlate - 需要深度堆栈分析(使用)
--trace
Debugging Process
调试流程
- Capture - Get error message, stack trace, and reproduction steps
- Isolate - Narrow down the failure location
- Hypothesize - Form theories about the cause
- Test - Validate hypotheses with evidence
- Fix - Implement minimal fix
- Verify - Confirm solution works
- 捕获 - 获取错误信息、堆栈跟踪及复现步骤
- 隔离 - 缩小故障范围
- 假设 - 形成关于故障原因的推测
- 验证 - 用证据验证假设
- 修复 - 实施最小化修复方案
- 确认 - 验证解决方案有效
Investigation Steps
排查步骤
bash
undefinedbash
undefinedCheck recent changes that might have caused the issue
检查可能引发问题的近期变更
git log --oneline -10
git diff HEAD~3
git log --oneline -10
git diff HEAD~3
Find error patterns in logs
在日志中查找错误模式
grep -r "error|Error|ERROR" logs/ 2>/dev/null | tail -20
grep -r "error|Error|ERROR" logs/ 2>/dev/null | tail -20
Check test output
查看测试输出
npm test 2>&1 | tail -50 # or pytest, cargo test, etc.
undefinednpm test 2>&1 | tail -50 # 或 pytest、cargo test等
undefinedLog Analysis (--logs
)
--logs日志分析(--logs
)
--logsFind Errors
查找错误
bash
undefinedbash
undefinedRecent errors with context
包含上下文的近期错误
grep -B 5 -A 10 "ERROR" /var/log/app.log
grep -B 5 -A 10 "ERROR" /var/log/app.log
Count by error type
按错误类型统计数量
grep -oE "Error: [^:]*" app.log | sort | uniq -c | sort -rn
grep -oE "Error: [^:]*" app.log | sort | uniq -c | sort -rn
Errors in time range
特定时间范围内的错误
awk '/2024-01-15 14:/ && /ERROR/' app.log
awk '/2024-01-15 14:/ && /ERROR/' app.log
Find repeated errors
查找重复出现的错误
grep "ERROR" app.log | cut -d']' -f2 | sort | uniq -c | sort -rn | head -20
grep "ERROR" app.log | cut -d']' -f2 | sort | uniq -c | sort -rn | head -20
Find error spikes
查找错误峰值
grep "ERROR" app.log | cut -d' ' -f1-2 | uniq -c | sort -rn
undefinedgrep "ERROR" app.log | cut -d' ' -f1-2 | uniq -c | sort -rn
undefinedCommon Patterns
常见模式
| Pattern | Indicates | Action |
|---|---|---|
| NullPointer | Missing null check | Add validation |
| Timeout | Slow dependency | Add timeout, retry |
| Connection refused | Service down | Check health, retry |
| OOM | Memory leak | Profile, increase limits |
| Rate limit | Too many requests | Add backoff, queue |
| 模式 | 指示内容 | 处理措施 |
|---|---|---|
| NullPointer | 缺失空值检查 | 添加验证逻辑 |
| Timeout | 依赖服务响应缓慢 | 添加超时机制、重试 |
| Connection refused | 服务未启动 | 检查服务健康状态、重试 |
| OOM | 内存泄漏 | 性能分析、调整内存限制 |
| Rate limit | 请求量过大 | 添加退避策略、请求队列 |
Correlation Queries (--correlate
)
--correlate关联查询(--correlate
)
--correlatesql
-- Errors by endpoint
SELECT endpoint, count(*) as errors
FROM logs
WHERE level = 'ERROR' AND time > NOW() - INTERVAL '1 hour'
GROUP BY endpoint ORDER BY errors DESC;
-- Error rate over time
SELECT
date_trunc('minute', time) as minute,
count(*) filter (where level = 'ERROR') as errors,
count(*) as total
FROM logs
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY minute ORDER BY minute;
-- Correlate request IDs across services
SELECT service, message, time
FROM logs
WHERE request_id = 'req-12345'
ORDER BY time;sql
-- 按端点统计错误
SELECT endpoint, count(*) as errors
FROM logs
WHERE level = 'ERROR' AND time > NOW() - INTERVAL '1 hour'
GROUP BY endpoint ORDER BY errors DESC;
-- 错误率随时间变化趋势
SELECT
date_trunc('minute', time) as minute,
count(*) filter (where level = 'ERROR') as errors,
count(*) as total
FROM logs
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY minute ORDER BY minute;
-- 跨服务关联请求ID
SELECT service, message, time
FROM logs
WHERE request_id = 'req-12345'
ORDER BY time;Stack Trace Analysis (--trace
)
--trace堆栈跟踪分析(--trace
)
--traceParse Stack Traces
解析堆栈跟踪
python
import re
def parse_stack_trace(log_content: str) -> list[dict]:
pattern = r'(?P<exception>\w+Error|\w+Exception): (?P<message>.*?)\n(?P<trace>(?:\s+at .+\n)+)'
traces = []
for match in re.finditer(pattern, log_content):
traces.append({
'type': match.group('exception'),
'message': match.group('message'),
'trace': match.group('trace').strip().split('\n')
})
return tracespython
import re
def parse_stack_trace(log_content: str) -> list[dict]:
pattern = r'(?P<exception>\w+Error|\w+Exception): (?P<message>.*?)\n(?P<trace>(?:\s+at .+\n)+)'
traces = []
for match in re.finditer(pattern, log_content):
traces.append({
'type': match.group('exception'),
'message': match.group('message'),
'trace': match.group('trace').strip().split('\n')
})
return tracesInvestigation Checklist
排查清单
- Capture - Get full error message and stack trace
- Timestamp - When did it start?
- Frequency - How often? Increasing?
- Scope - All users or specific?
- Changes - Recent deployments?
- Dependencies - External services affected?
- 捕获 - 获取完整错误信息及堆栈跟踪
- 时间戳 - 故障首次出现的时间?
- 频率 - 发生频率?是否在上升?
- 范围 - 影响所有用户还是特定用户?
- 变更 - 近期是否有部署变更?
- 依赖 - 外部服务是否受影响?
Output Format
输出格式
markdown
undefinedmarkdown
undefinedDebug Report
调试报告
Issue: [Brief description]
Root Cause: [What's actually wrong]
问题: [简要描述]
根本原因: [实际故障原因]
Evidence
证据
- [Finding 1]
- [Finding 2]
- [发现1]
- [发现2]
Fix
修复方案
[Code or configuration change]
[代码或配置变更内容]
Verification
验证方式
[How to confirm the fix works]
[确认修复有效的方法]
Prevention
预防措施
[How to prevent this in the future]
undefined[未来避免此类问题的方法]
undefinedExamples
示例
Input: "TypeError: Cannot read property 'map' of undefined"
Action: Trace the undefined value, find where data should be initialized, fix the source
Input: "Tests are failing"
Action: Run tests, capture failures, analyze each failure, fix underlying issues
Input:
Action: Search logs for 500 status, find stack traces, identify root cause, check error frequency
/debug --logs "API returning 500 errors"Input:
Action: Run correlation queries to find patterns, identify affected endpoints, correlate with events
/debug --correlate "intermittent failures"输入: "TypeError: Cannot read property 'map' of undefined"
操作: 追踪未定义值的来源,找到数据应初始化的位置,修复问题根源
输入: "Tests are failing"
操作: 运行测试,捕获失败案例,逐个分析失败原因,修复底层问题
输入:
操作: 在日志中搜索500状态码,查找堆栈跟踪,确定根本原因,检查错误发生频率
/debug --logs "API returning 500 errors"输入:
操作: 执行关联查询寻找模式,确定受影响的端点,与事件进行关联分析
/debug --correlate "intermittent failures"