debug

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Debug

调试

Systematic debugging for errors, test failures, unexpected behavior, and production issues.
针对错误、测试失败、异常行为及生产环境问题的系统化调试方法。

Usage

使用方法

/debug [issue] [--logs] [--correlate] [--trace] [--type bug|build|perf|deploy]
/debug [issue] [--logs] [--correlate] [--trace] [--type bug|build|perf|deploy]

Options

选项

FlagPurpose
--logs
Enable log pattern analysis (error spikes, frequency, types)
--correlate
Run SQL correlation queries on structured logs
--trace
Deep stack trace analysis with context
--type
Issue category: bug, build, perf(ormance), deploy(ment)
标记用途
--logs
启用日志模式分析(错误峰值、发生频率、错误类型)
--correlate
对结构化日志执行SQL关联查询
--trace
结合上下文进行深度堆栈跟踪分析
--type
问题分类:bug(程序缺陷)、build(构建问题)、perf(性能问题)、deploy(部署问题)

When to Use

适用场景

  • Error messages or stack traces appear
  • Tests are failing
  • Code behaves unexpectedly
  • User says "it's broken" or "not working"
  • Production errors need investigation (
    --logs
    )
  • Need to correlate errors across systems (
    --correlate
    )
  • Deep stack analysis required (
    --trace
    )
  • 出现错误信息或堆栈跟踪
  • 测试用例执行失败
  • 代码行为不符合预期
  • 用户反馈“功能损坏”或“无法正常工作”
  • 需要排查生产环境错误(使用
    --logs
  • 需要跨系统关联错误(使用
    --correlate
  • 需要深度堆栈分析(使用
    --trace

Debugging Process

调试流程

  1. Capture - Get error message, stack trace, and reproduction steps
  2. Isolate - Narrow down the failure location
  3. Hypothesize - Form theories about the cause
  4. Test - Validate hypotheses with evidence
  5. Fix - Implement minimal fix
  6. Verify - Confirm solution works
  1. 捕获 - 获取错误信息、堆栈跟踪及复现步骤
  2. 隔离 - 缩小故障范围
  3. 假设 - 形成关于故障原因的推测
  4. 验证 - 用证据验证假设
  5. 修复 - 实施最小化修复方案
  6. 确认 - 验证解决方案有效

Investigation Steps

排查步骤

bash
undefined
bash
undefined

Check recent changes that might have caused the issue

检查可能引发问题的近期变更

git log --oneline -10 git diff HEAD~3
git log --oneline -10 git diff HEAD~3

Find error patterns in logs

在日志中查找错误模式

grep -r "error|Error|ERROR" logs/ 2>/dev/null | tail -20
grep -r "error|Error|ERROR" logs/ 2>/dev/null | tail -20

Check test output

查看测试输出

npm test 2>&1 | tail -50 # or pytest, cargo test, etc.
undefined
npm test 2>&1 | tail -50 # 或 pytest、cargo test等
undefined

Log Analysis (
--logs
)

日志分析(
--logs

Find Errors

查找错误

bash
undefined
bash
undefined

Recent errors with context

包含上下文的近期错误

grep -B 5 -A 10 "ERROR" /var/log/app.log
grep -B 5 -A 10 "ERROR" /var/log/app.log

Count by error type

按错误类型统计数量

grep -oE "Error: [^:]*" app.log | sort | uniq -c | sort -rn
grep -oE "Error: [^:]*" app.log | sort | uniq -c | sort -rn

Errors in time range

特定时间范围内的错误

awk '/2024-01-15 14:/ && /ERROR/' app.log
awk '/2024-01-15 14:/ && /ERROR/' app.log

Find repeated errors

查找重复出现的错误

grep "ERROR" app.log | cut -d']' -f2 | sort | uniq -c | sort -rn | head -20
grep "ERROR" app.log | cut -d']' -f2 | sort | uniq -c | sort -rn | head -20

Find error spikes

查找错误峰值

grep "ERROR" app.log | cut -d' ' -f1-2 | uniq -c | sort -rn
undefined
grep "ERROR" app.log | cut -d' ' -f1-2 | uniq -c | sort -rn
undefined

Common Patterns

常见模式

PatternIndicatesAction
NullPointerMissing null checkAdd validation
TimeoutSlow dependencyAdd timeout, retry
Connection refusedService downCheck health, retry
OOMMemory leakProfile, increase limits
Rate limitToo many requestsAdd backoff, queue
模式指示内容处理措施
NullPointer缺失空值检查添加验证逻辑
Timeout依赖服务响应缓慢添加超时机制、重试
Connection refused服务未启动检查服务健康状态、重试
OOM内存泄漏性能分析、调整内存限制
Rate limit请求量过大添加退避策略、请求队列

Correlation Queries (
--correlate
)

关联查询(
--correlate

sql
-- Errors by endpoint
SELECT endpoint, count(*) as errors
FROM logs
WHERE level = 'ERROR' AND time > NOW() - INTERVAL '1 hour'
GROUP BY endpoint ORDER BY errors DESC;

-- Error rate over time
SELECT
  date_trunc('minute', time) as minute,
  count(*) filter (where level = 'ERROR') as errors,
  count(*) as total
FROM logs
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY minute ORDER BY minute;

-- Correlate request IDs across services
SELECT service, message, time
FROM logs
WHERE request_id = 'req-12345'
ORDER BY time;
sql
-- 按端点统计错误
SELECT endpoint, count(*) as errors
FROM logs
WHERE level = 'ERROR' AND time > NOW() - INTERVAL '1 hour'
GROUP BY endpoint ORDER BY errors DESC;

-- 错误率随时间变化趋势
SELECT
  date_trunc('minute', time) as minute,
  count(*) filter (where level = 'ERROR') as errors,
  count(*) as total
FROM logs
WHERE time > NOW() - INTERVAL '1 hour'
GROUP BY minute ORDER BY minute;

-- 跨服务关联请求ID
SELECT service, message, time
FROM logs
WHERE request_id = 'req-12345'
ORDER BY time;

Stack Trace Analysis (
--trace
)

堆栈跟踪分析(
--trace

Parse Stack Traces

解析堆栈跟踪

python
import re

def parse_stack_trace(log_content: str) -> list[dict]:
    pattern = r'(?P<exception>\w+Error|\w+Exception): (?P<message>.*?)\n(?P<trace>(?:\s+at .+\n)+)'
    traces = []
    for match in re.finditer(pattern, log_content):
        traces.append({
            'type': match.group('exception'),
            'message': match.group('message'),
            'trace': match.group('trace').strip().split('\n')
        })
    return traces
python
import re

def parse_stack_trace(log_content: str) -> list[dict]:
    pattern = r'(?P<exception>\w+Error|\w+Exception): (?P<message>.*?)\n(?P<trace>(?:\s+at .+\n)+)'
    traces = []
    for match in re.finditer(pattern, log_content):
        traces.append({
            'type': match.group('exception'),
            'message': match.group('message'),
            'trace': match.group('trace').strip().split('\n')
        })
    return traces

Investigation Checklist

排查清单

  1. Capture - Get full error message and stack trace
  2. Timestamp - When did it start?
  3. Frequency - How often? Increasing?
  4. Scope - All users or specific?
  5. Changes - Recent deployments?
  6. Dependencies - External services affected?
  1. 捕获 - 获取完整错误信息及堆栈跟踪
  2. 时间戳 - 故障首次出现的时间?
  3. 频率 - 发生频率?是否在上升?
  4. 范围 - 影响所有用户还是特定用户?
  5. 变更 - 近期是否有部署变更?
  6. 依赖 - 外部服务是否受影响?

Output Format

输出格式

markdown
undefined
markdown
undefined

Debug Report

调试报告

Issue: [Brief description] Root Cause: [What's actually wrong]
问题: [简要描述] 根本原因: [实际故障原因]

Evidence

证据

  • [Finding 1]
  • [Finding 2]
  • [发现1]
  • [发现2]

Fix

修复方案

[Code or configuration change]
[代码或配置变更内容]

Verification

验证方式

[How to confirm the fix works]
[确认修复有效的方法]

Prevention

预防措施

[How to prevent this in the future]
undefined
[未来避免此类问题的方法]
undefined

Examples

示例

Input: "TypeError: Cannot read property 'map' of undefined" Action: Trace the undefined value, find where data should be initialized, fix the source
Input: "Tests are failing" Action: Run tests, capture failures, analyze each failure, fix underlying issues
Input:
/debug --logs "API returning 500 errors"
Action: Search logs for 500 status, find stack traces, identify root cause, check error frequency
Input:
/debug --correlate "intermittent failures"
Action: Run correlation queries to find patterns, identify affected endpoints, correlate with events
输入: "TypeError: Cannot read property 'map' of undefined" 操作: 追踪未定义值的来源,找到数据应初始化的位置,修复问题根源
输入: "Tests are failing" 操作: 运行测试,捕获失败案例,逐个分析失败原因,修复底层问题
输入:
/debug --logs "API returning 500 errors"
操作: 在日志中搜索500状态码,查找堆栈跟踪,确定根本原因,检查错误发生频率
输入:
/debug --correlate "intermittent failures"
操作: 执行关联查询寻找模式,确定受影响的端点,与事件进行关联分析