performance-regression-debugging

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Performance Regression Debugging

性能回归调试

Overview

概述

Performance regressions occur when code changes degrade application performance. Detection and quick resolution are critical.
性能回归是指代码变更导致应用性能下降的情况。及时检测和解决这类问题至关重要。

When to Use

使用场景

  • After deployment performance degrades
  • Metrics show negative trend
  • User complaints about slowness
  • A/B testing shows variance
  • Regular performance monitoring
  • 部署后性能下降
  • 指标呈现负面趋势
  • 用户反馈应用卡顿
  • A/B测试显示性能差异
  • 常规性能监控

Instructions

操作步骤

1. Detection & Measurement

1. 检测与度量

javascript
// Before: 500ms response time
// After: 1000ms response time (2x slower = regression)

// Capture baseline metrics
const baseline = {
  responseTime: 500,  // ms
  timeToInteractive: 2000,  // ms
  largestContentfulPaint: 1500,  // ms
  memoryUsage: 50,  // MB
  bundleSize: 150  // KB gzipped
};

// Monitor after change
const current = {
  responseTime: 1000,
  timeToInteractive: 4000,
  largestContentfulPaint: 3000,
  memoryUsage: 150,
  bundleSize: 200
};

// Calculate regression
const regressions = {};
for (let metric in baseline) {
  const change = (current[metric] - baseline[metric]) / baseline[metric];
  if (change > 0.1) {  // >10% degradation
    regressions[metric] = {
      baseline: baseline[metric],
      current: current[metric],
      percentChange: (change * 100).toFixed(1) + '%',
      severity: change > 0.5 ? 'Critical' : 'High'
    };
  }
}

// Results:
// responseTime: 500ms → 1000ms (100% slower = CRITICAL)
// largestContentfulPaint: 1500ms → 3000ms (100% slower = CRITICAL)
javascript
// Before: 500ms response time
// After: 1000ms response time (2x slower = regression)

// Capture baseline metrics
const baseline = {
  responseTime: 500,  // ms
  timeToInteractive: 2000,  // ms
  largestContentfulPaint: 1500,  // ms
  memoryUsage: 50,  // MB
  bundleSize: 150  // KB gzipped
};

// Monitor after change
const current = {
  responseTime: 1000,
  timeToInteractive: 4000,
  largestContentfulPaint: 3000,
  memoryUsage: 150,
  bundleSize: 200
};

// Calculate regression
const regressions = {};
for (let metric in baseline) {
  const change = (current[metric] - baseline[metric]) / baseline[metric];
  if (change > 0.1) {  // >10% degradation
    regressions[metric] = {
      baseline: baseline[metric],
      current: current[metric],
      percentChange: (change * 100).toFixed(1) + '%',
      severity: change > 0.5 ? 'Critical' : 'High'
    };
  }
}

// Results:
// responseTime: 500ms → 1000ms (100% slower = CRITICAL)
// largestContentfulPaint: 1500ms → 3000ms (100% slower = CRITICAL)

2. Root Cause Identification

2. 根因定位

yaml
Systematic Search:

Step 1: Identify Changed Code
  - Check git commits between versions
  - Review code review comments
  - Identify risky changes
  - Prioritize by likelyhood

Step 2: Binary Search (Bisect)
  - Start with suspected change
  - Disable the change
  - Re-measure performance
  - If improves → this is the issue
  - If not → disable other changes

  git bisect start
  git bisect bad HEAD
  git bisect good v1.0.0
  # Test each commit

Step 3: Profile the Change
  - Run profiler on old vs new code
  - Compare flame graphs
  - Identify expensive functions
  - Check allocation patterns

Step 4: Analyze Impact
  - Code review the change
  - Understand what changed
  - Check for O(n²) algorithms
  - Look for new database queries
  - Check for missing indexes

---

Common Regressions:

N+1 Query:
  Before: 1 query (10ms)
  After: 1000 queries (1000ms)
  Caused: Removed JOIN, now looping
  Fix: Restore JOIN or use eager loading

Missing Index:
  Before: Index Scan (10ms)
  After: Seq Scan (500ms)
  Caused: New filter column, no index
  Fix: Add index

Memory Leak:
  Before: 50MB memory
  After: 500MB after 1 hour
  Caused: Listener not removed, cache grows
  Fix: Clean up properly

Bundle Size:
  Before: 150KB gzipped
  After: 250KB gzipped
  Caused: Added library without tree-shaking
  Fix: Use lighter alternative or split

Algorithm Efficiency:
  Before: O(n) = 1ms for 1000 items
  After: O(n²) = 1000ms for 1000 items
  Caused: Nested loops added
  Fix: Use better algorithm
yaml
系统化排查:

步骤1:识别变更的代码
  - 检查不同版本间的git提交记录
  - 查看代码评审评论
  - 识别高风险变更
  - 按可能性优先级排序

步骤2:二分查找(Bisect)
  - 从疑似有问题的变更开始
  - 禁用该变更
  - 重新度量性能
  - 如果性能提升 → 该变更就是问题根源
  - 如果没有 → 禁用其他变更

  git bisect start
  git bisect bad HEAD
  git bisect good v1.0.0
  # 测试每个提交

步骤3:对变更进行性能分析
  - 在旧代码和新代码上运行性能分析工具
  - 对比火焰图
  - 定位耗时较高的函数
  - 检查内存分配模式

步骤4:分析影响
  - 对变更进行代码评审
  - 理解具体变更内容
  - 检查是否存在O(n²)复杂度的算法
  - 查找新增的数据库查询
  - 检查是否缺少索引

---

常见的性能回归场景:

N+1查询:
  之前: 1次查询(10ms)
  之后: 1000次查询(1000ms)
  原因: 移除了JOIN操作,改为循环查询
  修复: 恢复JOIN或使用预加载

缺失索引:
  之前: 索引扫描(10ms)
  之后: 全表扫描(500ms)
  原因: 新增了过滤字段但未创建索引
  修复: 添加索引

内存泄漏:
  之前: 内存占用50MB
  之后: 1小时后占用500MB
  原因: 未移除监听器,缓存持续增长
  修复: 正确清理资源

包体积增大:
  之前: 压缩后150KB
  之后: 压缩后250KB
  原因: 添加了未开启tree-shaking的库
  修复: 使用更轻量的替代库或拆分代码

算法效率降低:
  之前: O(n)复杂度,处理1000条数据耗时1ms
  之后: O(n²)复杂度,处理1000条数据耗时1000ms
  原因: 添加了嵌套循环
  修复: 使用更高效的算法

3. Fixing & Verification

3. 修复与验证

yaml
Fix Process:

1. Understand the Problem
  - Profile and identify exactly what's slow
  - Measure impact quantitatively
  - Understand root cause

2. Implement Fix
  - Make minimal changes
  - Don't introduce new issues
  - Test locally first
  - Measure improvement

3. Verify Fix
  - Run same measurement
  - Check regression gone
  - Ensure no new issues
  - Compare metrics

  Before regression: 500ms
  After regression: 1000ms
  After fix: 550ms (acceptable, minor overhead)

4. Prevent Recurrence
  - Add performance test
  - Set performance budget
  - Alert on regressions
  - Code review for perf
yaml
修复流程:

1. 理解问题
  - 通过性能分析准确定位慢代码
  - 量化度量影响程度
  - 理解问题根源

2. 实施修复
  - 做最小化变更
  - 不要引入新问题
  - 先在本地测试
  - 度量性能提升情况

3. 验证修复效果
  - 运行相同的度量测试
  - 确认性能回归已解决
  - 确保未引入新问题
  - 对比指标数据

  回归前: 500ms
  回归后: 1000ms
  修复后: 550ms(可接受,存在轻微开销)

4. 预防复发
  - 添加性能测试
  - 设置性能预算
  - 对性能回归设置告警
  - 代码评审时关注性能

4. Prevention Measures

4. 预防措施

yaml
Performance Testing:

Baseline Testing:
  - Establish baseline metrics
  - Record for each release
  - Track trends over time
  - Alert on degradation

Load Testing:
  - Test with realistic load
  - Measure under stress
  - Identify bottlenecks
  - Catch regressions

Performance Budgets:
  - Set max bundle size
  - Set max response time
  - Set max LCP/FCP
  - Enforce in CI/CD

Monitoring:
  - Track real user metrics
  - Alert on degradation
  - Compare releases
  - Analyze trends

---

Checklist:

[ ] Baseline metrics established
[ ] Regression detected and measured
[ ] Changed code identified
[ ] Root cause found (code, data, infra)
[ ] Fix implemented
[ ] Fix verified
[ ] No new issues introduced
[ ] Performance test added
[ ] Budget set
[ ] Monitoring updated
[ ] Team notified
[ ] Prevention measures in place
yaml
性能测试:

基准测试:
  - 建立基准指标
  - 为每个版本记录指标
  - 跟踪长期趋势
  - 性能下降时触发告警

负载测试:
  - 使用真实负载进行测试
  - 在压力下度量性能
  - 定位瓶颈
  - 发现性能回归

性能预算:
  - 设置最大包体积
  - 设置最大响应时间
  - 设置最大LCP/FCP值
  - 在CI/CD中强制执行

监控:
  - 跟踪真实用户指标
  - 性能下降时触发告警
  - 对比不同版本的性能
  - 分析趋势

---

检查清单:

[ ] 已建立基准指标
[ ] 已检测并度量性能回归
[ ] 已识别变更的代码
[ ] 已找到根因(代码、数据、基础设施)
[ ] 已实施修复
[ ] 已验证修复效果
[ ] 未引入新问题
[ ] 已添加性能测试
[ ] 已设置性能预算
[ ] 已更新监控配置
[ ] 已通知团队
[ ] 已落实预防措施

Key Points

核心要点

  • Establish baseline metrics for comparison
  • Use binary search to find culprit commits
  • Profile to identify exact bottleneck
  • Measure before/after fix
  • Add performance regression tests
  • Set and enforce performance budgets
  • Monitor production metrics
  • Alert on significant degradation
  • Document root cause
  • Prevent through code review
  • 建立用于对比的基准指标
  • 使用二分查找定位有问题的提交
  • 通过性能分析确定确切的瓶颈
  • 修复前后都要进行度量
  • 添加性能回归测试
  • 设置并强制执行性能预算
  • 监控生产环境指标
  • 性能显著下降时触发告警
  • 记录问题根因
  • 通过代码评审预防问题