Performance Regression Debugging

性能回归调试

Overview

概述

Performance regressions occur when code changes degrade application performance. Detection and quick resolution are critical.

性能回归是指代码变更导致应用性能下降的情况。及时检测和解决这类问题至关重要。

When to Use

使用场景

After deployment performance degrades
Metrics show negative trend
User complaints about slowness
A/B testing shows variance
Regular performance monitoring

部署后性能下降
指标呈现负面趋势
用户反馈应用卡顿
A/B测试显示性能差异
常规性能监控

Instructions

操作步骤

1. Detection & Measurement

1. 检测与度量

javascript

// Before: 500ms response time
// After: 1000ms response time (2x slower = regression)

// Capture baseline metrics
const baseline = {
  responseTime: 500,  // ms
  timeToInteractive: 2000,  // ms
  largestContentfulPaint: 1500,  // ms
  memoryUsage: 50,  // MB
  bundleSize: 150  // KB gzipped
};

// Monitor after change
const current = {
  responseTime: 1000,
  timeToInteractive: 4000,
  largestContentfulPaint: 3000,
  memoryUsage: 150,
  bundleSize: 200
};

// Calculate regression
const regressions = {};
for (let metric in baseline) {
  const change = (current[metric] - baseline[metric]) / baseline[metric];
  if (change > 0.1) {  // >10% degradation
    regressions[metric] = {
      baseline: baseline[metric],
      current: current[metric],
      percentChange: (change * 100).toFixed(1) + '%',
      severity: change > 0.5 ? 'Critical' : 'High'
    };
  }
}

// Results:
// responseTime: 500ms → 1000ms (100% slower = CRITICAL)
// largestContentfulPaint: 1500ms → 3000ms (100% slower = CRITICAL)

javascript

// Before: 500ms response time
// After: 1000ms response time (2x slower = regression)

// Capture baseline metrics
const baseline = {
  responseTime: 500,  // ms
  timeToInteractive: 2000,  // ms
  largestContentfulPaint: 1500,  // ms
  memoryUsage: 50,  // MB
  bundleSize: 150  // KB gzipped
};

// Monitor after change
const current = {
  responseTime: 1000,
  timeToInteractive: 4000,
  largestContentfulPaint: 3000,
  memoryUsage: 150,
  bundleSize: 200
};

// Calculate regression
const regressions = {};
for (let metric in baseline) {
  const change = (current[metric] - baseline[metric]) / baseline[metric];
  if (change > 0.1) {  // >10% degradation
    regressions[metric] = {
      baseline: baseline[metric],
      current: current[metric],
      percentChange: (change * 100).toFixed(1) + '%',
      severity: change > 0.5 ? 'Critical' : 'High'
    };
  }
}

// Results:
// responseTime: 500ms → 1000ms (100% slower = CRITICAL)
// largestContentfulPaint: 1500ms → 3000ms (100% slower = CRITICAL)

2. Root Cause Identification

2. 根因定位

yaml

Systematic Search:

Step 1: Identify Changed Code
  - Check git commits between versions
  - Review code review comments
  - Identify risky changes
  - Prioritize by likelyhood

Step 2: Binary Search (Bisect)
  - Start with suspected change
  - Disable the change
  - Re-measure performance
  - If improves → this is the issue
  - If not → disable other changes

  git bisect start
  git bisect bad HEAD
  git bisect good v1.0.0
  # Test each commit

Step 3: Profile the Change
  - Run profiler on old vs new code
  - Compare flame graphs
  - Identify expensive functions
  - Check allocation patterns

Step 4: Analyze Impact
  - Code review the change
  - Understand what changed
  - Check for O(n²) algorithms
  - Look for new database queries
  - Check for missing indexes

---

Common Regressions:

N+1 Query:
  Before: 1 query (10ms)
  After: 1000 queries (1000ms)
  Caused: Removed JOIN, now looping
  Fix: Restore JOIN or use eager loading

Missing Index:
  Before: Index Scan (10ms)
  After: Seq Scan (500ms)
  Caused: New filter column, no index
  Fix: Add index

Memory Leak:
  Before: 50MB memory
  After: 500MB after 1 hour
  Caused: Listener not removed, cache grows
  Fix: Clean up properly

Bundle Size:
  Before: 150KB gzipped
  After: 250KB gzipped
  Caused: Added library without tree-shaking
  Fix: Use lighter alternative or split

Algorithm Efficiency:
  Before: O(n) = 1ms for 1000 items
  After: O(n²) = 1000ms for 1000 items
  Caused: Nested loops added
  Fix: Use better algorithm

yaml

系统化排查:

步骤1：识别变更的代码
  - 检查不同版本间的git提交记录
  - 查看代码评审评论
  - 识别高风险变更
  - 按可能性优先级排序

步骤2：二分查找（Bisect）
  - 从疑似有问题的变更开始
  - 禁用该变更
  - 重新度量性能
  - 如果性能提升 → 该变更就是问题根源
  - 如果没有 → 禁用其他变更

  git bisect start
  git bisect bad HEAD
  git bisect good v1.0.0
  # 测试每个提交

步骤3：对变更进行性能分析
  - 在旧代码和新代码上运行性能分析工具
  - 对比火焰图
  - 定位耗时较高的函数
  - 检查内存分配模式

步骤4：分析影响
  - 对变更进行代码评审
  - 理解具体变更内容
  - 检查是否存在O(n²)复杂度的算法
  - 查找新增的数据库查询
  - 检查是否缺少索引

---

常见的性能回归场景:

N+1查询:
  之前: 1次查询（10ms）
  之后: 1000次查询（1000ms）
  原因: 移除了JOIN操作，改为循环查询
  修复: 恢复JOIN或使用预加载

缺失索引:
  之前: 索引扫描（10ms）
  之后: 全表扫描（500ms）
  原因: 新增了过滤字段但未创建索引
  修复: 添加索引

内存泄漏:
  之前: 内存占用50MB
  之后: 1小时后占用500MB
  原因: 未移除监听器，缓存持续增长
  修复: 正确清理资源

包体积增大:
  之前: 压缩后150KB
  之后: 压缩后250KB
  原因: 添加了未开启tree-shaking的库
  修复: 使用更轻量的替代库或拆分代码

算法效率降低:
  之前: O(n)复杂度，处理1000条数据耗时1ms
  之后: O(n²)复杂度，处理1000条数据耗时1000ms
  原因: 添加了嵌套循环
  修复: 使用更高效的算法

3. Fixing & Verification

3. 修复与验证

yaml

Fix Process:

1. Understand the Problem
  - Profile and identify exactly what's slow
  - Measure impact quantitatively
  - Understand root cause

2. Implement Fix
  - Make minimal changes
  - Don't introduce new issues
  - Test locally first
  - Measure improvement

3. Verify Fix
  - Run same measurement
  - Check regression gone
  - Ensure no new issues
  - Compare metrics

  Before regression: 500ms
  After regression: 1000ms
  After fix: 550ms (acceptable, minor overhead)

4. Prevent Recurrence
  - Add performance test
  - Set performance budget
  - Alert on regressions
  - Code review for perf

yaml

修复流程:

1. 理解问题
  - 通过性能分析准确定位慢代码
  - 量化度量影响程度
  - 理解问题根源

2. 实施修复
  - 做最小化变更
  - 不要引入新问题
  - 先在本地测试
  - 度量性能提升情况

3. 验证修复效果
  - 运行相同的度量测试
  - 确认性能回归已解决
  - 确保未引入新问题
  - 对比指标数据

  回归前: 500ms
  回归后: 1000ms
  修复后: 550ms（可接受，存在轻微开销）

4. 预防复发
  - 添加性能测试
  - 设置性能预算
  - 对性能回归设置告警
  - 代码评审时关注性能

4. Prevention Measures

4. 预防措施

yaml

Performance Testing:

Baseline Testing:
  - Establish baseline metrics
  - Record for each release
  - Track trends over time
  - Alert on degradation

Load Testing:
  - Test with realistic load
  - Measure under stress
  - Identify bottlenecks
  - Catch regressions

Performance Budgets:
  - Set max bundle size
  - Set max response time
  - Set max LCP/FCP
  - Enforce in CI/CD

Monitoring:
  - Track real user metrics
  - Alert on degradation
  - Compare releases
  - Analyze trends

---

Checklist:

[ ] Baseline metrics established
[ ] Regression detected and measured
[ ] Changed code identified
[ ] Root cause found (code, data, infra)
[ ] Fix implemented
[ ] Fix verified
[ ] No new issues introduced
[ ] Performance test added
[ ] Budget set
[ ] Monitoring updated
[ ] Team notified
[ ] Prevention measures in place

yaml

性能测试:

基准测试:
  - 建立基准指标
  - 为每个版本记录指标
  - 跟踪长期趋势
  - 性能下降时触发告警

负载测试:
  - 使用真实负载进行测试
  - 在压力下度量性能
  - 定位瓶颈
  - 发现性能回归

性能预算:
  - 设置最大包体积
  - 设置最大响应时间
  - 设置最大LCP/FCP值
  - 在CI/CD中强制执行

监控:
  - 跟踪真实用户指标
  - 性能下降时触发告警
  - 对比不同版本的性能
  - 分析趋势

---

检查清单:

[ ] 已建立基准指标
[ ] 已检测并度量性能回归
[ ] 已识别变更的代码
[ ] 已找到根因（代码、数据、基础设施）
[ ] 已实施修复
[ ] 已验证修复效果
[ ] 未引入新问题
[ ] 已添加性能测试
[ ] 已设置性能预算
[ ] 已更新监控配置
[ ] 已通知团队
[ ] 已落实预防措施

Key Points

核心要点

Establish baseline metrics for comparison
Use binary search to find culprit commits
Profile to identify exact bottleneck
Measure before/after fix
Add performance regression tests
Set and enforce performance budgets
Monitor production metrics
Alert on significant degradation
Document root cause
Prevent through code review

建立用于对比的基准指标
使用二分查找定位有问题的提交
通过性能分析确定确切的瓶颈
修复前后都要进行度量
添加性能回归测试
设置并强制执行性能预算
监控生产环境指标
性能显著下降时触发告警
记录问题根因
通过代码评审预防问题

performance-regression-debugging

Original

Translation

Performance Regression Debugging

性能回归调试

Overview

概述

When to Use

使用场景

Instructions

操作步骤

1. Detection & Measurement

1. 检测与度量

2. Root Cause Identification

2. 根因定位

3. Fixing & Verification

3. 修复与验证

4. Prevention Measures

4. 预防措施

Key Points

核心要点