performance-profiling
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePerformance Profiling
性能剖析
Find where your application actually spends time before touching a line of code. Covers the full stack: Node.js CPU and memory profiling, browser flame graphs, React render profiling, and database query analysis. The discipline here is profile first, optimize second — premature optimization is not a workflow, it is a guess.
在修改任何代码之前,先找出应用程序的时间消耗点。涵盖全栈场景:Node.js CPU与内存剖析、浏览器火焰图、React渲染剖析、数据库查询分析。遵循的原则是先剖析,后优化——过早优化不是合理的工作流程,而是主观猜测。
When to Use
适用场景
Use for:
- Diagnosing slow Node.js applications (CPU-bound, I/O-bound, memory pressure)
- Generating and reading flame graphs to find hot code paths
- Detecting memory leaks via heap snapshots and growth trends
- Profiling React component render performance with React Profiler
- Measuring browser rendering performance (Core Web Vitals, layout thrashing, long tasks)
- Database query profiling with EXPLAIN ANALYZE
- Measuring event loop utilization and latency
NOT for:
- Infrastructure monitoring, distributed tracing, or log aggregation (use )
logging-observability - Load testing and capacity planning (a separate domain)
- Network latency analysis between services (use distributed tracing tools)
- Database schema design optimization (separate from query profiling)
适用场景:
- 诊断运行缓慢的Node.js应用(CPU密集型、I/O密集型、内存压力问题)
- 生成并解读火焰图,找出热点代码路径
- 通过堆快照和内存增长趋势检测内存泄漏
- 使用React Profiler剖析React组件渲染性能
- 测量浏览器渲染性能(Core Web Vitals、布局抖动、长任务)
- 使用EXPLAIN ANALYZE进行数据库查询剖析
- 测量事件循环利用率与延迟
不适用场景:
- 基础设施监控、分布式追踪或日志聚合(请使用工具)
logging-observability - 负载测试与容量规划(独立领域)
- 服务间网络延迟分析(请使用分布式追踪工具)
- 数据库架构设计优化(与查询剖析为独立领域)
Core Decision: Where Is My App Slow?
核心决策:应用卡顿点在哪里?
mermaid
flowchart TD
Start[App is slow. Where?] --> Layer{Which layer?}
Layer -->|Backend| Backend{What kind?}
Layer -->|Frontend/browser| Browser{What symptom?}
Layer -->|Unknown| Measure[Instrument first — add timing logs]
Backend -->|CPU pegged, slow responses| CPU[CPU Profiling]
Backend -->|Memory growing, crashes| Mem[Memory / Heap Profiling]
Backend -->|Fast CPU, slow I/O| IO{I/O type?}
IO -->|Database queries| DB[EXPLAIN ANALYZE + query profiler]
IO -->|Network calls| Network[Trace external calls, add timeouts]
IO -->|File system| FS[Check event loop utilization]
Browser -->|Slow initial load| Lighthouse[Lighthouse + bundle analysis]
Browser -->|Janky scrolling, animations| Rendering[Chrome Performance tab — layout thrashing]
Browser -->|Slow after interaction| React{React app?}
React -->|Yes| ReactProfiler[React Profiler + why-did-you-render]
React -->|No| JS[Chrome Performance — long tasks, main thread blocking]
CPU --> FlameGraph[Generate flame graph with 0x or clinic flame]
Mem --> HeapSnap[Take heap snapshots before/after suspected leak]
FS --> ELU[clinic bubbles — event loop utilization]mermaid
flowchart TD
Start[App is slow. Where?] --> Layer{Which layer?}
Layer -->|Backend| Backend{What kind?}
Layer -->|Frontend/browser| Browser{What symptom?}
Layer -->|Unknown| Measure[Instrument first — add timing logs]
Backend -->|CPU pegged, slow responses| CPU[CPU Profiling]
Backend -->|Memory growing, crashes| Mem[Memory / Heap Profiling]
Backend -->|Fast CPU, slow I/O| IO{I/O type?}
IO -->|Database queries| DB[EXPLAIN ANALYZE + query profiler]
IO -->|Network calls| Network[Trace external calls, add timeouts]
IO -->|File system| FS[Check event loop utilization]
Browser -->|Slow initial load| Lighthouse[Lighthouse + bundle analysis]
Browser -->|Janky scrolling, animations| Rendering[Chrome Performance tab — layout thrashing]
Browser -->|Slow after interaction| React{React app?}
React -->|Yes| ReactProfiler[React Profiler + why-did-you-render]
React -->|No| JS[Chrome Performance — long tasks, main thread blocking]
CPU --> FlameGraph[Generate flame graph with 0x or clinic flame]
Mem --> HeapSnap[Take heap snapshots before/after suspected leak]
FS --> ELU[clinic bubbles — event loop utilization]Node.js: CPU Profiling
Node.js: CPU性能剖析
V8 Inspector (Built-in)
V8 Inspector(内置工具)
bash
undefinedbash
undefinedAttach inspector and capture a CPU profile
Attach inspector and capture a CPU profile
node --inspect src/index.js
node --inspect src/index.js
Or start paused and wait for DevTools
Or start paused and wait for DevTools
node --inspect-brk src/index.js
Then open `chrome://inspect` in Chrome, click the target, go to the **Profiler** tab, and record while sending load to the server.node --inspect-brk src/index.js
随后在Chrome中打开`chrome://inspect`,点击目标进程,进入**Profiler**标签页,在向服务器发送负载的同时开始录制。0x: Flame Graphs from the Terminal
0x: 终端生成火焰图
bash
npm install -g 0xbash
npm install -g 0xProfile a script (runs it, generates flame graph)
Profile a script (runs it, generates flame graph)
0x -- node src/index.js
0x -- node src/index.js
Profile with a load generator running simultaneously
Profile with a load generator running simultaneously
0x -- node src/server.js &
npx autocannon -d 30 http://localhost:3000/api/heavy
0x generates an interactive HTML flame graph. The **widest stacks** are where time is spent. Look for:
- Functions that appear wide near the bottom (called frequently by everything)
- Unexpected width in library code (serialization, template engines, parsers)
- Idle / `[idle]` blocks — I/O wait, not CPU (look elsewhere for those)0x -- node src/server.js &
npx autocannon -d 30 http://localhost:3000/api/heavy
0x会生成交互式HTML火焰图。**最宽的调用栈**就是时间消耗最多的地方。重点关注:
- 底部出现的宽幅函数(被所有代码频繁调用)
- 库代码中意外的宽幅区域(序列化、模板引擎、解析器)
- 空闲/`[idle]`块——表示I/O等待,而非CPU问题(需排查其他环节)Clinic.js Suite
Clinic.js工具套件
bash
npm install -g clinicbash
npm install -g clinicDoctor: overview of what is wrong
Doctor: 问题概览
clinic doctor -- node src/server.js
clinic doctor -- node src/server.js
Flame: CPU flame graph (wraps 0x)
Flame: CPU火焰图(基于0x封装)
clinic flame -- node src/server.js
clinic flame -- node src/server.js
Bubbles: event loop utilization
Bubbles: 事件循环利用率
clinic bubbles -- node src/server.js
Clinic Doctor gives you a triage view: CPU, memory, event loop, and handles. Start here when you do not know what kind of bottleneck you have.clinic bubbles -- node src/server.js
Clinic Doctor提供分类视图:CPU、内存、事件循环与句柄。当你不确定瓶颈类型时,从这里开始排查。Event Loop Utilization (ELU)
事件循环利用率(ELU)
js
const { performance } = require('perf_hooks');
// Sample ELU every 5 seconds
let last = performance.eventLoopUtilization();
setInterval(() => {
const current = performance.eventLoopUtilization();
const diff = performance.eventLoopUtilization(current, last);
console.log(`ELU: ${(diff.utilization * 100).toFixed(1)}%`);
last = current;
}, 5000);ELU above 80% means the event loop is saturated — CPU-bound work or sync blocking. ELU near 0% with slow responses means I/O wait (network, disk, database).
js
const { performance } = require('perf_hooks');
// Sample ELU every 5 seconds
let last = performance.eventLoopUtilization();
setInterval(() => {
const current = performance.eventLoopUtilization();
const diff = performance.eventLoopUtilization(current, last);
console.log(`ELU: ${(diff.utilization * 100).toFixed(1)}%`);
last = current;
}, 5000);ELU超过80%表示事件循环饱和——存在CPU密集型工作或同步阻塞操作。ELU接近0%但响应缓慢表示I/O等待(网络、磁盘、数据库)。
Node.js: Memory Profiling
Node.js: 内存性能剖析
Heap Snapshots
堆快照
bash
undefinedbash
undefinedTake heap snapshot via CLI
Take heap snapshot via CLI
node --inspect src/index.js
node --inspect src/index.js
In chrome://inspect → Memory tab → Take Heap Snapshot
In chrome://inspect → Memory tab → Take Heap Snapshot
**Three-snapshot technique for leak detection**:
1. Snapshot after startup (baseline)
2. Snapshot after N requests (warm)
3. Snapshot after 2N requests (growth)
Compare Snapshot 3 to Snapshot 2 — objects that grew proportionally to request count are leaking.
**内存泄漏检测的三快照技术**:
1. 启动后拍摄快照(基准线)
2. N次请求后拍摄快照(预热后)
3. 2N次请求后拍摄快照(增长后)
对比快照3与快照2——与请求数量成比例增长的对象即为泄漏对象。Common Leak Patterns
常见泄漏模式
Closure captures — Variables captured in long-lived closures that should have been released:
js
// LEAK: handler is registered but never removed
emitter.on('data', (chunk) => {
processedData.push(chunk); // processedData grows unbounded
});
// FIX: remove listener when done, or use once()
emitter.once('data', handler);
// or
const handler = (chunk) => { ... };
emitter.on('data', handler);
// later:
emitter.off('data', handler);Growing caches without eviction:
js
// LEAK: cache grows forever
const cache = new Map();
app.get('/user/:id', async (req, res) => {
if (!cache.has(req.params.id)) {
cache.set(req.params.id, await db.getUser(req.params.id));
}
res.json(cache.get(req.params.id));
});
// FIX: use LRU cache with max size
const LRU = require('lru-cache');
const cache = new LRU({ max: 1000, ttl: 1000 * 60 * 5 });WeakRef and FinalizationRegistry (for intentional weak references):
js
const cache = new Map();
function cacheValue(key, obj) {
const ref = new WeakRef(obj);
const registry = new FinalizationRegistry((k) => cache.delete(k));
registry.register(obj, key);
cache.set(key, ref);
}闭包捕获——长期存在的闭包捕获了本应被释放的变量:
js
// LEAK: 注册了处理函数但从未移除
emitter.on('data', (chunk) => {
processedData.push(chunk); // processedData无限增长
});
// FIX: 完成后移除监听器,或使用once()
emitter.once('data', handler);
// 或者
const handler = (chunk) => { ... };
emitter.on('data', handler);
// 后续操作:
emitter.off('data', handler);无淘汰机制的缓存增长:
js
// LEAK: 缓存无限增长
const cache = new Map();
app.get('/user/:id', async (req, res) => {
if (!cache.has(req.params.id)) {
cache.set(req.params.id, await db.getUser(req.params.id));
}
res.json(cache.get(req.params.id));
});
// FIX: 使用带最大容量的LRU缓存
const LRU = require('lru-cache');
const cache = new LRU({ max: 1000, ttl: 1000 * 60 * 5 });WeakRef与FinalizationRegistry(用于有意的弱引用):
js
const cache = new Map();
function cacheValue(key, obj) {
const ref = new WeakRef(obj);
const registry = new FinalizationRegistry((k) => cache.delete(k));
registry.register(obj, key);
cache.set(key, ref);
}Anti-Pattern: Optimizing Without Profiling
反模式:未剖析就优化
Novice: "This function looks expensive, I'll rewrite it in a more efficient algorithm."
Expert: Rewrote the wrong function. Profiling would have shown that this function is called once per startup and contributes 0.1% of runtime. The actual bottleneck was JSON serialization in the response handler, called 10,000 times per second. Optimization effort must follow measurement, never intuition.
Detection: The "optimized" code is measurably faster in microbenchmark isolation but production p99 latency is unchanged.
新手: "这个函数看起来开销很大,我要用更高效的算法重写它。"
专家: 重写了错误的函数。剖析会显示该函数仅在启动时调用一次,占运行时间的0.1%。真正的瓶颈是响应处理中的JSON序列化,每秒被调用10000次。优化工作必须基于测量,而非直觉。
识别标志: "优化后的代码"在微基准测试中速度更快,但生产环境的p99延迟毫无变化。
Anti-Pattern: Micro-Benchmarking in Isolation
反模式:孤立微基准测试
Novice: Writes a benchmark comparing two sorting algorithms on an array of 1000 items, concludes Algorithm B is 2x faster, rewrites production code.
Expert: Micro-benchmarks measure JIT-compiled hot paths under artificial conditions. Real workloads have different data shapes, mixed call patterns, GC pressure, and I/O interspersed. The JIT may optimize the benchmark differently than the real call site. Profile the actual application under real load — or at minimum, profile with realistic data shapes and call patterns embedded in the actual application code path.
The test: Does your benchmark run in a tight loop 10,000 times before measuring? If yes, V8 has JIT-compiled it differently than it will compile the real code, which runs cold at startup and is called with varied inputs.
新手: 编写基准测试对比两种排序算法处理1000条数据的性能,得出算法B快2倍的结论,随后重写生产代码。
专家: 微基准测试测量的是人工条件下JIT编译的热点路径。真实工作负载的数据形态、调用模式、GC压力与I/O交互都不同。JIT对基准测试的优化方式可能与真实调用位点完全不同。请在真实负载下剖析实际应用——至少要在实际应用代码路径中嵌入真实数据形态与调用模式进行剖析。
测试标准: 你的基准测试是否在测量前循环执行10000次?如果是,V8对其的JIT编译方式会与真实代码不同——真实代码启动时是冷态,且输入多样。
React Rendering Performance
React渲染性能
React Profiler (DevTools)
React Profiler(DevTools)
- Open React DevTools → Profiler tab
- Click "Record"
- Perform the slow interaction
- Stop recording
- Examine the flame chart — bars represent components, width represents render time
Key columns: "Why did this render?" shows which prop or state change triggered each render.
- 打开React DevTools → Profiler标签页
- 点击"Record"
- 执行缓慢的交互操作
- 停止录制
- 查看火焰图——条形代表组件,宽度代表渲染时间
关键列:**"Why did this render?"**显示触发组件渲染的props或state变化原因。
why-did-you-render
why-did-you-render
bash
npm install @welldone-software/why-did-you-renderjs
// src/wdyr.js (import before React)
import React from 'react';
if (process.env.NODE_ENV === 'development') {
const whyDidYouRender = require('@welldone-software/why-did-you-render');
whyDidYouRender(React, { trackAllPureComponents: true });
}js
// Mark a specific component for tracking
MyExpensiveComponent.whyDidYouRender = true;This logs to the console every time a component re-renders with the same props — exposing unnecessary renders caused by reference equality failures.
bash
npm install @welldone-software/why-did-you-renderjs
// src/wdyr.js(需在React之前导入)
import React from 'react';
if (process.env.NODE_ENV === 'development') {
const whyDidYouRender = require('@welldone-software/why-did-you-render');
whyDidYouRender(React, { trackAllPureComponents: true });
}js
// 标记特定组件进行追踪
MyExpensiveComponent.whyDidYouRender = true;该工具会在控制台记录组件使用相同props重复渲染的情况——暴露因引用相等性失败导致的不必要渲染。
Common React Performance Patterns
常见React性能优化模式
js
// Memoize expensive components
const ExpensiveList = React.memo(({ items, onSelect }) => {
return items.map(item => <Item key={item.id} item={item} onSelect={onSelect} />);
});
// Stable callback references — prevent re-renders downstream
const handleSelect = useCallback((id) => {
setSelected(id);
}, []); // no deps: stable forever
// Memoize expensive computations
const sortedItems = useMemo(() => {
return [...items].sort((a, b) => a.name.localeCompare(b.name));
}, [items]);
// Virtualize long lists
import { FixedSizeList } from 'react-window';
<FixedSizeList height={600} itemCount={items.length} itemSize={50} width="100%">
{({ index, style }) => <Row item={items[index]} style={style} />}
</FixedSizeList>js
// 记忆化开销大的组件
const ExpensiveList = React.memo(({ items, onSelect }) => {
return items.map(item => <Item key={item.id} item={item} onSelect={onSelect} />);
});
// 稳定回调引用——避免下游组件重渲染
const handleSelect = useCallback((id) => {
setSelected(id);
}, []); // 无依赖:永久稳定
// 记忆化开销大的计算
const sortedItems = useMemo(() => {
return [...items].sort((a, b) => a.name.localeCompare(b.name));
}, [items]);
// 虚拟化长列表
import { FixedSizeList } from 'react-window';
<FixedSizeList height={600} itemCount={items.length} itemSize={50} width="100%">
{({ index, style }) => <Row item={items[index]} style={style} />}
</FixedSizeList>Database Query Profiling
数据库查询剖析
PostgreSQL EXPLAIN ANALYZE
PostgreSQL EXPLAIN ANALYZE
sql
-- Wrap any query in EXPLAIN (ANALYZE, BUFFERS) to see execution plan
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id;Read the output bottom-up. Each node shows:
- — startup time to first row, total time for all rows
actual time=X..Y - — actual rows returned
rows=N - — how many times this node executed
loops=N
Red flags:
- on large tables — missing index
Seq Scan - estimated vs
rows=1000actual — stale statistics, runrows=1ANALYZE - with large hash batches — memory pressure, tune
Hash Joinwork_mem - on large outer result — cartesian product risk
Nested Loop
sql
-- Wrap any query in EXPLAIN (ANALYZE, BUFFERS) to see execution plan
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id;从下往上阅读输出结果。每个节点显示:
- ——返回第一行的启动时间、所有行的总时间
actual time=X..Y - ——实际返回的行数
rows=N - ——该节点执行的次数
loops=N
危险信号:
- 大表上的——缺少索引
Seq Scan - 预估vs 实际
rows=1000——统计信息过时,运行rows=1更新ANALYZE - 大哈希批处理的——内存压力,调整
Hash Joinwork_mem - 大外部结果集的——存在笛卡尔积风险
Nested Loop
Finding Slow Queries in Production
生产环境慢查询排查
sql
-- Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Top 10 slowest queries by total time
SELECT
query,
calls,
total_exec_time / 1000 AS total_seconds,
mean_exec_time AS mean_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;sql
-- Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Top 10 slowest queries by total time
SELECT
query,
calls,
total_exec_time / 1000 AS total_seconds,
mean_exec_time AS mean_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;Browser Profiling
浏览器性能剖析
See for the full Chrome Performance tab workflow, Core Web Vitals measurement, and layout thrashing diagnosis.
references/browser-profiling.md完整的Chrome Performance标签页工作流程、Core Web Vitals测量与布局抖动诊断,请查看。
references/browser-profiling.mdBottleneck Classification Rules
瓶颈分类规则
When the user provides profiling data, classify and rank bottlenecks using these rules. Process signals in priority order — higher-priority signals override lower ones.
当用户提供剖析数据时,使用以下规则对瓶颈进行分类与排序。按优先级处理信号——高优先级信号覆盖低优先级信号。
Priority 1: Database (check first — it's the bottleneck 70% of the time)
优先级1:数据库(优先检查——70%的场景中是瓶颈)
| Signal | Classification |
|---|---|
| Any query >500ms | Critical — |
| Multiple queries >100ms per request | High — |
| Query count >20 per page load | High — |
| 信号 | 分类 |
|---|---|
| 任何查询耗时>500ms | Critical — |
| 单次请求中多个查询耗时>100ms | High — |
| 页面加载时查询次数>20 | High — |
Priority 2: Event Loop (Node.js-specific — the most underdiagnosed bottleneck)
优先级2:事件循环(Node.js专属——最容易被误诊的瓶颈)
| Signal | Classification |
|---|---|
| ELU >0.8 | Critical — |
| ELU >0.5 with slow p99 latency | High — |
| ELU <0.2 with slow responses | This is NOT a CPU problem. |
| 信号 | 分类 |
|---|---|
| ELU >0.8 | Critical — |
| ELU >0.5且p99延迟高 | High — |
| ELU <0.2且响应缓慢 | 这不是CPU问题。 |
Priority 3: Memory
优先级3:内存
| Signal | Classification |
|---|---|
| Heap growth rate >10MB/min sustained | Critical — |
| Heap growth proportional to request rate (resets on GC) | Medium — |
| List each suspect with its retained size. High if any single object retains >50MB. Next step: Trace the retainer tree to find why it's not being collected. |
| 信号 | 分类 |
|---|---|
| 堆内存增长率持续>10MB/分钟 | Critical — |
| 堆内存增长与请求速率成正比(GC后重置) | Medium — |
堆分析中的 | 列出每个嫌疑对象及其保留大小。如果单个对象保留大小>50MB则标记为High。下一步:追踪保留树,找出未被回收的原因。 |
Priority 4: React Rendering (frontend)
优先级4:React渲染(前端)
| Signal | Classification |
|---|---|
| Component render time >16ms | High — |
| >5 re-renders per user interaction | Medium — |
| Large component tree (>500 components mounted) | Medium — |
| 信号 | 分类 |
|---|---|
| 组件渲染时间>16ms | High — |
| 单次用户交互导致>5次重渲染 | Medium — |
| 大型组件树(挂载组件>500个) | Medium — |
Priority 5: CPU (non-event-loop)
优先级5:CPU(非事件循环)
| Signal | Classification |
|---|---|
Single function >30% of | High — |
| Flame graph shows wide, flat profile (no single hot function) | Medium — |
| 信号 | 分类 |
|---|---|
单个函数在CPU剖析中占 | High — |
| 火焰图显示宽而平的轮廓(无单个热点函数) | Medium — |
Output Ranking
输出排序
After classifying all signals, rank the bottleneck list by:
- Severity (critical first)
- Actionability (clear next step ranks higher than vague "investigate further")
- Estimated impact — "Adding an index will reduce this query from 800ms to 5ms" is more useful than "This might help"
Always include as a concrete prediction: "Eliminating N+1 queries should reduce request count from 47 to 3, cutting endpoint latency by ~200ms" — not "should improve performance."
estimatedImpact分类所有信号后,按以下规则对瓶颈列表排序:
- 严重程度(Critical优先)
- 可操作性(明确下一步的优先级高于模糊的"进一步调查")
- 预估影响——"添加索引可将该查询从800ms降至5ms"比"可能会提升性能"更有价值
始终包含作为具体预测:"消除N+1查询可将请求次数从47降至3,将端点延迟减少约200ms"——而非"应该会提升性能"。
estimatedImpactReferences
参考资料
- — Consult for detailed Node.js profiling: --inspect flags, clinic.js commands, heap snapshot analysis, event loop monitoring, stream backpressure diagnosis
references/node-profiling.md - — Consult for browser performance: Chrome Performance tab workflow, Lighthouse CI integration, React Profiler deep-dive, Core Web Vitals measurement, layout thrashing patterns
references/browser-profiling.md
- — 详细Node.js剖析指南:--inspect参数、clinic.js命令、堆快照分析、事件循环监控、流背压诊断
references/node-profiling.md - — 浏览器性能指南:Chrome Performance标签页工作流程、Lighthouse CI集成、React Profiler深度解析、Core Web Vitals测量、布局抖动模式
references/browser-profiling.md