extreme-software-optimization

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Extreme Software Optimization

极致软件性能优化

The One Rule: Profile first. Prove behavior unchanged. One change at a time.

核心准则： 先做性能剖析（Profile），证明代码行为无变更，每次仅提交一项改动。

The Loop (Mandatory)

优化循环（必须遵守）

1. BASELINE    → hyperfine --warmup 3 --runs 10 'command'
2. PROFILE     → cargo flamegraph / py-spy / clinic flame
3. PROVE       → Golden outputs + isomorphism proof per change
4. IMPLEMENT   → Score ≥ 2.0 only, one lever per commit
5. VERIFY      → sha256sum -c golden_checksums.txt
6. REPEAT      → Re-profile (bottlenecks shift)

1. BASELINE    → hyperfine --warmup 3 --runs 10 'command'
2. PROFILE     → cargo flamegraph / py-spy / clinic flame
3. PROVE       → Golden outputs + isomorphism proof per change
4. IMPLEMENT   → Score ≥ 2.0 only, one lever per commit
5. VERIFY      → sha256sum -c golden_checksums.txt
6. REPEAT      → Re-profile (bottlenecks shift)

Opportunity Matrix

机会评估矩阵

Hotspot	Impact (1-5)	Confidence (1-5)	Effort (1-5)	Score
func:line	×	×	÷	Impact×Conf/Effort

Rule: Only implement Score ≥ 2.0

热点代码位置	影响度（1-5）	置信度（1-5）	投入成本（1-5）	得分
函数名:行号	×	×	÷	影响度×置信度/投入成本

规则： 仅实现得分≥2.0的优化项

Isomorphism Proof Template

同构性证明模板

For EVERY change, document:

undefined

每次改动都需要记录：

undefined

Change: [description]

改动：[改动描述]

Ordering preserved: [yes/no + why]
Tie-breaking unchanged: [yes/no + why]
Floating-point: [identical/N/A]
RNG seeds: [unchanged/N/A]
Golden outputs: sha256sum -c golden_checksums.txt ✓

---

排序逻辑保留： [是/否 + 原因]
平局判定规则无变更： [是/否 + 原因]
浮点数计算： [完全一致/不涉及]
随机数种子： [无变更/不涉及]
基准输出校验： sha256sum -c golden_checksums.txt ✓

---

Pattern Tiers (Quick Reference)

优化模式分层（快速参考）

Tier 1: Low-Hanging Fruit

第一层：低门槛易落地优化项

Pattern	When	Isomorphism
N+1 → Batch	Sequential fetches	Same results, fewer round-trips
Linear → HashMap	Keyed lookups	O(n)→O(1), order may change
Lazy eval	Maybe-unused values	Same final values
Memoization	Repeated pure calls	Cached = recomputed
Buffer reuse	Alloc per iteration	Zero-copy in loop

优化模式	适用场景	同构性说明
N+1请求 → 批量请求	顺序拉取场景	结果一致，减少往返次数
线性查找 → HashMap查找	键值查询场景	O(n)→O(1)，顺序可能变更
惰性求值	可能不会用到的数值计算场景	最终结果一致
结果缓存	重复纯函数调用场景	缓存结果=重新计算结果
缓冲区复用	每次迭代都需要分配内存的场景	循环内零拷贝

Tier 2: Algorithmic

第二层：算法层面优化

Pattern	Change	Check
Binary search	O(n)→O(log n)	Sorted input
Two-pointer	O(n²)→O(n)	Structured input
Prefix sums	O(n)→O(1) query	Static data
Priority queue	O(n)→O(log n)	Top-k/scheduling

优化模式	改动效果	校验项
二分查找	O(n)→O(log n)	输入已排序
双指针	O(n²)→O(n)	输入有结构化特征
前缀和	O(n)查询→O(1)查询	数据为静态
优先队列	O(n)→O(log n)	Top k/调度场景

Tier 3: Data Structures

第三层：数据结构层面优化

Structure	Use Case
HashMap	Point lookups
BTreeMap	Range queries
SmallVec	Usually-small collections
Arena	Many allocations, bulk free
Bloom filter	Membership pre-filter

Full catalog: TECHNIQUES.md

数据结构	适用场景
HashMap	单点查询
BTreeMap	范围查询
SmallVec	通常数据量很小的集合
Arena	大量内存分配、批量释放场景
Bloom filter	成员资格预过滤

完整目录： TECHNIQUES.md

Language Cheatsheet

各语言优化速查表

Lang	CPU Profile	Trouble Spot Grep
Rust	`cargo flamegraph`	`rg '\.clone\(\)' --type rust`
Go	`go tool pprof /debug/pprof/profile`	`rg 'interface\{\}' --type go`
TS	`clinic flame -- node app.js`	`rg 'JSON\.(parse\|stringify)' --type ts`
Python	`py-spy record -o flame.svg -- python script.py`	`rg '\.iterrows\(\)' --type py`

Full language guides: LANGUAGE-SPECIFIC.md

语言	CPU性能剖析工具	常见问题排查命令
Rust	`cargo flamegraph`	`rg '\.clone\(\)' --type rust`
Go	`go tool pprof /debug/pprof/profile`	`rg 'interface\{\}' --type go`
TS	`clinic flame -- node app.js`	`rg 'JSON\.(parse\|stringify)' --type ts`
Python	`py-spy record -o flame.svg -- python script.py`	`rg '\.iterrows\(\)' --type py`

完整的语言专属指南： LANGUAGE-SPECIFIC.md

Anti-Patterns (Never Do)

反模式（禁止操作）

✗	Why
Optimize without profiling	Wastes effort on non-hotspots
Multiple changes per commit	Can't isolate regressions
Assume improvement	Must measure before/after
Change behavior "while we're here"	Breaks isomorphism guarantee
Skip golden output capture	No regression detection

✗ 操作	原因
不做Profile就优化	浪费精力在非热点代码上
单次提交多个改动	无法定位性能回归原因
主观假设性能会提升	必须通过前后测量验证效果
优化时顺便修改代码行为	破坏同构性保证
不保存基准输出结果	无法检测功能回归

Checklist (Before Any Optimization)

优化前检查清单

Baseline captured (p50/p95/p99, throughput, memory)
Profiled: hotspot in top 5 by % time
Opportunity score ≥ 2.0
Golden outputs saved
Isomorphism proof written
Single lever only
Rollback plan:
```
git revert <sha>
```

已获取基准性能数据（p50/p95/p99延迟、吞吐量、内存占用）
已完成性能剖析：热点代码位于耗时占比Top5
机会评估得分≥2.0
已保存基准输出结果
已完成同构性证明编写
本次仅做单一优化项改动
已准备回滚方案：
```
git revert <sha>
```

Tool Commands

工具常用命令

bash

undefined

bash

undefined

Benchmark

基准测试

hyperfine --warmup 3 --runs 10 'command'

Profile

性能剖析

cargo flamegraph # Rust CPU heaptrack ./binary # Allocation strace -c ./binary # Syscalls

cargo flamegraph # Rust CPU性能剖析 heaptrack ./binary # 内存分配剖析 strace -c ./binary # 系统调用剖析

Verify

结果校验

sha256sum golden_outputs/* > golden_checksums.txt sha256sum -c golden_checksums.txt # After changes

---

sha256sum golden_outputs/* > golden_checksums.txt sha256sum -c golden_checksums.txt # 改动后校验

---

References

参考文档

Need	Reference
Complete technique catalog	TECHNIQUES.md
Step-by-step methodology	METHODOLOGY.md
Language-specific guides	LANGUAGE-SPECIFIC.md
Advanced (Round 2+)	ADVANCED.md

需求	参考文档
完整优化技巧目录	TECHNIQUES.md
分步操作方法论	METHODOLOGY.md
语言专属优化指南	LANGUAGE-SPECIFIC.md
进阶优化（第二轮及以后）	ADVANCED.md

Iteration Rounds

迭代优化阶段

Round 1: Standard (N+1, indexes, batching, memoization)
Round 2: Algorithmic (DP, convex, semirings) → ADVANCED.md
Round 3: Exotic (suffix automata, link-cut trees)

Each round: fresh profile → new hotspots → new matrix.

第一阶段： 常规优化（N+1查询优化、索引优化、批量处理、缓存）
第二阶段： 算法优化（动态规划、凸优化、半环）→ ADVANCED.md
第三阶段： 前沿优化（后缀自动机、Link-Cut树）

每个阶段操作：重新做性能剖析 → 识别新的热点 → 生成新的机会评估矩阵。