extreme-software-optimization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Extreme Software Optimization

极致软件性能优化

The One Rule: Profile first. Prove behavior unchanged. One change at a time.
核心准则: 先做性能剖析(Profile),证明代码行为无变更,每次仅提交一项改动。

The Loop (Mandatory)

优化循环(必须遵守)

1. BASELINE    → hyperfine --warmup 3 --runs 10 'command'
2. PROFILE     → cargo flamegraph / py-spy / clinic flame
3. PROVE       → Golden outputs + isomorphism proof per change
4. IMPLEMENT   → Score ≥ 2.0 only, one lever per commit
5. VERIFY      → sha256sum -c golden_checksums.txt
6. REPEAT      → Re-profile (bottlenecks shift)
1. BASELINE    → hyperfine --warmup 3 --runs 10 'command'
2. PROFILE     → cargo flamegraph / py-spy / clinic flame
3. PROVE       → Golden outputs + isomorphism proof per change
4. IMPLEMENT   → Score ≥ 2.0 only, one lever per commit
5. VERIFY      → sha256sum -c golden_checksums.txt
6. REPEAT      → Re-profile (bottlenecks shift)

Opportunity Matrix

机会评估矩阵

HotspotImpact (1-5)Confidence (1-5)Effort (1-5)Score
func:line××÷Impact×Conf/Effort
Rule: Only implement Score ≥ 2.0
热点代码位置影响度(1-5)置信度(1-5)投入成本(1-5)得分
函数名:行号××÷影响度×置信度/投入成本
规则: 仅实现得分≥2.0的优化项

Isomorphism Proof Template

同构性证明模板

For EVERY change, document:
undefined
每次改动都需要记录:
undefined

Change: [description]

改动:[改动描述]

  • Ordering preserved: [yes/no + why]
  • Tie-breaking unchanged: [yes/no + why]
  • Floating-point: [identical/N/A]
  • RNG seeds: [unchanged/N/A]
  • Golden outputs: sha256sum -c golden_checksums.txt ✓

---
  • 排序逻辑保留: [是/否 + 原因]
  • 平局判定规则无变更: [是/否 + 原因]
  • 浮点数计算: [完全一致/不涉及]
  • 随机数种子: [无变更/不涉及]
  • 基准输出校验: sha256sum -c golden_checksums.txt ✓

---

Pattern Tiers (Quick Reference)

优化模式分层(快速参考)

Tier 1: Low-Hanging Fruit

第一层:低门槛易落地优化项

PatternWhenIsomorphism
N+1 → BatchSequential fetchesSame results, fewer round-trips
Linear → HashMapKeyed lookupsO(n)→O(1), order may change
Lazy evalMaybe-unused valuesSame final values
MemoizationRepeated pure callsCached = recomputed
Buffer reuseAlloc per iterationZero-copy in loop
优化模式适用场景同构性说明
N+1请求 → 批量请求顺序拉取场景结果一致,减少往返次数
线性查找 → HashMap查找键值查询场景O(n)→O(1),顺序可能变更
惰性求值可能不会用到的数值计算场景最终结果一致
结果缓存重复纯函数调用场景缓存结果=重新计算结果
缓冲区复用每次迭代都需要分配内存的场景循环内零拷贝

Tier 2: Algorithmic

第二层:算法层面优化

PatternChangeCheck
Binary searchO(n)→O(log n)Sorted input
Two-pointerO(n²)→O(n)Structured input
Prefix sumsO(n)→O(1) queryStatic data
Priority queueO(n)→O(log n)Top-k/scheduling
优化模式改动效果校验项
二分查找O(n)→O(log n)输入已排序
双指针O(n²)→O(n)输入有结构化特征
前缀和O(n)查询→O(1)查询数据为静态
优先队列O(n)→O(log n)Top k/调度场景

Tier 3: Data Structures

第三层:数据结构层面优化

StructureUse Case
HashMapPoint lookups
BTreeMapRange queries
SmallVecUsually-small collections
ArenaMany allocations, bulk free
Bloom filterMembership pre-filter
Full catalog: TECHNIQUES.md

数据结构适用场景
HashMap单点查询
BTreeMap范围查询
SmallVec通常数据量很小的集合
Arena大量内存分配、批量释放场景
Bloom filter成员资格预过滤
完整目录: TECHNIQUES.md

Language Cheatsheet

各语言优化速查表

LangCPU ProfileTrouble Spot Grep
Rust
cargo flamegraph
rg '\.clone\(\)' --type rust
Go
go tool pprof /debug/pprof/profile
rg 'interface\{\}' --type go
TS
clinic flame -- node app.js
rg 'JSON\.(parse|stringify)' --type ts
Python
py-spy record -o flame.svg -- python script.py
rg '\.iterrows\(\)' --type py
Full language guides: LANGUAGE-SPECIFIC.md

语言CPU性能剖析工具常见问题排查命令
Rust
cargo flamegraph
rg '\.clone\(\)' --type rust
Go
go tool pprof /debug/pprof/profile
rg 'interface\{\}' --type go
TS
clinic flame -- node app.js
rg 'JSON\.(parse|stringify)' --type ts
Python
py-spy record -o flame.svg -- python script.py
rg '\.iterrows\(\)' --type py
完整的语言专属指南: LANGUAGE-SPECIFIC.md

Anti-Patterns (Never Do)

反模式(禁止操作)

Why
Optimize without profilingWastes effort on non-hotspots
Multiple changes per commitCan't isolate regressions
Assume improvementMust measure before/after
Change behavior "while we're here"Breaks isomorphism guarantee
Skip golden output captureNo regression detection

✗ 操作原因
不做Profile就优化浪费精力在非热点代码上
单次提交多个改动无法定位性能回归原因
主观假设性能会提升必须通过前后测量验证效果
优化时顺便修改代码行为破坏同构性保证
不保存基准输出结果无法检测功能回归

Checklist (Before Any Optimization)

优化前检查清单

  • Baseline captured (p50/p95/p99, throughput, memory)
  • Profiled: hotspot in top 5 by % time
  • Opportunity score ≥ 2.0
  • Golden outputs saved
  • Isomorphism proof written
  • Single lever only
  • Rollback plan:
    git revert <sha>

  • 已获取基准性能数据(p50/p95/p99延迟、吞吐量、内存占用)
  • 已完成性能剖析:热点代码位于耗时占比Top5
  • 机会评估得分≥2.0
  • 已保存基准输出结果
  • 已完成同构性证明编写
  • 本次仅做单一优化项改动
  • 已准备回滚方案:
    git revert <sha>

Tool Commands

工具常用命令

bash
undefined
bash
undefined

Benchmark

基准测试

hyperfine --warmup 3 --runs 10 'command'
hyperfine --warmup 3 --runs 10 'command'

Profile

性能剖析

cargo flamegraph # Rust CPU heaptrack ./binary # Allocation strace -c ./binary # Syscalls
cargo flamegraph # Rust CPU性能剖析 heaptrack ./binary # 内存分配剖析 strace -c ./binary # 系统调用剖析

Verify

结果校验

sha256sum golden_outputs/* > golden_checksums.txt sha256sum -c golden_checksums.txt # After changes

---
sha256sum golden_outputs/* > golden_checksums.txt sha256sum -c golden_checksums.txt # 改动后校验

---

References

参考文档

NeedReference
Complete technique catalogTECHNIQUES.md
Step-by-step methodologyMETHODOLOGY.md
Language-specific guidesLANGUAGE-SPECIFIC.md
Advanced (Round 2+)ADVANCED.md
需求参考文档
完整优化技巧目录TECHNIQUES.md
分步操作方法论METHODOLOGY.md
语言专属优化指南LANGUAGE-SPECIFIC.md
进阶优化(第二轮及以后)ADVANCED.md

Iteration Rounds

迭代优化阶段

  • Round 1: Standard (N+1, indexes, batching, memoization)
  • Round 2: Algorithmic (DP, convex, semirings) → ADVANCED.md
  • Round 3: Exotic (suffix automata, link-cut trees)
Each round: fresh profile → new hotspots → new matrix.
  • 第一阶段: 常规优化(N+1查询优化、索引优化、批量处理、缓存)
  • 第二阶段: 算法优化(动态规划、凸优化、半环)→ ADVANCED.md
  • 第三阶段: 前沿优化(后缀自动机、Link-Cut树)
每个阶段操作:重新做性能剖析 → 识别新的热点 → 生成新的机会评估矩阵。