extreme-software-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseExtreme Software Optimization
极致软件性能优化
The One Rule: Profile first. Prove behavior unchanged. One change at a time.
核心准则: 先做性能剖析(Profile),证明代码行为无变更,每次仅提交一项改动。
The Loop (Mandatory)
优化循环(必须遵守)
1. BASELINE → hyperfine --warmup 3 --runs 10 'command'
2. PROFILE → cargo flamegraph / py-spy / clinic flame
3. PROVE → Golden outputs + isomorphism proof per change
4. IMPLEMENT → Score ≥ 2.0 only, one lever per commit
5. VERIFY → sha256sum -c golden_checksums.txt
6. REPEAT → Re-profile (bottlenecks shift)1. BASELINE → hyperfine --warmup 3 --runs 10 'command'
2. PROFILE → cargo flamegraph / py-spy / clinic flame
3. PROVE → Golden outputs + isomorphism proof per change
4. IMPLEMENT → Score ≥ 2.0 only, one lever per commit
5. VERIFY → sha256sum -c golden_checksums.txt
6. REPEAT → Re-profile (bottlenecks shift)Opportunity Matrix
机会评估矩阵
| Hotspot | Impact (1-5) | Confidence (1-5) | Effort (1-5) | Score |
|---|---|---|---|---|
| func:line | × | × | ÷ | Impact×Conf/Effort |
Rule: Only implement Score ≥ 2.0
| 热点代码位置 | 影响度(1-5) | 置信度(1-5) | 投入成本(1-5) | 得分 |
|---|---|---|---|---|
| 函数名:行号 | × | × | ÷ | 影响度×置信度/投入成本 |
规则: 仅实现得分≥2.0的优化项
Isomorphism Proof Template
同构性证明模板
For EVERY change, document:
undefined每次改动都需要记录:
undefinedChange: [description]
改动:[改动描述]
- Ordering preserved: [yes/no + why]
- Tie-breaking unchanged: [yes/no + why]
- Floating-point: [identical/N/A]
- RNG seeds: [unchanged/N/A]
- Golden outputs: sha256sum -c golden_checksums.txt ✓
---- 排序逻辑保留: [是/否 + 原因]
- 平局判定规则无变更: [是/否 + 原因]
- 浮点数计算: [完全一致/不涉及]
- 随机数种子: [无变更/不涉及]
- 基准输出校验: sha256sum -c golden_checksums.txt ✓
---Pattern Tiers (Quick Reference)
优化模式分层(快速参考)
Tier 1: Low-Hanging Fruit
第一层:低门槛易落地优化项
| Pattern | When | Isomorphism |
|---|---|---|
| N+1 → Batch | Sequential fetches | Same results, fewer round-trips |
| Linear → HashMap | Keyed lookups | O(n)→O(1), order may change |
| Lazy eval | Maybe-unused values | Same final values |
| Memoization | Repeated pure calls | Cached = recomputed |
| Buffer reuse | Alloc per iteration | Zero-copy in loop |
| 优化模式 | 适用场景 | 同构性说明 |
|---|---|---|
| N+1请求 → 批量请求 | 顺序拉取场景 | 结果一致,减少往返次数 |
| 线性查找 → HashMap查找 | 键值查询场景 | O(n)→O(1),顺序可能变更 |
| 惰性求值 | 可能不会用到的数值计算场景 | 最终结果一致 |
| 结果缓存 | 重复纯函数调用场景 | 缓存结果=重新计算结果 |
| 缓冲区复用 | 每次迭代都需要分配内存的场景 | 循环内零拷贝 |
Tier 2: Algorithmic
第二层:算法层面优化
| Pattern | Change | Check |
|---|---|---|
| Binary search | O(n)→O(log n) | Sorted input |
| Two-pointer | O(n²)→O(n) | Structured input |
| Prefix sums | O(n)→O(1) query | Static data |
| Priority queue | O(n)→O(log n) | Top-k/scheduling |
| 优化模式 | 改动效果 | 校验项 |
|---|---|---|
| 二分查找 | O(n)→O(log n) | 输入已排序 |
| 双指针 | O(n²)→O(n) | 输入有结构化特征 |
| 前缀和 | O(n)查询→O(1)查询 | 数据为静态 |
| 优先队列 | O(n)→O(log n) | Top k/调度场景 |
Tier 3: Data Structures
第三层:数据结构层面优化
| Structure | Use Case |
|---|---|
| HashMap | Point lookups |
| BTreeMap | Range queries |
| SmallVec | Usually-small collections |
| Arena | Many allocations, bulk free |
| Bloom filter | Membership pre-filter |
Full catalog: TECHNIQUES.md
| 数据结构 | 适用场景 |
|---|---|
| HashMap | 单点查询 |
| BTreeMap | 范围查询 |
| SmallVec | 通常数据量很小的集合 |
| Arena | 大量内存分配、批量释放场景 |
| Bloom filter | 成员资格预过滤 |
完整目录: TECHNIQUES.md
Language Cheatsheet
各语言优化速查表
| Lang | CPU Profile | Trouble Spot Grep |
|---|---|---|
| Rust | | |
| Go | | |
| TS | | |
| Python | | |
Full language guides: LANGUAGE-SPECIFIC.md
| 语言 | CPU性能剖析工具 | 常见问题排查命令 |
|---|---|---|
| Rust | | |
| Go | | |
| TS | | |
| Python | | |
完整的语言专属指南: LANGUAGE-SPECIFIC.md
Anti-Patterns (Never Do)
反模式(禁止操作)
| ✗ | Why |
|---|---|
| Optimize without profiling | Wastes effort on non-hotspots |
| Multiple changes per commit | Can't isolate regressions |
| Assume improvement | Must measure before/after |
| Change behavior "while we're here" | Breaks isomorphism guarantee |
| Skip golden output capture | No regression detection |
| ✗ 操作 | 原因 |
|---|---|
| 不做Profile就优化 | 浪费精力在非热点代码上 |
| 单次提交多个改动 | 无法定位性能回归原因 |
| 主观假设性能会提升 | 必须通过前后测量验证效果 |
| 优化时顺便修改代码行为 | 破坏同构性保证 |
| 不保存基准输出结果 | 无法检测功能回归 |
Checklist (Before Any Optimization)
优化前检查清单
- Baseline captured (p50/p95/p99, throughput, memory)
- Profiled: hotspot in top 5 by % time
- Opportunity score ≥ 2.0
- Golden outputs saved
- Isomorphism proof written
- Single lever only
- Rollback plan:
git revert <sha>
- 已获取基准性能数据(p50/p95/p99延迟、吞吐量、内存占用)
- 已完成性能剖析:热点代码位于耗时占比Top5
- 机会评估得分≥2.0
- 已保存基准输出结果
- 已完成同构性证明编写
- 本次仅做单一优化项改动
- 已准备回滚方案:
git revert <sha>
Tool Commands
工具常用命令
bash
undefinedbash
undefinedBenchmark
基准测试
hyperfine --warmup 3 --runs 10 'command'
hyperfine --warmup 3 --runs 10 'command'
Profile
性能剖析
cargo flamegraph # Rust CPU
heaptrack ./binary # Allocation
strace -c ./binary # Syscalls
cargo flamegraph # Rust CPU性能剖析
heaptrack ./binary # 内存分配剖析
strace -c ./binary # 系统调用剖析
Verify
结果校验
sha256sum golden_outputs/* > golden_checksums.txt
sha256sum -c golden_checksums.txt # After changes
---sha256sum golden_outputs/* > golden_checksums.txt
sha256sum -c golden_checksums.txt # 改动后校验
---References
参考文档
| Need | Reference |
|---|---|
| Complete technique catalog | TECHNIQUES.md |
| Step-by-step methodology | METHODOLOGY.md |
| Language-specific guides | LANGUAGE-SPECIFIC.md |
| Advanced (Round 2+) | ADVANCED.md |
| 需求 | 参考文档 |
|---|---|
| 完整优化技巧目录 | TECHNIQUES.md |
| 分步操作方法论 | METHODOLOGY.md |
| 语言专属优化指南 | LANGUAGE-SPECIFIC.md |
| 进阶优化(第二轮及以后) | ADVANCED.md |
Iteration Rounds
迭代优化阶段
- Round 1: Standard (N+1, indexes, batching, memoization)
- Round 2: Algorithmic (DP, convex, semirings) → ADVANCED.md
- Round 3: Exotic (suffix automata, link-cut trees)
Each round: fresh profile → new hotspots → new matrix.
- 第一阶段: 常规优化(N+1查询优化、索引优化、批量处理、缓存)
- 第二阶段: 算法优化(动态规划、凸优化、半环)→ ADVANCED.md
- 第三阶段: 前沿优化(后缀自动机、Link-Cut树)
每个阶段操作:重新做性能剖析 → 识别新的热点 → 生成新的机会评估矩阵。