python-performance-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePython Performance Optimization
Python性能优化
Expert guidance for profiling, optimizing, and accelerating Python applications through systematic analysis, algorithmic improvements, efficient data structures, and acceleration techniques.
通过系统分析、算法改进、高效数据结构和加速技术,为Python应用的性能分析、优化与加速提供专业指导。
When to Use This Skill
何时使用该技能
- Code runs too slowly for production requirements
- High CPU usage or memory consumption issues
- Need to reduce API response times or batch processing duration
- Application fails to scale under load
- Optimizing data processing pipelines or scientific computing
- Reducing cloud infrastructure costs through efficiency gains
- Profile-guided optimization after measuring performance bottlenecks
- 代码运行速度无法满足生产环境要求
- 存在CPU占用过高或内存消耗过大的问题
- 需要缩短API响应时间或批量处理时长
- 应用在负载下无法实现水平扩展
- 优化数据处理管道或科学计算流程
- 通过提升效率降低云基础设施成本
- 在测量出性能瓶颈后,基于分析结果进行优化
Core Concepts
核心概念
The Golden Rule: Never optimize without profiling first. 80% of execution time is spent in 20% of code.
Optimization Hierarchy (in priority order):
- Algorithm complexity - O(n²) → O(n log n) provides exponential gains
- Data structure choice - List → Set for lookups (10,000x faster)
- Language features - Comprehensions, built-ins, generators
- Caching - Memoization for repeated calculations
- Compiled extensions - NumPy, Numba, Cython for hot paths
- Parallelism - Multiprocessing for CPU-bound work
Key Principle: Algorithmic improvements beat micro-optimizations every time.
黄金法则:永远不要在未做性能分析的情况下进行优化。80%的执行时间都消耗在20%的代码上。
优化优先级层级(按优先级排序):
- 算法复杂度 - 从O(n²)优化到O(n log n)能带来指数级的性能提升
- 数据结构选择 - 将List替换为Set进行查找操作(速度提升10000倍)
- 语言特性运用 - 推导式、内置函数、生成器
- 缓存机制 - 为重复计算的结果添加缓存
- 编译扩展 - 针对热点路径使用NumPy、Numba、Cython
- 并行处理 - 针对CPU密集型任务使用多进程
关键原则:算法改进的效果永远优于微优化。
Quick Reference
快速参考
Load detailed guides for specific optimization areas:
| Task | Load reference |
|---|---|
| Profile code and find bottlenecks | |
| Algorithm and data structure optimization | |
| Memory optimization and generators | |
| String concatenation and file I/O | |
| NumPy, Numba, Cython, multiprocessing | |
加载针对特定优化领域的详细指南:
| 任务 | 加载参考文档 |
|---|---|
| 分析代码并定位瓶颈 | |
| 算法与数据结构优化 | |
| 内存优化与生成器使用 | |
| 字符串拼接与文件I/O | |
| NumPy、Numba、Cython与多进程 | |
Optimization Workflow
优化工作流
Phase 1: Measure
阶段1:测量
- Profile with cProfile - Identify slow functions
- Line profile hot paths - Find exact slow lines
- Memory profile - Check for memory bottlenecks
- Benchmark baseline - Record current performance
- 使用cProfile进行性能分析 - 定位运行缓慢的函数
- 对热点路径进行行级分析 - 找出具体的慢执行行
- 内存分析 - 检查内存瓶颈
- 基准测试 - 记录当前性能指标
Phase 2: Analyze
阶段2:分析
- Check algorithm complexity - Is it O(n²) or worse?
- Evaluate data structures - Are you using lists for lookups?
- Identify repeated work - Can results be cached?
- Find I/O bottlenecks - Database queries, file operations
- 检查算法复杂度 - 是否为O(n²)或更差的复杂度?
- 评估数据结构 - 是否在使用List进行查找操作?
- 识别重复计算 - 结果是否可以被缓存?
- 定位I/O瓶颈 - 数据库查询、文件操作等
Phase 3: Optimize
阶段3:优化
- Improve algorithms first - Biggest impact
- Use appropriate data structures - Set/dict for O(1) lookups
- Apply caching - for expensive functions
@lru_cache - Use generators - For large datasets
- Leverage NumPy/Numba - For numerical code
- Parallelize - Multiprocessing for CPU-bound tasks
- 优先改进算法 - 带来最大性能提升
- 使用合适的数据结构 - 用Set/Dict实现O(1)时间复杂度的查找
- 应用缓存 - 为计算密集型函数使用
@lru_cache - 使用生成器 - 处理大型数据集
- 借助NumPy/Numba - 针对数值计算代码
- 并行化处理 - 为CPU密集型任务使用多进程
Phase 4: Validate
阶段4:验证
- Re-profile - Verify improvements
- Benchmark - Measure speedup quantitatively
- Test correctness - Ensure optimizations didn't break functionality
- Document - Explain why optimization was needed
- 重新分析性能 - 验证优化效果
- 基准测试 - 量化测量性能提升幅度
- 正确性测试 - 确保优化未破坏原有功能
- 文档记录 - 说明优化的必要性
Common Optimization Patterns
常见优化模式
Pattern 1: Replace List with Set for Lookups
模式1:将List替换为Set进行查找操作
python
undefinedpython
undefinedSlow: O(n) lookup
慢:O(n) 查找时间
if item in large_list: # Bad
if item in large_list: # 不推荐
Fast: O(1) lookup
快:O(1) 查找时间
if item in large_set: # Good
undefinedif item in large_set: # 推荐
undefinedPattern 2: Use Comprehensions
模式2:使用推导式
python
undefinedpython
undefinedSlower
较慢
result = []
for i in range(n):
result.append(i * 2)
result = []
for i in range(n):
result.append(i * 2)
Faster (35% speedup)
更快(提升35%速度)
result = [i * 2 for i in range(n)]
undefinedresult = [i * 2 for i in range(n)]
undefinedPattern 3: Cache Expensive Calculations
模式3:缓存计算密集型结果
python
from functools import lru_cache
@lru_cache(maxsize=None)
def expensive_function(n):
# Result cached automatically
return complex_calculation(n)python
from functools import lru_cache
@lru_cache(maxsize=None)
def expensive_function(n):
# 结果会被自动缓存
return complex_calculation(n)Pattern 4: Use Generators for Large Data
模式4:使用生成器处理大型数据
python
undefinedpython
undefinedMemory inefficient
内存效率低
def read_file(path):
return [line for line in open(path)] # Loads entire file
def read_file(path):
return [line for line in open(path)] # 加载整个文件到内存
Memory efficient
内存效率高
def read_file(path):
for line in open(path): # Streams line by line
yield line.strip()
undefineddef read_file(path):
for line in open(path): # 逐行流式读取
yield line.strip()
undefinedPattern 5: Vectorize with NumPy
模式5:使用NumPy向量化计算
python
undefinedpython
undefinedPure Python: ~500ms
纯Python实现:约500ms
result = sum(i**2 for i in range(1000000))
result = sum(i**2 for i in range(1000000))
NumPy: ~5ms (100x faster)
NumPy实现:约5ms(速度提升100倍)
import numpy as np
result = np.sum(np.arange(1000000)**2)
undefinedimport numpy as np
result = np.sum(np.arange(1000000)**2)
undefinedCommon Mistakes to Avoid
需避免的常见错误
- Optimizing before profiling - You'll optimize the wrong code
- Using lists for membership tests - Use sets/dicts instead
- String concatenation in loops - Use or
"".join()StringIO - Loading entire files into memory - Use generators
- N+1 database queries - Use JOINs or batch queries
- Ignoring built-in functions - They're C-optimized and fast
- Premature optimization - Focus on algorithmic improvements first
- Not benchmarking - Always measure improvements quantitatively
- 未做性能分析就进行优化 - 你可能会优化错误的代码
- 使用List进行成员测试 - 改用Set/Dict
- 在循环中拼接字符串 - 使用或
"".join()StringIO - 将整个文件加载到内存 - 使用生成器
- N+1数据库查询问题 - 使用JOIN或批量查询
- 忽略内置函数 - 它们是C实现的,速度更快
- 过早优化 - 优先关注算法层面的改进
- 未进行基准测试 - 始终量化测量优化效果
Decision Tree
决策树
Start here: Profile with cProfile to find bottlenecks
Hot path is algorithm?
- Yes → Check complexity, improve algorithm, use better data structures
- No → Continue
Hot path is computation?
- Numerical loops → NumPy or Numba
- CPU-bound → Multiprocessing
- Already fast enough → Done
Hot path is memory?
- Large data → Generators, streaming
- Many objects → , object pooling
__slots__ - Caching needed → or custom cache
@lru_cache
Hot path is I/O?
- Database → Batch queries, indexes, connection pooling
- Files → Buffering, streaming
- Network → Async I/O, request batching
开始:使用cProfile进行性能分析,定位瓶颈
热点路径是算法问题?
- 是 → 检查复杂度,改进算法,使用更优的数据结构
- 否 → 继续
热点路径是计算问题?
- 数值循环 → 使用NumPy或Numba
- CPU密集型 → 使用多进程
- 已满足性能要求 → 结束
热点路径是内存问题?
- 大型数据 → 生成器、流式处理
- 大量对象 → 使用、对象池
__slots__ - 需要缓存 → 使用或自定义缓存
@lru_cache
热点路径是I/O问题?
- 数据库 → 批量查询、索引、连接池
- 文件 → 缓冲、流式处理
- 网络 → 异步I/O、请求批量处理
Best Practices
最佳实践
- Profile before optimizing - Measure to find real bottlenecks
- Optimize algorithms first - O(n²) → O(n) beats micro-optimizations
- Use appropriate data structures - Set/dict for lookups, not lists
- Leverage built-ins - C-implemented built-ins are faster than pure Python
- Avoid premature optimization - Optimize hot paths identified by profiling
- Use generators for large data - Reduce memory usage with lazy evaluation
- Batch operations - Minimize overhead from syscalls and network requests
- Cache expensive computations - Use or custom caching
@lru_cache - Consider NumPy/Numba - Vectorization and JIT for numerical code
- Parallelize CPU-bound work - Use multiprocessing to utilize all cores
- 优化前先做性能分析 - 通过测量找到真实的瓶颈
- 优先优化算法 - 从O(n²)到O(n)的优化效果远胜于微优化
- 使用合适的数据结构 - 用Set/Dict进行查找,而非List
- 充分利用内置函数 - C实现的内置函数比纯Python代码更快
- 避免过早优化 - 仅针对性能分析定位出的热点路径进行优化
- 使用生成器处理大型数据 - 通过惰性计算降低内存占用
- 批量操作 - 减少系统调用和网络请求的开销
- 缓存计算密集型任务的结果 - 使用或自定义缓存机制
@lru_cache - 考虑使用NumPy/Numba - 针对数值计算的向量化和JIT编译
- 并行化CPU密集型任务 - 使用多进程充分利用所有核心
Resources
资源
- Python Performance: https://wiki.python.org/moin/PythonSpeed
- cProfile: https://docs.python.org/3/library/profile.html
- NumPy: https://numpy.org/doc/stable/user/absolute_beginners.html
- Numba: https://numba.pydata.org/
- Cython: https://cython.readthedocs.io/
- High Performance Python (Book by Gorelick & Ozsvald)
- Python性能优化:https://wiki.python.org/moin/PythonSpeed
- cProfile:https://docs.python.org/3/library/profile.html
- NumPy:https://numpy.org/doc/stable/user/absolute_beginners.html
- Numba:https://numba.pydata.org/
- Cython:https://cython.readthedocs.io/
- 《High Performance Python》(Gorelick & Ozsvald 所著书籍)