data-sql-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSQL Query Optimization
SQL查询优化
Framework
框架
IRON LAW: Measure Before Optimizing
NEVER guess which query is slow or why. Use EXPLAIN (EXPLAIN ANALYZE in
PostgreSQL) to see the actual execution plan. The database's plan often
differs from what you expect — a query you think is efficient may do
a full table scan, and a complex-looking query may use an index perfectly.
Measure → identify bottleneck → fix → measure again.铁律:优化前先测量
永远不要猜测哪个查询慢或者为什么慢。使用EXPLAIN(PostgreSQL中使用EXPLAIN ANALYZE)查看实际执行计划。数据库的计划往往和你预期的不同——你认为高效的查询可能在做全表扫描,而看起来复杂的查询可能完美地使用了索引。
测量→识别瓶颈→修复→再次测量。EXPLAIN Output Reading
EXPLAIN输出解读
Key metrics in EXPLAIN ANALYZE (PostgreSQL):
| Metric | What It Means | Red Flag |
|---|---|---|
| Seq Scan | Full table scan | On large tables (>100K rows) |
| Index Scan | Using an index | Expected for filtered queries |
| Nested Loop | Join method (row-by-row) | On large tables without index |
| Hash Join | Join method (hash table) | Normal for larger tables |
| Sort | Sorting results | Without index support on large sets |
| Actual Time | Milliseconds for this step | Compare to identify bottleneck |
| Rows | Actual rows processed vs estimated | Large mismatch = stale statistics |
PostgreSQL中EXPLAIN ANALYZE的关键指标:
| 指标 | 含义 | 危险信号 |
|---|---|---|
| Seq Scan | 全表扫描 | 在大表(>10万行)上出现 |
| Index Scan | 索引扫描 | 过滤查询的预期行为 |
| Nested Loop | 嵌套循环(逐行连接) | 在无索引的大表上使用 |
| Hash Join | 哈希连接 | 大表的正常连接方式 |
| Sort | 排序操作 | 在大数据集上无索引支持时出现 |
| Actual Time | 实际耗时(毫秒) | 用于对比识别瓶颈 |
| Rows | 实际扫描行数与预估行数 | 差异过大=统计信息过时 |
Indexing Strategy
索引策略
| When to Index | Index Type | Example |
|---|---|---|
| WHERE clause column | B-Tree (default) | |
| JOIN column | B-Tree | |
| Composite filter | Composite index | |
| Text search | GIN / Full-text | |
| Range queries | B-Tree | Columns used with |
Composite index column order matters: Put the most selective (highest cardinality) column first. is good if you always filter by status. is better if you always filter by date range first.
INDEX(status, date)INDEX(date, status)| 何时创建索引 | 索引类型 | 示例 |
|---|---|---|
| WHERE子句列 | B-Tree(默认) | |
| JOIN关联列 | B-Tree | |
| 复合过滤场景 | 复合索引 | |
| 文本搜索场景 | GIN/全文索引 | |
| 范围查询场景 | B-Tree | 用于使用 |
复合索引的列顺序很重要:将选择性最高(基数最大)的列放在前面。如果总是按status过滤,很合适;如果总是先按日期范围过滤,更优。
INDEX(status, date)INDEX(date, status)Common Anti-Patterns
常见反模式
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Reads all columns, prevents index-only scans | Select only needed columns |
| Subquery in WHERE | Re-executes for each row | Rewrite as JOIN or CTE |
| Prevents index use | Rewrite as UNION or separate queries |
| Function on indexed column | | |
| N+1 queries | 1 query for list + N queries for details | JOIN or batch query with |
| Missing pagination | Fetching all rows when only showing 20 | |
| Implicit type conversion | | Use correct type: |
| 反模式 | 问题 | 修复方案 |
|---|---|---|
| 读取所有列,无法使用仅索引扫描 | 仅选择需要的列 |
| WHERE子句中的子查询 | 为每一行重新执行 | 重写为JOIN或CTE |
WHERE中的 | 无法使用索引 | 重写为UNION或拆分查询 |
| 索引列上使用函数 | | |
| N+1查询 | 1次列表查询+N次详情查询 | 使用JOIN或 |
| 缺少分页 | 仅展示20行却获取所有行 | 使用 |
| 隐式类型转换 | | 使用正确类型: |
Optimization Workflow
优化工作流
- Identify slow queries: Database slow query log (pg_stat_statements, MySQL slow log)
- Run EXPLAIN ANALYZE on the slowest
- Find the bottleneck: Seq Scan on large table? Missing index? Expensive sort?
- Apply fix: Add index, rewrite query, or restructure schema
- Verify: Run EXPLAIN ANALYZE again — confirm improvement
- Monitor: Check that fix didn't degrade other queries
- 识别慢查询:数据库慢查询日志(pg_stat_statements、MySQL慢查询日志)
- 对最慢查询运行EXPLAIN ANALYZE
- 定位瓶颈:大表全表扫描?缺少索引?昂贵的排序操作?
- 应用修复:添加索引、重写查询或重构Schema
- 验证效果:再次运行EXPLAIN ANALYZE——确认性能提升
- 持续监控:检查修复是否影响其他查询的性能
Partitioning (Large Tables)
分区(大表场景)
When tables exceed millions of rows:
| Strategy | How It Works | Best For |
|---|---|---|
| Range partition | Split by date range (monthly, yearly) | Time-series data, logs |
| Hash partition | Distribute by hash of a column | Even distribution, high-throughput |
| List partition | Split by specific values | Multi-tenant, status-based |
当表行数超过百万时:
| 策略 | 工作原理 | 适用场景 |
|---|---|---|
| 范围分区 | 按日期范围拆分(月度、年度) | 时序数据、日志 |
| 哈希分区 | 按列的哈希值分布数据 | 均匀分布、高吞吐量场景 |
| 列表分区 | 按特定值拆分 | 多租户、基于状态的场景 |
Output Format
输出格式
markdown
undefinedmarkdown
undefinedQuery Optimization: {Context}
查询优化:{上下文}
Slow Query
慢查询
sql
{the original slow query}- Execution time: {current ms}
- Rows scanned: {N}
- Problem: {what EXPLAIN revealed}
sql
{原始慢查询}- 执行耗时:{当前毫秒数}
- 扫描行数:{N}
- 问题:{EXPLAIN分析结果}
Fix Applied
修复措施
{What was changed — new index, query rewrite, etc.}
{修改内容——新增索引、重写查询等}
Result
优化结果
- Execution time: {original ms} → {optimized ms} ({X% improvement})
- Rows scanned: {original N} → {optimized N}
undefined- 执行耗时:{原毫秒数} → {优化后毫秒数}(提升{X%})
- 扫描行数:{原行数} → {优化后行数}
undefinedGotchas
注意事项
- Indexes have write cost: Every INSERT/UPDATE must update all indexes. Over-indexing slows writes. Index what you query, not everything.
- Statistics can be stale: If EXPLAIN estimates are way off from actuals, run (PostgreSQL) or
ANALYZE(MySQL) to update statistics.ANALYZE TABLE - Query cache hides problems: A query may appear fast because it's cached. Test with cache cleared or cold start.
- ORM-generated queries: ORMs (Django, SQLAlchemy, ActiveRecord) generate SQL that may not be optimal. Always inspect the actual SQL for performance-critical paths.
- Connection pooling: Sometimes the bottleneck isn't the query but connection overhead. Use connection pooling (PgBouncer, ProxySQL) for high-concurrency applications.
- 索引存在写入开销:每次INSERT/UPDATE都必须更新所有索引。过度索引会减慢写入速度。只为查询用到的列创建索引,而非所有列。
- 统计信息可能过时:如果EXPLAIN的预估行数与实际差异过大,运行(PostgreSQL)或
ANALYZE(MySQL)更新统计信息。ANALYZE TABLE - 查询缓存会掩盖问题:查询看起来很快可能是因为命中了缓存。请在清空缓存或冷启动状态下测试。
- ORM生成的查询:ORM(Django、SQLAlchemy、ActiveRecord)生成的SQL可能并非最优。对于性能关键路径,务必检查实际生成的SQL。
- 连接池优化:有时瓶颈并非查询本身,而是连接开销。对于高并发应用,请使用连接池(PgBouncer、ProxySQL)。
References
参考资料
- For PostgreSQL-specific optimization, see
references/pg-optimization.md - For CTE vs temp table performance comparison, see
references/cte-vs-temp.md
- 针对PostgreSQL的特定优化,请参考
references/pg-optimization.md - 关于CTE与临时表的性能对比,请参考
references/cte-vs-temp.md