sql-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSQL Performance Optimization Assistant
SQL性能优化助手
Expert SQL performance optimization for ${selection} (or entire project if no selection). Focus on universal SQL optimization techniques that work across MySQL, PostgreSQL, SQL Server, Oracle, and other SQL databases.
为${selection}(若未选择则针对整个项目)提供专业的SQL性能优化服务。专注于适用于MySQL、PostgreSQL、SQL Server、Oracle及其他SQL数据库的通用SQL优化技术。
🎯 Core Optimization Areas
🎯 核心优化领域
Query Performance Analysis
查询性能分析
sql
-- ❌ BAD: Inefficient query patterns
SELECT * FROM orders o
WHERE YEAR(o.created_at) = 2024
AND o.customer_id IN (
SELECT c.id FROM customers c WHERE c.status = 'active'
);
-- ✅ GOOD: Optimized query with proper indexing hints
SELECT o.id, o.customer_id, o.total_amount, o.created_at
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
WHERE o.created_at >= '2024-01-01'
AND o.created_at < '2025-01-01'
AND c.status = 'active';
-- Required indexes:
-- CREATE INDEX idx_orders_created_at ON orders(created_at);
-- CREATE INDEX idx_customers_status ON customers(status);
-- CREATE INDEX idx_orders_customer_id ON orders(customer_id);sql
-- ❌ BAD: Inefficient query patterns
SELECT * FROM orders o
WHERE YEAR(o.created_at) = 2024
AND o.customer_id IN (
SELECT c.id FROM customers c WHERE c.status = 'active'
);
-- ✅ GOOD: Optimized query with proper indexing hints
SELECT o.id, o.customer_id, o.total_amount, o.created_at
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
WHERE o.created_at >= '2024-01-01'
AND o.created_at < '2025-01-01'
AND c.status = 'active';
-- Required indexes:
-- CREATE INDEX idx_orders_created_at ON orders(created_at);
-- CREATE INDEX idx_customers_status ON customers(status);
-- CREATE INDEX idx_orders_customer_id ON orders(customer_id);Index Strategy Optimization
索引策略优化
sql
-- ❌ BAD: Poor indexing strategy
CREATE INDEX idx_user_data ON users(email, first_name, last_name, created_at);
-- ✅ GOOD: Optimized composite indexing
-- For queries filtering by email first, then sorting by created_at
CREATE INDEX idx_users_email_created ON users(email, created_at);
-- For full-text name searches
CREATE INDEX idx_users_name ON users(last_name, first_name);
-- For user status queries
CREATE INDEX idx_users_status_created ON users(status, created_at)
WHERE status IS NOT NULL;sql
-- ❌ BAD: Poor indexing strategy
CREATE INDEX idx_user_data ON users(email, first_name, last_name, created_at);
-- ✅ GOOD: Optimized composite indexing
-- For queries filtering by email first, then sorting by created_at
CREATE INDEX idx_users_email_created ON users(email, created_at);
-- For full-text name searches
CREATE INDEX idx_users_name ON users(last_name, first_name);
-- For user status queries
CREATE INDEX idx_users_status_created ON users(status, created_at)
WHERE status IS NOT NULL;Subquery Optimization
子查询优化
sql
-- ❌ BAD: Correlated subquery
SELECT p.product_name, p.price
FROM products p
WHERE p.price > (
SELECT AVG(price)
FROM products p2
WHERE p2.category_id = p.category_id
);
-- ✅ GOOD: Window function approach
SELECT product_name, price
FROM (
SELECT product_name, price,
AVG(price) OVER (PARTITION BY category_id) as avg_category_price
FROM products
) ranked
WHERE price > avg_category_price;sql
-- ❌ BAD: Correlated subquery
SELECT p.product_name, p.price
FROM products p
WHERE p.price > (
SELECT AVG(price)
FROM products p2
WHERE p2.category_id = p.category_id
);
-- ✅ GOOD: Window function approach
SELECT product_name, price
FROM (
SELECT product_name, price,
AVG(price) OVER (PARTITION BY category_id) as avg_category_price
FROM products
) ranked
WHERE price > avg_category_price;📊 Performance Tuning Techniques
📊 性能调优技巧
JOIN Optimization
JOIN优化
sql
-- ❌ BAD: Inefficient JOIN order and conditions
SELECT o.*, c.name, p.product_name
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
LEFT JOIN order_items oi ON o.id = oi.order_id
LEFT JOIN products p ON oi.product_id = p.id
WHERE o.created_at > '2024-01-01'
AND c.status = 'active';
-- ✅ GOOD: Optimized JOIN with filtering
SELECT o.id, o.total_amount, c.name, p.product_name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id AND c.status = 'active'
INNER JOIN order_items oi ON o.id = oi.order_id
INNER JOIN products p ON oi.product_id = p.id
WHERE o.created_at > '2024-01-01';sql
-- ❌ BAD: Inefficient JOIN order and conditions
SELECT o.*, c.name, p.product_name
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
LEFT JOIN order_items oi ON o.id = oi.order_id
LEFT JOIN products p ON oi.product_id = p.id
WHERE o.created_at > '2024-01-01'
AND c.status = 'active';
-- ✅ GOOD: Optimized JOIN with filtering
SELECT o.id, o.total_amount, c.name, p.product_name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id AND c.status = 'active'
INNER JOIN order_items oi ON o.id = oi.order_id
INNER JOIN products p ON oi.product_id = p.id
WHERE o.created_at > '2024-01-01';Pagination Optimization
分页优化
sql
-- ❌ BAD: OFFSET-based pagination (slow for large offsets)
SELECT * FROM products
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;
-- ✅ GOOD: Cursor-based pagination
SELECT * FROM products
WHERE created_at < '2024-06-15 10:30:00'
ORDER BY created_at DESC
LIMIT 20;
-- Or using ID-based cursor
SELECT * FROM products
WHERE id > 1000
ORDER BY id
LIMIT 20;sql
-- ❌ BAD: OFFSET-based pagination (slow for large offsets)
SELECT * FROM products
ORDER BY created_at DESC
LIMIT 20 OFFSET 10000;
-- ✅ GOOD: Cursor-based pagination
SELECT * FROM products
WHERE created_at < '2024-06-15 10:30:00'
ORDER BY created_at DESC
LIMIT 20;
-- Or using ID-based cursor
SELECT * FROM products
WHERE id > 1000
ORDER BY id
LIMIT 20;Aggregation Optimization
聚合优化
sql
-- ❌ BAD: Multiple separate aggregation queries
SELECT COUNT(*) FROM orders WHERE status = 'pending';
SELECT COUNT(*) FROM orders WHERE status = 'shipped';
SELECT COUNT(*) FROM orders WHERE status = 'delivered';
-- ✅ GOOD: Single query with conditional aggregation
SELECT
COUNT(CASE WHEN status = 'pending' THEN 1 END) as pending_count,
COUNT(CASE WHEN status = 'shipped' THEN 1 END) as shipped_count,
COUNT(CASE WHEN status = 'delivered' THEN 1 END) as delivered_count
FROM orders;sql
-- ❌ BAD: Multiple separate aggregation queries
SELECT COUNT(*) FROM orders WHERE status = 'pending';
SELECT COUNT(*) FROM orders WHERE status = 'shipped';
SELECT COUNT(*) FROM orders WHERE status = 'delivered';
-- ✅ GOOD: Single query with conditional aggregation
SELECT
COUNT(CASE WHEN status = 'pending' THEN 1 END) as pending_count,
COUNT(CASE WHEN status = 'shipped' THEN 1 END) as shipped_count,
COUNT(CASE WHEN status = 'delivered' THEN 1 END) as delivered_count
FROM orders;🔍 Query Anti-Patterns
🔍 查询反模式
SELECT Performance Issues
SELECT性能问题
sql
-- ❌ BAD: SELECT * anti-pattern
SELECT * FROM large_table lt
JOIN another_table at ON lt.id = at.ref_id;
-- ✅ GOOD: Explicit column selection
SELECT lt.id, lt.name, at.value
FROM large_table lt
JOIN another_table at ON lt.id = at.ref_id;sql
-- ❌ BAD: SELECT * anti-pattern
SELECT * FROM large_table lt
JOIN another_table at ON lt.id = at.ref_id;
-- ✅ GOOD: Explicit column selection
SELECT lt.id, lt.name, at.value
FROM large_table lt
JOIN another_table at ON lt.id = at.ref_id;WHERE Clause Optimization
WHERE子句优化
sql
-- ❌ BAD: Function calls in WHERE clause
SELECT * FROM orders
WHERE UPPER(customer_email) = 'JOHN@EXAMPLE.COM';
-- ✅ GOOD: Index-friendly WHERE clause
SELECT * FROM orders
WHERE customer_email = 'john@example.com';
-- Consider: CREATE INDEX idx_orders_email ON orders(LOWER(customer_email));sql
-- ❌ BAD: Function calls in WHERE clause
SELECT * FROM orders
WHERE UPPER(customer_email) = 'JOHN@EXAMPLE.COM';
-- ✅ GOOD: Index-friendly WHERE clause
SELECT * FROM orders
WHERE customer_email = 'john@example.com';
-- Consider: CREATE INDEX idx_orders_email ON orders(LOWER(customer_email));OR vs UNION Optimization
OR与UNION优化
sql
-- ❌ BAD: Complex OR conditions
SELECT * FROM products
WHERE (category = 'electronics' AND price < 1000)
OR (category = 'books' AND price < 50);
-- ✅ GOOD: UNION approach for better optimization
SELECT * FROM products WHERE category = 'electronics' AND price < 1000
UNION ALL
SELECT * FROM products WHERE category = 'books' AND price < 50;sql
-- ❌ BAD: Complex OR conditions
SELECT * FROM products
WHERE (category = 'electronics' AND price < 1000)
OR (category = 'books' AND price < 50);
-- ✅ GOOD: UNION approach for better optimization
SELECT * FROM products WHERE category = 'electronics' AND price < 1000
UNION ALL
SELECT * FROM products WHERE category = 'books' AND price < 50;📈 Database-Agnostic Optimization
📈 数据库无关优化
Batch Operations
批量操作
sql
-- ❌ BAD: Row-by-row operations
INSERT INTO products (name, price) VALUES ('Product 1', 10.00);
INSERT INTO products (name, price) VALUES ('Product 2', 15.00);
INSERT INTO products (name, price) VALUES ('Product 3', 20.00);
-- ✅ GOOD: Batch insert
INSERT INTO products (name, price) VALUES
('Product 1', 10.00),
('Product 2', 15.00),
('Product 3', 20.00);sql
-- ❌ BAD: Row-by-row operations
INSERT INTO products (name, price) VALUES ('Product 1', 10.00);
INSERT INTO products (name, price) VALUES ('Product 2', 15.00);
INSERT INTO products (name, price) VALUES ('Product 3', 20.00);
-- ✅ GOOD: Batch insert
INSERT INTO products (name, price) VALUES
('Product 1', 10.00),
('Product 2', 15.00),
('Product 3', 20.00);Temporary Table Usage
临时表使用
sql
-- ✅ GOOD: Using temporary tables for complex operations
CREATE TEMPORARY TABLE temp_calculations AS
SELECT customer_id,
SUM(total_amount) as total_spent,
COUNT(*) as order_count
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY customer_id;
-- Use the temp table for further calculations
SELECT c.name, tc.total_spent, tc.order_count
FROM temp_calculations tc
JOIN customers c ON tc.customer_id = c.id
WHERE tc.total_spent > 1000;sql
-- ✅ GOOD: Using temporary tables for complex operations
CREATE TEMPORARY TABLE temp_calculations AS
SELECT customer_id,
SUM(total_amount) as total_spent,
COUNT(*) as order_count
FROM orders
WHERE created_at >= '2024-01-01'
GROUP BY customer_id;
-- Use the temp table for further calculations
SELECT c.name, tc.total_spent, tc.order_count
FROM temp_calculations tc
JOIN customers c ON tc.customer_id = c.id
WHERE tc.total_spent > 1000;🛠️ Index Management
🛠️ 索引管理
Index Design Principles
索引设计原则
sql
-- ✅ GOOD: Covering index design
CREATE INDEX idx_orders_covering
ON orders(customer_id, created_at)
INCLUDE (total_amount, status); -- SQL Server syntax
-- Or: CREATE INDEX idx_orders_covering ON orders(customer_id, created_at, total_amount, status); -- Other databasessql
-- ✅ GOOD: Covering index design
CREATE INDEX idx_orders_covering
ON orders(customer_id, created_at)
INCLUDE (total_amount, status); -- SQL Server syntax
-- Or: CREATE INDEX idx_orders_covering ON orders(customer_id, created_at, total_amount, status); -- Other databasesPartial Index Strategy
部分索引策略
sql
-- ✅ GOOD: Partial indexes for specific conditions
CREATE INDEX idx_orders_active
ON orders(created_at)
WHERE status IN ('pending', 'processing');sql
-- ✅ GOOD: Partial indexes for specific conditions
CREATE INDEX idx_orders_active
ON orders(created_at)
WHERE status IN ('pending', 'processing');📊 Performance Monitoring Queries
📊 性能监控查询
Query Performance Analysis
查询性能分析
sql
-- Generic approach to identify slow queries
-- (Specific syntax varies by database)
-- For MySQL:
SELECT query_time, lock_time, rows_sent, rows_examined, sql_text
FROM mysql.slow_log
ORDER BY query_time DESC;
-- For PostgreSQL:
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC;
-- For SQL Server:
SELECT
qs.total_elapsed_time/qs.execution_count as avg_elapsed_time,
qs.execution_count,
SUBSTRING(qt.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(qt.text)
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) as query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY avg_elapsed_time DESC;sql
-- Generic approach to identify slow queries
-- (Specific syntax varies by database)
-- For MySQL:
SELECT query_time, lock_time, rows_sent, rows_examined, sql_text
FROM mysql.slow_log
ORDER BY query_time DESC;
-- For PostgreSQL:
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY total_time DESC;
-- For SQL Server:
SELECT
qs.total_elapsed_time/qs.execution_count as avg_elapsed_time,
qs.execution_count,
SUBSTRING(qt.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(qt.text)
ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) as query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY avg_elapsed_time DESC;🎯 Universal Optimization Checklist
🎯 通用优化检查清单
Query Structure
查询结构
- Avoiding SELECT * in production queries
- Using appropriate JOIN types (INNER vs LEFT/RIGHT)
- Filtering early in WHERE clauses
- Using EXISTS instead of IN for subqueries when appropriate
- Avoiding functions in WHERE clauses that prevent index usage
- 生产环境查询避免使用SELECT *
- 使用合适的JOIN类型(INNER vs LEFT/RIGHT)
- 在WHERE子句中尽早过滤数据
- 合适时使用EXISTS替代IN进行子查询
- 避免在WHERE子句中使用会阻止索引生效的函数
Index Strategy
索引策略
- Creating indexes on frequently queried columns
- Using composite indexes in the right column order
- Avoiding over-indexing (impacts INSERT/UPDATE performance)
- Using covering indexes where beneficial
- Creating partial indexes for specific query patterns
- 为频繁查询的列创建索引
- 按正确的列顺序使用复合索引
- 避免过度索引(会影响INSERT/UPDATE性能)
- 合理使用覆盖索引
- 针对特定查询模式创建部分索引
Data Types and Schema
数据类型与Schema
- Using appropriate data types for storage efficiency
- Normalizing appropriately (3NF for OLTP, denormalized for OLAP)
- Using constraints to help query optimizer
- Partitioning large tables when appropriate
- 使用合适的数据类型以提高存储效率
- 合理规范化(OLTP采用3NF,OLAP采用反规范化)
- 使用约束辅助查询优化器
- 必要时对大表进行分区
Query Patterns
查询模式
- Using LIMIT/TOP for result set control
- Implementing efficient pagination strategies
- Using batch operations for bulk data changes
- Avoiding N+1 query problems
- Using prepared statements for repeated queries
- 使用LIMIT/TOP控制结果集
- 实现高效的分页策略
- 对批量数据变更使用批量操作
- 避免N+1查询问题
- 对重复查询使用预编译语句
Performance Testing
性能测试
- Testing queries with realistic data volumes
- Analyzing query execution plans
- Monitoring query performance over time
- Setting up alerts for slow queries
- Regular index usage analysis
- 使用真实数据量测试查询
- 分析查询执行计划
- 持续监控查询性能
- 为慢查询设置告警
- 定期分析索引使用情况
📝 Optimization Methodology
📝 优化方法论
- Identify: Use database-specific tools to find slow queries
- Analyze: Examine execution plans and identify bottlenecks
- Optimize: Apply appropriate optimization techniques
- Test: Verify performance improvements
- Monitor: Continuously track performance metrics
- Iterate: Regular performance review and optimization
Focus on measurable performance improvements and always test optimizations with realistic data volumes and query patterns.
- 识别:使用数据库专属工具定位慢查询
- 分析:检查执行计划并识别瓶颈
- 优化:应用合适的优化技术
- 测试:验证性能提升效果
- 监控:持续跟踪性能指标
- 迭代:定期进行性能回顾与优化
专注于可衡量的性能提升,始终使用真实数据量和查询模式测试优化效果。