finding-expensive-queries
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFinding Expensive Queries
查找高成本查询
Query history → Rank by metric → Identify patterns → Recommend optimizations
查询历史 → 按指标排序 → 识别模式 → 推荐优化方案
Workflow
工作流程
1. Ask What to Optimize For
1. 确认优化维度
Before querying, clarify:
- Time period? (last day, week, month)
- Metric? (execution time, bytes scanned, cost, spillage)
- Warehouse? (specific or all)
- User? (specific or all)
在查询前,明确以下信息:
- 时间范围?(最近1天、1周、1个月)
- 指标?(执行时间、扫描字节数、成本、溢出量)
- 数据仓库?(特定仓库或所有仓库)
- 用户?(特定用户或所有用户)
2. Find Expensive Queries by Cost
2. 按成本查找高成本查询
Use QUERY_ATTRIBUTION_HISTORY for credit/cost analysis:
sql
SELECT
query_id,
warehouse_name,
user_name,
credits_attributed_compute,
start_time,
end_time,
query_tag
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ATTRIBUTION_HISTORY
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())
ORDER BY credits_attributed_compute DESC
LIMIT 20;使用QUERY_ATTRIBUTION_HISTORY进行积分/成本分析:
sql
SELECT
query_id,
warehouse_name,
user_name,
credits_attributed_compute,
start_time,
end_time,
query_tag
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ATTRIBUTION_HISTORY
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())
ORDER BY credits_attributed_compute DESC
LIMIT 20;3. Get Performance Stats for Specific Queries
3. 获取特定查询的性能统计信息
Use QUERY_HISTORY for detailed performance metrics (run separately, not joined):
sql
SELECT
query_id,
query_text,
total_elapsed_time/1000 as seconds,
bytes_scanned/1e9 as gb_scanned,
bytes_spilled_to_local_storage/1e9 as gb_spilled_local,
bytes_spilled_to_remote_storage/1e9 as gb_spilled_remote,
partitions_scanned,
partitions_total
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_id IN ('<query_id_1>', '<query_id_2>', ...)
AND start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP());使用QUERY_HISTORY获取详细性能指标(单独运行,不要关联查询):
sql
SELECT
query_id,
query_text,
total_elapsed_time/1000 as seconds,
bytes_scanned/1e9 as gb_scanned,
bytes_spilled_to_local_storage/1e9 as gb_spilled_local,
bytes_spilled_to_remote_storage/1e9 as gb_spilled_remote,
partitions_scanned,
partitions_total
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_id IN ('<query_id_1>', '<query_id_2>', ...)
AND start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP());4. Identify Patterns
4. 识别模式
Look for:
- High queries
credits_attributed_compute - Same repeated (caching opportunity)
query_hash - (no pruning)
partitions_scanned = partitions_total - High (memory pressure)
gb_spilled
关注以下情况:
- 高值的查询
credits_attributed_compute - 重复出现相同的查询(有缓存优化机会)
query_hash - (无分区裁剪)
partitions_scanned = partitions_total - 高值(内存压力大)
gb_spilled
5. Return Results
5. 返回结果
Provide:
- Ranked list of expensive queries with key metrics
- Common patterns identified
- Top 3-5 optimization recommendations
- Specific queries to investigate further
提供以下内容:
- 带关键指标的高成本查询排序列表
- 识别出的常见模式
- 排名前3-5的优化建议
- 需要进一步调查的特定查询
Common Filters
常用筛选条件
sql
-- Time range (required)
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())
-- By warehouse
AND warehouse_name = 'ANALYTICS_WH'
-- By user
AND user_name = 'ETL_USER'
-- Only queries over cost threshold
AND credits_attributed_compute > 0.01
-- Only queries over time threshold
AND total_elapsed_time > 60000 -- over 1 minutesql
-- 时间范围(必填)
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())
-- 按数据仓库筛选
AND warehouse_name = 'ANALYTICS_WH'
-- 按用户筛选
AND user_name = 'ETL_USER'
-- 仅筛选成本超过阈值的查询
AND credits_attributed_compute > 0.01
-- 仅筛选时间超过阈值的查询
AND total_elapsed_time > 60000 -- 超过1分钟