finding-expensive-queries

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Finding Expensive Queries

查找高成本查询

Query history → Rank by metric → Identify patterns → Recommend optimizations
查询历史 → 按指标排序 → 识别模式 → 推荐优化方案

Workflow

工作流程

1. Ask What to Optimize For

1. 确认优化维度

Before querying, clarify:
  • Time period? (last day, week, month)
  • Metric? (execution time, bytes scanned, cost, spillage)
  • Warehouse? (specific or all)
  • User? (specific or all)
在查询前,明确以下信息:
  • 时间范围?(最近1天、1周、1个月)
  • 指标?(执行时间、扫描字节数、成本、溢出量)
  • 数据仓库?(特定仓库或所有仓库)
  • 用户?(特定用户或所有用户)

2. Find Expensive Queries by Cost

2. 按成本查找高成本查询

Use QUERY_ATTRIBUTION_HISTORY for credit/cost analysis:
sql
SELECT
    query_id,
    warehouse_name,
    user_name,
    credits_attributed_compute,
    start_time,
    end_time,
    query_tag
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ATTRIBUTION_HISTORY
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())
ORDER BY credits_attributed_compute DESC
LIMIT 20;
使用QUERY_ATTRIBUTION_HISTORY进行积分/成本分析:
sql
SELECT
    query_id,
    warehouse_name,
    user_name,
    credits_attributed_compute,
    start_time,
    end_time,
    query_tag
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ATTRIBUTION_HISTORY
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())
ORDER BY credits_attributed_compute DESC
LIMIT 20;

3. Get Performance Stats for Specific Queries

3. 获取特定查询的性能统计信息

Use QUERY_HISTORY for detailed performance metrics (run separately, not joined):
sql
SELECT
    query_id,
    query_text,
    total_elapsed_time/1000 as seconds,
    bytes_scanned/1e9 as gb_scanned,
    bytes_spilled_to_local_storage/1e9 as gb_spilled_local,
    bytes_spilled_to_remote_storage/1e9 as gb_spilled_remote,
    partitions_scanned,
    partitions_total
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_id IN ('<query_id_1>', '<query_id_2>', ...)
  AND start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP());
使用QUERY_HISTORY获取详细性能指标(单独运行,不要关联查询):
sql
SELECT
    query_id,
    query_text,
    total_elapsed_time/1000 as seconds,
    bytes_scanned/1e9 as gb_scanned,
    bytes_spilled_to_local_storage/1e9 as gb_spilled_local,
    bytes_spilled_to_remote_storage/1e9 as gb_spilled_remote,
    partitions_scanned,
    partitions_total
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_id IN ('<query_id_1>', '<query_id_2>', ...)
  AND start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP());

4. Identify Patterns

4. 识别模式

Look for:
  • High
    credits_attributed_compute
    queries
  • Same
    query_hash
    repeated (caching opportunity)
  • partitions_scanned = partitions_total
    (no pruning)
  • High
    gb_spilled
    (memory pressure)
关注以下情况:
  • credits_attributed_compute
    值的查询
  • 重复出现相同
    query_hash
    的查询(有缓存优化机会)
  • partitions_scanned = partitions_total
    (无分区裁剪)
  • gb_spilled
    值(内存压力大)

5. Return Results

5. 返回结果

Provide:
  1. Ranked list of expensive queries with key metrics
  2. Common patterns identified
  3. Top 3-5 optimization recommendations
  4. Specific queries to investigate further
提供以下内容:
  1. 带关键指标的高成本查询排序列表
  2. 识别出的常见模式
  3. 排名前3-5的优化建议
  4. 需要进一步调查的特定查询

Common Filters

常用筛选条件

sql
-- Time range (required)
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())

-- By warehouse
AND warehouse_name = 'ANALYTICS_WH'

-- By user
AND user_name = 'ETL_USER'

-- Only queries over cost threshold
AND credits_attributed_compute > 0.01

-- Only queries over time threshold
AND total_elapsed_time > 60000  -- over 1 minute
sql
-- 时间范围(必填)
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())

-- 按数据仓库筛选
AND warehouse_name = 'ANALYTICS_WH'

-- 按用户筛选
AND user_name = 'ETL_USER'

-- 仅筛选成本超过阈值的查询
AND credits_attributed_compute > 0.01

-- 仅筛选时间超过阈值的查询
AND total_elapsed_time > 60000  -- 超过1分钟