finding-expensive-queries

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Finding Expensive Queries

查找高成本查询

Query history → Rank by metric → Identify patterns → Recommend optimizations

查询历史 → 按指标排序 → 识别模式 → 推荐优化方案

Workflow

工作流程

1. Ask What to Optimize For

1. 确认优化维度

Before querying, clarify:

Time period? (last day, week, month)
Metric? (execution time, bytes scanned, cost, spillage)
Warehouse? (specific or all)
User? (specific or all)

在查询前，明确以下信息：

时间范围？（最近1天、1周、1个月）
指标？（执行时间、扫描字节数、成本、溢出量）
数据仓库？（特定仓库或所有仓库）
用户？（特定用户或所有用户）

2. Find Expensive Queries by Cost

2. 按成本查找高成本查询

Use QUERY_ATTRIBUTION_HISTORY for credit/cost analysis:

sql

SELECT
    query_id,
    warehouse_name,
    user_name,
    credits_attributed_compute,
    start_time,
    end_time,
    query_tag
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ATTRIBUTION_HISTORY
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())
ORDER BY credits_attributed_compute DESC
LIMIT 20;

使用QUERY_ATTRIBUTION_HISTORY进行积分/成本分析：

sql

SELECT
    query_id,
    warehouse_name,
    user_name,
    credits_attributed_compute,
    start_time,
    end_time,
    query_tag
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ATTRIBUTION_HISTORY
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())
ORDER BY credits_attributed_compute DESC
LIMIT 20;

3. Get Performance Stats for Specific Queries

3. 获取特定查询的性能统计信息

Use QUERY_HISTORY for detailed performance metrics (run separately, not joined):

sql

SELECT
    query_id,
    query_text,
    total_elapsed_time/1000 as seconds,
    bytes_scanned/1e9 as gb_scanned,
    bytes_spilled_to_local_storage/1e9 as gb_spilled_local,
    bytes_spilled_to_remote_storage/1e9 as gb_spilled_remote,
    partitions_scanned,
    partitions_total
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_id IN ('<query_id_1>', '<query_id_2>', ...)
  AND start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP());

使用QUERY_HISTORY获取详细性能指标（单独运行，不要关联查询）：

sql

SELECT
    query_id,
    query_text,
    total_elapsed_time/1000 as seconds,
    bytes_scanned/1e9 as gb_scanned,
    bytes_spilled_to_local_storage/1e9 as gb_spilled_local,
    bytes_spilled_to_remote_storage/1e9 as gb_spilled_remote,
    partitions_scanned,
    partitions_total
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_id IN ('<query_id_1>', '<query_id_2>', ...)
  AND start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP());

4. Identify Patterns

4. 识别模式

Look for:

High
```
credits_attributed_compute
```
queries
Same
```
query_hash
```
repeated (caching opportunity)
```
partitions_scanned = partitions_total
```
(no pruning)
High
```
gb_spilled
```
(memory pressure)

关注以下情况：

高
```
credits_attributed_compute
```
值的查询
重复出现相同
```
query_hash
```
的查询（有缓存优化机会）
```
partitions_scanned = partitions_total
```
（无分区裁剪）
高
```
gb_spilled
```
值（内存压力大）

5. Return Results

5. 返回结果

Provide:

Ranked list of expensive queries with key metrics
Common patterns identified
Top 3-5 optimization recommendations
Specific queries to investigate further

提供以下内容：

带关键指标的高成本查询排序列表
识别出的常见模式
排名前3-5的优化建议
需要进一步调查的特定查询

Common Filters

常用筛选条件

sql

-- Time range (required)
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())

-- By warehouse
AND warehouse_name = 'ANALYTICS_WH'

-- By user
AND user_name = 'ETL_USER'

-- Only queries over cost threshold
AND credits_attributed_compute > 0.01

-- Only queries over time threshold
AND total_elapsed_time > 60000  -- over 1 minute

sql

-- 时间范围（必填）
WHERE start_time >= DATEADD('days', -7, CURRENT_TIMESTAMP())

-- 按数据仓库筛选
AND warehouse_name = 'ANALYTICS_WH'

-- 按用户筛选
AND user_name = 'ETL_USER'

-- 仅筛选成本超过阈值的查询
AND credits_attributed_compute > 0.01

-- 仅筛选时间超过阈值的查询
AND total_elapsed_time > 60000  -- 超过1分钟