monte-carlo-validation-notebook

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Tip: This skill works well with Sonnet. Run
/model sonnet
before invoking for faster generation.
Generate a SQL Notebook with validation queries for dbt changes.
Arguments: $ARGUMENTS
提示: 此技能与Sonnet搭配使用效果最佳。调用前运行
/model sonnet
可加快生成速度。
为dbt变更生成包含验证查询的SQL笔记本。
参数: $ARGUMENTS

When to Use

使用场景

Use this skill when the user wants to validate dbt model or snapshot changes with Monte Carlo SQL Notebook queries, either from a GitHub PR or a local dbt repository.
Parse the arguments:
  • Target (required): first argument — a GitHub PR URL or local dbt repo path
  • MC Base URL (optional):
    --mc-base-url <URL>
    — defaults to
    https://getmontecarlo.com
  • Models (optional):
    --models <model1,model2,...>
    — comma-separated list of model filenames (without
    .sql
    extension) to generate queries for. Only these models will be included. By default, all changed models are included up to a maximum of 10.

当用户希望通过Monte Carlo SQL Notebook查询验证dbt模型或快照变更时(无论是来自GitHub PR还是本地dbt仓库),均可使用此技能。
解析参数:
  • 目标(必填):第一个参数 — GitHub PR URL或本地dbt仓库路径
  • MC基础URL(可选):
    --mc-base-url <URL>
    — 默认值为
    https://getmontecarlo.com
  • 模型(可选):
    --models <model1,model2,...>
    — 逗号分隔的模型文件名列表(不含
    .sql
    扩展名),仅为这些模型生成查询。默认情况下,最多包含10个变更模型。

Setup

安装配置

Prerequisites:
  • gh
    (GitHub CLI) — required for PR mode. Must be authenticated (
    gh auth status
    ).
  • python3
    — required for helper scripts.
  • pyyaml
    — install with
    pip3 install pyyaml
    (or
    pip install pyyaml
    ,
    uv pip install pyyaml
    , etc.)
Note: Generated SQL uses ANSI-compatible syntax that works across Snowflake, BigQuery, Redshift, and Athena. Minor adjustments may be needed for specific warehouse quirks.
This skill includes two helper scripts in
${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/
:
  • resolve_dbt_schema.py
    - Resolves dbt model output schemas from
    dbt_project.yml
    routing rules and model config overrides.
  • generate_notebook_url.py
    - Encodes notebook YAML into a base64 import URL and opens it in the browser.
前置条件:
  • gh
    (GitHub CLI)—— PR模式必填。必须已完成认证(运行
    gh auth status
    检查)。
  • python3
    —— 辅助脚本必填。
  • pyyaml
    —— 通过
    pip3 install pyyaml
    (或
    pip install pyyaml
    uv pip install pyyaml
    等)安装。
注意: 生成的SQL采用ANSI兼容语法,可在Snowflake、BigQuery、Redshift和Athena中运行。针对特定数据仓库的特性可能需要进行细微调整。
此技能在
${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/
目录下包含两个辅助脚本:
  • resolve_dbt_schema.py
    - 从
    dbt_project.yml
    路由规则和模型配置覆盖中解析dbt模型输出Schema。
  • generate_notebook_url.py
    - 将笔记本YAML编码为base64导入URL并在浏览器中打开。

Mode Detection

模式检测

Auto-detect mode from the target argument:
  • If target looks like a URL (contains
    ://
    or
    github.com
    ) -> PR mode
  • If target is a path (
    .
    ,
    /path/to/repo
    , relative path) -> Local mode

根据目标参数自动检测模式:
  • 如果目标看起来是URL(包含
    ://
    github.com
    )→ PR模式
  • 如果目标是路径(
    .
    /path/to/repo
    、相对路径)→ 本地模式

Context

背景信息

This command generates a SQL Notebook containing validation queries for dbt changes. The notebook can be opened in the MC Bridge SQL Notebook interface for interactive validation.
The output is an import URL that opens directly in the notebook interface:
<MC_BASE_URL>/notebooks/import#<base64-encoded-yaml>
Key Features:
  • Database Parameters: Two
    text
    parameters (
    prod_db
    and
    dev_db
    ) for selecting databases
  • Schema Inference: Automatically infers schema per model from
    dbt_project.yml
    and model configs
  • Single-table queries: Basic validation queries using
    {{prod_db}}.<SCHEMA>.<TABLE>
  • Comparison queries: Before/after queries comparing
    {{prod_db}}
    vs
    {{dev_db}}
  • Flexible usage: Users can set both parameters to the same database for single-database analysis
此命令为dbt变更生成包含验证查询的SQL笔记本。该笔记本可在MC Bridge SQL Notebook界面中打开,进行交互式验证。
输出为可直接在笔记本界面打开的导入URL:
<MC_BASE_URL>/notebooks/import#<base64-encoded-yaml>
核心特性:
  • 数据库参数:两个
    text
    参数(
    prod_db
    dev_db
    )用于选择数据库
  • Schema推断:自动从
    dbt_project.yml
    和模型配置中为每个模型推断Schema
  • 单表查询:使用
    {{prod_db}}.<SCHEMA>.<TABLE>
    的基础验证查询
  • 对比查询:对比
    {{prod_db}}
    {{dev_db}}
    的前后查询
  • 灵活使用:用户可将两个参数设置为同一数据库,进行单数据库分析

Notebook YAML Spec Reference

Notebook YAML规范参考

Key structure:
yaml
version: 1
metadata:
  id: string           # kebab-case + random suffix
  name: string         # display name
  created_at: string   # ISO 8601
  updated_at: string   # ISO 8601
default_context:       # optional database/schema context
  database: string
  schema: string
cells:
  - id: string
    type: sql | markdown | parameter
    content: string    # SQL, markdown, or parameter config (JSON)
    display_type: table | bar | timeseries
核心结构:
yaml
version: 1
metadata:
  id: string           # kebab-case + 随机后缀
  name: string         # 显示名称
  created_at: string   # ISO 8601格式
  updated_at: string   # ISO 8601格式
default_context:       # 可选的数据库/Schema上下文
  database: string
  schema: string
cells:
  - id: string
    type: sql | markdown | parameter
    content: string    # SQL、markdown或参数配置(JSON)
    display_type: table | bar | timeseries

Parameter Cell Spec

参数单元格规范

Parameter cells allow defining variables referenced in SQL via
{{param_name}}
syntax:
yaml
- id: param-prod-db
  type: parameter
  content:
    name: prod_db              # variable name
    config:
      type: text                   # free-form text input
      default_value: "ANALYTICS"
      placeholder: "Prod database"
  display_type: table
Parameter types:
  • text
    : Free-form text input (used for database names)
  • schema_selector
    : Two dropdowns (database -> schema), value stored as
    DATABASE.SCHEMA
  • dropdown
    : Select from predefined options
参数单元格允许定义通过
{{param_name}}
语法在SQL中引用的变量:
yaml
- id: param-prod-db
  type: parameter
  content:
    name: prod_db              # 变量名
    config:
      type: text                   # 自由文本输入
      default_value: "ANALYTICS"
      placeholder: "Prod database"
  display_type: table
参数类型:
  • text
    :自由文本输入(用于数据库名称)
  • schema_selector
    :两个下拉菜单(数据库→Schema),值存储为
    DATABASE.SCHEMA
  • dropdown
    :从预定义选项中选择

Task

任务

Generate a SQL Notebook with validation queries based on the mode and target.
根据模式和目标生成包含验证查询的SQL笔记本。

Phase 1: Get Changed Files

阶段1:获取变更文件

The approach differs based on mode:
方法因模式而异:

If PR mode (GitHub PR):

如果是PR模式(GitHub PR):

  1. Extract the PR number and repo from the target URL.
    • Example:
      https://github.com/monte-carlo-data/dbt/pull/3386
      -> owner=
      monte-carlo-data
      , repo=
      dbt
      , PR=
      3386
  2. Fetch PR metadata using
    gh
    :
bash
gh pr view <PR#> --repo <owner>/<repo> --json number,title,author,mergedAt,headRefOid
  1. Fetch the list of changed files:
bash
gh pr view <PR#> --repo <owner>/<repo> --json files --jq '.files[].path'
  1. Fetch the diff:
bash
gh pr diff <PR#> --repo <owner>/<repo>
  1. Filter the changed files list to only
    .sql
    files under
    models/
    or
    snapshots/
    directories (at any depth — e.g.,
    models/
    ,
    analytics/models/
    ,
    dbt/models/
    ). These are the dbt models to analyze. If no model SQL files were changed, report that and stop.
  2. For each changed model file, fetch the full file content at the head SHA:
bash
gh api repos/<owner>/<repo>/contents/<file_path>?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"
  1. Fetch dbt_project.yml for schema resolution. Detect the dbt project root by looking at the changed file paths — find the common parent directory that contains
    dbt_project.yml
    . Try these paths in order until one succeeds:
bash
gh api repos/<owner>/<repo>/contents/<dbt_root>/dbt_project.yml?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"
Common
<dbt_root>
locations:
analytics
,
.
(repo root),
dbt
,
transform
. Try each until found.
Save
dbt_project.yml
to
/tmp/validation_notebook_working/<PR#>/dbt_project.yml
.
  1. 从目标URL中提取PR编号和仓库信息。
    • 示例:
      https://github.com/monte-carlo-data/dbt/pull/3386
      → owner=
      monte-carlo-data
      ,repo=
      dbt
      ,PR=
      3386
  2. 使用
    gh
    获取PR元数据:
bash
gh pr view <PR#> --repo <owner>/<repo> --json number,title,author,mergedAt,headRefOid
  1. 获取变更文件列表:
bash
gh pr view <PR#> --repo <owner>/<repo> --json files --jq '.files[].path'
  1. 获取差异内容:
bash
gh pr diff <PR#> --repo <owner>/<repo>
  1. 筛选变更文件列表,仅保留
    models/
    snapshots/
    目录下的
    .sql
    文件(任意深度——例如
    models/
    analytics/models/
    dbt/models/
    )。这些是需要分析的dbt模型。如果没有变更的模型SQL文件,报告此情况并停止。
  2. 对于每个变更的模型文件,获取head SHA对应的完整文件内容:
bash
gh api repos/<owner>/<repo>/contents/<file_path>?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"
  1. 获取dbt_project.yml用于Schema解析。通过变更文件路径检测dbt项目根目录——找到包含
    dbt_project.yml
    的公共父目录。按以下顺序尝试路径,直到找到为止:
bash
gh api repos/<owner>/<repo>/contents/<dbt_root>/dbt_project.yml?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"
常见的
<dbt_root>
位置:
analytics
.
(仓库根目录)、
dbt
transform
。逐个尝试直到找到。
dbt_project.yml
保存到
/tmp/validation_notebook_working/<PR#>/dbt_project.yml

If Local mode (Local Directory):

如果是本地模式(本地目录):

  1. Change to the target directory.
  2. Get current branch info:
bash
git rev-parse --abbrev-ref HEAD
  1. Detect base branch - try
    main
    ,
    master
    ,
    develop
    in order, or use upstream tracking branch.
  2. Get the list of changed SQL files compared to base branch:
bash
git diff --name-only <base_branch>...HEAD -- '*.sql'
  1. Filter to only
    .sql
    files under
    models/
    or
    snapshots/
    directories (at any depth — e.g.,
    models/
    ,
    analytics/models/
    ,
    dbt/models/
    ). If no model SQL files were changed, report that and stop.
  2. Get the diff for each changed file:
bash
git diff <base_branch>...HEAD -- <file_path>
  1. Read model files directly from the filesystem.
  2. Find dbt_project.yml:
bash
find . -name "dbt_project.yml" -type f | head -1
  1. For notebook metadata in local mode, use:
    • ID:
      local-<branch-name>-<timestamp>
    • Title:
      Local: <branch-name>
    • Author: Output of
      git config user.name
    • Merged: "N/A (local)"
  1. 切换到目标目录。
  2. 获取当前分支信息:
bash
git rev-parse --abbrev-ref HEAD
  1. 检测基础分支——依次尝试
    main
    master
    develop
    ,或使用上游跟踪分支。
  2. 获取与基础分支相比变更的SQL文件列表:
bash
git diff --name-only <base_branch>...HEAD -- '*.sql'
  1. 筛选仅保留
    models/
    snapshots/
    目录下的
    .sql
    文件(任意深度——例如
    models/
    analytics/models/
    dbt/models/
    )。如果没有变更的模型SQL文件,报告此情况并停止。
  2. 获取每个变更文件的差异:
bash
git diff <base_branch>...HEAD -- <file_path>
  1. 直接从文件系统读取模型文件。
  2. 查找dbt_project.yml
bash
find . -name "dbt_project.yml" -type f | head -1
  1. 本地模式下的笔记本元数据使用:
    • ID
      local-<branch-name>-<timestamp>
    • 标题
      Local: <branch-name>
    • 作者
      git config user.name
      的输出
    • 合并状态:"N/A (local)"

Model Selection (applies to both modes)

模型选择(适用于两种模式)

After filtering to
.sql
files under
models/
or
snapshots/
:
  1. If
    --models
    was specified:
    Filter the changed files list to only include models whose filename (without
    .sql
    extension, case-insensitive) matches one of the specified model names. If any specified model is not found in the changed files, warn the user but continue with the models that were found. If none match, report that and stop.
  2. Model cap: If more than 10 models remain after filtering, select the first 10 (by file path order) and warn the user:
    ⚠️ <total_count> models changed — generating validation queries for the first 10 only.
    To generate for specific models, re-run with: --models <model1,model2,...>
    Skipped models: <list of skipped model filenames>
筛选出
models/
snapshots/
下的
.sql
文件后:
  1. 如果指定了
    --models
    筛选变更文件列表,仅包含文件名(不含
    .sql
    扩展名,不区分大小写)与指定模型名称匹配的模型。如果任何指定模型未在变更文件中找到,向用户发出警告但继续处理找到的模型。如果没有匹配的模型,报告此情况并停止。
  2. 模型数量上限: 如果筛选后剩余模型超过10个,选择前10个(按文件路径顺序)并向用户发出警告:
    ⚠️ 共变更了<total_count>个模型——仅为前10个生成验证查询。
    如需为特定模型生成,请重新运行:--models <model1,model2,...>
    跳过的模型:<跳过的模型文件名列表>

Phase 2: Parse Changed Models

阶段2:解析变更模型

For EACH changed dbt model
.sql
file, parse and extract:
对于每个变更的dbt模型
.sql
文件,解析并提取:

2a. Model Metadata

2a. 模型元数据

Output table name -- Derive from file name:
  • <any_path>/models/<subdir>/<model_name>.sql
    -> table is
    <MODEL_NAME>
    (uppercase, taken from the filename)
Output schema -- Use the schema resolution script:
  1. Setup: Save
    dbt_project.yml
    and model files to
    /tmp/validation_notebook_working/<id>/
    preserving paths:
    /tmp/validation_notebook_working/<id>/
    +-- dbt_project.yml
    +-- models/
        +-- <path>/<model>.sql
  2. Run the script for each model:
    bash
    python3 ${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/resolve_dbt_schema.py /tmp/validation_notebook_working/<id>/dbt_project.yml /tmp/validation_notebook_working/<id>/models/<path>/<model>.sql
  3. Error handling: If the script fails, STOP immediately and report the error. Do NOT proceed with notebook generation if schema resolution fails.
  4. Output: The script prints the resolved schema (e.g.,
    PROD
    ,
    PROD_STAGE
    ,
    PROD_LINEAGE
    )
Note: Do NOT manually parse dbt_project.yml or model configs for schema -- always use the script. It handles model config overrides, dbt_project.yml routing rules, PROD_ prefix for custom schemas, and defaults to
PROD
.
Config block -- Look for
{{ config(...) }}
and extract:
  • materialized
    -- 'table', 'view', 'incremental', 'ephemeral'
  • unique_key
    -- the dedup key (may be a string or list)
  • cluster_by
    -- clustering fields (may contain the time axis)
Core segmentation fields -- Scan the entire model SQL for fields likely to be business keys:
  • Fields named
    *_id
    (e.g.,
    account_id
    ,
    resource_id
    ,
    monitor_id
    ) that appear in JOIN ON, GROUP BY, PARTITION BY, or
    unique_key
  • Deduplicate and rank by frequency. Take the top 3.
Time axis field -- Detect the model's time dimension (in priority order):
  1. is_incremental()
    block: field used in the WHERE comparison
  2. cluster_by
    config: timestamp/date fields
  3. Field name conventions:
    ingest_ts
    ,
    created_time
    ,
    date_part
    ,
    timestamp
    ,
    run_start_time
    ,
    export_ts
    ,
    event_created_time
  4. ORDER BY DESC in QUALIFY/ROW_NUMBER
If no time axis is found, skip time-axis queries for this model.
输出表名 —— 从文件名推导:
  • <any_path>/models/<subdir>/<model_name>.sql
    → 表名为
    <MODEL_NAME>
    (大写,取自文件名)
输出Schema —— 使用Schema解析脚本:
  1. 设置:将
    dbt_project.yml
    和模型文件保存到
    /tmp/validation_notebook_working/<id>/
    并保留路径:
    /tmp/validation_notebook_working/<id>/
    +-- dbt_project.yml
    +-- models/
        +-- <path>/<model>.sql
  2. 为每个模型运行脚本
    bash
    python3 ${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/resolve_dbt_schema.py /tmp/validation_notebook_working/<id>/dbt_project.yml /tmp/validation_notebook_working/<id>/models/<path>/<model>.sql
  3. 错误处理:如果脚本失败,立即停止并报告错误。如果Schema解析失败,请勿继续生成笔记本。
  4. 输出:脚本打印解析后的Schema(例如
    PROD
    PROD_STAGE
    PROD_LINEAGE
注意:请勿手动解析dbt_project.yml或模型配置来获取Schema——始终使用脚本。它会处理模型配置覆盖、dbt_project.yml路由规则、自定义Schema的PROD前缀,并默认使用
PROD
配置块 —— 查找
{{ config(...) }}
并提取:
  • materialized
    —— 'table'、'view'、'incremental'、'ephemeral'
  • unique_key
    —— 去重键(可能是字符串或列表)
  • cluster_by
    —— 聚类字段(可能包含时间轴)
核心分段字段 —— 扫描整个模型SQL,查找可能是业务键的字段:
  • 命名为
    *_id
    的字段(例如
    account_id
    resource_id
    monitor_id
    ),出现在JOIN ON、GROUP BY、PARTITION BY或
    unique_key
  • 去重并按频率排序。取前3个。
时间轴字段 —— 检测模型的时间维度(按优先级排序):
  1. is_incremental()
    块:WHERE比较中使用的字段
  2. cluster_by
    配置:时间戳/日期字段
  3. 字段命名惯例:
    ingest_ts
    created_time
    date_part
    timestamp
    run_start_time
    export_ts
    event_created_time
  4. QUALIFY/ROW_NUMBER中的ORDER BY DESC
如果未找到时间轴,则跳过此模型的时间轴查询。

2b. Diff Analysis

2b. 差异分析

Parse the diff hunks for this file. Classify each changed line:
  • Changed fields -- Lines added/modified in SELECT clauses or CTE definitions. Extract the output column name.
  • Changed filters -- Lines added/modified in WHERE clauses.
  • Changed joins -- Lines added/modified in JOIN ON conditions.
  • Changed unique_key -- If
    unique_key
    in config was modified, note both old and new values.
  • New columns -- Columns in "after" SELECT that don't appear in "before" (pure additions).
解析此文件的差异块。对每个变更行进行分类:
  • 变更字段 —— SELECT子句或CTE定义中添加/修改的行。提取输出列名。
  • 变更过滤器 —— WHERE子句中添加/修改的行。
  • 变更连接 —— JOIN ON条件中添加/修改的行。
  • 变更unique_key —— 如果配置中的
    unique_key
    被修改,记录新旧值。
  • 新列 —— "after" SELECT中出现但"before"中没有的列(纯新增)。

2c. Model Classification

2c. 模型分类

Classify each model as new or modified based on the diff:
  • If the diff for this file contains
    new file mode
    → classify as new
  • Otherwise → classify as modified
This classification determines which query patterns are generated in Phase 3.
Note: For new models, Phase 2b diff analysis is skipped (there is no "before" to compare against). Phase 2a metadata extraction still applies.
根据差异将每个模型分类为新建修改
  • 如果文件差异包含
    new file mode
    → 分类为新建
  • 否则 → 分类为修改
此分类决定了阶段3中生成的查询模式。
注意: 对于新建模型,跳过阶段2b的差异分析(没有"before"可对比)。阶段2a的元数据提取仍然适用。

Phase 3: Generate Validation Queries

阶段3:生成验证查询

For each changed model, generate the applicable queries based on its classification (new vs modified).
CRITICAL: Parameter Placeholder Syntax
Use double curly braces
{{...}}
for parameter placeholders. Do NOT use
${...}
or any other syntax.
Correct:
{{prod_db}}.PROD.AGENT_RUNS
Wrong:
${prod_db}.PROD.AGENT_RUNS
Table Reference Format:
  • Use
    {{prod_db}}.<SCHEMA>.<TABLE_NAME>
    for prod queries
  • Use
    {{dev_db}}.<SCHEMA>.<TABLE_NAME>
    for dev queries
  • <SCHEMA>
    is hardcoded per-model using the output from the schema resolution script

对于每个变更模型,根据其分类(新建/修改)生成适用的查询。
关键:参数占位符语法
使用双大括号
{{...}}
作为参数占位符。请勿使用
${...}
或其他语法。
正确:
{{prod_db}}.PROD.AGENT_RUNS
错误:
${prod_db}.PROD.AGENT_RUNS
表引用格式:
  • 生产环境查询使用
    {{prod_db}}.<SCHEMA>.<TABLE_NAME>
  • 开发环境查询使用
    {{dev_db}}.<SCHEMA>.<TABLE_NAME>
  • <SCHEMA>
    每个模型硬编码的,使用Schema解析脚本的输出

Query Patterns for NEW Models

新建模型的查询模式

For new models, all queries target
{{dev_db}}
only. No comparison queries are generated since no prod table exists.
对于新建模型,所有查询仅针对
{{dev_db}}
。由于不存在生产环境表,不生成对比查询。

Pattern 7-new: Total Row Count

模式7-new:总行数

Trigger: Always.
sql
SELECT COUNT(*) AS total_rows
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
触发条件: 始终触发。
sql
SELECT COUNT(*) AS total_rows
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>

Pattern 9: Sample Data Preview

模式9:样本数据预览

Trigger: Always.
sql
SELECT *
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
LIMIT 20
触发条件: 始终触发。
sql
SELECT *
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
LIMIT 20

Pattern 2-new: Core Segmentation Counts

模式2-new:核心分段计数

Trigger: Always.
sql
SELECT
    <segmentation_field>,
    COUNT(*) AS row_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <segmentation_field>
ORDER BY row_count DESC
LIMIT 100
触发条件: 始终触发。
sql
SELECT
    <segmentation_field>,
    COUNT(*) AS row_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <segmentation_field>
ORDER BY row_count DESC
LIMIT 100

Pattern 5: Uniqueness Check

模式5:唯一性检查

Trigger: Always for new models (verify unique_key constraint from the start).
sql
SELECT
    COUNT(*) AS total_rows,
    COUNT(DISTINCT <key_fields>) AS distinct_keys,
    COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
sql
SELECT <key_fields>, COUNT(*) AS n
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <key_fields>
HAVING COUNT(*) > 1
ORDER BY n DESC
LIMIT 100
触发条件: 新建模型始终触发(从一开始验证unique_key约束)。
sql
SELECT
    COUNT(*) AS total_rows,
    COUNT(DISTINCT <key_fields>) AS distinct_keys,
    COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
sql
SELECT <key_fields>, COUNT(*) AS n
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <key_fields>
HAVING COUNT(*) > 1
ORDER BY n DESC
LIMIT 100

Pattern 6-new: NULL Rate Check (all columns)

模式6-new:NULL率检查(所有列)

Trigger: Always. Checks all output columns since everything is new.
sql
SELECT
    COUNT(*) AS total_rows,
    SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) AS <col1>_null_count,
    ROUND(100.0 * SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col1>_null_pct,
    SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) AS <col2>_null_count,
    ROUND(100.0 * SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col2>_null_pct
    -- repeat for each output column
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
触发条件: 始终触发。检查所有输出列,因为一切都是新的。
sql
SELECT
    COUNT(*) AS total_rows,
    SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) AS <col1>_null_count,
    ROUND(100.0 * SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col1>_null_pct,
    SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) AS <col2>_null_count,
    ROUND(100.0 * SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col2>_null_pct
    -- 为每个输出列重复上述内容
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>

Pattern 8: Time-Axis Continuity

模式8:时间轴连续性

Trigger: Model is
materialized='incremental'
OR a time axis field was identified.
sql
SELECT
    CAST(<time_axis> AS DATE) AS day,
    COUNT(*) AS row_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
GROUP BY day
ORDER BY day DESC
LIMIT 30

触发条件: 模型的
materialized='incremental'
或识别到时间轴字段。
sql
SELECT
    CAST(<time_axis> AS DATE) AS day,
    COUNT(*) AS row_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
GROUP BY day
ORDER BY day DESC
LIMIT 30

Query Patterns for MODIFIED Models

修改模型的查询模式

For modified models, single-table queries use
{{prod_db}}
and comparison queries use both.
对于修改模型,单表查询使用
{{prod_db}}
,对比查询同时使用两者。

Pattern 7: Total Row Count

模式7:总行数

Trigger: Always.
sql
SELECT COUNT(*) AS total_rows
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
触发条件: 始终触发。
sql
SELECT COUNT(*) AS total_rows
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>

Pattern 9: Sample Data Preview

模式9:样本数据预览

Trigger: Always.
sql
SELECT *
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
LIMIT 20
触发条件: 始终触发。
sql
SELECT *
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
LIMIT 20

Pattern 2: Core Segmentation Counts

模式2:核心分段计数

Trigger: Always.
sql
SELECT
    <segmentation_field>,
    COUNT(*) AS row_count
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <segmentation_field>
ORDER BY row_count DESC
LIMIT 100
触发条件: 始终触发。
sql
SELECT
    <segmentation_field>,
    COUNT(*) AS row_count
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <segmentation_field>
ORDER BY row_count DESC
LIMIT 100

Pattern 1: Changed Field Distribution

模式1:变更字段分布

Trigger: Changed fields found in Phase 2b. Exclude added columns (from "New columns" in Phase 2b) — only include fields that exist in prod.
sql
SELECT
    <changed_field>,
    COUNT(*) AS row_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) AS pct
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <changed_field>
ORDER BY row_count DESC
LIMIT 100
触发条件: 阶段2b中发现变更字段。排除新增列(阶段2b中的"新列")——仅包含生产环境中已存在的字段。
sql
SELECT
    <changed_field>,
    COUNT(*) AS row_count,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) AS pct
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <changed_field>
ORDER BY row_count DESC
LIMIT 100

Pattern 5: Uniqueness Check

模式5:唯一性检查

Trigger: JOIN condition changed,
unique_key
changed, or model is incremental.
sql
SELECT
    COUNT(*) AS total_rows,
    COUNT(DISTINCT <key_fields>) AS distinct_keys,
    COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
sql
SELECT <key_fields>, COUNT(*) AS n
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <key_fields>
HAVING COUNT(*) > 1
ORDER BY n DESC
LIMIT 100
触发条件: JOIN条件变更、
unique_key
变更,或模型为增量模型。
sql
SELECT
    COUNT(*) AS total_rows,
    COUNT(DISTINCT <key_fields>) AS distinct_keys,
    COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
sql
SELECT <key_fields>, COUNT(*) AS n
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <key_fields>
HAVING COUNT(*) > 1
ORDER BY n DESC
LIMIT 100

Pattern 6: NULL Rate Check

模式6:NULL率检查

Trigger: New column added, or column wrapped in COALESCE/NULLIF.
Important: Added columns (from "New columns" in Phase 2b) do NOT exist in prod yet. For added columns, query
{{dev_db}}
only. For modified columns (COALESCE/NULLIF changes), compare both databases.
For added columns (dev only):
sql
SELECT
    COUNT(*) AS total_rows,
    SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
    ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
For modified columns (prod vs dev):
sql
SELECT
    'prod' AS source,
    COUNT(*) AS total_rows,
    SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
    ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
UNION ALL
SELECT
    'dev' AS source,
    COUNT(*) AS total_rows,
    SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
    ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
触发条件: 添加新列,或列被COALESCE/NULLIF包裹。
重要提示: 新增列(阶段2b中的"新列")在生产环境中尚不存在。对于新增列,仅查询
{{dev_db}}
。对于修改的列(COALESCE/NULLIF变更),对比两个数据库。
新增列(仅开发环境):
sql
SELECT
    COUNT(*) AS total_rows,
    SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
    ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
修改列(生产环境vs开发环境):
sql
SELECT
    'prod' AS source,
    COUNT(*) AS total_rows,
    SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
    ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
UNION ALL
SELECT
    'dev' AS source,
    COUNT(*) AS total_rows,
    SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
    ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>

Pattern 8: Time-Axis Continuity

模式8:时间轴连续性

Trigger: Model is
materialized='incremental'
OR a time axis field was identified.
sql
SELECT
    CAST(<time_axis> AS DATE) AS day,
    COUNT(*) AS row_count
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
GROUP BY day
ORDER BY day DESC
LIMIT 30
触发条件: 模型的
materialized='incremental'
或识别到时间轴字段。
sql
SELECT
    CAST(<time_axis> AS DATE) AS day,
    COUNT(*) AS row_count
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
GROUP BY day
ORDER BY day DESC
LIMIT 30

Pattern 3: Before/After Comparison

模式3:前后对比

Trigger: Always (for changed fields + top segmentation field). Modified models only.
Important: Exclude added columns (from "New columns" in Phase 2b) from
<group_fields>
. Only use fields that exist in BOTH prod and dev. Added columns don't exist in prod and will cause query errors.
sql
WITH prod AS (
    SELECT <group_fields>, COUNT(*) AS cnt
    FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
    GROUP BY <group_fields>
),
dev AS (
    SELECT <group_fields>, COUNT(*) AS cnt
    FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
    GROUP BY <group_fields>
)
SELECT
    COALESCE(b.<field>, d.<field>) AS <field>,
    COALESCE(b.cnt, 0) AS cnt_prod,
    COALESCE(d.cnt, 0) AS cnt_dev,
    COALESCE(d.cnt, 0) - COALESCE(b.cnt, 0) AS diff
FROM prod b
FULL OUTER JOIN dev d ON b.<field> = d.<field>
ORDER BY ABS(diff) DESC
LIMIT 100
触发条件: 始终触发(针对变更字段+顶级分段字段)。仅适用于修改模型
重要提示:
<group_fields>
中排除新增列(阶段2b中的"新列")。仅使用在生产环境和开发环境中均存在的字段。新增列在生产环境中不存在,会导致查询错误。
sql
WITH prod AS (
    SELECT <group_fields>, COUNT(*) AS cnt
    FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
    GROUP BY <group_fields>
),
dev AS (
    SELECT <group_fields>, COUNT(*) AS cnt
    FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
    GROUP BY <group_fields>
)
SELECT
    COALESCE(b.<field>, d.<field>) AS <field>,
    COALESCE(b.cnt, 0) AS cnt_prod,
    COALESCE(d.cnt, 0) AS cnt_dev,
    COALESCE(d.cnt, 0) - COALESCE(b.cnt, 0) AS diff
FROM prod b
FULL OUTER JOIN dev d ON b.<field> = d.<field>
ORDER BY ABS(diff) DESC
LIMIT 100

Pattern 7b: Row Count Comparison

模式7b:行数对比

Trigger: Always. Modified models only.
sql
SELECT 'prod' AS source, COUNT(*) AS row_count FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
UNION ALL
SELECT 'dev' AS source, COUNT(*) AS row_count FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
触发条件: 始终触发。仅适用于修改模型
sql
SELECT 'prod' AS source, COUNT(*) AS row_count FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
UNION ALL
SELECT 'dev' AS source, COUNT(*) AS row_count FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>

Phase 4: Build Notebook YAML

阶段4:构建Notebook YAML

4a. Metadata

4a. 元数据

yaml
version: 1
metadata:
  id: validation-pr-<PR_NUMBER>-<random_suffix>
  name: "Validation: PR #<PR_NUMBER> - <PR_TITLE_TRUNCATED>"
  created_at: "<current_iso_timestamp>"
  updated_at: "<current_iso_timestamp>"
yaml
version: 1
metadata:
  id: validation-pr-<PR_NUMBER>-<random_suffix>
  name: "Validation: PR #<PR_NUMBER> - <PR_TITLE_TRUNCATED>"
  created_at: "<current_iso_timestamp>"
  updated_at: "<current_iso_timestamp>"

4b. Parameter Cells

4b. 参数单元格

Only include
prod_db
if there are modified models.
If all models are new, only include
dev_db
.
yaml
undefined
仅当存在修改模型时才包含
prod_db
。如果所有模型都是新建的,仅包含
dev_db
yaml
undefined

Include ONLY if there are modified models:

仅当存在修改模型时包含:

  • id: param-prod-db type: parameter content: name: prod_db config: type: text default_value: "ANALYTICS" placeholder: "Prod database (e.g., ANALYTICS)" display_type: table
  • id: param-prod-db type: parameter content: name: prod_db config: type: text default_value: "ANALYTICS" placeholder: "Prod database (e.g., ANALYTICS)" display_type: table

Always include:

始终包含:

  • id: param-dev-db type: parameter content: name: dev_db config: type: text default_value: "PERSONAL_<USER>" placeholder: "Dev database (e.g., PERSONAL_JSMITH)" display_type: table
undefined
  • id: param-dev-db type: parameter content: name: dev_db config: type: text default_value: "PERSONAL_<USER>" placeholder: "Dev database (e.g., PERSONAL_JSMITH)" display_type: table
undefined

4c. Markdown Summary Cell

4c. Markdown摘要单元格

yaml
- id: cell-summary
  type: markdown
  content: |
    # Validation Queries for <PR or Local Branch>
    ## Summary
    - **Title:** <title>
    - **Author:** <author>
    - **Source:** <PR URL or "Local branch: <branch>">
    - **Status:** <merge_timestamp or "Not yet merged" or "N/A (local)">
    ## Changes
    <brief description based on diff analysis>
    ## Changed Models
    - `<SCHEMA>.<TABLE_NAME>` (from `<file_path>`)
    ## How to Use
    1. Select your Snowflake connector above
    2. Set **dev_db** to your dev database (e.g., `PERSONAL_JSMITH`)
    3. If modified models are present, set **prod_db** to your prod database (e.g., `ANALYTICS`)
    4. Run single-table queries first, then comparison queries
  display_type: table
yaml
- id: cell-summary
  type: markdown
  content: |
    # <PR或本地分支>的验证查询
    ## 摘要
    - **标题:** <title>
    - **作者:** <author>
    - **来源:** <PR URL或"Local branch: <branch>">
    - **状态:** <merge_timestamp或"Not yet merged"或"N/A (local)">
    ## 变更内容
    <基于差异分析的简要描述>
    ## 变更模型
    - `<SCHEMA>.<TABLE_NAME>`(来自`<file_path>`)
    ## 使用方法
    1. 在上方选择您的Snowflake连接器
    2. 将**dev_db**设置为您的开发数据库(例如`PERSONAL_JSMITH`)
    3. 如果存在修改模型,将**prod_db**设置为您的生产数据库(例如`ANALYTICS`)
    4. 先运行单表查询,再运行对比查询
  display_type: table

4d. SQL Cell Format

4d. SQL单元格格式

yaml
- id: cell-<pattern>-<model>-<index>
  type: sql
  content: |
    /*
    ========================================
    <Pattern Name (human-readable, e.g. "Total Row Count" — do NOT include pattern numbers like "Pattern 7:")>
    ========================================
    Model: <SCHEMA>.<TABLE_NAME>
    Triggered by: <why this pattern was generated>
    What to look for: <interpretation guidance>
    ----------------------------------------
    */
    <actual_sql_query>
  display_type: table
yaml
- id: cell-<pattern>-<model>-<index>
  type: sql
  content: |
    /*
    ========================================
    <模式名称(易读格式,例如"总行数"——请勿包含模式编号如"Pattern 7:")>
    ========================================
    模型:<SCHEMA>.<TABLE_NAME>
    触发原因:<生成此模式的原因>
    关注要点:<解释指导>
    ----------------------------------------
    */
    <实际SQL查询>
  display_type: table

4e. Cell Organization

4e. 单元格组织

Cells are ordered consistently for both model types, following this sequence:
New models:
  1. Summary markdown cell (note that model is new)
  2. Parameter cells (dev_db only — no prod_db if all models are new)
  3. Total row count (Pattern 7-new)
  4. Sample data preview (Pattern 9)
  5. Core segmentation counts (Pattern 2-new)
  6. Uniqueness check (Pattern 5), NULL rate check (Pattern 6-new), Time-axis continuity (Pattern 8)
Modified models:
  1. Summary markdown cell
  2. Parameter cells (prod_db, dev_db)
  3. Total row count (Pattern 7)
  4. Sample data preview (Pattern 9)
  5. Core segmentation counts (Pattern 2)
  6. Changed field distribution (Pattern 1)
  7. Uniqueness check (Pattern 5), NULL rate check (Pattern 6), Time-axis continuity (Pattern 8)
  8. Before/after comparisons (Pattern 3), Row count comparison (Pattern 7b)
两种模型类型的单元格顺序一致,遵循以下序列:
新建模型:
  1. 摘要markdown单元格(注明模型为新建)
  2. 参数单元格(仅dev_db——如果所有模型都是新建的,则不包含prod_db)
  3. 总行数(模式7-new)
  4. 样本数据预览(模式9)
  5. 核心分段计数(模式2-new)
  6. 唯一性检查(模式5)、NULL率检查(模式6-new)、时间轴连续性(模式8)
修改模型:
  1. 摘要markdown单元格
  2. 参数单元格(prod_db、dev_db)
  3. 总行数(模式7)
  4. 样本数据预览(模式9)
  5. 核心分段计数(模式2)
  6. 变更字段分布(模式1)
  7. 唯一性检查(模式5)、NULL率检查(模式6)、时间轴连续性(模式8)
  8. 前后对比(模式3)、行数对比(模式7b)

Phase 5: Generate Import URL

阶段5:生成导入URL

  1. Write notebook YAML to
    /tmp/validation_notebook_working/<id>/notebook.yaml
  2. Run the URL generation script:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/generate_notebook_url.py /tmp/validation_notebook_working/<id>/notebook.yaml --mc-base-url <MC_BASE_URL>
  1. The script validates both YAML syntax and notebook schema (required fields on metadata and cells). If validation fails, read the error messages carefully, fix the YAML to match the spec in Phase 4, and re-run.
  1. 将Notebook YAML写入
    /tmp/validation_notebook_working/<id>/notebook.yaml
  2. 运行URL生成脚本:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/generate_notebook_url.py /tmp/validation_notebook_working/<id>/notebook.yaml --mc-base-url <MC_BASE_URL>
  1. 脚本验证YAML语法和笔记本Schema(元数据和单元格上的必填字段)。如果验证失败,仔细阅读错误消息,修复YAML使其符合阶段4中的规范,然后重新运行。

Phase 6: Output

阶段6:输出

Present:
markdown
undefined
呈现:
markdown
undefined

Validation Notebook Generated

验证笔记本已生成

Summary

摘要

  • Source: PR #<number> - <title> OR Local: <branch>
  • Author: <author>
  • Changed Models: <count> models (of <total_count> changed)
  • Generated Queries: <count> queries
⚠️ If models were capped: "Only the first 10 of <total_count> changed models were included. Re-run with
--models
to select specific models."
  • 来源: PR #<number> - <title> 或 Local: <branch>
  • 作者: <author>
  • 变更模型: <count>个模型(共<total_count>个变更)
  • 生成查询: <count>个查询
⚠️ 如果模型数量受限:"仅包含了<total_count>个变更模型中的前10个。重新运行时使用
--models
选择特定模型。"

Notebook Opened

笔记本已打开

The notebook has been opened directly in your browser. Select your Snowflake connector in the notebook interface to begin running queries. Make sure MC Bridge is running. Let me know if you want tips on how to install this locally
undefined
笔记本已直接在您的浏览器中打开。 在笔记本界面中选择您的Snowflake连接器开始运行查询。 确保MC Bridge正在运行。如果您需要本地安装的提示,请告知我
undefined

Important Guidelines

重要指南

  1. Do NOT execute queries -- only generate the notebook
  2. Keep SQL readable -- proper formatting and meaningful aliases
  3. Include LIMIT 100 on queries that could return many rows
  4. Use double curly braces --
    {{prod_db}}
    NOT
    ${prod_db}
  5. Use correct table format --
    {{prod_db}}.<SCHEMA>.<TABLE>
    and
    {{dev_db}}.<SCHEMA>.<TABLE>
  6. Always use the schema resolution script -- do NOT manually parse dbt_project.yml
  7. Schema is NOT a parameter -- only
    prod_db
    and
    dev_db
    are parameters
  8. Skip ephemeral models -- they have no physical table
  9. Truncate notebook name -- keep under 50 chars
  10. Generate unique cell IDs -- use pattern like
    cell-p3-model-1
  11. YAML multiline content -- use
    |
    block scalar for SQL with comments
  12. ASCII-only YAML -- the script sanitizes and validates before encoding
  1. 请勿执行查询——仅生成笔记本
  2. 保持SQL可读性——适当的格式和有意义的别名
  3. 在可能返回大量行的查询中包含LIMIT 100
  4. 使用双大括号——
    {{prod_db}}
    而非
    ${prod_db}
  5. 使用正确的表格式——
    {{prod_db}}.<SCHEMA>.<TABLE>
    {{dev_db}}.<SCHEMA>.<TABLE>
  6. 始终使用Schema解析脚本——请勿手动解析dbt_project.yml
  7. Schema不是参数——仅
    prod_db
    dev_db
    是参数
  8. 跳过临时模型——它们没有物理表
  9. 截断笔记本名称——保持在50字符以内
  10. 生成唯一的单元格ID——使用类似
    cell-p3-model-1
    的格式
  11. YAML多行内容——对带注释的SQL使用
    |
    块标量
  12. 仅使用ASCII的YAML——脚本在编码前会进行清理和验证

Query Pattern Reference

查询模式参考

PatternNameTriggerModel TypeDatabaseOrder
7 / 7-newTotal Row CountAlwaysBoth
{{prod_db}}
(modified) /
{{dev_db}}
(new)
1
9Sample Data PreviewAlwaysBoth
{{prod_db}}
(modified) /
{{dev_db}}
(new)
2
2 / 2-newCore Segmentation CountsAlwaysBoth
{{prod_db}}
(modified) /
{{dev_db}}
(new)
3
1Changed Field DistributionColumn modified in diff (not added)Modified only
{{prod_db}}
4
5Uniqueness CheckJOIN/unique_key changed (modified) / Always (new)Both
{{dev_db}}
5
6 / 6-newNULL Rate CheckNew column or COALESCE (modified) / Always (new)BothAdded col:
{{dev_db}}
only; COALESCE: Both (modified) /
{{dev_db}}
(new)
5
8Time-Axis ContinuityIncremental or time fieldBoth
{{prod_db}}
(modified) /
{{dev_db}}
(new)
5
3Before/After ComparisonChanged fields (not added)Modified onlyBoth6
7bRow Count ComparisonAlwaysModified onlyBoth6
模式名称触发条件模型类型数据库顺序
7 / 7-new总行数始终两者
{{prod_db}}
(修改)/
{{dev_db}}
(新建)
1
9样本数据预览始终两者
{{prod_db}}
(修改)/
{{dev_db}}
(新建)
2
2 / 2-new核心分段计数始终两者
{{prod_db}}
(修改)/
{{dev_db}}
(新建)
3
1变更字段分布差异中修改的列(非新增)仅修改
{{prod_db}}
4
5唯一性检查JOIN/unique_key变更(修改)/ 始终(新建)两者
{{dev_db}}
5
6 / 6-newNULL率检查新增列或COALESCE(修改)/ 始终(新建)两者新增列:仅
{{dev_db}}
;COALESCE:两者(修改)/
{{dev_db}}
(新建)
5
8时间轴连续性增量模型或时间字段两者
{{prod_db}}
(修改)/
{{dev_db}}
(新建)
5
3前后对比变更字段(非新增)仅修改两者6
7b行数对比始终仅修改两者6

MC Bridge Setup Help

MC Bridge安装配置帮助

If the user asks how to install or set up MC Bridge, fetch the README from the mc-bridge repo and show the relevant quick start / setup instructions:
bash
gh api repos/monte-carlo-data/mc-bridge/readme --jq '.content' | base64 --decode
Focus on: how to install, configure connections, and run MC Bridge. Don't dump the entire README — extract just the setup-relevant sections.
如果用户询问如何安装或设置MC Bridge,从mc-bridge仓库获取README并显示相关的快速开始/安装配置说明:
bash
gh api repos/monte-carlo-data/mc-bridge/readme --jq '.content' | base64 --decode
重点关注:如何安装、配置连接和运行MC Bridge。不要输出整个README——仅提取与安装配置相关的部分。

Limitations

局限性

  • Use this skill only when the task clearly matches the scope described above.
  • Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
  • Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
  • 仅当任务明确符合上述描述的范围时使用此技能。
  • 不要将输出视为环境特定验证、测试或专家评审的替代品。
  • 如果缺少所需输入、权限、安全边界或成功标准,请停止并请求澄清。