Loading...
Loading...
Generates SQL validation notebooks for dbt PR changes with before/after comparison queries.
npx skill4agent add sickn33/antigravity-awesome-skills monte-carlo-validation-notebookTip: This skill works well with Sonnet. Runbefore invoking for faster generation./model sonnet
--mc-base-url <URL>https://getmontecarlo.com--models <model1,model2,...>.sqlghgh auth statuspython3pyyamlpip3 install pyyamlpip install pyyamluv pip install pyyaml${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/resolve_dbt_schema.pydbt_project.ymlgenerate_notebook_url.py://github.com./path/to/repo<MC_BASE_URL>/notebooks/import#<base64-encoded-yaml>textprod_dbdev_dbdbt_project.yml{{prod_db}}.<SCHEMA>.<TABLE>{{prod_db}}{{dev_db}}version: 1
metadata:
id: string # kebab-case + random suffix
name: string # display name
created_at: string # ISO 8601
updated_at: string # ISO 8601
default_context: # optional database/schema context
database: string
schema: string
cells:
- id: string
type: sql | markdown | parameter
content: string # SQL, markdown, or parameter config (JSON)
display_type: table | bar | timeseries{{param_name}}- id: param-prod-db
type: parameter
content:
name: prod_db # variable name
config:
type: text # free-form text input
default_value: "ANALYTICS"
placeholder: "Prod database"
display_type: tabletextschema_selectorDATABASE.SCHEMAdropdownhttps://github.com/monte-carlo-data/dbt/pull/3386monte-carlo-datadbt3386ghgh pr view <PR#> --repo <owner>/<repo> --json number,title,author,mergedAt,headRefOidgh pr view <PR#> --repo <owner>/<repo> --json files --jq '.files[].path'gh pr diff <PR#> --repo <owner>/<repo>.sqlmodels/snapshots/models/analytics/models/dbt/models/gh api repos/<owner>/<repo>/contents/<file_path>?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"dbt_project.ymlgh api repos/<owner>/<repo>/contents/<dbt_root>/dbt_project.yml?ref=<head_sha> --jq '.content' | python3 -c "import sys,base64; sys.stdout.write(base64.b64decode(sys.stdin.read()).decode())"<dbt_root>analytics.dbttransformdbt_project.yml/tmp/validation_notebook_working/<PR#>/dbt_project.ymlgit rev-parse --abbrev-ref HEADmainmasterdevelopgit diff --name-only <base_branch>...HEAD -- '*.sql'.sqlmodels/snapshots/models/analytics/models/dbt/models/git diff <base_branch>...HEAD -- <file_path>find . -name "dbt_project.yml" -type f | head -1local-<branch-name>-<timestamp>Local: <branch-name>git config user.name.sqlmodels/snapshots/--models.sql⚠️ <total_count> models changed — generating validation queries for the first 10 only.
To generate for specific models, re-run with: --models <model1,model2,...>
Skipped models: <list of skipped model filenames>.sql<any_path>/models/<subdir>/<model_name>.sql<MODEL_NAME>dbt_project.yml/tmp/validation_notebook_working/<id>//tmp/validation_notebook_working/<id>/
+-- dbt_project.yml
+-- models/
+-- <path>/<model>.sqlpython3 ${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/resolve_dbt_schema.py /tmp/validation_notebook_working/<id>/dbt_project.yml /tmp/validation_notebook_working/<id>/models/<path>/<model>.sqlPRODPROD_STAGEPROD_LINEAGEPROD{{ config(...) }}materializedunique_keycluster_by*_idaccount_idresource_idmonitor_idunique_keyis_incremental()cluster_byingest_tscreated_timedate_parttimestamprun_start_timeexport_tsevent_created_timeunique_keynew file mode{{...}}${...}{{prod_db}}.PROD.AGENT_RUNS${prod_db}.PROD.AGENT_RUNS{{prod_db}}.<SCHEMA>.<TABLE_NAME>{{dev_db}}.<SCHEMA>.<TABLE_NAME><SCHEMA>{{dev_db}}SELECT COUNT(*) AS total_rows
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>SELECT *
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
LIMIT 20SELECT
<segmentation_field>,
COUNT(*) AS row_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <segmentation_field>
ORDER BY row_count DESC
LIMIT 100SELECT
COUNT(*) AS total_rows,
COUNT(DISTINCT <key_fields>) AS distinct_keys,
COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>SELECT <key_fields>, COUNT(*) AS n
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <key_fields>
HAVING COUNT(*) > 1
ORDER BY n DESC
LIMIT 100SELECT
COUNT(*) AS total_rows,
SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) AS <col1>_null_count,
ROUND(100.0 * SUM(CASE WHEN <col1> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col1>_null_pct,
SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) AS <col2>_null_count,
ROUND(100.0 * SUM(CASE WHEN <col2> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS <col2>_null_pct
-- repeat for each output column
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>materialized='incremental'SELECT
CAST(<time_axis> AS DATE) AS day,
COUNT(*) AS row_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
GROUP BY day
ORDER BY day DESC
LIMIT 30{{prod_db}}SELECT COUNT(*) AS total_rows
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>SELECT *
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
LIMIT 20SELECT
<segmentation_field>,
COUNT(*) AS row_count
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <segmentation_field>
ORDER BY row_count DESC
LIMIT 100SELECT
<changed_field>,
COUNT(*) AS row_count,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) AS pct
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <changed_field>
ORDER BY row_count DESC
LIMIT 100unique_keySELECT
COUNT(*) AS total_rows,
COUNT(DISTINCT <key_fields>) AS distinct_keys,
COUNT(*) - COUNT(DISTINCT <key_fields>) AS duplicate_count
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>SELECT <key_fields>, COUNT(*) AS n
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <key_fields>
HAVING COUNT(*) > 1
ORDER BY n DESC
LIMIT 100{{dev_db}}SELECT
COUNT(*) AS total_rows,
SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>SELECT
'prod' AS source,
COUNT(*) AS total_rows,
SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
UNION ALL
SELECT
'dev' AS source,
COUNT(*) AS total_rows,
SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) AS null_count,
ROUND(100.0 * SUM(CASE WHEN <column> IS NULL THEN 1 ELSE 0 END) / NULLIF(COUNT(*), 0), 2) AS null_pct
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>materialized='incremental'SELECT
CAST(<time_axis> AS DATE) AS day,
COUNT(*) AS row_count
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
WHERE <time_axis> >= CURRENT_TIMESTAMP - INTERVAL '14' DAY
GROUP BY day
ORDER BY day DESC
LIMIT 30<group_fields>WITH prod AS (
SELECT <group_fields>, COUNT(*) AS cnt
FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <group_fields>
),
dev AS (
SELECT <group_fields>, COUNT(*) AS cnt
FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>
GROUP BY <group_fields>
)
SELECT
COALESCE(b.<field>, d.<field>) AS <field>,
COALESCE(b.cnt, 0) AS cnt_prod,
COALESCE(d.cnt, 0) AS cnt_dev,
COALESCE(d.cnt, 0) - COALESCE(b.cnt, 0) AS diff
FROM prod b
FULL OUTER JOIN dev d ON b.<field> = d.<field>
ORDER BY ABS(diff) DESC
LIMIT 100SELECT 'prod' AS source, COUNT(*) AS row_count FROM {{prod_db}}.<SCHEMA>.<TABLE_NAME>
UNION ALL
SELECT 'dev' AS source, COUNT(*) AS row_count FROM {{dev_db}}.<SCHEMA>.<TABLE_NAME>version: 1
metadata:
id: validation-pr-<PR_NUMBER>-<random_suffix>
name: "Validation: PR #<PR_NUMBER> - <PR_TITLE_TRUNCATED>"
created_at: "<current_iso_timestamp>"
updated_at: "<current_iso_timestamp>"prod_dbdev_db# Include ONLY if there are modified models:
- id: param-prod-db
type: parameter
content:
name: prod_db
config:
type: text
default_value: "ANALYTICS"
placeholder: "Prod database (e.g., ANALYTICS)"
display_type: table
# Always include:
- id: param-dev-db
type: parameter
content:
name: dev_db
config:
type: text
default_value: "PERSONAL_<USER>"
placeholder: "Dev database (e.g., PERSONAL_JSMITH)"
display_type: table- id: cell-summary
type: markdown
content: |
# Validation Queries for <PR or Local Branch>
## Summary
- **Title:** <title>
- **Author:** <author>
- **Source:** <PR URL or "Local branch: <branch>">
- **Status:** <merge_timestamp or "Not yet merged" or "N/A (local)">
## Changes
<brief description based on diff analysis>
## Changed Models
- `<SCHEMA>.<TABLE_NAME>` (from `<file_path>`)
## How to Use
1. Select your Snowflake connector above
2. Set **dev_db** to your dev database (e.g., `PERSONAL_JSMITH`)
3. If modified models are present, set **prod_db** to your prod database (e.g., `ANALYTICS`)
4. Run single-table queries first, then comparison queries
display_type: table- id: cell-<pattern>-<model>-<index>
type: sql
content: |
/*
========================================
<Pattern Name (human-readable, e.g. "Total Row Count" — do NOT include pattern numbers like "Pattern 7:")>
========================================
Model: <SCHEMA>.<TABLE_NAME>
Triggered by: <why this pattern was generated>
What to look for: <interpretation guidance>
----------------------------------------
*/
<actual_sql_query>
display_type: table/tmp/validation_notebook_working/<id>/notebook.yamlpython3 ${CLAUDE_PLUGIN_ROOT}/skills/monte-carlo-validation-notebook/scripts/generate_notebook_url.py /tmp/validation_notebook_working/<id>/notebook.yaml --mc-base-url <MC_BASE_URL># Validation Notebook Generated
## Summary
- **Source:** PR #<number> - <title> OR Local: <branch>
- **Author:** <author>
- **Changed Models:** <count> models (of <total_count> changed)
- **Generated Queries:** <count> queries
> ⚠️ If models were capped: "Only the first 10 of <total_count> changed models were included. Re-run with `--models` to select specific models."
## Notebook Opened
The notebook has been opened directly in your browser.
Select your Snowflake connector in the notebook interface to begin running queries.
*Make sure MC Bridge is running. Let me know if you want tips on how to install this locally*{{prod_db}}${prod_db}{{prod_db}}.<SCHEMA>.<TABLE>{{dev_db}}.<SCHEMA>.<TABLE>prod_dbdev_dbcell-p3-model-1|| Pattern | Name | Trigger | Model Type | Database | Order |
|---|---|---|---|---|---|
| 7 / 7-new | Total Row Count | Always | Both | | 1 |
| 9 | Sample Data Preview | Always | Both | | 2 |
| 2 / 2-new | Core Segmentation Counts | Always | Both | | 3 |
| 1 | Changed Field Distribution | Column modified in diff (not added) | Modified only | | 4 |
| 5 | Uniqueness Check | JOIN/unique_key changed (modified) / Always (new) | Both | | 5 |
| 6 / 6-new | NULL Rate Check | New column or COALESCE (modified) / Always (new) | Both | Added col: | 5 |
| 8 | Time-Axis Continuity | Incremental or time field | Both | | 5 |
| 3 | Before/After Comparison | Changed fields (not added) | Modified only | Both | 6 |
| 7b | Row Count Comparison | Always | Modified only | Both | 6 |
gh api repos/monte-carlo-data/mc-bridge/readme --jq '.content' | base64 --decode