testing-dbt-models

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

dbt Testing

dbt 测试

Every model deserves at least one test. Primary keys need unique + not_null.
每个模型都至少需要一个测试。主键需要同时配置unique和not_null测试。

Workflow

工作流程

1. Study Existing Test Patterns

1. 研究现有测试模式

CRITICAL: Match the project's existing testing style before adding new tests.
bash
undefined
重要提示:添加新测试前必须匹配项目现有的测试风格。
bash
undefined

Find all schema.yml files with tests

Find all schema.yml files with tests

find . -name "schema.yml" -exec grep -l "tests:" {} ;
find . -name "schema.yml" -exec grep -l "tests:" {} ;

Read existing tests to learn patterns

Read existing tests to learn patterns

cat models/staging/schema.yml | head -100 cat models/marts/schema.yml | head -100
cat models/staging/schema.yml | head -100 cat models/marts/schema.yml | head -100

Check for custom tests or dbt packages

Check for custom tests or dbt packages

ls tests/ cat packages.yml 2>/dev/null

**Extract from existing tests:**
- YAML formatting style (indentation, spacing)
- Test coverage depth (all columns vs key columns only)
- Use of custom tests (dbt_utils, dbt_expectations, custom macros)
- Description style (brief vs detailed)
- Severity levels used (warn vs error)
ls tests/ cat packages.yml 2>/dev/null

**从现有测试中提取以下信息:**
- YAML格式风格(缩进、间距)
- 测试覆盖深度(所有列 vs 仅关键列)
- 自定义测试的使用情况(dbt_utils、dbt_expectations、自定义宏)
- 描述风格(简洁 vs 详细)
- 使用的严重级别(warn vs error)

2. Read Model SQL

2. 阅读模型SQL文件

bash
cat models/<path>/<model_name>.sql
Identify: primary keys, foreign keys, categorical columns, date columns, business-critical fields.
bash
cat models/<path>/<model_name>.sql
识别:主键、外键、分类列、日期列、业务关键字段。

3. Check Existing Tests for This Model

3. 检查该模型的现有测试

bash
cat models/<path>/schema.yml | grep -A 50 "<model_name>"
bash
cat models/<path>/schema.yml | grep -A 50 "<model_name>"

or

or

find . -name "schema.yml" -exec grep -l "<model_name>" {} ;
undefined
find . -name "schema.yml" -exec grep -l "<model_name>" {} ;
undefined

4. Identify Testable Columns

4. 确定可测试列

Column TypeRecommended Tests
Primary key
unique
,
not_null
Foreign key
not_null
,
relationships
Categorical
accepted_values
(ask user for valid values)
Required field
not_null
Date/timestamp
not_null
Boolean
accepted_values: [true, false]
列类型推荐测试
主键
unique
,
not_null
外键
not_null
,
relationships
分类列
accepted_values
(需向用户确认有效值)
必填字段
not_null
日期/时间戳
not_null
布尔值
accepted_values: [true, false]

5. Write Tests in schema.yml

5. 在schema.yml中编写测试

Match the existing style from step 1. Example format (adapt to project):
yaml
version: 2

models:
  - name: model_name
    description: "Brief description of what this model contains"
    columns:
      - name: primary_key_column
        description: "Unique identifier for this record"
        tests:
          - unique
          - not_null

      - name: foreign_key_column
        description: "Reference to related_model"
        tests:
          - not_null
          - relationships:
              to: ref('related_model')
              field: related_key_column

      - name: status
        description: "Current status of the record"
        tests:
          - not_null
          - accepted_values:
              values: ['pending', 'active', 'completed', 'cancelled']

      - name: created_at
        description: "Timestamp when record was created"
        tests:
          - not_null
匹配步骤1中提取的现有风格。示例格式(需根据项目调整):
yaml
version: 2

models:
  - name: model_name
    description: "Brief description of what this model contains"
    columns:
      - name: primary_key_column
        description: "Unique identifier for this record"
        tests:
          - unique
          - not_null

      - name: foreign_key_column
        description: "Reference to related_model"
        tests:
          - not_null
          - relationships:
              to: ref('related_model')
              field: related_key_column

      - name: status
        description: "Current status of the record"
        tests:
          - not_null
          - accepted_values:
              values: ['pending', 'active', 'completed', 'cancelled']

      - name: created_at
        description: "Timestamp when record was created"
        tests:
          - not_null

6. Run Tests

6. 运行测试

bash
undefined
bash
undefined

Test specific model

Test specific model

dbt test --select <model_name>
dbt test --select <model_name>

Test with upstream

Test with upstream

dbt test --select +<model_name>
undefined
dbt test --select +<model_name>
undefined

7. Fix Failing Tests

7. 修复测试失败问题

Common failures and fixes:
FailureLikely CauseFix
unique
fails
Duplicate recordsAdd deduplication in model
not_null
fails
NULL values in sourceAdd COALESCE or filter
relationships
fails
Orphan recordsAdd WHERE clause or fix upstream
accepted_values
fails
New/unexpected valuesUpdate accepted values list
常见失败场景及修复方案:
失败类型可能原因修复方案
unique
测试失败
存在重复记录在模型中添加去重逻辑
not_null
测试失败
源数据中存在NULL值添加COALESCE函数或过滤条件
relationships
测试失败
存在孤立记录添加WHERE子句或修复上游数据
accepted_values
测试失败
出现新的/未预期的值更新有效值列表

Test Types Reference

测试类型参考

Generic Tests (built-in)

通用测试(内置)

yaml
tests:
  - unique
  - not_null
  - accepted_values:
      values: ['a', 'b', 'c']
  - relationships:
      to: ref('other_model')
      field: id
yaml
tests:
  - unique
  - not_null
  - accepted_values:
      values: ['a', 'b', 'c']
  - relationships:
      to: ref('other_model')
      field: id

Custom Generic Tests

自定义通用测试

yaml
tests:
  - dbt_utils.expression_is_true:
      expression: "amount >= 0"
  - dbt_utils.recency:
      datepart: day
      field: created_at
      interval: 1
yaml
tests:
  - dbt_utils.expression_is_true:
      expression: "amount >= 0"
  - dbt_utils.recency:
      datepart: day
      field: created_at
      interval: 1

Singular Tests

单个测试

Create
tests/<test_name>.sql
:
sql
-- tests/assert_positive_revenue.sql
select *
from {{ ref('orders') }}
where revenue < 0
创建
tests/<test_name>.sql
文件:
sql
-- tests/assert_positive_revenue.sql
select *
from {{ ref('orders') }}
where revenue < 0

Anti-Patterns

反模式

  • Adding tests without checking existing project patterns first
  • Using different YAML formatting style than existing tests
  • Models without any tests
  • Primary keys without both unique AND not_null
  • Testing only obvious columns, ignoring business-critical ones
  • Hardcoding accepted_values without confirming with stakeholders
  • Adding dbt_utils tests when project doesn't use that package
  • 未先检查现有项目模式就添加测试
  • 使用与现有测试不同的YAML格式风格
  • 模型未配置任何测试
  • 主键未同时配置unique和not_null测试
  • 仅测试明显列,忽略业务关键列
  • 未与相关人员确认就硬编码accepted_values的值
  • 项目未使用dbt_utils包却添加该包的测试