creating-dbt-models

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

dbt Model Development

dbt 模型开发

Read before you write. Build after you write. Verify your output.
先阅读,再编写。编写后构建,验证输出。

Critical Rules

关键规则

  1. ALWAYS run
    dbt build
    after creating/modifying models - compile is NOT enough
  2. ALWAYS verify output after build using
    dbt show
    - don't assume success
  3. If build fails 3+ times, stop and reassess your entire approach
  1. 创建/修改模型后必须运行
    dbt build
    - 仅compile是不够的
  2. 构建完成后必须验证输出 - 不要默认成功
  3. 如果构建失败3次以上,停止操作并重新评估整体方案

Workflow

工作流程

1. Understand the Task Requirements

1. 理解任务要求

  • What columns are needed? List them explicitly.
  • What is the grain of the table (one row per what)?
  • What calculations or aggregations are required?
  • 需要哪些列?明确列出。
  • 表的粒度是什么(每行代表什么)?
  • 需要哪些计算或聚合操作?

2. Discover Project Conventions

2. 识别项目规范

bash
cat dbt_project.yml
find models/ -name "*.sql" | head -20
Read 2-3 existing models to learn naming, config, and SQL patterns.
bash
cat dbt_project.yml
find models/ -name "*.sql" | head -20
阅读2-3个现有模型,学习命名、配置和SQL模式。

3. Find Similar Models

3. 查找相似模型

bash
undefined
bash
undefined

Find models with similar purpose

查找用途相似的模型

find models/ -name "agg.sql" -o -name "fct_.sql" | head -5

Learn from existing models: join types, aggregation patterns, NULL handling.
find models/ -name "agg.sql" -o -name "fct_.sql" | head -5

从现有模型中学习:关联类型、聚合模式、NULL值处理方式。

4. Check Upstream Data

4. 检查上游数据

bash
undefined
bash
undefined

Preview upstream data if needed

如有需要,预览上游数据

dbt show --select <upstream_model> --limit 10
undefined
dbt show --select <upstream_model> --limit 10
undefined

5. Write the Model

5. 编写模型

Follow discovered conventions. Match the required columns exactly.
遵循识别到的规范,严格匹配所需的列。

6. Compile (Syntax Check)

6. 编译(语法检查)

bash
dbt compile --select <model_name>
bash
dbt compile --select <model_name>

7. BUILD - MANDATORY

7. 构建 - 强制性步骤

This step is REQUIRED. Do NOT skip it.
bash
dbt build --select <model_name>
If build fails:
  1. Read the error carefully
  2. Fix the specific issue
  3. Run build again
  4. If fails 3+ times, step back and reassess approach
此步骤为必填项,请勿跳过。
bash
dbt build --select <model_name>
如果构建失败:
  1. 仔细阅读错误信息
  2. 修复具体问题
  3. 重新运行构建
  4. 如果失败3次以上,请暂停并重新评估方案

8. Verify Output (CRITICAL)

8. 验证输出(关键步骤)

Build success does NOT mean correct output.
bash
undefined
构建成功并不代表输出正确。
bash
undefined

Check the table was created and preview data

检查表是否已创建并预览数据

dbt show --select <model_name> --limit 10

Verify:
- Column names match requirements exactly
- Row count is reasonable
- Data values look correct
- No unexpected NULLs
dbt show --select <model_name> --limit 10

验证内容:
- 列名是否与要求完全匹配
- 行数是否合理
- 数据值是否看起来正确
- 有无意外的NULL值

9. Verify Calculations Against Sample Data

9. 对照样本数据验证计算结果

For models with calculations, verify correctness manually:
bash
undefined
对于包含计算逻辑的模型,手动验证正确性:
bash
undefined

Pick a specific row and verify calculation by hand

选择特定行并手动验证计算结果

dbt show --inline " select * from {{ ref('model_name') }} where <primary_key> = '<known_value>' " --limit 1
dbt show --inline " select * from {{ ref('model_name') }} where <primary_key> = '<known_value>' " --limit 1

Cross-check aggregations

交叉检查聚合结果

dbt show --inline " select count(*), sum(<column>) from {{ ref('model_name') }} "

For example, if calculating `total_revenue = quantity * price`:
1. Pick one row from output
2. Look up the source quantity and price
3. Manually calculate: does it match?
dbt show --inline " select count(*), sum(<column>) from {{ ref('model_name') }} "

例如,若计算`total_revenue = quantity * price`:
1. 从输出中选取一行
2. 查找源数据中的quantity和price值
3. 手动计算:结果是否匹配?

10. Re-review Against Requirements

10. 对照要求重新检查

Before declaring done, re-read the original request:
  • Did you implement what was asked, not what you assumed?
  • Are column names exactly as specified?
  • Is the calculation logic correct per the requirements?
  • Does the grain (one row per what?) match what was requested?
在完成前,重新阅读原始需求:
  • 你实现的是需求要求的内容,而非自己假设的内容吗?
  • 列名是否与指定的完全一致?
  • 计算逻辑是否符合要求?
  • 粒度(每行代表什么)是否与要求匹配?

Anti-Patterns

反模式

  • Declaring done after compile without running build
  • Not verifying output data after build
  • Getting stuck in compile/build error loops
  • Assuming table exists just because model file exists
  • Writing SQL without checking existing model patterns first
  • 仅compile后就宣布完成,未运行build
  • 构建完成后未验证输出数据
  • 陷入编译/构建错误循环
  • 仅因为模型文件存在就假设表已创建
  • 未先检查现有模型模式就编写SQL