creating-dbt-models

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

dbt Model Development

dbt 模型开发

Read before you write. Build after you write. Verify your output.

先阅读，再编写。编写后构建，验证输出。

Critical Rules

关键规则

ALWAYS run
dbt build
after creating/modifying models - compile is NOT enough
ALWAYS verify output after build using
```
dbt show
```
- don't assume success
If build fails 3+ times, stop and reassess your entire approach

创建/修改模型后必须运行
dbt build
- 仅compile是不够的
构建完成后必须验证输出 - 不要默认成功
如果构建失败3次以上，停止操作并重新评估整体方案

Workflow

工作流程

1. Understand the Task Requirements

1. 理解任务要求

What columns are needed? List them explicitly.
What is the grain of the table (one row per what)?
What calculations or aggregations are required?

需要哪些列？明确列出。
表的粒度是什么（每行代表什么）？
需要哪些计算或聚合操作？

2. Discover Project Conventions

2. 识别项目规范

bash

cat dbt_project.yml
find models/ -name "*.sql" | head -20

Read 2-3 existing models to learn naming, config, and SQL patterns.

bash

cat dbt_project.yml
find models/ -name "*.sql" | head -20

阅读2-3个现有模型，学习命名、配置和SQL模式。

3. Find Similar Models

3. 查找相似模型

bash

undefined

bash

undefined

Find models with similar purpose

查找用途相似的模型

find models/ -name "agg.sql" -o -name "fct_.sql" | head -5


Learn from existing models: join types, aggregation patterns, NULL handling.

find models/ -name "agg.sql" -o -name "fct_.sql" | head -5


从现有模型中学习：关联类型、聚合模式、NULL值处理方式。

4. Check Upstream Data

4. 检查上游数据

bash

undefined

bash

undefined

Preview upstream data if needed

如有需要，预览上游数据

dbt show --select <upstream_model> --limit 10

undefined

dbt show --select <upstream_model> --limit 10

undefined

5. Write the Model

5. 编写模型

Follow discovered conventions. Match the required columns exactly.

遵循识别到的规范，严格匹配所需的列。

6. Compile (Syntax Check)

6. 编译（语法检查）

bash

dbt compile --select <model_name>

bash

dbt compile --select <model_name>

7. BUILD - MANDATORY

7. 构建 - 强制性步骤

This step is REQUIRED. Do NOT skip it.

bash

dbt build --select <model_name>

If build fails:

Read the error carefully
Fix the specific issue
Run build again
If fails 3+ times, step back and reassess approach

此步骤为必填项，请勿跳过。

bash

dbt build --select <model_name>

如果构建失败：

仔细阅读错误信息
修复具体问题
重新运行构建
如果失败3次以上，请暂停并重新评估方案

8. Verify Output (CRITICAL)

8. 验证输出（关键步骤）

Build success does NOT mean correct output.

bash

undefined

构建成功并不代表输出正确。

bash

undefined

Check the table was created and preview data

检查表是否已创建并预览数据

dbt show --select <model_name> --limit 10


Verify:
- Column names match requirements exactly
- Row count is reasonable
- Data values look correct
- No unexpected NULLs

dbt show --select <model_name> --limit 10


验证内容：
- 列名是否与要求完全匹配
- 行数是否合理
- 数据值是否看起来正确
- 有无意外的NULL值

9. Verify Calculations Against Sample Data

9. 对照样本数据验证计算结果

For models with calculations, verify correctness manually:

bash

undefined

对于包含计算逻辑的模型，手动验证正确性：

bash

undefined

Pick a specific row and verify calculation by hand

选择特定行并手动验证计算结果

dbt show --inline " select * from {{ ref('model_name') }} where <primary_key> = '<known_value>' " --limit 1

Cross-check aggregations

交叉检查聚合结果

dbt show --inline " select count(*), sum(<column>) from {{ ref('model_name') }} "


For example, if calculating `total_revenue = quantity * price`:
1. Pick one row from output
2. Look up the source quantity and price
3. Manually calculate: does it match?

dbt show --inline " select count(*), sum(<column>) from {{ ref('model_name') }} "


例如，若计算`total_revenue = quantity * price`：
1. 从输出中选取一行
2. 查找源数据中的quantity和price值
3. 手动计算：结果是否匹配？

10. Re-review Against Requirements

10. 对照要求重新检查

Before declaring done, re-read the original request:

Did you implement what was asked, not what you assumed?
Are column names exactly as specified?
Is the calculation logic correct per the requirements?
Does the grain (one row per what?) match what was requested?

在完成前，重新阅读原始需求：

你实现的是需求要求的内容，而非自己假设的内容吗？
列名是否与指定的完全一致？
计算逻辑是否符合要求？
粒度（每行代表什么）是否与要求匹配？

Anti-Patterns

反模式

Declaring done after compile without running build
Not verifying output data after build
Getting stuck in compile/build error loops
Assuming table exists just because model file exists
Writing SQL without checking existing model patterns first

仅compile后就宣布完成，未运行build
构建完成后未验证输出数据
陷入编译/构建错误循环
仅因为模型文件存在就假设表已创建
未先检查现有模型模式就编写SQL