omni-to-databricks-metric-view

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Omni → Databricks Metric View

Omni → Databricks Metric View

Converts an Omni topic into a Databricks Metric View by exploring the Omni model via API, translating its field definitions into the Databricks Metric View embedded YAML format, and executing via the Databricks CLI.
See FIELD-MAPPING.md for full before/after translation examples and YAML-REFERENCE.md for the complete YAML structure, aggregate type, and format mapping tables.

通过API探索Omni模型,将其字段定义转换为Databricks Metric View的嵌入式YAML格式,并通过Databricks CLI执行,从而将Omni主题转换为Databricks Metric View。
查看FIELD-MAPPING.md获取完整的转换前后示例,查看YAML-REFERENCE.md获取完整的YAML结构、聚合类型和格式映射表。

Prerequisites

前提条件

bash
undefined
bash
undefined

Verify the Omni CLI is installed — if not, ask the user to install it

Verify the Omni CLI is installed — if not, ask the user to install it

command -v omni >/dev/null || echo "ERROR: Omni CLI is not installed."

```bash
command -v omni >/dev/null || echo "ERROR: Omni CLI is not installed."

```bash

Show available profiles and select the appropriate one

Show available profiles and select the appropriate one

omni config show
omni config show

If multiple profiles exist, ask the user which to use, then switch:

If multiple profiles exist, ask the user which to use, then switch:

omni config use <profile-name>

```bash
omni config use <profile-name>

```bash

Databricks CLI — verify installed and check profiles

Databricks CLI — verify installed and check profiles

databricks --version cat ~/.databrickscfg

---

> **Tip**: Use `-o json` to force structured output for programmatic parsing, or `-o human` for readable tables. The default is `auto` (human in a TTY, JSON when piped).
databricks --version cat ~/.databrickscfg

---

> **提示**:使用`-o json`强制输出结构化内容以便程序化解析,或使用`-o human`输出易读的表格。默认值为`auto`(在终端中输出人类可读格式,管道传输时输出JSON)。

Workflow

工作流程

Step 1 — Gather Requirements

步骤1 — 收集需求

Ask the user:
  1. Which Omni topic do they want to convert? (e.g.,
    orders
    )
  2. What is the Unity Catalog destination? (
    catalog.schema
    ) (e.g.,
    main.sales
    )
  3. What is the Databricks SQL Warehouse ID? (run
    databricks sql warehouses list
    to find it)
  4. Is this a new metric view or does one already exist at
    catalog.schema.[topic_name]_mv
    ?
  5. Which Databricks CLI profile to use (optional — only if the user has multiple profiles)?
⚠️ STOP — Confirm all answers before proceeding. The metric view will be named
[topic_name]_mv
by default.

询问用户:
  1. 他们想要转换哪个Omni主题?(例如:
    orders
  2. Unity Catalog目标位置是什么?(
    catalog.schema
    )(例如:
    main.sales
  3. Databricks SQL仓库ID是什么?(运行
    databricks sql warehouses list
    查找)
  4. 这是一个新的metric view,还是在
    catalog.schema.[topic_name]_mv
    位置已存在一个?
  5. 使用哪个Databricks CLI配置文件(可选 — 仅当用户有多个配置文件时)?
⚠️ 停止 — 继续前确认所有答案。Metric View默认命名为
[topic_name]_mv

Step 2 — Explore the Omni Model

步骤2 — 探索Omni模型

2a. Find the model ID

2a. 查找模型ID

bash
omni models list --modelkind SHARED
Identify the Shared Model and note its
id
. Always prefer the Shared Model over Schema or Workbook models.
bash
omni models list --modelkind SHARED
识别共享模型并记录其
id
。优先选择共享模型而非Schema或工作簿模型。

2b. Fetch the topic file

2b. 获取主题文件

bash
omni models yaml-get <modelId> --filename <topic_name>.topic
From the topic file extract:
base_view
,
joins
,
fields
,
always_filter
,
ai_context
,
sample_queries
.
bash
omni models yaml-get <modelId> --filename <topic_name>.topic
从主题文件中提取:
base_view
joins
fields
always_filter
ai_context
sample_queries

2c. Fetch the relationships file

2c. 获取关系文件

bash
omni models yaml-get <modelId> --filename relationships
bash
omni models yaml-get <modelId> --filename relationships

2d. Fetch each view file referenced in the topic

2d. 获取主题中引用的每个视图文件

For every view in
base_view
and
joins
:
bash
omni models yaml-get <modelId> --filename <view_name>.view
If a view is prefixed with
omni_dbt_
, fetch the file starting with
omni_dbt_
. Skip any view backed by
derived_table.sql
— it has no physical table.

对于
base_view
joins
中的每个视图:
bash
omni models yaml-get <modelId> --filename <view_name>.view
如果视图前缀为
omni_dbt_
,则获取以
omni_dbt_
开头的文件。跳过任何由
derived_table.sql
支持的视图 — 它没有物理表。

Step 3 — Identify Tables and Joins

步骤3 — 识别表与关联

Map view names to fully-qualified Databricks table references (
catalog.schema.table
):
Omni view nameDatabricks table
ecomm__order_items
catalog.ecomm.order_items
omni_dbt_ecomm__order_items
catalog.ecomm.order_items
(strip
omni_dbt_
)
The
__
separator maps to schema (left) and table (right). Confirm the catalog prefix with the user.
The
joins
indentation defines the join chain — a view indented beneath another joins into its parent:
yaml
joins:
  user_order_facts: {}          # skip — derived CTE
  ecomm__users: {}              # joins to base_view
  ecomm__inventory_items:       # joins to base_view
    ecomm__products:            # joins to inventory_items
Find the dimension with
primary_key: true
in each view — list it first among that table's dimensions.
STOP — Confirm the full table list and join hierarchy with the user before continuing.

将视图名称映射到Databricks的全限定表引用(
catalog.schema.table
):
Omni视图名称Databricks表
ecomm__order_items
catalog.ecomm.order_items
omni_dbt_ecomm__order_items
catalog.ecomm.order_items
(移除
omni_dbt_
__
分隔符映射为模式(左侧)和表(右侧)。与用户确认目录前缀。
joins
的缩进定义了关联链 — 缩进在另一个视图下的视图会关联到其父视图:
yaml
joins:
  user_order_facts: {}          # skip — derived CTE
  ecomm__users: {}              # joins to base_view
  ecomm__inventory_items:       # joins to base_view
    ecomm__products:            # joins to inventory_items
在每个视图中找到带有
primary_key: true
的维度 — 将其列为该表维度的第一个。
停止 — 继续前与用户确认完整的表列表和关联层级。

Step 4 — Resolve the Field List

步骤4 — 解析字段列表

SyntaxMeaning
(no
fields
parameter)
Include all fields from all views
all_views.*
/
view.*
Include all fields from all views / named view
tag:<value>
Include all fields with this tag
view.field
Include this specific field
-view.field
Exclude this field (always wins over wildcard inclusions)
Process inclusions first, then apply exclusions. Also remove any field with
hidden: true
unless explicitly included by name.

语法含义
(无
fields
参数)
包含所有视图中的全部字段
all_views.*
/
view.*
包含所有视图/指定视图中的全部字段
tag:<value>
包含带有此标签的所有字段
view.field
包含此特定字段
-view.field
排除此字段(始终优先于通配符包含)
先处理包含规则,再应用排除规则。同时移除所有带有
hidden: true
的字段,除非通过名称明确包含。

Step 5 — Build Join Definitions

步骤5 — 构建关联定义

Using the hierarchy from Step 3 and
relationships.yaml
, extract join columns from
on_sql
and build the
on:
clause. Use the view name as the join
name
.
Star schema (single-level):
yaml
joins:
  - name: ecomm__users
    source: catalog.ecomm.users
    'on': source.user_id = ecomm__users.id
Snowflake schema (multi-hop):
yaml
joins:
  - name: ecomm__inventory_items
    source: catalog.ecomm.inventory_items
    'on': source.inventory_item_id = ecomm__inventory_items.id
    joins:
      - name: ecomm__products
        source: catalog.ecomm.products
        'on': ecomm__inventory_items.product_id = ecomm__products.id
⚠️
on
is a YAML 1.1 reserved word — always single-quote the key as
'on':
. Columns from nested (2+ level) joins cannot be used in
expr
— flatten them through a denormalized direct join instead.

使用步骤3中的层级和
relationships.yaml
,从
on_sql
中提取关联列并构建
on:
子句。使用视图名称作为关联
name
星型模式(单层级):
yaml
joins:
  - name: ecomm__users
    source: catalog.ecomm.users
    'on': source.user_id = ecomm__users.id
雪花型模式(多层级):
yaml
joins:
  - name: ecomm__inventory_items
    source: catalog.ecomm.inventory_items
    'on': source.inventory_item_id = ecomm__inventory_items.id
    joins:
      - name: ecomm__products
        source: catalog.ecomm.products
        'on': ecomm__inventory_items.product_id = ecomm__products.id
⚠️
on
是YAML 1.1的保留字 — 始终将键用单引号括起来,即
'on':
。嵌套(2级及以上)关联中的列不能用于
expr
— 改为通过非规范化直接关联来展平它们。

Step 6 — Map Dimensions and Measures

步骤6 — 映射维度与度量

For each field that survived Step 4, translate it using the rules below. See FIELD-MAPPING.md for full examples.
Dimension quick reference:
Omni field typeDatabricks translation
Standard string/number
expr: COLUMN
type: time
(no timeframes)
Single timestamp dimension
type: time
+
timeframes
One
DATE_TRUNC(...)
dimension per timeframe
groups:
CASE WHEN ... END
expression
bin_boundaries:
CASE WHEN
range expression
duration:
DATEDIFF(unit, start, end)
expression
type: yesno
BOOLEAN dimension (not a filter; omit
data_type
)
Measure quick reference:
Omni measure typeDatabricks translation
aggregate_type: sum/avg/max/min
SUM(col)
/
AVG(col)
/ etc.
aggregate_type: count
COUNT(*)
aggregate_type: count_distinct
COUNT(DISTINCT col)
Derived (refs other measures)
MEASURE(measure_a) op MEASURE(measure_b)
— define atomics first
filters:
on a measure
AGG(col) FILTER (WHERE condition)
Strip Omni's
${view.column}
refs to bare column names (or
join_name.column
for joined fields). Use
display_name
for the Omni
label
,
comment
for
description
, and carry
synonyms
directly. See YAML-REFERENCE.md for format and aggregate type mapping tables.
If the topic has
ai_context
, carry it into the metric view's top-level
comment
.
STOP — Review all dimensions, measures, and join definitions with the user before generating the final output.

对于步骤4中保留的每个字段,使用以下规则进行转换。查看FIELD-MAPPING.md获取完整示例。
维度速查:
Omni字段类型Databricks转换结果
标准字符串/数字
expr: COLUMN
type: time
(无时间范围)
单个时间戳维度
type: time
+
timeframes
每个时间范围对应一个
DATE_TRUNC(...)
维度
groups:
CASE WHEN ... END
表达式
bin_boundaries:
CASE WHEN
范围表达式
duration:
DATEDIFF(unit, start, end)
表达式
type: yesno
BOOLEAN维度(不是过滤器;省略
data_type
度量速查:
Omni度量类型Databricks转换结果
aggregate_type: sum/avg/max/min
SUM(col)
/
AVG(col)
/ 等
aggregate_type: count
COUNT(*)
aggregate_type: count_distinct
COUNT(DISTINCT col)
派生(引用其他度量)
MEASURE(measure_a) op MEASURE(measure_b)
— 先定义原子度量
度量上的
filters:
AGG(col) FILTER (WHERE condition)
将Omni的
${view.column}
引用简化为裸列名(或关联字段使用
join_name.column
)。使用
display_name
对应Omni的
label
comment
对应
description
,直接保留
synonyms
。查看YAML-REFERENCE.md获取格式和聚合类型映射表。
如果主题包含
ai_context
,将其带入metric view顶级的
comment
中。
停止 — 生成最终输出前,与用户审核所有维度、度量和关联定义。

Step 7 — Check for Existing Metric View

步骤7 — 检查现有Metric View

bash
databricks api post /api/2.0/sql/statements \
  --json "{\"warehouse_id\": \"<WAREHOUSE_ID>\", \"statement\": \"SHOW VIEWS IN <catalog>.<schema> LIKE '%_mv'\", \"wait_timeout\": \"30s\", \"catalog\": \"<CATALOG>\", \"schema\": \"<SCHEMA>\"}"
  • View does not exist → use
    CREATE OR REPLACE VIEW ... WITH METRICS
  • View already exists → use
    ALTER VIEW ... AS $$ ... $$

bash
databricks api post /api/2.0/sql/statements \
  --json "{\"warehouse_id\": \"<WAREHOUSE_ID>\", \"statement\": \"SHOW VIEWS IN <catalog>.<schema> LIKE '%_mv'\", \"wait_timeout\": \"30s\", \"catalog\": \"<CATALOG>\", \"schema\": \"<SCHEMA>\"}"
  • 视图不存在 → 使用
    CREATE OR REPLACE VIEW ... WITH METRICS
  • 视图已存在 → 使用
    ALTER VIEW ... AS $$ ... $$

Step 8 — Generate and Execute the SQL

步骤8 — 生成并执行SQL

Write the SQL to a temp file:
sql
-- CREATE (new view)
CREATE OR REPLACE VIEW catalog.schema.orders_mv
WITH METRICS
LANGUAGE YAML
AS $$
version: 1.1
comment: "..."
source: catalog.ecomm.order_items

joins:
  - name: ecomm__users
    source: catalog.ecomm.users
    'on': source.user_id = ecomm__users.id

dimensions:
  - name: id
    expr: id
    display_name: "Order ID"

  - name: status
    expr: status
    display_name: "Order Status"

measures:
  - name: order_count
    expr: COUNT(*)
    display_name: "Order Count"

  - name: total_sale_price
    expr: SUM(sale_price)
    display_name: "Total Sale Price"
    format:
      type: currency
      currency_code: USD
$$
sql
-- ALTER (existing view)
ALTER VIEW catalog.schema.orders_mv AS $$
version: 1.1
...
$$
Execute via the SQL Statements API (
databricks sql execute
does not exist in CLI v0.295.0+):
bash
databricks api post /api/2.0/sql/statements \
  --json "{
    \"warehouse_id\": \"<WAREHOUSE_ID>\",
    \"statement\": $(cat /tmp/orders_mv.sql | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),
    \"wait_timeout\": \"50s\",
    \"catalog\": \"<CATALOG>\",
    \"schema\": \"<SCHEMA>\"
  }"
Check the response for
"state": "SUCCEEDED"
. If
"state": "FAILED"
, read
status.error.message
and see the Troubleshooting section below.
STOP — Confirm which group or user should receive access before running the GRANT. This is a permission change visible to others.
Grant access:
bash
databricks api post /api/2.0/sql/statements \
  --json "{\"warehouse_id\": \"<WAREHOUSE_ID>\", \"statement\": \"GRANT SELECT ON VIEW catalog.schema.orders_mv TO \`group_name\`\", \"wait_timeout\": \"30s\", \"catalog\": \"<CATALOG>\", \"schema\": \"<SCHEMA>\"}"

将SQL写入临时文件:
sql
-- CREATE (new view)
CREATE OR REPLACE VIEW catalog.schema.orders_mv
WITH METRICS
LANGUAGE YAML
AS $$
version: 1.1
comment: "..."
source: catalog.ecomm.order_items

joins:
  - name: ecomm__users
    source: catalog.ecomm.users
    'on': source.user_id = ecomm__users.id

dimensions:
  - name: id
    expr: id
    display_name: "Order ID"

  - name: status
    expr: status
    display_name: "Order Status"

measures:
  - name: order_count
    expr: COUNT(*)
    display_name: "Order Count"

  - name: total_sale_price
    expr: SUM(sale_price)
    display_name: "Total Sale Price"
    format:
      type: currency
      currency_code: USD
$$
sql
-- ALTER (existing view)
ALTER VIEW catalog.schema.orders_mv AS $$
version: 1.1
...
$$
通过SQL Statements API执行(CLI v0.295.0+中不存在
databricks sql execute
):
bash
databricks api post /api/2.0/sql/statements \
  --json "{
    \"warehouse_id\": \"<WAREHOUSE_ID>\",
    \"statement\": $(cat /tmp/orders_mv.sql | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))'),
    \"wait_timeout\": \"50s\",
    \"catalog\": \"<CATALOG>\",
    \"schema\": \"<SCHEMA>\"
  }"
检查响应中的
"state": "SUCCEEDED"
。如果
"state": "FAILED"
,查看
status.error.message
并参考下方的故障排除部分。
停止 — 运行GRANT前,确认哪些组或用户应获得访问权限。这是对他人可见的权限变更。
授予访问权限:
bash
databricks api post /api/2.0/sql/statements \
  --json "{\"warehouse_id\": \"<WAREHOUSE_ID>\", \"statement\": \"GRANT SELECT ON VIEW catalog.schema.orders_mv TO \`group_name\`\", \"wait_timeout\": \"30s\", \"catalog\": \"<CATALOG>\", \"schema\": \"<SCHEMA>\"}"

Troubleshooting

故障排除

When the SQL Statements API returns
"state": "FAILED"
, read
status.error.message
:
Error message containsLikely causeFix
METRIC_VIEW_INVALID_VIEW_DEFINITION
Invalid YAML field or valueCheck the field name against the valid keys (
name
,
expr
,
display_name
,
comment
,
synonyms
,
format
). Common mistakes: using
description
instead of
comment
, unsupported
decimal_places
.
warehouse not running
/
RESOURCE_DOES_NOT_EXIST
Warehouse is stopped or wrong IDStart the warehouse in the Databricks UI or verify the ID with
databricks api get /api/2.0/sql/warehouses
.
PERMISSION_DENIED
The CLI profile lacks privilegesCheck the profile's permissions on the catalog/schema with
databricks api get /api/2.0/unity-catalog/permissions/...
.
TABLE_OR_VIEW_NOT_FOUND
A source or join table doesn't exist in Unity CatalogVerify each table reference with
SHOW TABLES IN <catalog>.<schema>
.
on
parse error / unexpected key
on:
not quoted
Always write
'on':
(single-quoted) — it is a YAML 1.1 reserved word.
wait_timeout
value error
Timeout out of range
wait_timeout
must be between
5s
and
50s
.
If the error message is truncated, run the same statement with
"wait_timeout": "5s"
to get the full synchronous error response.

当SQL Statements API返回
"state": "FAILED"
时,查看
status.error.message
错误消息包含可能原因修复方案
METRIC_VIEW_INVALID_VIEW_DEFINITION
YAML字段或值无效检查字段名称是否符合有效键(
name
expr
display_name
comment
synonyms
format
)。常见错误:使用
description
而非
comment
、不支持的
decimal_places
warehouse not running
/
RESOURCE_DOES_NOT_EXIST
仓库已停止或ID错误在Databricks UI中启动仓库,或通过
databricks api get /api/2.0/sql/warehouses
验证ID。
PERMISSION_DENIED
CLI配置文件缺少权限通过
databricks api get /api/2.0/unity-catalog/permissions/...
检查配置文件在目录/模式上的权限。
TABLE_OR_VIEW_NOT_FOUND
源表或关联表在Unity Catalog中不存在通过
SHOW TABLES IN <catalog>.<schema>
验证每个表引用。
on
解析错误/意外键
on:
未加引号
始终写入
'on':
(单引号括起)— 它是YAML 1.1的保留字。
wait_timeout
值错误
超时超出范围
wait_timeout
必须在
5s
50s
之间。
如果错误消息被截断,使用
"wait_timeout": "5s"
运行相同语句以获取完整的同步错误响应。

Critical Rules

关键规则

  1. Naming: Name the metric view
    [topic_name]_mv
    (snake_case, lowercase)
  2. CREATE vs ALTER: Check for existence first —
    CREATE OR REPLACE
    for new,
    ALTER VIEW
    for existing
  3. Version: Always use
    version: 1.1
    (requires Databricks Runtime 17.2+)
  4. Skip derived CTEs: Views with
    derived_table.sql
    have no physical table — skip and warn the user
  5. Confirm before executing: Show the full generated SQL to the user before running
  6. Boolean fields: Map
    type: yesno
    as BOOLEAN dimensions — not filters.
    data_type
    is not a valid field — omit it
  7. Composed measures: Use
    MEASURE()
    syntax; define atomic measures before composed ones
  8. YAML quoting:
    on
    is a YAML 1.1 reserved word — always write
    'on':
    (single-quoted)
  9. No SELECT *: All fields must be explicitly defined
  10. MAP columns: Skip joins to tables containing
    MAP
    type columns — not supported
  11. Nested join refs: Only direct star join columns (1 level) can be used in
    expr
    . Flatten snowflake schema joins through a denormalized direct join
  12. Warehouse ID required: Always confirm before execution — cannot be inferred
  13. Exclusions win:
    -view.field
    always overrides any wildcard inclusion
  14. Format type values are lowercase:
    number
    ,
    currency
    ,
    date
    ,
    date_time
    ,
    percentage
    ,
    byte
  15. Date format required:
    type: date
    and
    type: date_time
    both require
    date_format
  16. Currency format: Use
    currency_code: USD
    not
    iso_code: USD
  17. decimal_places
    unsupported
    : Omit it entirely — causes a parse error
  18. CLI execution: Use
    databricks api post /api/2.0/sql/statements
    ;
    wait_timeout
    must be
    5s
    50s
  19. Omni CLI flag: Use
    --filename
    (not
    --file-name
    )
  20. Field description key: Use
    comment:
    not
    description:
    description
    is not a recognized field and causes a parse error

  1. 命名:Metric View命名为
    [topic_name]_mv
    (蛇形命名法,小写)
  2. CREATE vs ALTER:先检查是否存在 — 新视图使用
    CREATE OR REPLACE
    ,现有视图使用
    ALTER VIEW
  3. 版本:始终使用
    version: 1.1
    (需要Databricks Runtime 17.2+)
  4. 跳过派生CTE:带有
    derived_table.sql
    的视图没有物理表 — 跳过并提醒用户
  5. 执行前确认:运行前向用户展示完整的生成SQL
  6. 布尔字段:将
    type: yesno
    映射为BOOLEAN维度 — 不是过滤器。
    data_type
    不是有效字段 — 省略它
  7. 组合度量:使用
    MEASURE()
    语法;先定义原子度量再定义组合度量
  8. YAML引号
    on
    是YAML 1.1的保留字 — 始终写入
    'on':
    (单引号括起)
  9. **禁止SELECT ***:所有字段必须显式定义
  10. MAP列:跳过关联到包含
    MAP
    类型列的表 — 不支持
  11. 嵌套关联引用:只有直接星型关联列(1级)可用于
    expr
    。通过非规范化直接关联展平雪花模式关联
  12. 仓库ID必填:执行前必须确认 — 无法推断
  13. 排除优先
    -view.field
    始终覆盖任何通配符包含
  14. 格式类型值小写
    number
    currency
    date
    date_time
    percentage
    byte
  15. 日期格式必填
    type: date
    type: date_time
    都需要
    date_format
  16. 货币格式:使用
    currency_code: USD
    而非
    iso_code: USD
  17. decimal_places
    不支持
    :完全省略它 — 会导致解析错误
  18. CLI执行:使用
    databricks api post /api/2.0/sql/statements
    wait_timeout
    必须为
    5s
    50s
  19. Omni CLI标志:使用
    --filename
    (而非
    --file-name
  20. 字段描述键:使用
    comment:
    而非
    description:
    description
    不是可识别字段,会导致解析错误

Reference

参考资料