Loading...
Loading...
Compare original and translation side by side
allowed-toolsTask(subagent_type="datahub-skills:metadata-searcher")../shared-references/references/templates/allowed-toolsTask(subagent_type="datahub-skills:metadata-searcher")../shared-references/references/templates/| If the user wants to... | Use this instead |
|---|---|
| Search for entities by keyword or metadata | |
| Answer "who owns X?" or "what is X?" | |
| Add or update metadata (descriptions, tags, owners) | |
| Create assertions, run quality checks, manage incidents | |
| 用户需求 | 应使用的技能 |
|---|---|
| 按关键词或元数据搜索实体 | |
| 回答“谁拥有X?”或“X是什么?” | |
| 添加或更新元数据(描述、标签、所有者) | |
| 创建断言、运行质量检查、管理事件 | |
datahub search "<name>" --where "entity_type = dataset" --limit 5datahub search "<name>" --where "entity_type = dataset" --limit 5| Mode | Direction | Use Case | User Says |
|---|---|---|---|
| Impact analysis | Downstream | "What breaks if I change this?" | "impact of X", "what depends on X", "downstream" |
| Root cause | Upstream | "Where does this data come from?" | "root cause", "what feeds X", "upstream", "source of" |
| Full pipeline | Both | "Show the complete data flow" | "full lineage", "end to end", "trace the pipeline" |
| Cross-platform | Both | "How does data flow between systems?" | "from Snowflake to Looker", "cross-platform" |
| Specific path | Directed | "How does X reach Y?" | "path from X to Y", "how does X connect to Y" |
| 模式 | 方向 | 使用场景 | 用户表述示例 |
|---|---|---|---|
| 影响分析 | 下游 | “如果我修改这个会导致什么故障?” | “X的影响”、“哪些依赖X”、“下游” |
| 根本原因分析 | 上游 | “这些数据来自哪里?” | “根本原因”、“哪些数据流入X”、“上游”、“数据源” |
| 完整管道 | 双向 | “展示完整的数据流动” | “完整Lineage”、“端到端”、“追踪数据管道” |
| 跨平台 | 双向 | “数据如何在系统间流动?” | “从Snowflake到Looker”、“跨平台” |
| 特定路径 | 定向 | “X如何到达Y?” | “从X到Y的路径”、“X与Y如何连接” |
| Depth | When to Use |
|---|---|
| 1 hop | Default — immediate upstream/downstream |
| 2-3 hops | User asks for "full" lineage or cross-platform tracing |
| 3+ hops | Only with user confirmation — results grow exponentially |
| 深度 | 使用场景 |
|---|---|
| 1跳 | 默认设置——直接上游/下游 |
| 2-3跳 | 用户要求“完整”Lineage或跨平台追踪时 |
| 3跳以上 | 仅在用户确认后使用——结果数量会呈指数增长 |
| MCP tools | DataHub CLI | |
|---|---|---|
| When available | Preferred for simple traversals | Use for |
| Lineage | | |
| Enrich results | | |
path| MCP工具 | DataHub CLI | |
|---|---|---|
| 适用场景 | 优先用于简单遍历 | 用于 |
| Lineage查询 | | |
| 结果扩展 | | 使用带 |
pathdatahub lineagedatahub lineageundefinedundefined
The command returns a summary line indicating how many entities were found, the maximum hop depth, and whether results were capped. Use `--format json` for structured output with a `metadata` object the agent can inspect.
**Defaults:** `--hops 3` (full transitive lineage), `--count 100`. Increase `--count` if the summary indicates results were capped.
**Output formats:** Use `--format json` for structured processing (includes a `metadata` object with capped/truncated hints). Default table output is best for quick display to the user.
该命令会返回一行摘要信息,包含找到的实体数量、最大跳数深度以及结果是否被截断。使用`--format json`可获取包含`metadata`对象的结构化输出,供Agent查看。
**默认设置:** `--hops 3`(完整传递Lineage),`--count 100`。如果摘要显示结果被截断,请增大`--count`的值。
**输出格式:** 结构化处理请使用`--format json`(包含结果截断提示的`metadata`对象)。默认的表格输出最适合快速展示给用户。datahub lineage--projectionurn--projectionundefineddatahub lineage--projectionurn--projectionundefined
This avoids N+1 calls — collect the URNs from lineage output and resolve them all in one search. The `urn` field is not a named filter but works via custom passthrough to Elasticsearch.
**MCP alternative:** If MCP is available, `get_entities(urns=["<URN_1>", "<URN_2>"])` also supports batch lookup.
这样可避免N+1次调用——从Lineage输出中收集所有URN,通过一次搜索解析所有内容。`urn`字段并非命名过滤器,而是通过自定义传递到Elasticsearch实现功能。
**MCP备选方案:** 如果MCP可用,`get_entities(urns=["<URN_1>", "<URN_2>"])`也支持批量查询。siblingsstg_ordersanalytics.stg_orderssiblingsstg_ordersanalytics.stg_ordersdatahub lineage path --from "<URN_A>" --to "<URN_B>"pathdatahub lineage path --from "<URN_A>" --to "<URN_B>"path[source_table_1] ──→ [staging_table] ──→ [analytics_table] ──→ [Revenue Dashboard]
[source_table_2] ──┘ └──→ [daily_export][source_table_1] ──→ [staging_table] ──→ [analytics_table] ──→ [Revenue Dashboard]
[source_table_2] ──┘ └──→ [daily_export]undefinedundefined| Hop | Entity | Type | Platform | Relationship |
|---|---|---|---|---|
| 1 | staging_table | dataset | Snowflake | TRANSFORMED |
| 2 | source_table_1 | dataset | PostgreSQL | TRANSFORMED |
| 2 | source_table_2 | dataset | PostgreSQL | TRANSFORMED |
| 跳数 | 实体 | 类型 | 平台 | 关系 |
|---|---|---|---|---|
| 1 | staging_table | 数据集 | Snowflake | TRANSFORMED |
| 2 | source_table_1 | 数据集 | PostgreSQL | TRANSFORMED |
| 2 | source_table_2 | 数据集 | PostgreSQL | TRANSFORMED |
| Hop | Entity | Type | Platform | Relationship |
|---|---|---|---|---|
| 1 | Revenue Dashboard | dashboard | Looker | — |
| 1 | daily_export | dataset | S3 | TRANSFORMED |
undefined| 跳数 | 实体 | 类型 | 平台 | 关系 |
|---|---|---|---|---|
| 1 | Revenue Dashboard | 仪表盘 | Looker | — |
| 1 | daily_export | 数据集 | S3 | TRANSFORMED |
undefinedtemplates/impact-analysis.template.mdtemplates/impact-analysis.template.mdPostgreSQL Snowflake Looker
───────── ───────── ──────
[raw_orders] ──→ [stg_orders] ──→ [fct_orders] ──→ [Orders Dashboard]
[raw_customers] ──→ [stg_customers] ──┘PostgreSQL Snowflake Looker
───────── ───────── ──────
[raw_orders] ──→ [stg_orders] ──→ [fct_orders] ──→ [Orders Dashboard]
[raw_customers] ──→ [stg_customers] ──┘datahub search--projection/datahub-enrich/datahub-audit--projectiondatahub search/datahub-enrich/datahub-audit| Document | Path | Purpose |
|---|---|---|
| Lineage patterns reference | | Traversal strategies and patterns |
| Impact analysis template | | Impact analysis report template |
| Lineage map template | | Lineage visualization template |
| CLI reference (shared) | | CLI commands |
| 文档 | 路径 | 用途 |
|---|---|---|
| Lineage模式参考 | | 遍历策略与模式 |
| 影响分析模板 | | 影响分析报告模板 |
| Lineage映射模板 | | Lineage可视化模板 |
| CLI参考(共享) | | CLI命令参考 |
datahub get --aspect upstreamLineagedatahub lineagedatahub lineagedatahub lineagedatahub get --aspect upstreamLineagedatahub lineagedatahub lineagedatahub lineage/datahub-search/datahub-enrich/datahub-search/datahub-enrichurn:li:dataset:(urn:li:dataPlatform:<platform>,<qualified_name>,<env>)dataPlatform:urn:li:<type>:(<platform>,<id>)urn:li:dataset:(urn:li:dataPlatform:<platform>,<qualified_name>,<env>)dataPlatform:urn:li:<type>:(<platform>,<id>)datahub lineage--projection--countdatahub lineage--projection--count