discovering-gcp-data-assets

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Instructions

操作说明

Step 1: Handle Public Datasets or Proceed to Search

步骤1:处理公开数据集或继续搜索

Dataplex Entries Lookup provides the richest metadata for data assets. You MUST prioritize using it for all Google Cloud assets, even if you already know their IDs.
  • Public Datasets (Direct Inspection): If the requested asset belongs to the
    bigquery-public-data
    project, Dataplex Entries Lookup will fail. You MUST skip Steps 2 and 3 and inspect the table directly using the
    bq
    CLI or BigQuery MCP tools instead.
  • All Other Assets (Proceed to Step 2): For all other BigQuery, Cloud Storage, Spanner, BigLake Iceberg or general GCP data assets (whether their IDs are known or missing), you MUST proceed to Step 2 to search the Dataplex catalog and obtain their full Entry Name.
Dataplex Entries Lookup可为数据资产提供最丰富的元数据。对于所有Google Cloud资产,即使您已知道其ID,也必须优先使用该工具。
  • 公开数据集(直接检查):如果请求的资产属于
    bigquery-public-data
    项目,Dataplex Entries Lookup会失败。您必须跳过步骤2和步骤3,改用
    bq
    CLI或BigQuery MCP工具直接检查表。
  • 所有其他资产(进入步骤2):对于所有其他BigQuery、Cloud Storage、Spanner、BigLake Iceberg或通用GCP数据资产(无论是否知道其ID),您必须进入步骤2,搜索Dataplex目录并获取其完整的条目名称(Entry Name)。

Step 2: Execute Discovery Search

步骤2:执行发现搜索

You MUST use the Dataplex search command to discover assets and retrieve their full
projects/...
entry names. This step is required even if you already know the asset's short ID (e.g.,
my_dataset.my_table
), because Step 3 strictly requires the full entry name.
[!IMPORTANT] The
--project
parameter MUST ALWAYS be provided. This project_id is used to attribute the search only and does NOT restrict the search scope. The project must have the dataplex API enabled and user must have the
dataplex.entries.get
permissions.
您必须使用Dataplex搜索命令来发现资产并获取其完整的
projects/...
条目名称。即使您已知道资产的短ID(例如
my_dataset.my_table
),此步骤也是必需的,因为步骤3严格要求完整的条目名称。
[!IMPORTANT] 必须始终提供
--project
参数。此project_id仅用于标识搜索归属,不会限制搜索范围。该项目必须已启用dataplex API,且用户必须拥有
dataplex.entries.get
权限。

A. Semantic Search (Natural Language Intent)

A. 语义搜索(自然语言意图)

Use this when the user describes the meaning or intent of the data (e.g., "Find Q4 product sales data").
Use the
search_entries
MCP tool
OR
bash
gcloud dataplex entries search "<NATURAL_LANGUAGE_QUERY>" \
  --project="<PROJECT_ID>" \
  --semantic-search \
  --limit=50
当用户描述数据的含义意图时使用此方式(例如“查找第四季度产品销售数据”)。
使用
search_entries
MCP工具
OR
bash
gcloud dataplex entries search "<NATURAL_LANGUAGE_QUERY>" \
  --project="<PROJECT_ID>" \
  --semantic-search \
  --limit=50

B. Keyword Search (Technical Strings)

B. 关键词搜索(技术字符串)

Use this for exact keyword matches or technical strings (e.g.,
name:order_v2
).
用于精确关键词匹配或技术字符串(例如
name:order_v2
)。

Search Query Rules (MANDATORY)

搜索查询规则(强制性)

  • Mode-Specific Syntax:
    • Semantic Search: Logical operators (
      AND
      ,
      OR
      ) MUST be UPPERCASE. Use plural
      labels.
      for label filters (e.g.,
      labels.env=prod
      ).
    • Keyword Search: Operators are case-insensitive. Use singular
      label.
      for label filters (e.g.,
      label.env=prod
      ).
  • Abbreviated Logic: Use
    |
    for OR and
    ,
    for AND within parentheses to shorten queries (e.g.,
    projectid:(prod|staging)
    or
    column:(id,name)
    ).
  • Exact vs. Token Match:
    • Use
      :
      for token/substring matches (e.g.,
      name:sales
      ).
    • Use
      =
      for exact matches. REQUIRED for
      system
      ,
      type
      , and
      location
      .
  • Singular Keywords: ALWAYS convert plurals to singular (e.g., "product" NOT "products").
  • Scope Restriction: You SHOULD restrict the search scope using a
    parent
    filter if the project or dataset is known (e.g.,
    parent:projects/<PROJECT_ID>
    ).
  • 模式特定语法
    • 语义搜索:逻辑运算符(
      AND
      OR
      )必须为大写。标签过滤使用复数形式
      labels.
      (例如
      labels.env=prod
      )。
    • 关键词搜索:运算符不区分大小写。标签过滤使用单数形式
      label.
      (例如
      label.env=prod
      )。
  • 简化逻辑:在括号内使用
    |
    表示OR,使用
    ,
    表示AND以缩短查询(例如
    projectid:(prod|staging)
    column:(id,name)
    )。
  • 精确匹配与分词匹配
    • 使用
      :
      进行分词/子串匹配(例如
      name:sales
      )。
    • 使用
      =
      进行精确匹配。
      system
      type
      location
      必须使用此方式。
  • 单数关键词:始终将复数转换为单数(例如使用“product”而非“products”)。
  • 范围限制:如果已知项目或数据集,应使用
    parent
    过滤器限制搜索范围(例如
    parent:projects/<PROJECT_ID>
    )。

Dataplex Search Syntax Reference

Dataplex搜索语法参考

  • name:x
    : Substring/token match on resource ID.
  • displayname:x
    : Substring/token match on display name.
  • projectid:x
    : Substring/token match on GCP project ID.
  • parent:x
    : Substring match on hierarchical path (e.g.,
    projects/my-proj
    ).
  • location=x
    : Exact match on location (e.g.,
    us-central1
    ,
    us
    ).
  • column:x
    : Substring/token match on column names in the schema.
  • system=x
    : Exact match on source system. Common values:
    bigquery
    ,
    storage
    ,
    biglake
    ,
    cloud_sql
    ,
    cloud_spanner
    ,
    cloud_bigtable
    ,
    pubsub
    .
  • type=x
    : Exact match on entry type (e.g.,
    bigquery-table
    ,
    storage-bucket
    ,
    storage-folder
    ).
  • labels.key=value
    : (Semantic Mode ONLY) Exact match on a label.
  • label.key=value
    : (Keyword Mode ONLY) Exact match on a label.
  • createtime[>|<|=]x
    : Match assets created after/before date
    YYYY-MM-DD
    .
  • fully_qualified_name=x
    : Exact match on the FQN (e.g.,
    bigquery:project.dataset.table
    ).
[!TIP] Dataplex search results rely on metadata being ingested into the Universal Catalog (often via Discovery Scans). If an asset is missing from search, it may not be indexed. - Fallback 1: Try searching by the
fully_qualified_name
qualifier. - Fallback 2: Use native tools (e.g.,
bq show
,
gcloud storage
) or specific skills for that asset type if you already know the ID.
bash
gcloud dataplex entries search "<KEYWORD_SEARCH_QUERY>" \
  --project="<PROJECT_ID>" \
  --limit=50
Criteria: Once candidate assets are returned, proceed to Step 3 using the full entry names from the search results.
  • name:x
    :对资源ID进行子串/分词匹配。
  • displayname:x
    :对显示名称进行子串/分词匹配。
  • projectid:x
    :对GCP项目ID进行子串/分词匹配。
  • parent:x
    :对层级路径进行子串匹配(例如
    projects/my-proj
    )。
  • location=x
    :对位置进行精确匹配(例如
    us-central1
    us
    )。
  • column:x
    :对架构中的列名进行子串/分词匹配。
  • system=x
    :对源系统进行精确匹配。常见值:
    bigquery
    storage
    biglake
    cloud_sql
    cloud_spanner
    cloud_bigtable
    pubsub
  • type=x
    :对条目类型进行精确匹配(例如
    bigquery-table
    storage-bucket
    storage-folder
    )。
  • labels.key=value
    :(仅语义模式)对标签进行精确匹配。
  • label.key=value
    :(仅关键词模式)对标签进行精确匹配。
  • createtime[>|<|=]x
    :匹配在
    YYYY-MM-DD
    日期之后/之前创建的资产。
  • fully_qualified_name=x
    :对完全限定名称(FQN)进行精确匹配(例如
    bigquery:project.dataset.table
    )。
[!TIP] Dataplex搜索结果依赖于已摄入到通用目录(Universal Catalog)的元数据(通常通过发现扫描完成)。如果资产未出现在搜索结果中,可能尚未被索引。 - 备选方案1:尝试使用
fully_qualified_name
限定符进行搜索。 - 备选方案2:如果已知道资产ID,使用原生工具(例如
bq show
gcloud storage
)或对应资产类型的特定技能。
bash
gcloud dataplex entries search "<KEYWORD_SEARCH_QUERY>" \
  --project="<PROJECT_ID>" \
  --limit=50
判定标准:返回候选资产后,使用搜索结果中的完整条目名称进入步骤3。

Step 3: Lookup Entry

步骤3:查询条目

You MUST use the Entries Lookup command to fetch schema and deep metadata for the relevant results obtained from Step 2.
[!IMPORTANT] The argument MUST be the name (starting with
projects/
) returned by the search result. Passing short table IDs, GCS URIs, or fully qualified
bigquery:
prefixes is PROHIBITED and will fail.
您必须使用**条目查询(Entries Lookup)**命令来获取步骤2中相关结果的架构和深度元数据。
[!IMPORTANT] 参数必须是搜索结果返回的名称(以
projects/
开头)。禁止传入短表ID、GCS URI或带有
bigquery:
前缀的完全限定名称,否则会失败。

Command Execution

命令执行

Use the
lookup_entry
MCP tool
OR
bash
gcloud dataplex entries lookup "<FULL_ENTRY_NAME>"
Completion Criteria: The command returns the detailed schema and business context.

使用
lookup_entry
MCP工具
OR
bash
gcloud dataplex entries lookup "<FULL_ENTRY_NAME>"
完成标准:命令返回详细的架构和业务上下文。

Troubleshooting

故障排除

Lookup Fails or "Resource not found"

查询失败或“资源未找到”

  • Cause: Short table names were used improperly.
  • Fix: Ensure you use the correct entry name format from the search results (starting with
    projects/
    ).
  • 原因:不正确地使用了短表名称。
  • 解决方法:确保使用搜索结果中正确的条目名称格式(以
    projects/
    开头)。

Search Returns No Results

搜索无结果返回

  • Cause: Plural terms in keyword search or lack of scoping.
  • Fix: Switch to singular keywords. For semantic search, try more descriptive natural language.
  • 原因:关键词搜索中使用了复数术语或未设置范围。
  • 解决方法:切换为单数关键词。对于语义搜索,尝试使用更具描述性的自然语言。

Lookup Fails with "NOT_FOUND" (despite correct format)

格式正确但查询仍返回“NOT_FOUND”

  • Cause: The table belongs to a project (e.g.,
    bigquery-public-data
    ) that has not fully synchronized its metadata with the Dataplex Universal Catalog. While the entry appears in search,
    entries lookup
    is unavailable.
  • Fix: Fall back to direct inspection using native tools (e.g.,
    bq
    CLI).
  • 原因:该表属于某个尚未将元数据完全同步到Dataplex通用目录的项目(例如
    bigquery-public-data
    )。虽然条目出现在搜索结果中,但无法使用
    entries lookup
  • 解决方法:改用原生工具(例如
    bq
    CLI)直接检查。

Search Fails with "--project: Must be specified."

搜索失败并提示“--project: Must be specified.”

  • Cause:
    --project <PROJECT_ID>
    arguments were not provided
  • Fix: Provide a project which will be used to authorize and attribute the search request.
  • 原因:未提供
    --project <PROJECT_ID>
    参数。
  • 解决方法:提供一个用于授权和标识搜索请求的项目。

Search Fails with "PERMISSION_DENIED"

搜索失败并提示“PERMISSION_DENIED”

  • Cause: The project_id provided in the
    --project <PROJECT_ID>
    arguments does not have the Dataplex API enabled or the user is missing necessary IAM permissions.
  • Fix: Ask the user if they have a project which has the Dataplex API enabled with the dataplex.entries.get permission
  • 原因
    --project <PROJECT_ID>
    参数中提供的项目未启用Dataplex API,或用户缺少必要的IAM权限。
  • 解决方法:询问用户是否有已启用Dataplex API且具备
    dataplex.entries.get
    权限的项目。