bigquery-data-transfer-service
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBigQuery Data Transfer Service (DTS)
BigQuery Data Transfer Service (DTS)
Mandatory Guidelines
强制指南
[!IMPORTANT]All new BigQuery Data Transfer Service (DTS) configurations MUST be provisioned through the gcp pipeline resource provisioning framework, which includes generating a.deployment.yaml
- Do NOT use imperative CLI commands (e.g.,
orbq mk) to create or update configurations.gcloud- CLI commands are permitted only for discovery (listing/showing) and triggering manual runs.
This guide enables the discovery of existing ingestion resources and provides
metadata related to ingestion when needed.
[!IMPORTANT]所有新的BigQuery Data Transfer Service (DTS)配置 必须 通过 gcp pipeline resource provisioning 框架进行部署,其中包括生成文件。deployment.yaml
- 请勿 使用命令式CLI命令(如
或bq mk)来创建或更新配置。gcloud- CLI命令仅允许用于发现(列出/查看)和触发手动运行。
本指南可用于发现现有的ingestion资源,并在需要时提供与ingestion相关的元数据。
Workflow
工作流程
Step 0: Discover Environment Parameters
步骤0:发现环境参数
Before generating configurations, discover the actual values for the target
project and region.
[!TIP] Ifalready exists in the repository root, prioritize extractingdeployment.yamlandprojectfrom the target environment configuration (e.g.,region).dev
- Project:
gcloud config get project - Region:
gcloud config get compute/region
[!TIP] Use these commands to replace placeholders likewith actual values. Always remove associated comments that start with TODO once replaced.<PROJECT_ID>
在生成配置之前,先获取目标项目和区域的实际值。
[!TIP] 如果仓库根目录中已存在,优先从目标环境配置(如deployment.yaml)中提取dev和project。region
- 项目:
gcloud config get project - 区域:
gcloud config get compute/region
[!TIP] 使用这些命令将等占位符替换为实际值。替换完成后,请务必删除所有以TODO开头的相关注释。<PROJECT_ID>
Step 1: Check for Existing Transfers
步骤1:检查现有传输任务
Before assuming a new transfer is needed, check for existing ones in the target
region.
-
List Transfers:bash
bq ls --transfer_config \ --transfer_location=<REGION> \ --project_id=<PROJECT_ID> -
Analyze Existing Transfers:
-
Single Transfer Found:
- Check if the transfer has at least one successful run:
bq ls --transfer_run --transfer_config=<RESOURCE_NAME> - If found: Use existing transfer config.
- If not found: Confirm with user if it's ok to trigger the transfer run.
- Check if the transfer has at least one successful run:
-
Multiple Transfers Found:
- Attempt to guess the correct one based on context.
- Ask user to confirm.
-
Disabled Transfers Found:
- Ask user if they want to enable it or create a new one.
- To Enable: Instruct the user to update the transfer configuration
within their file by setting the
deployment.yamlfield todisabledfor the specific transfer resource.false
-
No Transfers Found: Proceed to create new if needed.
-
在假设需要创建新传输任务之前,请检查目标区域中是否存在现有任务。
-
列出传输任务:bash
bq ls --transfer_config \ --transfer_location=<REGION> \ --project_id=<PROJECT_ID> -
分析现有传输任务:
-
找到单个传输任务:
- 检查该传输任务是否至少有一次成功运行:
bq ls --transfer_run --transfer_config=<RESOURCE_NAME> - 如果有:使用现有传输配置。
- 如果没有:请与用户确认是否可以触发该传输任务的运行。
- 检查该传输任务是否至少有一次成功运行:
-
找到多个传输任务:
- 根据上下文尝试猜测正确的任务。
- 请求用户确认。
-
找到已禁用的传输任务:
- 询问用户是否要启用它或创建一个新任务。
- 启用方法:指导用户在其 文件中更新传输配置,将特定传输资源的
deployment.yaml字段设置为disabled。false
-
未找到传输任务:如有需要,继续创建新任务。
-
Step 2: Discover & Validate Parameters (New Transfers)
步骤2:发现并验证参数(新传输任务)
If creating a new transfer, discover the required parameters using the REST API
and validate them with the user.
[!TIP] Ifis unknown, run the discovery script without<DATA_SOURCE_ID>argument to list available source IDs (e.g.,<DATA_SOURCE_ID>). It uses the derived project and location from Step 0.google_cloud_storagebashpython3 scripts/bigquery_dts.py --project_id=<PROJECT_ID>
-
Run Discovery Script: Use thescript to inspect Data Source parameters via the REST API.
bigquery_dts.pybash# Passes the derived project and region to the script. python3 scripts/bigquery_dts.py --project_id=<PROJECT_ID> <DATA_SOURCE_ID> <REGION>[!IMPORTANT] Run this command every time a new transfer is being planned. -
[!CAUTION] Mandatory User Questionnaire (CRITICAL):
- Explicitly identify ALL specific parameters returned by the discovery script. You MUST NOT generalize or vaguely summarize them.
- OAuth Authorization (Google Data Sources): For Google ecosystem data
sources (Google Ads, Youtube, etc.), if the user is not using a service
account to configure the DTS transfer config (meaning the user is using
End User Credentials or EUC to configure the transfer config), then
generate an OAuth URI. Ask the user to visit this URL to authorize.
Once the user provides the versionInfo code, use the code as
in
definition.versionInfoand then you can proceed.deployment.yaml - If any parameters are related to authentication, explicitly ask the user to provide the Secret Manager Resource ID (e.g., projects/my-project/secrets/my-secret) for these parameters
- Present every required parameter to the user BEFORE generating config files.
- Ask for verification of assets/tables to be ingested.
-
Wait for User Response: You MUST NOT proceed until parameters are confirmed.
如果要创建新传输任务,请使用REST API发现所需参数并与用户进行验证。
[!TIP] 如果未知,运行发现脚本时不传入<DATA_SOURCE_ID>参数即可列出可用的源ID(例如<DATA_SOURCE_ID>)。 该脚本会使用步骤0中获取的项目和位置信息。google_cloud_storagebashpython3 scripts/bigquery_dts.py --project_id=<PROJECT_ID>
-
运行发现脚本:使用脚本通过REST API检查数据源参数。
bigquery_dts.pybash# 将获取的项目和区域传递给脚本。 python3 scripts/bigquery_dts.py --project_id=<PROJECT_ID> <DATA_SOURCE_ID> <REGION>[!IMPORTANT] 每次计划创建新传输任务时,都必须运行此命令。 -
[!CAUTION] 强制用户问卷调查(至关重要):
- 明确识别发现脚本返回的所有特定参数。不得泛化或模糊总结这些参数。
- OAuth授权(Google数据源):对于Google生态系统数据源(Google Ads、Youtube等),如果用户未使用服务账户配置DTS传输配置(即用户使用终端用户凭据EUC配置传输配置),则生成一个OAuth URI。请用户访问该URL进行授权。
用户提供versionInfo代码后,将该代码用作 中的
deployment.yaml,然后即可继续操作。definition.versionInfo - 如果任何参数与身份验证相关,请明确要求用户提供Secret Manager资源ID(例如 projects/my-project/secrets/my-secret)。
- 在生成配置文件之前,必须向用户展示所有必填参数。
- 请求用户确认要摄入的资产/表。
-
等待用户响应:在参数确认之前,不得继续操作。
Step 3: Extract Transfer Config Data
步骤3:提取传输配置数据
Retrieve the configuration details for the selected transfer.
bash
bq show --format=prettyjson --transfer_config <RESOURCE_NAME>检索所选传输任务的配置详情。
bash
bq show --format=prettyjson --transfer_config <RESOURCE_NAME>Step 4: Trigger and Verify Transfer
步骤4:触发并验证传输任务
After the transfer is deployed via the resource provisioning framework, you MUST
ensure there is at least a single successful run before proceeding with the rest
of the tasks.
-
Trigger a Manual Run: If no successful runs or ongoing runs are found, or the transfer was just created, trigger a manual run for the current time.bash
bq mk --transfer_run \ --transfer_config=<RESOURCE_NAME> \ --run_time=$(date -u +"%Y-%m-%dT%H:%M:%SZ") -
Poll for Completion (5-Minute Rule): Attempt to check the status of the run every 30-60 seconds for up to 5 minutes.bash
bq ls --format=prettyjson --transfer_run --transfer_config=<RESOURCE_NAME>- Success: If the run completes successfully, proceed with the rest of the pipeline.
- Failure: If the run fails, analyze the logs and ask the user for help.
- Timeout (5 mins): If the run is still in progress after 5 minutes, STOP and ask the user: "The Data Transfer Service ingestion is still in progress. Please provide 'proceed guidance' once the ingestion has finished so that I can continue building the rest of the data pipeline using the ingested schema and samples."
-
Wait for User Guidance: Do NOT proceed until the user confirms ingestion is complete or provides guidance.
-
Once user confirms to proceed, start work on rest of the tasks.
通过资源部署框架部署传输任务后,必须确保至少有一次成功运行,然后才能继续执行其余任务。
-
触发手动运行:如果未找到成功运行或正在进行的运行,或者传输任务刚刚创建,请触发一个当前时间的手动运行。bash
bq mk --transfer_run \ --transfer_config=<RESOURCE_NAME> \ --run_time=$(date -u +"%Y-%m-%dT%H:%M:%SZ") -
轮询完成状态(5分钟规则):每30-60秒检查一次运行状态,最多等待5分钟。bash
bq ls --format=prettyjson --transfer_run --transfer_config=<RESOURCE_NAME>- 成功:如果运行成功完成,继续执行管道的其余部分。
- 失败:如果运行失败,分析日志并请求用户协助。
- 超时(5分钟):如果5分钟后运行仍在进行中,停止并询问用户:“Data Transfer Service的ingestion仍在进行中。请在ingestion完成后提供‘继续指导’,以便我可以使用摄入的模式和样本继续构建数据管道的其余部分。”
-
等待用户指导:在用户确认ingestion完成或提供指导之前,不得继续操作。
-
用户确认继续后,开始执行其余任务。
Definition of Done
完成标准
- A BigQuery DTS transfer configuration has been discovered or provisioned
declaratively (via gcp pipeline resource provisioning with a generated
).
deployment.yaml - Mandatory datasource parameters have been identified and confirmed with the user.
- A manual transfer run has been triggered and monitored.
- The transfer run has completed successfully OR the user has provided "proceed guidance" for a long-running transfer.
- 已通过声明式方式(使用gcp pipeline resource provisioning并生成)发现或部署BigQuery DTS传输配置。
deployment.yaml - 已识别所有必填数据源参数并与用户确认。
- 已触发并监控手动传输任务运行。
- 传输任务运行已成功完成,或者用户已为长时间运行的传输任务提供“继续指导”。