gcp-pipeline-orchestration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Mandatory Reference Routing

强制参考路由

If relevant, call the associated reference file(s) before you take actions. Refer to the table below to determine which reference file to retrieve in different scenarios involving specific functions. [!IMPORTANT]: DO NOT GUESS filenames. You MUST only use the exact paths provided below.
Function/Use CaseRequired Reference FileCapabilities & Intent Keywords
orchestration-pipelines schema
references/orchestration-pipelines-schema.md
orchestrate, generate, create, update
如果相关,请在执行操作前调用关联的参考文件。 参考下表,确定在涉及特定功能的不同场景中应检索哪个参考文件。[!IMPORTANT]:请勿猜测文件名。您必须仅使用下面提供的确切路径。
功能/用例所需参考文件功能与意图关键词
orchestration-pipelines schema
references/orchestration-pipelines-schema.md
orchestrate, generate, create, update

How to use this skill

如何使用此Skill

Orchestration pipelines require creating two files to ensure a complete and deployable pipeline:
1.  `Orchestration File` (e.g., `orchestration-pipeline.yaml`,
    `test-pipeline.yaml`): Defines the pipeline's logic, tasks, and
    schedule. **IMPORTANT:** Check if a `deployment.yaml` file exists and
    references an existing orchestration file. If it does, you **must update
    the existing orchestration file** (e.g.,`test_pipeline.yaml`) instead of
    creating a new one. The filename can be customized but must be
    referenced in the `deployment.yaml` file.
2.  `deployment.yaml`: Defines the environment-specific
    configurations.(e.g., `dev`, `prod`). `deployment.yaml`should only
    exists in the repository root and must be named `deployment.yaml`
  • All files should always be maintained together. And all files should be placed on the root of the workspace folder.
  • This skill is helpful to create or update configuration files to orchestrate data pipelines.
编排管道需要创建两个文件,以确保管道完整且可部署:
1.  `编排文件`(例如:`orchestration-pipeline.yaml`、`test-pipeline.yaml`):定义管道的逻辑、任务和调度。**重要提示**:检查是否存在`deployment.yaml`文件并引用了现有编排文件。如果存在,您**必须更新现有编排文件**(例如`test_pipeline.yaml`),而非创建新文件。文件名可自定义,但必须在`deployment.yaml`文件中引用。
2.  `deployment.yaml`:定义特定环境的配置(例如`dev`、`prod`)。`deployment.yaml`只能存在于仓库根目录,且必须命名为`deployment.yaml`
  • 所有文件应始终放在一起维护,且必须放置在工作区文件夹的根目录。
  • 此Skill有助于创建或更新配置文件以编排数据管道。

How to use this skill

如何使用此Skill

Step 1: Assess Orchestration Pipeline Status and Initialize if Necessary

步骤1:评估编排管道状态并在必要时初始化

Examine the repository's root directory for a
deployment.yaml
file.
  1. Check for existing setup: The absence of
    deployment.yaml
    indicates that orchestration has not been set up.
  2. Determine if initialization is required: Initialization is required if
    deployment.yaml
    is missing. you MUST run the
    init
    command in Step 3 to scaffold the project if
    deployment.yaml
    is missing. Do NOT create the files manually.
  3. Pipeline Name: If initialization is needed, ask the user for the pipeline name. If user hasn't provided the orchestration pipeline name, name should be "orchestration_pipeline"
  4. Environment Name: If initialization is needed, you MUST ask the user for the environment name. If the user does not provide it, use dev as the default.
  5. Execute Initialization: Once you have the pipeline name, run the following command:
undefined
检查仓库根目录是否存在
deployment.yaml
文件。
  1. 检查现有设置
    deployment.yaml
    不存在表示尚未设置编排。
  2. 确定是否需要初始化:如果缺少
    deployment.yaml
    ,则需要初始化。如果
    deployment.yaml
    缺失,您必须运行步骤3中的
    init
    命令
    来搭建项目框架,请勿手动创建文件。
  3. 管道名称:如果需要初始化,请向用户询问管道名称。如果用户未提供编排管道名称,默认名称为"orchestration_pipeline"。
  4. 环境名称:如果需要初始化,您必须向用户询问环境名称。如果用户未提供,默认使用dev
  5. 执行初始化:获取管道名称后,运行以下命令:
undefined

Replace <ORCHESTRATION_PIPELINE_NAME> with the actual name

将<ORCHESTRATION_PIPELINE_NAME>替换为实际名称

Replace <ENV_NAME> with the actual environment name

将<ENV_NAME>替换为实际环境名称

gcloud beta orchestration-pipelines init <ORCHESTRATION_PIPELINE_NAME> --environment=<ENV_NAME>
undefined
gcloud beta orchestration-pipelines init <ORCHESTRATION_PIPELINE_NAME> --environment=<ENV_NAME>
undefined

Step 2: Review the orchestration pipeline code structure and syntax instruction

步骤2:查看编排管道代码结构和语法说明

*** Pipeline Models (mapping to YAML)
[!IMPORTANT] While the internal pipeline models are defined using protobuf (which typically uses
snake_case
), the YAML configuration expects
camelCase
for almost all field names.
Mapping Rule: Always convert
snake_case
proto fields (e.g.,
pipeline_id
) to
camelCase
in YAML (e.g.,
pipelineId
).
*** 管道模型(与YAML映射)
[!IMPORTANT] 虽然内部管道模型使用protobuf定义(通常使用
snake_case
),但YAML配置几乎所有字段名称都要求使用
camelCase
映射规则:始终将protobuf的
snake_case
字段(例如
pipeline_id
)转换为YAML中的
camelCase
(例如
pipelineId
)。

Orchestration-Pipelines yaml structure and syntax instruction

编排管道YAML结构和语法说明

Reference to file
references/orchestration-pipelines-schema.md
.
Required Tags (Top-Level): You MUST add a
tags
field to the top-level of the orchestration pipeline YAML definition. The value of this field depends on the IDE environment:
  • For Antigravity, use
    ["job:datacloud:antigravity"]
    .
  • For VS Code, use
    ["job:datacloud:vscode"]
    .
  • For any other environment, use
    ["job:datacloud:other"]
    .
参考文件
references/orchestration-pipelines-schema.md
必填顶级标签:您必须在编排管道YAML定义的顶级添加
tags
字段。该字段的值取决于IDE环境:
  • 对于Antigravity,使用
    ["job:datacloud:antigravity"]
  • 对于VS Code,使用
    ["job:datacloud:vscode"]
  • 对于其他任何环境,使用
    ["job:datacloud:other"]

Deployment yaml structure and syntax instruction.

部署YAML结构和语法说明

Top-Level Structure: The root of the YAML should be an object with the following fields:
  • environments
    (dictionary): A map where keys are environment names (e.g., 'dev', 'prod', etc) and values are Environment objects.
Environment: Each environment object contains the following fields:
  • project
    (string): The Google Cloud Project ID.
  • region
    (string): The Google Cloud region (e.g., 'us-central1').
  • composer_environment
    (string): The Cloud Composer environment name.
  • artifact_storage
    • bucket
      (string): GCS bucket
    • path_prefix
      (string): prefix of path that we want to put in bucket
  • pipelines
    • - source
      (string): orchestration pipeline yaml file names. It can be multiple
  • variables
    (dictionary, optional): Key-value pairs representing environment variables. Values can be strings, numbers, or booleans.
[!TIP] If the user doesn't provide specific paths for scripts, dbt projects, or GCP details (Project ID, Region), use tools like
find_by_name
to search the repository and
gcloud
commands (e.g.,
gcloud config get-value project
) to retrieve the necessary information.
顶级结构:YAML的根节点应为包含以下字段的对象:
  • environments
    (字典):键为环境名称(例如'dev'、'prod'等),值为环境对象的映射。
环境对象:每个环境对象包含以下字段:
  • project
    (字符串):Google Cloud项目ID。
  • region
    (字符串):Google Cloud区域(例如'us-central1')。
  • composer_environment
    (字符串):Cloud Composer环境名称。
  • artifact_storage
    • bucket
      (字符串):GCS存储桶
    • path_prefix
      (字符串):要放入存储桶的路径前缀
  • pipelines
    • - source
      (字符串):编排管道YAML文件名,可指定多个
  • variables
    (字典,可选):表示环境变量的键值对。值可以是字符串、数字或布尔值。
[!TIP] 如果用户未提供脚本、dbt项目或GCP详细信息(项目ID、区域)的特定路径,请使用
find_by_name
等工具搜索仓库,或使用gcloud命令(例如
gcloud config get-value project
)检索必要信息。

Step 3: Generate the pipeline files

步骤3:生成管道文件

  • Before generating, check if an orchestration pipeline definition file and
    deployment.yaml
    already exist in the current directory. If they do, inform the user and ask if they want to update the existing files or create new ones with different names. Do not overwrite without confirmation.
  • First, before creating the orchestration pipeline definition file, you must first run the following command to get the list of available dataproc environments for the user's project. This avoids using placeholder values to run the jobs.
    # Replace <PROJECT_ID> with the actual project_id
    # Replace <REGION> with the actual region
    gcloud dataproc clusters list \
    --project <PROJECT_ID> \
    --region <REGION> \
    [!TIP] Running the command without
    --format=yaml
    provides a clear, tabular output that is easier to read.
  • Then use the returned dataproc list with details to create the orchestration pipeline definition file based on the user's requirements for the pipeline's logic and schedule. IMPORTANT: Every schedule must include an
    endTime
    . Every schedule must use the current date as
    startTime
    if the user hasn't specified.
    [!IMPORTANT] A Composer environment is not a Dataproc cluster. If no Dataproc clusters are available, do not use a Composer environment for the
    sparkHistoryServerConfig
    . It is better to omit this configuration if a dedicated Spark History Server is not available.
  • If you want to schedule the python job, check the content of Python content to determine if it's a spark job. If it is, use
    pyspark
    as type instead of script as type.
  • Before creating or updating the
    deployment.yaml
    file, you must first run the following command to get the list of available Composer environments for the user's project.
    # Replace <PROJECT_ID> with the actual project_id
    # Replace <REGION> with the actual region
    gcloud composer environments list \
    --project <PROJECT_ID> \
    --locations <REGION> \
    After listing available Composer environments, you must check each environment to ensure the composer is using the right image version or has installed right PyPI packages. Run the following command for each environment:
    # Replace <ENVIRONMENT_NAME> with the Composer environment name
    # Replace <REGION> with the region
    gcloud composer environments describe <ENVIRONMENT_NAME> \
    --location <REGION> \
    --format="json(config.softwareConfig.imageVersion, config.softwareConfig.pypiPackages)"
    From the output, select an environment where the imageVersion value is one of is "composer-3-airflow-3.1.7-build.x, composer-3-airflow-2.11.1-build.x, composer-3-airflow-2.10.5-build.x, composer-3-airflow-2.9.3-build.x, composer-2.16.11-airflow-2.11.1, composer-2.16.11-airflow-2.10.5, composer-2.16.11-airflow-2.9.3" or select an environment where
    orchestration-pipelines
    field is presented listed in the PyPI packages. This ensures the selected environment is compatible with orchestration pipelines.
  • Third, before generating the
    deployment.yaml
    file, you must ask the user to provide the
    artifact_storage
    bucket name. Note that the
    artifact_storage
    bucket is typically initialized as a placeholder (e.g.,
    YOUR_BUCKET
    ) by the
    init
    command in Step 1. You must identify any such placeholders, ask the user for the actual bucket name, and then update the
    deployment.yaml
    file with the provided value.
    Use the returned composer list with details, along with the project ID, region, and the bucket name provided by the user, to generate or update the
    deployment.yaml
    file. When generating or updating the
    deployment.yaml
    file, you must replace placeholders (e.g., "<YOUR_PROJECT_ID>", "<YOUR_REGION>", "<YOUR_COMPOSER>", "<YOUR_BUCKET>") with the actual retrieved and provided values. Additionally, you must remove any associated
    # TODO:
    comments once the placeholders are replaced.
  • Ensure both files adhere to the code structures and syntax specified in this document.
  • Renaming Pipelines: If requested to change the orchestration pipeline name, you must rename the orchestration YAML file accordingly (e.g., from
    dbt_clean_pipeline.yaml
    to
    new_name.yaml
    ) and update the
    source
    field within the
    pipelines
    list in
    deployment.yaml
    to match the new filename.
[!IMPORTANT]
Time Format: Do NOT include the
Z
suffix in
startTime
and
endTime
. Use the format
"YYYY-MM-DDTHH:MM:SS"
(e.g.,
"2025-10-01T00:00:00"
).
  • 生成前,检查当前目录是否已存在编排管道定义文件和
    deployment.yaml
    。如果存在,请告知用户并询问是否要更新现有文件或创建不同名称的新文件。未经确认请勿覆盖。
  • 首先,在创建编排管道定义文件之前,您必须先运行以下命令获取用户项目的可用Dataproc环境列表。这避免使用占位符值来运行作业。
    # 将<PROJECT_ID>替换为实际项目ID
    # 将<REGION>替换为实际区域
    gcloud dataproc clusters list \
    --project <PROJECT_ID> \
    --region <REGION> \
    [!TIP] 不带
    --format=yaml
    运行命令会提供清晰的表格输出,更易于阅读。
  • 然后使用返回的Dataproc列表详情,根据用户对管道逻辑和调度的要求创建编排管道定义文件。重要提示:每个调度必须包含
    endTime
    。如果用户未指定,每个调度的
    startTime
    必须使用当前日期。
    [!IMPORTANT] Composer环境不是Dataproc集群。如果没有可用的Dataproc集群,请勿将Composer环境用于
    sparkHistoryServerConfig
    。如果没有专用的Spark History Server,最好省略此配置。
  • 如果要调度Python作业,请检查Python内容以确定它是否为Spark作业。如果是,请使用
    pyspark
    作为类型,而非script类型。
  • 在创建或更新
    deployment.yaml
    文件之前,您必须先运行以下命令获取用户项目的可用Composer环境列表。
    # 将<PROJECT_ID>替换为实际项目ID
    # 将<REGION>替换为实际区域
    gcloud composer environments list \
    --project <PROJECT_ID> \
    --locations <REGION> \
    列出可用的Composer环境后,您必须检查每个环境,确保Composer使用正确的镜像版本或已安装正确的PyPI包。对每个环境运行以下命令:
    # 将<ENVIRONMENT_NAME>替换为Composer环境名称
    # 将<REGION>替换为区域
    gcloud composer environments describe <ENVIRONMENT_NAME> \
    --location <REGION> \
    --format="json(config.softwareConfig.imageVersion, config.softwareConfig.pypiPackages)"
    从输出中选择镜像版本为"composer-3-airflow-3.1.7-build.x, composer-3-airflow-2.11.1-build.x, composer-3-airflow-2.10.5-build.x, composer-3-airflow-2.9.3-build.x, composer-2.16.11-airflow-2.11.1, composer-2.16.11-airflow-2.10.5, composer-2.16.11-airflow-2.9.3"之一的环境,或选择PyPI包列表中包含
    orchestration-pipelines
    字段的环境。这确保所选环境与编排管道兼容。
  • 第三,在生成
    deployment.yaml
    文件之前,您必须询问用户提供
    artifact_storage
    存储桶名称。请注意,
    artifact_storage
    存储桶通常由步骤1中的
    init
    命令初始化为占位符(例如
    YOUR_BUCKET
    )。您必须识别此类占位符,询问用户实际存储桶名称,然后用提供的值更新
    deployment.yaml
    文件。
    使用返回的Composer列表详情,以及项目ID、区域和用户提供的存储桶名称,生成或更新
    deployment.yaml
    文件。生成或更新
    deployment.yaml
    文件时,您必须将占位符(例如"<YOUR_PROJECT_ID>", "<YOUR_REGION>", "<YOUR_COMPOSER>", "<YOUR_BUCKET>")替换为实际检索和提供的值。此外,替换占位符后,您必须删除所有相关的
    # TODO:
    注释。
  • 确保两个文件都符合本文档中指定的代码结构和语法。
  • 重命名管道:如果要求更改编排管道名称,您必须相应地重命名编排YAML文件(例如从
    dbt_clean_pipeline.yaml
    改为
    new_name.yaml
    ),并更新
    deployment.yaml
    pipelines
    列表内的
    source
    字段以匹配新文件名。
[!IMPORTANT]
时间格式:请勿在
startTime
endTime
中包含
Z
后缀。使用格式
"YYYY-MM-DDTHH:MM:SS"
(例如
"2025-10-01T00:00:00"
)。

Step 4: Validate the content (REQUIRED)

步骤4:验证内容(必填)

After creating or editing pipeline files, you MUST validate them using the
gcloud beta orchestration-pipelines validate
command. you must: a. Read the
deployment.yaml
file to identify all defined environments. b. Run the
validate
command below for each environment found in
deployment.yaml
.
undefined
创建或编辑管道文件后,您必须使用
gcloud beta orchestration-pipelines validate
命令验证它们。您必须:a. 读取
deployment.yaml
文件以识别所有已定义的环境。b. 对
deployment.yaml
中找到的每个环境运行以下
validate
命令。
undefined

Replace <ENV_NAME> with the identified environment name

将<ENV_NAME>替换为识别出的环境名称

gcloud beta orchestration-pipelines validate --environment=<ENV_NAME>
undefined
gcloud beta orchestration-pipelines validate --environment=<ENV_NAME>
undefined

Step 5: Handle Validation Errors

步骤5:处理验证错误

  1. Check the output of the validation command.
  2. If the command returns an error or failure message:
    • Read the error message carefully.
    • Edit the orchestration and deployment files to fix the specific issue mentioned.
  3. Re-run the validation command to confirm the fix. Do not mark the task as complete until the validation passes (exit code 0), and do not fall back to create airflow dag in python if validation fails.
  1. 检查验证命令的输出。
  2. 如果命令返回错误或失败消息:
    • 仔细阅读错误消息。
    • 编辑编排和部署文件以修复提到的特定问题。
  3. 重新运行验证命令以确认修复。在验证通过(退出代码0)之前,请勿将任务标记为完成,验证失败时请勿退回到用Python创建Airflow DAG。

Declarative Pipeline Templates

声明式管道模板

When asked to generate or verify declarative pipeline files, ensure they follow these compliant structures. Do not use the exact values below; adapt them to the user's specific project, region, and environment details.
当要求生成或验证声明式管道文件时,确保它们遵循这些合规结构。请勿使用以下确切值;请根据用户的特定项目、区域和环境详情进行调整。

deployment.yaml
Template - IMPORTANT FORMAT MUST MATCH-

deployment.yaml
模板 - 重要:格式必须匹配-

yaml
environments:
  <environment_name>: # e.g., dev, prod
    project: <PROJECT_ID>
    region: <REGION>
    composer_environment: <COMPOSER_ENVIRONMENT_NAME>
    gcs_bucket: "" # Optional
    artifact_storage:
      bucket: <ARTIFACT_BUCKET_NAME>
      path_prefix: "<prefix>-" # e.g., namespace or username prefix
    pipelines:
      - source: '<orchestration-pipeline.yaml>' # e.g., list of pipeline yaml names
yaml
environments:
  <environment_name>: # 例如:dev, prod
    project: <PROJECT_ID>
    region: <REGION>
    composer_environment: <COMPOSER_ENVIRONMENT_NAME>
    gcs_bucket: "" # 可选
    artifact_storage:
      bucket: <ARTIFACT_BUCKET_NAME>
      path_prefix: "<prefix>-" # 例如:命名空间或用户名前缀
    pipelines:
      - source: '<orchestration-pipeline.yaml>' # 例如:管道YAML名称列表

Step 6: Deploy the Orchestration Pipeline (Optional)

步骤6:部署编排管道(可选)

If requested to deploy the orchestration pipeline:
  1. You MUST ask the user which environment to deploy to. If no environment name is provided, list the available environments from
    deployment.yaml
    and ask the user to choose one, defaulting to
    dev
    if it exists.
  2. Read the orchestration YAML to extract the
    pipelineId
    .
  3. Deploy with
    --local
    . This uploads the DAG without running it:
    # Replace <ENV_NAME> with the target environment
    # Replace <PIPELINE_SOURCE> with the orchestration YAML filename
    gcloud beta orchestration-pipelines deploy \
      --environment=<ENV_NAME> --local
  4. Parse the deploy output to extract the bundle ID (version). The output includes a line like:
    Pipeline deployment successful for version local-b32d15e307b5
    The version string (e.g.,
    local-b32d15e307b5
    ) is the bundle ID.
[!IMPORTANT]
--local
deployments now default to
--paused=true
. The deployed DAG will be visible in Airflow as a paused DAG without a schedule. It will not auto-run. Use Step 7 to trigger it.
如果要求部署编排管道:
  1. 您必须询问用户要部署到哪个环境。如果未提供环境名称,请列出
    deployment.yaml
    中的可用环境并让用户选择,如果存在
    dev
    则默认选择
    dev
  2. 读取编排YAML以提取
    pipelineId
  3. 使用
    --local
    部署。这会上传DAG但不运行它:
    # 将<ENV_NAME>替换为目标环境
    # 将<PIPELINE_SOURCE>替换为编排YAML文件名
    gcloud beta orchestration-pipelines deploy \
      --environment=<ENV_NAME> --local
  4. 解析部署输出以提取Bundle ID(版本)。输出包含类似以下的行:
    Pipeline deployment successful for version local-b32d15e307b5
    。版本字符串(例如
    local-b32d15e307b5
    )即为Bundle ID。
[!IMPORTANT]
--local
部署现在默认设置为
--paused=true
。部署的DAG将在Airflow中显示为暂停状态,没有调度,不会自动运行。使用步骤7来触发它。

Step 7: Trigger the Orchestration Pipeline Run (Optional)

步骤7:触发编排管道运行(可选)

If requested to trigger/run the orchestration pipeline, you MUST follow the Deploy → Poll → Trigger flow.
  1. Ask for environment: You MUST ask the user which environment to use. Default to
    dev
    if it exists in
    deployment.yaml
    .
  2. Deploy first (Step 6): Always deploy before triggering to ensure the run uses the latest code. Extract the
    bundle ID
    from deploy output and the
    pipelineId
    from the orchestration YAML.
  3. Poll for DAG readiness: Wait for the DAG to be registered in Composer.
    
    # Initial delay: wait 30 seconds after deploy
    
    sleep 30
    
    # Poll every 15 seconds, up to 2 minutes total
    
    # Replace <ENV_NAME>, <BUNDLE_ID> with actual values
    
    gcloud beta orchestration-pipelines list \
    --environment=<ENV_NAME> \
    --bundle=<BUNDLE_ID> 
    The pipeline is ready when it appears in the list output. If it does not appear after 2 minutes, report failure and advise the user to check YAML validity.
  4. Trigger the pipeline:
    # Replace <ENV_NAME>, <BUNDLE_ID>, <PIPELINE_ID> with actual values

    gcloud beta orchestration-pipelines trigger \
    --environment=<ENV_NAME> \
    --bundle=<BUNDLE_ID> \
    --pipeline=<PIPELINE_ID> 
  1. Verify the run started:
gcloud beta orchestration-pipelines runs
    list \ --environment=<ENV_NAME> \ --bundle=<BUNDLE_ID> \
    --pipeline=<PIPELINE_ID>`
[!TIP] Trigger-only (no deploy): If the user wants to trigger an already-deployed pipeline, skip Step 6. Use
gcloud beta orchestration-pipelines list --environment=<ENV_NAME>
to find the bundle ID, then trigger directly with Step 7.4.
[!IMPORTANT] Fallback: If
gcloud trigger
fails, use the bundled script: Run script with -- help to discover and learn the interface.
python scripts/trigger/airflow_trigger.py \ --project <PROJECT_ID>
--location <REGION> \ --environment <COMPOSER_ENV> --dag_id <PIPELINE_ID>
Get
project
,
region
, and
composer_environment
from
deployment.yaml
.
如果要求触发/运行编排管道,您必须遵循部署→轮询→触发流程。
  1. 询问环境:您必须询问用户要使用哪个环境。如果
    deployment.yaml
    中存在
    dev
    ,则默认使用
    dev
  2. 先部署(步骤6):触发前始终先部署,以确保运行使用最新代码。从部署输出中提取
    Bundle ID
    ,从编排YAML中提取
    pipelineId
  3. 轮询DAG就绪状态:等待DAG在Composer中注册。
    
    # 初始延迟:部署后等待30秒
    
    sleep 30
    
    # 每15秒轮询一次,最多等待2分钟
    
    # 将<ENV_NAME>、<BUNDLE_ID>替换为实际值
    
    gcloud beta orchestration-pipelines list \
    --environment=<ENV_NAME> \
    --bundle=<BUNDLE_ID> 
    当管道出现在列表输出中时,表示已就绪。如果2分钟后仍未出现,请报告失败并建议用户检查YAML有效性。
  4. 触发管道
    # 将<ENV_NAME>、<BUNDLE_ID>、<PIPELINE_ID>替换为实际值

    gcloud beta orchestration-pipelines trigger \
    --environment=<ENV_NAME> \
    --bundle=<BUNDLE_ID> \
    --pipeline=<PIPELINE_ID> 
  1. 验证运行已启动
gcloud beta orchestration-pipelines runs
    list \ --environment=<ENV_NAME> \ --bundle=<BUNDLE_ID> \
    --pipeline=<PIPELINE_ID>
[!TIP] 仅触发(不部署):如果用户想要触发已部署的管道,请跳过步骤6。使用
gcloud beta orchestration-pipelines list --environment=<ENV_NAME>
查找Bundle ID,然后直接使用步骤7.4触发。
[!IMPORTANT] 备选方案:如果
gcloud trigger
失败,请使用捆绑脚本:运行脚本时添加--help参数以了解其接口。
python scripts/trigger/airflow_trigger.py \ --project <PROJECT_ID>
--location <REGION> \ --environment <COMPOSER_ENV> --dag_id <PIPELINE_ID>
deployment.yaml
中获取
project
region
composer_environment

Definition of done

完成定义

  • deployment.yaml
    file is created successfully.
  • The orchestration pipeline file (e.g.,
    orchestration_pipeline.yaml
    ) is created successfully, includes a mandatory
    endTime
    for every schedule, and passes the validation command:
    gcloud beta orchestration-pipelines validate --environment=<ENV_NAME>
  • If user requested to deploy the orchestration pipeline, the
    gcloud beta orchestration-pipelines deploy --environment=<ENV_NAME> --local
    command should return a success message with a version/bundle ID.
  • If user requested to trigger/run the orchestration pipeline:
    1. Deploy succeeded (bundle ID extracted from output)
    2. DAG appeared in
      gcloud beta orchestration-pipelines list
      within 2 min
    3. gcloud beta orchestration-pipelines trigger
      returned success
    4. Run is visible in
      gcloud beta orchestration-pipelines runs list
  • deployment.yaml
    文件已成功创建。
  • 编排管道文件(例如
    orchestration_pipeline.yaml
    )已成功创建,每个调度都包含必填的
    endTime
    ,且通过验证命令:
    gcloud beta orchestration-pipelines validate --environment=<ENV_NAME>
  • 如果用户要求部署编排管道,
    gcloud beta orchestration-pipelines deploy --environment=<ENV_NAME> --local
    命令应返回包含版本/Bundle ID的成功消息。
  • 如果用户要求触发/运行编排管道:
    1. 部署成功(从输出中提取Bundle ID)
    2. DAG在2分钟内出现在
      gcloud beta orchestration-pipelines list
    3. gcloud beta orchestration-pipelines trigger
      返回成功
    4. 运行在
      gcloud beta orchestration-pipelines runs list
      中可见

Other actions

其他操作

If requested to pause/stop the orchestration pipeline, use
bash
   # Replace <ENV_NAME>, <BUNDLE_ID>, <PIPELINE_ID> with actual values
   gcloud beta orchestration-pipelines pause \
   --environment=<ENV_NAME> \
   --bundle=<BUNDLE_ID> \
   --pipeline=<PIPELINE_ID> 
If requested to unpause/resume the orchestration pipeline, use
bash
   # Replace <ENV_NAME>, <BUNDLE_ID>, <PIPELINE_ID> with actual values
   gcloud beta orchestration-pipelines unpause \
   --environment=<ENV_NAME> \
   --bundle=<BUNDLE_ID> \
   --pipeline=<PIPELINE_ID> 
如果要求暂停/停止编排管道,使用
bash
   # 将<ENV_NAME>、<BUNDLE_ID>、<PIPELINE_ID>替换为实际值
   gcloud beta orchestration-pipelines pause \
   --environment=<ENV_NAME> \
   --bundle=<BUNDLE_ID> \
   --pipeline=<PIPELINE_ID> 
如果要求恢复/继续编排管道,使用
bash
   # 将<ENV_NAME>、<BUNDLE_ID>、<PIPELINE_ID>替换为实际值
   gcloud beta orchestration-pipelines unpause \
   --environment=<ENV_NAME> \
   --bundle=<BUNDLE_ID> \
   --pipeline=<PIPELINE_ID>