alibabacloud-dataworks-datastudio-develop

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

DataWorks Data Development

DataWorks 数据开发

⚡ MANDATORY: Read Before Any API Call

⚡ 强制要求：调用任何API前必读

These absolute rules are NOT optional — violating ANY ONE means the task WILL FAIL:

FIRST THING: Switch CLI profile. Before ANY
```
aliyun
```
command, run
```
aliyun configure list
```
. If multiple profiles exist, run
```
aliyun configure switch --profile <name>
```
to select the correct one. Priority: prefer a profile whose name contains
```
dataworks
```
(case-insensitive); otherwise use
```
default
```
. Do NOT skip this step. Do NOT run any
aliyun dataworks-public
command before switching. NEVER read/echo/print AK/SK values.
NEVER install plugins. If
```
aliyun help
```
shows "Plugin available but not installed" for dataworks-public → IGNORE IT. Do NOT run
```
aliyun plugin install
```
. PascalCase RPC works without plugins (requires CLI >= 3.3.1).

ONLY use PascalCase RPC. Every DataWorks API call must look like:

aliyun dataworks-public CreateNode --ProjectId ... --Spec '...'

. Never use kebab-case (

create-file

create-node

create-business

ONLY use these APIs for create:
```
CreateWorkflowDefinition
```
→
```
CreateNode
```
(per node, with
```
--ContainerId
```
) →
```
CreatePipelineRun
```
(to deploy).
ONLY use these APIs for update:
```
UpdateNode
```
(incremental,
```
kind:Node
```
) →
```
CreatePipelineRun
```
(to deploy). Never use
```
ImportWorkflowDefinition
```
,
```
DeployFile
```
, or
```
SubmitFile
```
for updates or publishing. 4a. ONLY use these APIs for deploy/publish:
```
CreatePipelineRun
```
(Type=Online, ObjectIds=[ID]) →
```
GetPipelineRun
```
(poll) →
```
ExecPipelineRunStage
```
(advance). NEVER use
```
DeployFile
```
,
```
SubmitFile
```
,
```
ListDeploymentPackages
```
, or
```
GetDeploymentPackage
```
— these are all legacy APIs that will fail.
If
CreateWorkflowDefinition
or
CreateNode
returns an error, FIX THE SPEC — do NOT fall back to legacy APIs. Error 58014884415 means your FlowSpec JSON format is wrong (e.g., used
```
"kind":"Workflow"
```
instead of
```
"kind":"CycleWorkflow"
```
, or
```
"apiVersion"
```
instead of
```
"version"
```
). Copy the exact Spec from the Quick Start below.
Run CLI commands directly — do NOT create wrapper scripts. Never create
```
.sh
```
scripts to batch API calls. Run each
```
aliyun
```
command directly in the shell. Wrapper scripts add complexity and obscure errors.
Saving files locally is NOT completion. The task is only done when the API returns a success response (e.g.,
```
{"Id": "..."}
```
from
```
CreateWorkflowDefinition
```
/
```
CreateNode
```
). Writing JSON files to disk without calling the API means the workflow/node was NOT created. Never claim success without a real API response.
NEVER simulate, mock, or fabricate API responses. If credentials are missing, the CLI is misconfigured, or an API call returns an error — report the exact error message to the user and STOP. Do NOT generate fake JSON responses, write simulation documents, echo hardcoded output, or claim success in any form. A simulated success is worse than an explicit failure.
Credential failure = hard stop. If
```
aliyun configure list
```
shows empty or invalid credentials, or any CLI call returns
```
InvalidAccessKeyId
```
,
```
access_key_id must be assigned
```
, or similar auth errors — STOP immediately. Tell the user to configure valid credentials outside this session. Do NOT attempt workarounds (writing config.json manually, using placeholder credentials, proceeding without auth). No subsequent API calls may be attempted until credentials are verified working.
ONLY use APIs listed in this document. Every API you call must appear in the API Quick Reference table below. If you need an operation that is not listed, check the table again — the operation likely exists under a different name. NEVER invent API names (e.g.,
```
CreateDeployment
```
,
```
ApproveDeployment
```
,
```
DeployNode
```
do NOT exist). If you cannot find the right API, ask the user.

If you catch yourself typing ANY of these, STOP IMMEDIATELY and re-read the Quick Start below:

create-file

create-business

create-folder

CreateFolder

CreateFile

UpdateFile

plugin install

--file-type

/bizroot

/workflowroot

DeployFile

SubmitFile

ListFiles

GetFile

ListDeploymentPackages

GetDeploymentPackage

CreateDeployment

ApproveDeployment

DeployNode

CreateFlow

CreateFileDepends

CreateSchedule

以下规则为强制性要求，没有例外——违反任意一条都会导致任务失败：

首要操作：切换CLI配置文件。 在运行任何
```
aliyun
```
命令前，先执行
```
aliyun configure list
```
。如果存在多个配置文件，执行
```
aliyun configure switch --profile <name>
```
选择正确的配置文件。优先级：优先选择名称包含
```
dataworks
```
的配置文件（不区分大小写），否则使用
```
default
```
。请勿跳过此步骤，切换配置前不要运行任何
aliyun dataworks-public
命令。严禁读取/回显/打印AK/SK值。
严禁安装插件。 如果
```
aliyun help
```
提示dataworks-public「有可用插件但未安装」→ 直接忽略，不要运行
```
aliyun plugin install
```
。无需插件即可使用PascalCase RPC调用（要求CLI版本 >= 3.3.1）。
仅允许使用PascalCase格式的RPC调用。 所有DataWorks API调用格式必须为：
```
aliyun dataworks-public CreateNode --ProjectId ... --Spec '...'
```
。严禁使用短横线命名法（kebab-case）的命令（如
```
create-file
```
、
```
create-node
```
、
```
create-business
```
）。
创建操作仅允许使用以下API：
```
CreateWorkflowDefinition
```
→
```
CreateNode
```
（每个节点需指定
```
--ContainerId
```
） →
```
CreatePipelineRun
```
（用于部署）。
更新操作仅允许使用以下API：
```
UpdateNode
```
（增量更新，
```
kind:Node
```
） →
```
CreatePipelineRun
```
（用于部署）。严禁使用
```
ImportWorkflowDefinition
```
、
```
DeployFile
```
或
```
SubmitFile
```
进行更新或发布操作。 4a. 部署/发布操作仅允许使用以下API：
```
CreatePipelineRun
```
（Type=Online, ObjectIds=[ID]） →
```
GetPipelineRun
```
（轮询状态） →
```
ExecPipelineRunStage
```
（推进执行阶段）。严禁使用
```
DeployFile
```
、
```
SubmitFile
```
、
```
ListDeploymentPackages
```
或
```
GetDeploymentPackage
```
——这些都是旧版API，会调用失败。
如果
CreateWorkflowDefinition
或
CreateNode
返回错误，请修复Spec——不要回退使用旧版API。错误码58014884415表示你的FlowSpec JSON格式错误（例如使用了
```
"kind":"Workflow"
```
而非
```
"kind":"CycleWorkflow"
```
，或用了
```
"apiVersion"
```
而非
```
"version"
```
），请从下方快速入门部分复制正确的Spec格式。
直接运行CLI命令——不要创建封装脚本。 严禁创建
```
.sh
```
脚本批量调用API，请直接在Shell中执行每条
```
aliyun
```
命令。封装脚本会增加复杂度，隐藏错误信息。
本地保存文件不等于任务完成。 只有当API返回成功响应（例如
```
CreateWorkflowDefinition
```
/
```
CreateNode
```
返回
```
{"Id": "..."}
```
）时，任务才算完成。仅将JSON文件写入磁盘而不调用API，意味着工作流/节点并未实际创建。没有真实API响应的情况下，严禁宣称任务成功。
严禁模拟、伪造或虚构API响应。 如果缺少凭证、CLI配置错误或API调用返回错误，请向用户上报准确的错误信息并停止操作。不要生成伪造的JSON响应、编写模拟文档、回显硬编码输出，或以任何形式宣称任务成功。模拟成功比明确的失败更严重。
凭证失败=立即停止。 如果
```
aliyun configure list
```
显示凭证为空或无效，或任何CLI调用返回
```
InvalidAccessKeyId
```
、
```
access_key_id must be assigned
```
等鉴权错误，请立即停止操作，告知用户在当前会话外配置有效的凭证。不要尝试变通方案（手动编写config.json、使用占位凭证、无鉴权继续操作）。在验证凭证有效前，不得尝试后续任何API调用。
仅允许使用本文档列出的API。 你调用的所有API都必须出现在下方的API快速参考表中。如果你需要的操作未在列表中，请再次核对表格——该操作可能使用了其他名称。严禁虚构API名称（例如
```
CreateDeployment
```
、
```
ApproveDeployment
```
、
```
DeployNode
```
均不存在）。如果找不到合适的API，请咨询用户。

如果你发现自己正在输入以下任意内容，请立即停止并重新阅读下方的快速入门：

create-file

create-business

create-folder

CreateFolder

CreateFile

UpdateFile

plugin install

--file-type

/bizroot

/workflowroot

DeployFile

SubmitFile

ListDeploymentPackages

GetDeploymentPackage

CreateDeployment

ApproveDeployment

DeployNode

CreateFlow

CreateFileDepends

CreateSchedule

⛔ Prohibited Legacy APIs

⛔ 禁止使用的旧版API

This skill uses DataWorks OpenAPI version 2024-05-18. The following legacy APIs and patterns are strictly prohibited:

Prohibited Legacy Operation	Correct Replacement
`create-file` / `CreateFile` (with `--file-type` numeric type code)	`CreateNode` + FlowSpec JSON
`create-folder` / `CreateFolder`	No folder needed, use `CreateNode` directly
`create-business` / `CreateBusiness` / `CreateFlowProject`	`CreateWorkflowDefinition` + FlowSpec
`list-folders` / `ListFolders`	`ListNodes` / `ListWorkflowDefinitions`
`import-workflow-definition` / `ImportWorkflowDefinition` (for create or update)	`CreateWorkflowDefinition` + individual `CreateNode` calls (for create); `UpdateNode` per node (for update)
Any operation based on folder paths ( `/bizroot` , `/workflowroot` , `/Business Flow` )	Specify path via `script.path` in FlowSpec
`SubmitFile` / `DeployFile` / `GetDeploymentPackage` / `ListDeploymentPackages`	`CreatePipelineRun` + `ExecPipelineRunStage`
`UpdateFile` (legacy file update)	`UpdateNode` + FlowSpec JSON ( `kind:Node` , incremental)
`ListFiles` / `GetFile` (legacy file model)	`ListNodes` / `GetNode`
`aliyun plugin install --names dataworks-public` (legacy plugin)	No plugin installation needed, use PascalCase RPC direct invocation

How to tell — STOP if any of these are true:

You are typing
```
create-file
```
,
```
create-business
```
,
```
create-folder
```
, or any kebab-case DataWorks command → WRONG. Use PascalCase RPC:
```
CreateNode
```
,
```
CreateWorkflowDefinition
```
You are running
```
aliyun plugin install
```
→ WRONG. No plugin needed; PascalCase RPC direct invocation works out of the box (requires CLI >= 3.3.1)
You are constructing folder paths (
```
/bizroot
```
,
```
/workflowroot
```
) → WRONG. Use
```
script.path
```
in FlowSpec
Your FlowSpec contains
```
apiVersion
```
,
```
type
```
(at node level), or
```
schedule
```
→ WRONG. See the correct format below

CLI Format: ALL DataWorks 2024-05-18 API calls use PascalCase RPC direct invocation:
aliyun dataworks-public CreateNode --ProjectId ... --Spec '...' --user-agent AlibabaCloud-Agent-Skills
This requires
aliyun
CLI >= 3.3.1. No plugin installation is needed.

本Skill使用2024-05-18版本的DataWorks OpenAPI，严格禁止使用以下旧版API和模式：

禁止使用的旧版操作	正确替换方案
`create-file` / `CreateFile` （配合 `--file-type` 数值类型码）	`CreateNode` + FlowSpec JSON
`create-folder` / `CreateFolder`	无需创建文件夹，直接使用 `CreateNode`
`create-business` / `CreateBusiness` / `CreateFlowProject`	`CreateWorkflowDefinition` + FlowSpec
`list-folders` / `ListFolders`	`ListNodes` / `ListWorkflowDefinitions`
`import-workflow-definition` / `ImportWorkflowDefinition` （用于创建或更新）	创建用 `CreateWorkflowDefinition` + 单独的 `CreateNode` 调用；更新用逐个调用 `UpdateNode`
任何基于文件夹路径的操作（ `/bizroot` 、 `/workflowroot` 、 `/Business Flow` ）	通过FlowSpec中的 `script.path` 指定路径
`SubmitFile` / `DeployFile` / `GetDeploymentPackage` / `ListDeploymentPackages`	`CreatePipelineRun` + `ExecPipelineRunStage`
`UpdateFile` （旧版文件更新）	`UpdateNode` + FlowSpec JSON（ `kind:Node` ，增量更新）
`ListFiles` / `GetFile` （旧版文件模型）	`ListNodes` / `GetNode`
`aliyun plugin install --names dataworks-public` （旧版插件）	无需安装插件，直接使用PascalCase RPC调用

判断标准：出现以下任意情况请立即停止操作：

你正在输入
```
create-file
```
、
```
create-business
```
、
```
create-folder
```
或任何短横线命名的DataWorks命令 → 错误，请使用PascalCase RPC：
```
CreateNode
```
、
```
CreateWorkflowDefinition
```
你正在运行
```
aliyun plugin install
```
→ 错误，无需插件，PascalCase RPC可直接调用（要求CLI版本 >= 3.3.1）
你正在拼接文件夹路径（
```
/bizroot
```
、
```
/workflowroot
```
） → 错误，请使用FlowSpec中的
```
script.path
```
你的FlowSpec包含
```
apiVersion
```
、节点层级的
```
type
```
或
```
schedule
```
→ 错误，参考下方的正确格式

CLI格式要求：所有2024-05-18版本的DataWorks API调用都使用PascalCase RPC直接调用：
aliyun dataworks-public CreateNode --ProjectId ... --Spec '...' --user-agent AlibabaCloud-Agent-Skills
要求
aliyun
CLI版本 >= 3.3.1，无需安装任何插件。

⚠️ FlowSpec Anti-Patterns

⚠️ FlowSpec反模式

Agents commonly invent wrong FlowSpec fields. The correct format is shown in the Quick Start below.

❌ WRONG	✅ CORRECT	Notes
`"apiVersion": "v1"` or `"apiVersion": "dataworks.aliyun.com/v1"`	`"version": "2.0.0"`	FlowSpec uses `version` , not `apiVersion`
`"kind": "Flow"` or `"kind": "Workflow"`	`"kind": "CycleWorkflow"` (for workflows) or `"kind": "Node"` (for nodes)	Only `Node` , `CycleWorkflow` , `ManualWorkflow` are valid. `"Workflow"` alone is NOT valid
`"metadata": {"name": "..."}`	`"spec": {"workflows": [{"name": "..."}]}`	FlowSpec has no `metadata` field; name goes inside `spec.workflows[0]` or `spec.nodes[0]`
`"type": "SHELL"` (at node level)	`"script": {"runtime": {"command": "DIDE_SHELL"}}`	Node type goes in `script.runtime.command`
`"schedule": {"cron": "..."}`	`"trigger": {"cron": "...", "type": "Scheduler"}`	Scheduling uses `trigger` , not `schedule`
`"script": {"content": "..."}` without `path`	`"script": {"path": "node_name", ...}`	`script.path` is always required

Agent常错误使用不存在的FlowSpec字段，下方快速入门展示了正确格式。

❌ 错误写法	✅ 正确写法	说明
`"apiVersion": "v1"` 或 `"apiVersion": "dataworks.aliyun.com/v1"`	`"version": "2.0.0"`	FlowSpec使用 `version` 字段，而非 `apiVersion`
`"kind": "Flow"` 或 `"kind": "Workflow"`	工作流用 `"kind": "CycleWorkflow"` ，节点用 `"kind": "Node"`	仅 `Node` 、 `CycleWorkflow` 、 `ManualWorkflow` 为有效值，单独使用 `"Workflow"` 无效
`"metadata": {"name": "..."}`	`"spec": {"workflows": [{"name": "..."}]}`	FlowSpec没有 `metadata` 字段，名称放在 `spec.workflows[0]` 或 `spec.nodes[0]` 中
节点层级的 `"type": "SHELL"`	`"script": {"runtime": {"command": "DIDE_SHELL"}}`	节点类型放在 `script.runtime.command` 中
`"schedule": {"cron": "..."}`	`"trigger": {"cron": "...", "type": "Scheduler"}`	调度配置使用 `trigger` 字段，而非 `schedule`
未带 `path` 的 `"script": {"content": "..."}`	`"script": {"path": "node_name", ...}`	`script.path` 为必填字段

🚀 Quick Start: End-to-End Workflow Creation

🚀 快速入门：端到端工作流创建

Complete working example — create a scheduled workflow with 2 dependent nodes:

bash

undefined

完整可用示例——创建包含2个依赖节点的定时工作流：

bash

undefined

Step 1: Create the workflow container

aliyun dataworks-public CreateWorkflowDefinition
--ProjectId 585549
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}}}]}}'
--user-agent AlibabaCloud-Agent-Skills

→ Returns {"Id": "WORKFLOW_ID", ...}

Step 2: Create upstream node (Shell) inside the workflow

IMPORTANT: Before creating, verify output name "my_project.check_data" is not already used by another node (ListNodes)

aliyun dataworks-public CreateNode
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"check_data","id":"check_data","script":{"path":"check_data","runtime":{"command":"DIDE_SHELL"},"content":"#!/bin/bash\necho done"},"outputs":{"nodeOutputs":[{"data":"my_project.check_data","artifactType":"NodeOutput"}]}}]}}'
--user-agent AlibabaCloud-Agent-Skills

→ Returns {"Id": "NODE_A_ID", ...}

Step 3: Create downstream node (SQL) with dependency on upstream

NOTE on dependencies: "nodeId" is the CURRENT node's name (self-reference), "output" is the UPSTREAM node's output

aliyun dataworks-public CreateNode
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"transform_data","id":"transform_data","script":{"path":"transform_data","runtime":{"command":"ODPS_SQL"},"content":"SELECT 1;"},"outputs":{"nodeOutputs":[{"data":"my_project.transform_data","artifactType":"NodeOutput"}]}}],"dependencies":[{"nodeId":"transform_data","depends":[{"type":"Normal","output":"my_project.check_data"}]}]}}'
--user-agent AlibabaCloud-Agent-Skills

Step 4: Set workflow schedule (daily at 00:30)

aliyun dataworks-public UpdateWorkflowDefinition
--ProjectId 585549
--Id WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}},"trigger":{"cron":"00 30 00 * * ?","timezone":"Asia/Shanghai","type":"Scheduler"}}]}}'
--user-agent AlibabaCloud-Agent-Skills

Step 5: Deploy the workflow online (REQUIRED — workflow is not active until deployed)

aliyun dataworks-public CreatePipelineRun
--ProjectId 585549
--Type Online --ObjectIds '["WORKFLOW_ID"]'
--user-agent AlibabaCloud-Agent-Skills

→ Returns {"Id": "PIPELINE_RUN_ID", ...}

Then poll GetPipelineRun and advance stages with ExecPipelineRunStage

(see "Publishing and Deploying" section below for full polling flow)


> **Key pattern**: CreateWorkflowDefinition → CreateNode (with ContainerId + outputs.nodeOutputs) → UpdateWorkflowDefinition (add trigger) → **CreatePipelineRun (deploy)**. Each node within a workflow MUST have `outputs.nodeOutputs`. **The workflow is NOT active until deployed via CreatePipelineRun.**
>
> **Dependency wiring summary**: In `spec.dependencies`, `nodeId` is the **current node's own name** (self-reference, NOT the upstream node), and `depends[].output` is the **upstream node's output** (`projectIdentifier.upstream_node_name`). The `outputs.nodeOutputs[].data` value of the upstream node and the `depends[].output` value of the downstream node must be **character-for-character identical**, otherwise the dependency silently fails.


> **核心模式**：CreateWorkflowDefinition → CreateNode（指定ContainerId + outputs.nodeOutputs） → UpdateWorkflowDefinition（添加触发规则） → **CreatePipelineRun（部署）**。工作流中的每个节点必须配置`outputs.nodeOutputs`。**只有通过CreatePipelineRun部署后，工作流才会生效。**
>
> **依赖配置说明**：在`spec.dependencies`中，`nodeId`是**当前节点自身的名称**（自引用，不是上游节点的名称），`depends[].output`是**上游节点的输出标识**（`projectIdentifier.upstream_node_name`）。上游节点的`outputs.nodeOutputs[].data`值与下游节点的`depends[].output`值必须**完全一致**，否则依赖会静默失败。

Core Workflow

核心工作流

Environment Discovery (Required Before Creating)

环境探查（创建前必填）

Step 0 — CLI Profile Switch (MUST be the very first action): Run

aliyun configure list

. If multiple profiles exist, run

aliyun configure switch --profile <name>

(prefer

dataworks

-named profile, otherwise

default

). No
aliyun dataworks-public
command may run before this.

If credentials are empty or invalid, STOP HERE. Do not proceed with any API calls. Report the error to the user and instruct them to configure valid credentials outside this session (via
aliyun configure
or environment variables). Do not attempt workarounds such as writing config files manually or using placeholder values.

Before creating nodes or workflows, understand the project's existing environment. It is recommended to use a subagent to execute queries, returning only a summary to the main Agent to avoid raw data consuming too much context.

Subagent tasks:

Call
```
ListWorkflowDefinitions
```
to get the workflow list
Call
```
ListNodes
```
to get the existing node list
Call
```
ListDataSources
```
AND
```
ListComputeResources
```
to get all available data sources and compute engine bindings (EMR, Hologres, StarRocks, etc.).
```
ListComputeResources
```
supplements
```
ListDataSources
```
which may not return compute-engine-type resources
Return a summary (do not return raw data):
- Workflow inventory: name + number of contained nodes + type (scheduled/manual)
- Existing nodes relevant to the current task: name + type + parent workflow
- Available data sources + compute resources (name, type) — combine both lists
- Suggested target workflow (if inferable from the task description)

Based on the summary, the main Agent decides: target workflow (existing or new, user decides), node naming (follow existing conventions), and dependencies (infer from SQL references and existing nodes).

Pre-creation conflict check (required, applies to all object types):

Name duplication check: Before creating any object, use the corresponding List API to check if an object with the same name already exists:
- Workflow →
```
ListWorkflowDefinitions
```
- Node →
```
ListNodes
```
  (node names are globally unique within a project)
- Resource →
```
ListResources
```
- Function →
```
ListFunctions
```
- Component →
```
ListComponents
```
Handling existing objects: Inform the user and ask how to proceed (use existing / rename / update existing). Direct deletion of existing objects is prohibited
Output name conflict check (CRITICAL): A node's
```
outputs.nodeOutputs[].data
```
(format
```
${projectIdentifier}.NodeName
```
) must be globally unique within the project, even across different workflows. Use
```
ListNodes --Name NodeName
```
and inspect
```
Outputs.NodeOutputs[].Data
```
in the response to verify. If the output name conflicts with an existing node, the conflict must be resolved before creation — otherwise deployment will fail with
```
"can not exported multiple nodes into the same output"
```
(see troubleshooting.md #11b)

Certainty level determines interaction approach:

Certain information → Use directly, do not ask the user
Confident inference → Proceed, explain the reasoning in the output
Uncertain information → Must ask the user

步骤0 — CLI配置文件切换（必须是第一个操作）： 运行

aliyun configure list

，如果存在多个配置文件，执行

aliyun configure switch --profile <name>

（优先选择名称含

dataworks

的配置文件，否则用

default

）。切换完成前不得运行任何
aliyun dataworks-public
命令。

如果凭证为空或无效，请立即停止操作。 不要继续调用任何API，向用户上报错误并指导用户在当前会话外配置有效凭证（通过
aliyun configure
或环境变量）。不要尝试变通方案，例如手动编写配置文件或使用占位值。

在创建节点或工作流前，需要了解项目的现有环境。建议使用子Agent执行查询，仅向主Agent返回摘要信息，避免原始数据占用过多上下文。

子Agent任务：

调用
```
ListWorkflowDefinitions
```
获取工作流列表
调用
```
ListNodes
```
获取现有节点列表
调用
```
ListDataSources
```
和
```
ListComputeResources
```
获取所有可用数据源和计算引擎绑定（EMR、Hologres、StarRocks等）。
```
ListComputeResources
```
可补充
```
ListDataSources
```
可能不返回的计算引擎类资源
返回摘要（不要返回原始数据）：
- 工作流清单：名称 + 包含节点数 + 类型（定时/手动）
- 与当前任务相关的现有节点：名称 + 类型 + 所属工作流
- 可用数据源 + 计算资源（名称、类型）——合并两个接口的返回结果
- 建议的目标工作流（如果可从任务描述推断）

主Agent根据摘要决定：目标工作流（使用现有或新建，由用户决定）、节点命名（遵循现有规范）、依赖关系（从SQL引用和现有节点推断）。

创建前冲突检查（必填，适用于所有对象类型）：

名称重复检查：创建任何对象前，使用对应的List API检查是否已存在同名对象：
- 工作流 →
```
ListWorkflowDefinitions
```
- 节点 →
```
ListNodes
```
  （节点名称在项目内全局唯一）
- 资源 →
```
ListResources
```
- 函数 →
```
ListFunctions
```
- 组件 →
```
ListComponents
```
已有对象处理：告知用户并询问处理方式（使用现有/重命名/更新现有）。严禁直接删除已有对象
输出名称冲突检查（关键）：节点的
```
outputs.nodeOutputs[].data
```
（格式
```
${projectIdentifier}.NodeName
```
）必须在项目内全局唯一，即使是跨工作流的节点。使用
```
ListNodes --Name NodeName
```
并检查返回结果中的
```
Outputs.NodeOutputs[].Data
```
进行验证。如果输出名称与现有节点冲突，必须在创建前解决，否则部署会失败并返回
```
"can not exported multiple nodes into the same output"
```
（参考troubleshooting.md #11b）

信息确定程度决定交互方式：

确定信息 → 直接使用，无需询问用户
高置信度推断 → 继续执行，在输出中说明推断理由
不确定信息 → 必须询问用户

Creating Nodes

创建节点

Unified workflow: Whether in OpenAPI Mode or Git Mode, generate the same local file structure.

统一工作流：无论是OpenAPI模式还是Git模式，都生成相同的本地文件结构。

Step 1: Create the Node Directory and Three Files

步骤1：创建节点目录和三个文件

One folder = one node, containing three files:

my_node/
├── my_node.spec.json          # FlowSpec node definition
├── my_node.sql                # Code file (extension based on contentFormat)
└── dataworks.properties       # Runtime configuration (actual values)

spec.json — Copy the minimal Spec from

references/nodetypes/{category}/{TYPE}.md

, modify name and path, and use

${spec.xxx}

placeholders to reference values from properties. If the user specifies trigger, dependencies, rerunTimes, etc., add them to the spec as well.

Code file — Determine the format (sql/shell/python/json/empty) based on the

contentFormat

in the node type documentation; determine the extension based on the

extension

field.

dataworks.properties — Fill in actual values:

properties

projectIdentifier=<actual project identifier>
spec.datasource.name=<actual datasource name>
spec.runtimeResource.resourceGroup=<actual resource group identifier>

Do not fill in uncertain values — if omitted, the server automatically uses project defaults.

Reference examples:

assets/templates/

一个文件夹对应一个节点，包含三个文件：

my_node/
├── my_node.spec.json          # FlowSpec节点定义
├── my_node.sql                # 代码文件（后缀根据contentFormat决定）
└── dataworks.properties       # 运行时配置（实际值）

spec.json — 从

references/nodetypes/{category}/{TYPE}.md

复制最小Spec模板，修改名称和路径，使用

${spec.xxx}

占位符引用properties中的值。如果用户指定了触发规则、依赖、重跑次数等，也添加到spec中。

代码文件 — 根据节点类型文档中的

contentFormat

决定格式（sql/shell/python/json/空），根据

extension

字段决定文件后缀。

dataworks.properties — 填写实际值：

properties

projectIdentifier=<实际项目标识>
spec.datasource.name=<实际数据源名称>
spec.runtimeResource.resourceGroup=<实际资源组标识>

不要填写不确定的值——如果省略，服务端会自动使用项目默认值。

参考示例：

assets/templates/

Step 2: Submit

步骤2：提交

Default is OpenAPI (unless the user explicitly says "commit to Git"):

Use
```
build.py
```
to merge the three files into API input:
bash
```
python $SKILL/scripts/build.py ./my_node > /tmp/spec.json
```
build.py does three things (no third-party dependencies; if errors occur, refer to the source code to execute manually):
- Read
```
dataworks.properties
```
  → replace
```
${spec.xxx}
```
  and
```
${projectIdentifier}
```
  placeholders in spec.json
- Read the code file → embed into
```
script.content
```
- Output the merged complete JSON

Validate the spec before submission:

bash

python $SKILL/scripts/validate.py ./my_node

Pre-submission spec review (MANDATORY) — Before calling CreateNode, review the merged JSON against this checklist:
- ```
script.runtime.command
```
  matches the intended node type (check
```
references/nodetypes/{category}/{TYPE}.md
```
  )
- ```
datasource
```
  — Required if the node type needs a data source (see the node type doc's
```
datasourceType
```
  field). Check that
```
name
```
  matches an existing data source (
```
ListDataSources
```
  ) or compute resource (
```
ListComputeResources
```
  ), and
```
type
```
  matches the expected engine type (e.g.,
```
odps
```
  ,
```
hologres
```
  ,
```
emr
```
  ,
```
starrocks
```
  ). If unsure, omit and let the server use project defaults
- ```
runtimeResource.resourceGroup
```
  — Check that the value matches an existing resource group (
```
ListResourceGroups
```
  ). If unsure, omit and let the server use project defaults
- ```
trigger
```
  — For workflow nodes: omit to inherit the workflow schedule; only set when the user explicitly specifies a per-node schedule. For standalone nodes: set if the user specified a schedule
- ```
outputs.nodeOutputs
```
  — Required for workflow nodes. Format:
```
{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"}
```
  . Verify the output name is globally unique in the project (
```
ListNodes --Name
```
  )
- ```
dependencies
```
  —
```
nodeId
```
  must be the current node's own name (self-reference).
```
depends[].output
```
  must exactly match the upstream node's
```
outputs.nodeOutputs[].data
```
  . Every workflow node MUST have dependencies: root nodes (no upstream) MUST depend on
```
${projectIdentifier}_root
```
  (underscore, not dot); downstream nodes depend on upstream outputs. A workflow node with NO dependencies entry will become an orphan
- No invented fields — Compare against the FlowSpec Anti-Patterns table above; remove any field not documented in
```
references/flowspec-guide.md
```

Call the API to submit (refer to references/api/CreateNode.md):

bash

# DataWorks 2024-05-18 API does not yet have plugin mode (kebab-case), use RPC direct invocation format (PascalCase)
aliyun dataworks-public CreateNode \
  --ProjectId $PROJECT_ID \
  --Scene DATAWORKS_PROJECT \
  --Spec "$(cat /tmp/spec.json)" \
  --user-agent AlibabaCloud-Agent-Skills

Note:
aliyun dataworks-public CreateNode
is in RPC direct invocation format and does not require any plugin installation. If the command is not found, check the aliyun CLI version (requires >= 3.3.1). Never downgrade to legacy kebab-case commands (
create-file
/
create-folder
).

Sandbox fallback: If

$(cat ...)

is blocked, use Python

subprocess.run(['aliyun', 'dataworks-public', 'CreateNode', '--ProjectId', str(PID), '--Scene', 'DATAWORKS_PROJECT', '--Spec', spec_str, '--user-agent', 'AlibabaCloud-Agent-Skills'])

To place within a workflow, add
```
--ContainerId $WorkflowId
```

Git Mode (when the user explicitly requests):

git add ./my_node && git commit

, DataWorks automatically syncs and replaces placeholders

Minimum required fields (verified in practice, universal across all 130+ types):

```
name
```
— Node name
```
id
```
— Must be set equal to
name
. Ensures
```
spec.dependencies[*].nodeId
```
can match. Without explicit
```
id
```
, the API may silently drop dependencies
```
script.path
```
— Script path, must end with the node name; the server automatically prepends the workflow prefix
```
script.runtime.command
```
— Node type (e.g., ODPS_SQL, DIDE_SHELL)

Copyable minimal node Spec (Shell node example):

json

{"version":"2.0.0","kind":"Node","spec":{"nodes":[{
  "name":"my_shell_node","id":"my_shell_node",
  "script":{"path":"my_shell_node","runtime":{"command":"DIDE_SHELL"},"content":"#!/bin/bash\necho hello"}
}]}}

Other fields are not required; the server will automatically fill in project defaults:

datasource, runtimeResource — If unsure, do not pass them; the server automatically binds project defaults
trigger — If not passed, inherits the workflow schedule. Only pass when specified by the user
dependencies, rerunTimes, etc. — Only pass when specified by the user
outputs.nodeOutputs — Optional for standalone nodes; required for nodes within a workflow (
```
{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"}
```
), otherwise downstream dependencies silently fail. ⚠️ The output name (
```
${projectIdentifier}.NodeName
```
) must be globally unique within the project — if another node (even in a different workflow) already uses the same output name, deployment will fail with "can not exported multiple nodes into the same output". Always check with
```
ListNodes
```
before creating

默认使用OpenAPI模式（除非用户明确要求「提交到Git」）：

使用
```
build.py
```
将三个文件合并为API输入：
bash
```
python $SKILL/scripts/build.py ./my_node > /tmp/spec.json
```
build.py会完成三件事（无第三方依赖；如果报错，可参考源码手动执行）：
- 读取
```
dataworks.properties
```
  → 替换spec.json中的
```
${spec.xxx}
```
  和
```
${projectIdentifier}
```
  占位符
- 读取代码文件 → 嵌入到
```
script.content
```
  中
- 输出合并后的完整JSON

提交前校验spec：

bash

python $SKILL/scripts/validate.py ./my_node

提交前spec审查（必填） — 调用CreateNode前，对照以下检查清单审核合并后的JSON：
- ```
script.runtime.command
```
  与预期节点类型匹配（参考
```
references/nodetypes/{category}/{TYPE}.md
```
  ）
- ```
datasource
```
  — 如果节点类型需要数据源（参考节点类型文档的
```
datasourceType
```
  字段）则必填。检查
```
name
```
  与现有数据源（
```
ListDataSources
```
  ）或计算资源（
```
ListComputeResources
```
  ）匹配，
```
type
```
  与预期引擎类型匹配（例如
```
odps
```
  、
```
hologres
```
  、
```
emr
```
  、
```
starrocks
```
  ）。如果不确定，省略该字段，让服务端使用项目默认值
- ```
runtimeResource.resourceGroup
```
  — 检查值与现有资源组（
```
ListResourceGroups
```
  ）匹配。如果不确定，省略该字段，让服务端使用项目默认值
- ```
trigger
```
  — 工作流内节点：省略以继承工作流调度规则，仅当用户明确指定节点级调度时才设置。独立节点：如果用户指定了调度规则则设置
- ```
outputs.nodeOutputs
```
  — 工作流内节点必填。格式：
```
{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"}
```
  。验证输出名称在项目内全局唯一（
```
ListNodes --Name
```
  ）
- ```
dependencies
```
  —
```
nodeId
```
  必须是当前节点自身的名称（自引用）。
```
depends[].output
```
  必须完全匹配上游节点的
```
outputs.nodeOutputs[].data
```
  。每个工作流节点必须配置依赖：根节点（无上游）必须依赖
```
${projectIdentifier}_root
```
  （下划线，不是点）；下游节点依赖上游输出。没有依赖配置的工作流节点会成为孤立节点
- 无虚构字段 — 对照上方的FlowSpec反模式表，删除所有未在
```
references/flowspec-guide.md
```
  中记录的字段

调用API提交（参考references/api/CreateNode.md）：

bash

# DataWorks 2024-05-18 API 暂不支持插件模式（短横线命名），使用PascalCase RPC直接调用格式
aliyun dataworks-public CreateNode \
  --ProjectId $PROJECT_ID \
  --Scene DATAWORKS_PROJECT \
  --Spec "$(cat /tmp/spec.json)" \
  --user-agent AlibabaCloud-Agent-Skills

注意：
aliyun dataworks-public CreateNode
为RPC直接调用格式，无需安装任何插件。如果提示命令不存在，检查阿里云CLI版本（要求 >= 3.3.1）。严禁回退使用旧版短横线命名命令（
create-file
/
create-folder
）。

沙箱兼容方案：如果

$(cat ...)

被禁用，使用Python

subprocess.run(['aliyun', 'dataworks-public', 'CreateNode', '--ProjectId', str(PID), '--Scene', 'DATAWORKS_PROJECT', '--Spec', spec_str, '--user-agent', 'AlibabaCloud-Agent-Skills'])

。

如果要放到工作流中，添加
```
--ContainerId $WorkflowId
```
参数

Git模式（用户明确要求时使用）：执行

git add ./my_node && git commit

，DataWorks会自动同步并替换占位符。

必填最小字段（经实际验证，适用于所有130+种节点类型）：

```
name
```
— 节点名称
```
id
```
— 必须与
name
值相同，确保
```
spec.dependencies[*].nodeId
```
可以匹配。如果未显式指定
```
id
```
，API可能会静默丢弃依赖配置
```
script.path
```
— 脚本路径，必须以节点名称结尾；服务端会自动添加工作流前缀
```
script.runtime.command
```
— 节点类型（例如ODPS_SQL、DIDE_SHELL）

可复制的最小节点Spec（Shell节点示例）：

json

{"version":"2.0.0","kind":"Node","spec":{"nodes":[{
  "name":"my_shell_node","id":"my_shell_node",
  "script":{"path":"my_shell_node","runtime":{"command":"DIDE_SHELL"},"content":"#!/bin/bash\necho hello"}
}]}}

其他字段非必填，服务端会自动填充项目默认值：

datasource、runtimeResource — 如果不确定，不要传；服务端会自动绑定项目默认值
trigger — 如果不传，继承工作流调度规则，仅当用户指定时才传
dependencies、rerunTimes等 — 仅当用户指定时才传
outputs.nodeOutputs — 独立节点可选；工作流内节点必填（
```
{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"}
```
），否则下游依赖会静默失败。⚠️ 输出名称（
```
${projectIdentifier}.NodeName
```
）必须在项目内全局唯一——如果其他节点（即使在不同工作流中）已使用相同的输出名称，部署会失败并提示「can not exported multiple nodes into the same output」。创建前务必使用
```
ListNodes
```
检查

Workflow and Node Relationship

工作流与节点关系

Project
└── Workflow              ← Container, unified scheduling management
    ├── Node A            ← Minimum execution unit
    ├── Node B (depends A)
    └── Node C (depends B)

A workflow is the container and scheduling unit for nodes, with its own trigger and strategy
Nodes can exist independently at the root level or belong to a workflow (user decides)
The workflow's
```
script.runtime.command
```
is always
```
"WORKFLOW"
```
Dependency configuration for nodes within a workflow: only maintain dependencies in the
```
spec.dependencies
```
array (do NOT dual-write
```
inputs.nodeOutputs
```
). ⚠️
```
spec.dependencies[*].nodeId
```
is a self-reference — it must match the current node's own
name
(the node that HAS the dependency), NOT the upstream node's name or ID.
```
depends[].output
```
is the upstream node's output identifier (
```
${projectIdentifier}.UpstreamNodeName
```
). Upstream nodes must declare
```
outputs.nodeOutputs
```

Project
└── Workflow              ← 容器，统一调度管理
    ├── Node A            ← 最小执行单元
    ├── Node B (依赖A)
    └── Node C (依赖B)

工作流是节点的容器和调度单元，有自身的触发规则和策略
节点可以独立存在于根层级，也可以属于某个工作流（由用户决定）
工作流的
```
script.runtime.command
```
固定为
```
"WORKFLOW"
```
工作流内节点的依赖配置：仅在
```
spec.dependencies
```
数组中维护依赖（不要同时写
```
inputs.nodeOutputs
```
）。⚠️
```
spec.dependencies[*].nodeId
```
是自引用——必须匹配当前节点自身的
name
（即配置依赖的节点本身），而不是上游节点的名称或ID。
```
depends[].output
```
是上游节点的输出标识（
```
${projectIdentifier}.UpstreamNodeName
```
）。上游节点必须声明
```
outputs.nodeOutputs
```

Creating Workflows

创建工作流

Create the workflow definition (minimal spec):

json

{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{
  "name":"workflow_name","script":{"path":"workflow_name","runtime":{"command":"WORKFLOW"}}
}]}}

Call

CreateWorkflowDefinition

→ returns WorkflowId

Create nodes in dependency order (each node passes
```
ContainerId=WorkflowId
```
)
- Before each node: Check that
```
${projectIdentifier}.NodeName
```
  is not already used as an output by any existing node in the project (use
```
ListNodes
```
  with
```
--Name
```
  and inspect
```
Outputs.NodeOutputs[].Data
```
  ). Duplicate output names cause deployment failure
- Each node's spec must include
```
outputs.nodeOutputs
```
  :
```
{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"}
```
- Downstream nodes declare dependencies in
```
spec.dependencies
```
  :
```
nodeId
```
  = current node's own name (self-reference),
```
depends[].output
```
  = upstream node's output (see workflow-guide.md)
Verify dependencies (MANDATORY after all nodes created) — For each downstream node, call
```
ListNodeDependencies --Id <NodeID>
```
. If
```
TotalCount
```
is
```
0
```
but the node should have upstream dependencies, the CreateNode API silently dropped them. Fix immediately with
```
UpdateNode
```
using
```
spec.dependencies
```
(see "Updating dependencies" below). Do NOT proceed to deploy until all dependencies are confirmed
Set the schedule —
```
UpdateWorkflowDefinition
```
with
```
trigger
```
(if the user specified a schedule)
Deploy online (REQUIRED) —
```
CreatePipelineRun(Type=Online, ObjectIds=[WorkflowId])
```
→ poll
```
GetPipelineRun
```
→ advance stages with
```
ExecPipelineRunStage
```
. A workflow is NOT active until deployed. Do not skip this step or tell the user to do it manually.

Detailed guide and copyable complete node Spec examples (including outputs and dependencies): references/workflow-guide.md

创建工作流定义（最小Spec）：

json

{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{
  "name":"workflow_name","script":{"path":"workflow_name","runtime":{"command":"WORKFLOW"}}
}]}}

调用

CreateWorkflowDefinition

→ 返回WorkflowId

按依赖顺序创建节点（每个节点传入
```
ContainerId=WorkflowId
```
）
- 创建每个节点前：检查
```
${projectIdentifier}.NodeName
```
  是否已被项目内其他节点用作输出（使用
```
ListNodes
```
  加
```
--Name
```
  参数，检查
```
Outputs.NodeOutputs[].Data
```
  ）。重复的输出名称会导致部署失败
- 每个节点的spec 必须包含
```
outputs.nodeOutputs
```
  ：
```
{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"}
```
- 下游节点在
```
spec.dependencies
```
  中声明依赖：
```
nodeId
```
  = 当前节点自身的名称（自引用），
```
depends[].output
```
  = 上游节点的输出（参考workflow-guide.md）
验证依赖（所有节点创建后必填） — 对每个下游节点，调用
```
ListNodeDependencies --Id <NodeID>
```
。如果
```
TotalCount
```
为
```
0
```
但节点应该有上游依赖，说明CreateNode API静默丢弃了依赖配置。立即使用
UpdateNode
配合
spec.dependencies
修复（参考下方「更新依赖」部分）。确认所有依赖正常前，不要继续部署
设置调度规则 — 如果用户指定了调度规则，调用
```
UpdateWorkflowDefinition
```
添加
```
trigger
```
配置
上线部署（必填） — 调用
```
CreatePipelineRun(Type=Online, ObjectIds=[WorkflowId])
```
→ 轮询
```
GetPipelineRun
```
→ 使用
```
ExecPipelineRunStage
```
推进执行阶段。只有部署后工作流才会生效，不要跳过此步骤或让用户手动执行。

详细指南和可复制的完整节点Spec示例（包含输出和依赖）：references/workflow-guide.md

Updating Existing Nodes

更新现有节点

Must use incremental updates — only pass the node id + fields to modify:

json

{"version":"2.0.0","kind":"Node","spec":{"nodes":[{
  "id":"NodeID",
  "script":{"content":"new code"}
}]}}

⚠️ Critical: UpdateNode always uses
"kind":"Node"
, even if the node belongs to a workflow. Do NOT use
"kind":"CycleWorkflow"
— that is only for workflow-level operations (
UpdateWorkflowDefinition
).

Do not pass unchanged fields like datasource or runtimeResource (the server may have corrected values; passing them back can cause errors).

⚠️ Updating dependencies: To fix or change a node's dependencies via UpdateNode, use
spec.dependencies
— NEVER use
inputs.nodeOutputs
. Example:
json
{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"id":"NodeID"}],"dependencies":[{"nodeId":"current_node_name","depends":[{"type":"Normal","output":"project.upstream_node"}]}]}}

必须使用增量更新 — 仅传入节点id + 需要修改的字段：

json

{"version":"2.0.0","kind":"Node","spec":{"nodes":[{
  "id":"NodeID",
  "script":{"content":"new code"}
}]}}

⚠️ 关键提示：UpdateNode 始终使用
"kind":"Node"
，即使节点属于某个工作流。不要使用
"kind":"CycleWorkflow"
——该值仅用于工作流级操作（
UpdateWorkflowDefinition
）。

不要传入未修改的字段，例如datasource或runtimeResource（服务端可能已有修正后的值，回传会导致错误）。

⚠️ 更新依赖：要通过UpdateNode修复或修改节点的依赖，请使用
spec.dependencies
——严禁使用
inputs.nodeOutputs
。示例：
json
{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"id":"NodeID"}],"dependencies":[{"nodeId":"current_node_name","depends":[{"type":"Normal","output":"project.upstream_node"}]}]}}

Update + Republish Workflow

更新 + 重新发布工作流

Complete end-to-end flow for modifying an existing node and deploying the change:

Find the node —
```
ListNodes(Name=xxx)
```
→ get Node ID
Update the node —
```
UpdateNode
```
with incremental spec (
```
kind:Node
```
, only
```
id
```
+ changed fields)

Publish —

CreatePipelineRun(type=Online, object_ids=[NodeID])

→ poll

GetPipelineRun

→ advance stages with

ExecPipelineRunStage

bash

undefined

修改现有节点并部署变更的完整端到端流程：

查找节点 — 执行
```
ListNodes(Name=xxx)
```
→ 获取节点ID
更新节点 — 使用增量Spec调用
```
UpdateNode
```
（
```
kind:Node
```
，仅包含
```
id
```
+ 变更字段）

发布 — 调用

CreatePipelineRun(type=Online, object_ids=[NodeID])

→ 轮询

GetPipelineRun

→ 使用

ExecPipelineRunStage

推进执行阶段

bash

undefined

Step 1: Find the node

aliyun dataworks-public ListNodes --ProjectId $PID --Name "my_node" --user-agent AlibabaCloud-Agent-Skills

→ Note the node Id from the response

Step 2: Update (incremental — only id + changed fields)

aliyun dataworks-public UpdateNode --ProjectId $PID --Id $NODE_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"id":"'$NODE_ID'","script":{"content":"SELECT 1;"}}]}}'
--user-agent AlibabaCloud-Agent-Skills

Step 3: Publish (see "Publishing and Deploying" below)

aliyun dataworks-public CreatePipelineRun --ProjectId $PID
--PipelineRunParam '{"type":"Online","objectIds":["'$NODE_ID'"]}'
--user-agent AlibabaCloud-Agent-Skills


> **Common wrong paths after UpdateNode** (all prohibited):
> - ❌ `DeployFile` / `SubmitFile` — legacy APIs, will fail or behave unexpectedly
> - ❌ `ImportWorkflowDefinition` — for initial bulk import only, not for updating or publishing
> - ❌ `ListFiles` / `GetFile` — legacy file model, use `ListNodes` / `GetNode` instead
> - ✅ `CreatePipelineRun` → `GetPipelineRun` → `ExecPipelineRunStage`

aliyun dataworks-public CreatePipelineRun --ProjectId $PID
--PipelineRunParam '{"type":"Online","objectIds":["'$NODE_ID'"]}'
--user-agent AlibabaCloud-Agent-Skills


> **UpdateNode后的常见错误路径（全部禁止）**：
> - ❌ `DeployFile` / `SubmitFile` — 旧版API，会失败或出现非预期行为
> - ❌ `ImportWorkflowDefinition` — 仅用于初始批量导入，不适用更新或发布
> - ❌ `ListFiles` / `GetFile` — 旧版文件模型，使用`ListNodes` / `GetNode`替代
> - ✅ 正确流程：`CreatePipelineRun` → `GetPipelineRun` → `ExecPipelineRunStage`

Publishing and Deploying

发布与部署

⚠️ NEVER use
DeployFile
,
SubmitFile
,
ListDeploymentPackages
,
GetDeploymentPackage
,
ListFiles
, or
GetFile
for deployment. These are all legacy APIs. Use ONLY:
CreatePipelineRun
→
GetPipelineRun
→
ExecPipelineRunStage
.

Publishing is an asynchronous multi-stage pipeline:

CreatePipelineRun(Type=Online, ObjectIds=[ID])

→ get PipelineRunId

Poll

GetPipelineRun

→ check

Pipeline.Status

and

Pipeline.Stages

When a Stage has
```
Init
```
status and all preceding Stages are
```
Success
```
→ call
```
ExecPipelineRunStage(Code=Stage.Code)
```
to advance
Until the Pipeline overall status becomes
```
Success
```
/
```
Fail
```

Key point: The Build stage runs automatically, but the Check and Deploy stages must be manually advanced. Detailed CLI examples and polling scripts are in references/deploy-guide.md.

CLI Note: The
aliyun
CLI returns JSON with the top-level key
Pipeline
(not SDK's
resp.body.pipeline
); Stages are in
Pipeline.Stages
.

⚠️ 严禁使用
DeployFile
、
SubmitFile
、
ListDeploymentPackages
、
GetDeploymentPackage
、
ListFiles
或
GetFile
进行部署，这些都是旧版API。仅允许使用：
CreatePipelineRun
→
GetPipelineRun
→
ExecPipelineRunStage
。

发布是异步多阶段流程：

调用

CreatePipelineRun(Type=Online, ObjectIds=[ID])

→ 获取PipelineRunId

轮询

GetPipelineRun

→ 检查

Pipeline.Status

和

Pipeline.Stages

当某个Stage状态为
```
Init
```
且所有前置Stage都为
```
Success
```
时 → 调用
```
ExecPipelineRunStage(Code=Stage.Code)
```
推进执行
直到Pipeline整体状态变为
```
Success
```
/
```
Fail
```

关键说明：Build阶段会自动运行，但Check和Deploy阶段需要手动推进。完整的CLI示例和轮询脚本参考references/deploy-guide.md。

CLI说明：
aliyun
CLI返回的JSON顶层key为
Pipeline
（而非SDK的
resp.body.pipeline
）；阶段信息在
Pipeline.Stages
中。

Common Node Types

常见节点类型

Use Case	command	contentFormat	Extension	datasource
Shell script	DIDE_SHELL	shell	.sh	—
MaxCompute SQL	ODPS_SQL	sql	.sql	odps
Python script	PYTHON	python	.py	—
Offline data sync	DI	json	.json	—
Hologres SQL	HOLOGRES_SQL	sql	.sql	hologres
Flink streaming SQL	FLINK_SQL_STREAM	sql	.json	flink
Flink batch SQL	FLINK_SQL_BATCH	sql	.json	flink
EMR Hive	EMR_HIVE	sql	.sql	emr
EMR Spark SQL	EMR_SPARK_SQL	sql	.sql	emr
Serverless Spark SQL	SERVERLESS_SPARK_SQL	sql	.sql	emr
StarRocks SQL	StarRocks	sql	.sql	starrocks
ClickHouse SQL	CLICK_SQL	sql	.sql	clickhouse
Virtual node	VIRTUAL	empty	.vi	—

Complete list (130+ types): references/nodetypes/index.md (searchable by command name, description, and category, with links to detailed documentation for each type)

When you cannot find a node type:

Check
```
references/nodetypes/index.md
```
and match by keyword

Glob("**/{keyword}*.md", path="references/nodetypes")

to locate the documentation directly

Use the
```
GetNode
```
API to get the spec of a similar node from the live environment as a reference
If none of the above works → fall back to
```
DIDE_SHELL
```
and use command-line tools within the Shell to accomplish the task

适用场景	command	contentFormat	后缀	datasource
Shell脚本	DIDE_SHELL	shell	.sh	—
MaxCompute SQL	ODPS_SQL	sql	.sql	odps
Python脚本	PYTHON	python	.py	—
离线数据同步	DI	json	.json	—
Hologres SQL	HOLOGRES_SQL	sql	.sql	hologres
Flink流处理SQL	FLINK_SQL_STREAM	sql	.json	flink
Flink批处理SQL	FLINK_SQL_BATCH	sql	.json	flink
EMR Hive	EMR_HIVE	sql	.sql	emr
EMR Spark SQL	EMR_SPARK_SQL	sql	.sql	emr
Serverless Spark SQL	SERVERLESS_SPARK_SQL	sql	.sql	emr
StarRocks SQL	StarRocks	sql	.sql	starrocks
ClickHouse SQL	CLICK_SQL	sql	.sql	clickhouse
虚拟节点	VIRTUAL	empty	.vi	—

完整列表（130+种类型）：references/nodetypes/index.md（支持按命令名称、描述、分类搜索，包含每种类型的详细文档链接）

找不到对应节点类型时：

检查
```
references/nodetypes/index.md
```
，按关键词匹配

使用

Glob("**/{keyword}*.md", path="references/nodetypes")

直接定位文档

使用
```
GetNode
```
API从现有环境获取相似节点的Spec作为参考
以上方式都无效时 → fallback到
```
DIDE_SHELL
```
，在Shell中使用命令行工具完成任务

Key Constraints

关键约束

script.path is required: Script path, must end with the node name. When creating, you can pass just the node name; the server automatically prepends the workflow prefix
Dependencies are configured via
spec.dependencies
(do NOT dual-write
```
inputs.nodeOutputs
```
): In
```
spec.dependencies
```
,
```
nodeId
```
is a self-reference — it must be the current node's own
name
(the node being created), NOT the upstream node.
```
depends[].output
```
is the upstream node's output (
```
${projectIdentifier}.UpstreamNodeName
```
). The upstream's
```
outputs.nodeOutputs[].data
```
and downstream's
```
depends[].output
```
must be character-for-character identical. Upstream nodes must declare
```
outputs.nodeOutputs
```
. ⚠️ Output names (
```
${projectIdentifier}.NodeName
```
) must be globally unique within the project — duplicates cause deployment failure
Immutable properties: A node's
```
command
```
(node type) cannot be changed after creation; if incorrect, inform the user and suggest creating a new node with the correct type
Updates must be incremental: Only pass id + fields to modify; do not pass unchanged fields like datasource/runtimeResource
datasource.type may be corrected by the server: e.g.,
```
flink
```
→
```
flink_serverless
```
; use the generic type when creating
Nodes can exist independently: Nodes can be created at the root level (without passing ContainerId) or belong to a workflow (pass ContainerId=WorkflowId). Whether to place in a workflow is the user's decision
Workflow command is always WORKFLOW:
```
script.runtime.command
```
must be
```
"WORKFLOW"
```
Deletion is not supported by this skill: This skill does not provide any delete operations. When creation or publishing fails, never attempt to "fix" the problem by deleting existing objects. Correct approach: diagnose the failure cause → inform the user of the specific conflict → let the user decide how to handle it (rename / update existing)
Name conflict check is required before creation: Before calling any Create API, use the corresponding List API to confirm the name is not duplicated (see "Environment Discovery"). Name conflicts will cause creation failure; duplicate node output names (
```
outputs.nodeOutputs[].data
```
) will cause dependency errors or publishing failure
Mutating operations require user confirmation: Except for Create and read-only queries (Get/List), all OpenAPI operations that modify existing objects (Update, Move, Rename, etc.) must be shown to the user with explicit confirmation obtained before execution. Confirmation information should include: operation type, target object name/ID, and key changes. These APIs must not be called before user confirmation. Delete and Abolish operations are not supported by this skill
Use only 2024-05-18 version APIs: All APIs in this skill are DataWorks 2024-05-18 version. Legacy APIs (
```
create-file
```
,
```
create-folder
```
,
```
CreateFlowProject
```
, etc.) are prohibited. If an API call returns an error, first check troubleshooting.md; do not fall back to legacy APIs
Stop on errors instead of brute-force retrying: If the same error code appears more than 2 consecutive times, the approach is wrong. Stop and analyze the error cause (check troubleshooting.md) instead of repeatedly retrying the same incorrect API with different parameters. Never fall back to legacy APIs (
```
create-file
```
,
```
create-business
```
, etc.) when a new API fails — review the FlowSpec Anti-Patterns table at the top of this document instead. Specific trap: If
```
aliyun help
```
output mentions "Plugin available but not installed" for dataworks-public, do NOT install the plugin — this leads to using deprecated kebab-case APIs. Instead, use PascalCase RPC directly (e.g.,
```
aliyun dataworks-public CreateNode
```
)
CLI parameter names must be checked in documentation, guessing is prohibited: Before calling an API, you must first check
```
references/api/{APIName}.md
```
to confirm parameter names. Common mistakes:
```
GetProject
```
's ID parameter is
```
--Id
```
(not
```
--ProjectId
```
);
```
UpdateNode
```
requires
```
--Id
```
. When unsure, verify with
```
aliyun dataworks-public {APIName} --help
```
PascalCase RPC only, no kebab-case: CLI commands must use
```
aliyun dataworks-public CreateNode
```
(PascalCase), never
```
aliyun dataworks-public create-node
```
(kebab-case). No plugin installation is needed. If the command is not found, upgrade
```
aliyun
```
CLI to >= 3.3.1
No wrapper scripts: Run each
```
aliyun
```
CLI command directly in the shell. Never create
```
.sh
```
/
```
.py
```
wrapper scripts to batch multiple API calls — this obscures errors and makes debugging impossible. Execute one API call at a time, check the response, then proceed
API response = success, not file output: Writing JSON spec files to disk is a preparation step, not completion. The task is complete only when the
```
aliyun
```
CLI returns a success response with a valid
```
Id
```
. If the API call fails, fix the spec and retry — do not declare the task done by saving local files
On error: re-read the Quick Start, do not invent new approaches: When an API call fails, compare your spec against the exact Quick Start example at the top of this document field by field. The most common cause is an invented FlowSpec field that does not exist. Copy the working example and modify only the values you need to change
Idempotency protection for write operations: DataWorks 2024-05-18 Create APIs (
```
CreateNode
```
,
```
CreateWorkflowDefinition
```
,
```
CreatePipelineRun
```
, etc.) do not support a
```
ClientToken
```
parameter. To prevent duplicate resource creation on network retries or timeouts:
- Before creating: Always run the pre-creation conflict check (List API) as described in "Environment Discovery" — this is the primary idempotency gate
- After a network error or timeout on Create: Do NOT blindly retry. First call the corresponding List/Get API to check whether the resource was actually created (the server may have processed the request despite the client-side error). Only retry if the resource does not exist
- Record RequestId: Every API response includes a
```
RequestId
```
  field. Log it so that duplicate-creation incidents can be traced and resolved via Alibaba Cloud support

script.path为必填字段：脚本路径必须以节点名称结尾。创建时可仅传入节点名称，服务端会自动添加工作流前缀
依赖通过
spec.dependencies
配置（不要同时写
```
inputs.nodeOutputs
```
）：在
```
spec.dependencies
```
中，
```
nodeId
```
是自引用——必须是当前创建节点自身的
name
，而非上游节点的名称。
```
depends[].output
```
是上游节点的输出（
```
${projectIdentifier}.UpstreamNodeName
```
）。上游的
```
outputs.nodeOutputs[].data
```
与下游的
```
depends[].output
```
必须完全一致。上游节点必须声明
```
outputs.nodeOutputs
```
。⚠️ 输出名称（
```
${projectIdentifier}.NodeName
```
）必须在项目内全局唯一——重复会导致部署失败
不可变属性：节点的
```
command
```
（节点类型）创建后不可修改，如果类型错误，告知用户并建议创建正确类型的新节点
更新必须为增量更新：仅传入id + 需要修改的字段；不要传入未修改的字段，例如datasource/runtimeResource
datasource.type可能被服务端修正：例如
```
flink
```
→
```
flink_serverless
```
；创建时使用通用类型即可
节点可独立存在：节点可以创建在根层级（不传入ContainerId），也可以属于某个工作流（传入ContainerId=WorkflowId）。是否放入工作流由用户决定
工作流的command固定为WORKFLOW：
```
script.runtime.command
```
必须为
```
"WORKFLOW"
```
本Skill不支持删除操作：本Skill不提供任何删除能力。当创建或发布失败时，严禁通过删除现有对象「修复」问题。正确做法：诊断失败原因 → 告知用户具体冲突 → 让用户决定处理方式（重命名/更新现有对象）
创建前必须检查名称冲突：调用任何Create API前，使用对应的List API确认名称未重复（参考「环境探查」部分）。名称冲突会导致创建失败；重复的节点输出名称（
```
outputs.nodeOutputs[].data
```
）会导致依赖错误或发布失败
变更操作需要用户确认：除了创建和只读查询（Get/List）外，所有修改现有对象的OpenAPI操作（Update、Move、Rename等）必须向用户展示操作信息并获得明确确认后再执行。确认信息需包含：操作类型、目标对象名称/ID、核心变更内容。未获得用户确认前不得调用这些API。本Skill不支持删除和废弃操作
仅使用2024-05-18版本API：本Skill中的所有API都是2024-05-18版本的DataWorks API，严禁使用旧版API（
```
create-file
```
、
```
create-folder
```
、
```
CreateFlowProject
```
等）。如果API调用返回错误，首先查看troubleshooting.md；不要回退使用旧版API
遇到错误停止操作，不要暴力重试：如果同一错误码连续出现2次以上，说明操作方式错误。停止操作分析错误原因（查看troubleshooting.md），不要使用不同参数反复重试同一个错误的API。新API调用失败时严禁回退使用旧版API（
```
create-file
```
、
```
create-business
```
等），请重新查看本文档顶部的FlowSpec反模式表。特殊陷阱：如果
```
aliyun help
```
输出提示dataworks-public「有可用插件但未安装」，不要安装插件——这会导致使用已弃用的短横线命名API。请直接使用PascalCase RPC调用（例如
```
aliyun dataworks-public CreateNode
```
）
CLI参数名称必须查阅文档确认，严禁猜测：调用API前，必须先查看
```
references/api/{APIName}.md
```
确认参数名称。常见错误：
```
GetProject
```
的ID参数是
```
--Id
```
（不是
```
--ProjectId
```
）；
```
UpdateNode
```
需要
```
--Id
```
参数。如果不确定，使用
```
aliyun dataworks-public {APIName} --help
```
验证
仅使用PascalCase RPC，禁止短横线命名：CLI命令必须使用
```
aliyun dataworks-public CreateNode
```
（PascalCase格式），严禁使用
```
aliyun dataworks-public create-node
```
（短横线格式）。无需安装插件。如果提示命令不存在，将
```
aliyun
```
CLI升级到 >= 3.3.1版本
禁止使用封装脚本：直接在Shell中执行每条
```
aliyun
```
CLI命令。严禁创建
```
.sh
```
/
```
.py
```
封装脚本批量调用多个API——这会隐藏错误，导致无法调试。每次执行一个API调用，检查响应后再继续
API响应成功才算完成，文件输出不算：将JSON Spec文件写入磁盘是准备步骤，不是任务完成。只有当
```
aliyun
```
CLI返回包含有效
```
Id
```
的成功响应时，任务才算完成。如果API调用失败，修复Spec并重试——不要以本地保存了文件为由宣称任务完成
遇到错误重新阅读快速入门，不要发明新方法：API调用失败时，逐字段对照本文档顶部的快速入门示例检查你的Spec。最常见的错误原因是使用了不存在的FlowSpec字段。复制可用的示例，仅修改你需要变更的值即可
写操作的幂等性保护：2024-05-18版本的DataWorks Create API（
```
CreateNode
```
、
```
CreateWorkflowDefinition
```
、
```
CreatePipelineRun
```
等）不支持
```
ClientToken
```
参数。为防止网络重试或超时导致重复创建资源：
- 创建前：始终执行「环境探查」部分描述的创建前冲突检查（List API）——这是主要的幂等性防护
- Create操作出现网络错误或超时后：不要盲目重试。首先调用对应的List/Get API检查资源是否已实际创建（即使客户端报错，服务端可能已经处理了请求）。仅当资源不存在时再重试
- 记录RequestId：每个API响应都包含
```
RequestId
```
  字段，记录该值，以便出现重复创建问题时可以通过阿里云支持排查解决

API Quick Reference

API快速参考

API Version: All APIs listed below are DataWorks 2024-05-18 version. CLI invocation format:
aliyun dataworks-public {APIName} --Parameter --user-agent AlibabaCloud-Agent-Skills
(PascalCase RPC direct invocation; DataWorks 2024-05-18 does not yet have plugin mode). Only use the APIs listed in the table below; do not search for or use other DataWorks APIs.

Detailed parameters and code templates for each API are in

references/api/{APIName}.md

. If a call returns an error, you can get the latest definition from

https://api.aliyun.com/meta/v1/products/dataworks-public/versions/2024-05-18/apis/{APIName}/api.json

API版本：下方列出的所有API都是2024-05-18版本的DataWorks API。CLI调用格式：
aliyun dataworks-public {APIName} --Parameter --user-agent AlibabaCloud-Agent-Skills
（PascalCase RPC直接调用；2024-05-18版本的DataWorks暂不支持插件模式）。仅允许使用下表列出的API，不要搜索或使用其他DataWorks API。

每个API的详细参数和代码模板参考

references/api/{APIName}.md

。如果调用返回错误，可以从

https://api.aliyun.com/meta/v1/products/dataworks-public/versions/2024-05-18/apis/{APIName}/api.json

获取最新定义。

Components

组件

API	Description
CreateComponent	Create a component
GetComponent	Get component details
UpdateComponent	Update a component
ListComponents	List components

API	描述
CreateComponent	创建组件
GetComponent	获取组件详情
UpdateComponent	更新组件
ListComponents	列出组件

Nodes

节点

API	Description
CreateNode	Create a data development node. project_id + scene + spec, optional container_id
UpdateNode	Update node information. Incremental update, only pass id + fields to change
MoveNode	Move a node to a specified path
RenameNode	Rename a node
GetNode	Get node details, returns the complete spec
ListNodes	List nodes, supports filtering by workflow
ListNodeDependencies	List a node's dependency nodes

API	描述
CreateNode	创建数据开发节点。参数：project_id + scene + spec，可选container_id
UpdateNode	更新节点信息。增量更新，仅传入id + 需要变更的字段
MoveNode	将节点移动到指定路径
RenameNode	重命名节点
GetNode	获取节点详情，返回完整Spec
ListNodes	列出节点，支持按工作流筛选
ListNodeDependencies	列出节点的依赖节点

Workflow Definitions

工作流定义

API	Description
CreateWorkflowDefinition	Create a workflow. project_id + spec
ImportWorkflowDefinition	Import a workflow (initial bulk import ONLY — do NOT use for updates or publishing; use `UpdateNode` + `CreatePipelineRun` instead)
UpdateWorkflowDefinition	Update workflow information, incremental update
MoveWorkflowDefinition	Move a workflow to a target path
RenameWorkflowDefinition	Rename a workflow
GetWorkflowDefinition	Get workflow details
ListWorkflowDefinitions	List workflows, filter by type

API	描述
CreateWorkflowDefinition	创建工作流。参数：project_id + spec
ImportWorkflowDefinition	导入工作流（仅用于初始批量导入——不要用于更新或发布；更新/发布请使用 `UpdateNode` + `CreatePipelineRun` ）
UpdateWorkflowDefinition	更新工作流信息，增量更新
MoveWorkflowDefinition	将工作流移动到目标路径
RenameWorkflowDefinition	重命名工作流
GetWorkflowDefinition	获取工作流详情
ListWorkflowDefinitions	列出工作流，支持按类型筛选

Resources

资源

API	Description
CreateResource	Create a file resource
UpdateResource	Update file resource information, incremental update
MoveResource	Move a file resource to a specified directory
RenameResource	Rename a file resource
GetResource	Get file resource details
ListResources	List file resources

API	描述
CreateResource	创建文件资源
UpdateResource	更新文件资源信息，增量更新
MoveResource	将文件资源移动到指定目录
RenameResource	重命名文件资源
GetResource	获取文件资源详情
ListResources	列出文件资源

Functions

函数

API	Description
CreateFunction	Create a UDF function
UpdateFunction	Update UDF function information, incremental update
MoveFunction	Move a function to a target path
RenameFunction	Rename a function
GetFunction	Get function details
ListFunctions	List functions

API	描述
CreateFunction	创建UDF函数
UpdateFunction	更新UDF函数信息，增量更新
MoveFunction	将函数移动到目标路径
RenameFunction	重命名函数
GetFunction	获取函数详情
ListFunctions	列出函数

Publishing Pipeline

发布流水线

API	Description
CreatePipelineRun	Create a publishing pipeline. type=Online/Offline
ExecPipelineRunStage	Execute a specified stage of the publishing pipeline, async requires polling
GetPipelineRun	Get publishing pipeline details, returns Stages status
ListPipelineRuns	List publishing pipelines
ListPipelineRunItems	Get publishing content

API	描述
CreatePipelineRun	创建发布流水线。type=Online/Offline
ExecPipelineRunStage	执行发布流水线的指定阶段，异步操作需要轮询
GetPipelineRun	获取发布流水线详情，返回阶段状态
ListPipelineRuns	列出发布流水线
ListPipelineRunItems	获取发布内容

Auxiliary Queries

辅助查询

API	Description
GetProject	Get projectIdentifier by id
ListDataSources	List data sources
ListComputeResources	List compute engine bindings (EMR, Hologres, StarRocks, etc.) — supplements ListDataSources
ListResourceGroups	List resource groups

API	描述
GetProject	通过ID获取projectIdentifier
ListDataSources	列出数据源
ListComputeResources	列出计算引擎绑定（EMR、Hologres、StarRocks等）——补充ListDataSources的返回结果
ListResourceGroups	列出资源组

Reference Documentation

参考文档

Scenario	Document
Complete list of APIs and CLI commands	references/related-apis.md
RAM permission policy configuration	references/ram-policies.md
Operation verification methods	references/verification-method.md
Acceptance criteria and test cases	references/acceptance-criteria.md
CLI installation and configuration guide	references/cli-installation-guide.md
Node type index (130+ types)	references/nodetypes/index.md
FlowSpec field reference	references/flowspec-guide.md
Workflow development	references/workflow-guide.md
Scheduling configuration	references/scheduling-guide.md
Publishing and unpublishing	references/deploy-guide.md
DI data integration	references/di-guide.md
Troubleshooting	references/troubleshooting.md
Complete examples	assets/templates/README.md

适用场景	文档路径
API和CLI命令完整列表	references/related-apis.md
RAM权限策略配置	references/ram-policies.md
操作验证方法	references/verification-method.md
验收标准和测试用例	references/acceptance-criteria.md
CLI安装和配置指南	references/cli-installation-guide.md
节点类型索引（130+种类型）	references/nodetypes/index.md
FlowSpec字段参考	references/flowspec-guide.md
工作流开发	references/workflow-guide.md
调度配置	references/scheduling-guide.md
发布与下线	references/deploy-guide.md
DI数据集成	references/di-guide.md
问题排查	references/troubleshooting.md
完整示例	assets/templates/README.md