alibabacloud-dataworks-datastudio-develop
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDataWorks Data Development
DataWorks 数据开发
⚡ MANDATORY: Read Before Any API Call
⚡ 强制要求:调用任何API前必读
These absolute rules are NOT optional — violating ANY ONE means the task WILL FAIL:
-
FIRST THING: Switch CLI profile. Before ANYcommand, run
aliyun. If multiple profiles exist, runaliyun configure listto select the correct one. Priority: prefer a profile whose name containsaliyun configure switch --profile <name>(case-insensitive); otherwise usedataworks. Do NOT skip this step. Do NOT run anydefaultcommand before switching. NEVER read/echo/print AK/SK values.aliyun dataworks-public -
NEVER install plugins. Ifshows "Plugin available but not installed" for dataworks-public → IGNORE IT. Do NOT run
aliyun help. PascalCase RPC works without plugins (requires CLI >= 3.3.1).aliyun plugin install -
ONLY use PascalCase RPC. Every DataWorks API call must look like:. Never use kebab-case (
aliyun dataworks-public CreateNode --ProjectId ... --Spec '...',create-file,create-node).create-business -
ONLY use these APIs for create:→
CreateWorkflowDefinition(per node, withCreateNode) →--ContainerId(to deploy).CreatePipelineRun -
ONLY use these APIs for update:(incremental,
UpdateNode) →kind:Node(to deploy). Never useCreatePipelineRun,ImportWorkflowDefinition, orDeployFilefor updates or publishing. 4a. ONLY use these APIs for deploy/publish:SubmitFile(Type=Online, ObjectIds=[ID]) →CreatePipelineRun(poll) →GetPipelineRun(advance). NEVER useExecPipelineRunStage,DeployFile,SubmitFile, orListDeploymentPackages— these are all legacy APIs that will fail.GetDeploymentPackage -
Ifor
CreateWorkflowDefinitionreturns an error, FIX THE SPEC — do NOT fall back to legacy APIs. Error 58014884415 means your FlowSpec JSON format is wrong (e.g., usedCreateNodeinstead of"kind":"Workflow", or"kind":"CycleWorkflow"instead of"apiVersion"). Copy the exact Spec from the Quick Start below."version" -
Run CLI commands directly — do NOT create wrapper scripts. Never createscripts to batch API calls. Run each
.shcommand directly in the shell. Wrapper scripts add complexity and obscure errors.aliyun -
Saving files locally is NOT completion. The task is only done when the API returns a success response (e.g.,from
{"Id": "..."}/CreateWorkflowDefinition). Writing JSON files to disk without calling the API means the workflow/node was NOT created. Never claim success without a real API response.CreateNode -
NEVER simulate, mock, or fabricate API responses. If credentials are missing, the CLI is misconfigured, or an API call returns an error — report the exact error message to the user and STOP. Do NOT generate fake JSON responses, write simulation documents, echo hardcoded output, or claim success in any form. A simulated success is worse than an explicit failure.
-
Credential failure = hard stop. Ifshows empty or invalid credentials, or any CLI call returns
aliyun configure list,InvalidAccessKeyId, or similar auth errors — STOP immediately. Tell the user to configure valid credentials outside this session. Do NOT attempt workarounds (writing config.json manually, using placeholder credentials, proceeding without auth). No subsequent API calls may be attempted until credentials are verified working.access_key_id must be assigned -
ONLY use APIs listed in this document. Every API you call must appear in the API Quick Reference table below. If you need an operation that is not listed, check the table again — the operation likely exists under a different name. NEVER invent API names (e.g.,,
CreateDeployment,ApproveDeploymentdo NOT exist). If you cannot find the right API, ask the user.DeployNode
If you catch yourself typing ANY of these, STOP IMMEDIATELY and re-read the Quick Start below:
, , , , , , , , , , , , , , , , , , , , ,
create-filecreate-businesscreate-folderCreateFolderCreateFileUpdateFileplugin install--file-type/bizroot/workflowrootDeployFileSubmitFileListFilesGetFileListDeploymentPackagesGetDeploymentPackageCreateDeploymentApproveDeploymentDeployNodeCreateFlowCreateFileDependsCreateSchedule以下规则为强制性要求,没有例外——违反任意一条都会导致任务失败:
-
首要操作:切换CLI配置文件。 在运行任何命令前,先执行
aliyun。如果存在多个配置文件,执行aliyun configure list选择正确的配置文件。优先级:优先选择名称包含aliyun configure switch --profile <name>的配置文件(不区分大小写),否则使用dataworks。请勿跳过此步骤,切换配置前不要运行任何default命令。 严禁读取/回显/打印AK/SK值。aliyun dataworks-public -
严禁安装插件。 如果提示dataworks-public「有可用插件但未安装」→ 直接忽略,不要运行
aliyun help。无需插件即可使用PascalCase RPC调用(要求CLI版本 >= 3.3.1)。aliyun plugin install -
仅允许使用PascalCase格式的RPC调用。 所有DataWorks API调用格式必须为:。严禁使用短横线命名法(kebab-case)的命令(如
aliyun dataworks-public CreateNode --ProjectId ... --Spec '...'、create-file、create-node)。create-business -
创建操作仅允许使用以下API:→
CreateWorkflowDefinition(每个节点需指定CreateNode) →--ContainerId(用于部署)。CreatePipelineRun -
更新操作仅允许使用以下API:(增量更新,
UpdateNode) →kind:Node(用于部署)。严禁使用CreatePipelineRun、ImportWorkflowDefinition或DeployFile进行更新或发布操作。 4a. 部署/发布操作仅允许使用以下API:SubmitFile(Type=Online, ObjectIds=[ID]) →CreatePipelineRun(轮询状态) →GetPipelineRun(推进执行阶段)。严禁使用ExecPipelineRunStage、DeployFile、SubmitFile或ListDeploymentPackages——这些都是旧版API,会调用失败。GetDeploymentPackage -
如果或
CreateWorkflowDefinition返回错误,请修复Spec——不要回退使用旧版API。 错误码58014884415表示你的FlowSpec JSON格式错误(例如使用了CreateNode而非"kind":"Workflow",或用了"kind":"CycleWorkflow"而非"apiVersion"),请从下方快速入门部分复制正确的Spec格式。"version" -
直接运行CLI命令——不要创建封装脚本。 严禁创建脚本批量调用API,请直接在Shell中执行每条
.sh命令。封装脚本会增加复杂度,隐藏错误信息。aliyun -
本地保存文件不等于任务完成。 只有当API返回成功响应(例如/
CreateWorkflowDefinition返回CreateNode)时,任务才算完成。仅将JSON文件写入磁盘而不调用API,意味着工作流/节点并未实际创建。没有真实API响应的情况下,严禁宣称任务成功。{"Id": "..."} -
严禁模拟、伪造或虚构API响应。 如果缺少凭证、CLI配置错误或API调用返回错误,请向用户上报准确的错误信息并停止操作。不要生成伪造的JSON响应、编写模拟文档、回显硬编码输出,或以任何形式宣称任务成功。模拟成功比明确的失败更严重。
-
凭证失败=立即停止。 如果显示凭证为空或无效,或任何CLI调用返回
aliyun configure list、InvalidAccessKeyId等鉴权错误,请立即停止操作,告知用户在当前会话外配置有效的凭证。不要尝试变通方案(手动编写config.json、使用占位凭证、无鉴权继续操作)。在验证凭证有效前,不得尝试后续任何API调用。access_key_id must be assigned -
仅允许使用本文档列出的API。 你调用的所有API都必须出现在下方的API快速参考表中。如果你需要的操作未在列表中,请再次核对表格——该操作可能使用了其他名称。严禁虚构API名称(例如、
CreateDeployment、ApproveDeployment均不存在)。如果找不到合适的API,请咨询用户。DeployNode
如果你发现自己正在输入以下任意内容,请立即停止并重新阅读下方的快速入门:
, , , , , , , , , , , , , , , , , , ,
create-filecreate-businesscreate-folderCreateFolderCreateFileUpdateFileplugin install--file-type/bizroot/workflowrootDeployFileSubmitFileListDeploymentPackagesGetDeploymentPackageCreateDeploymentApproveDeploymentDeployNodeCreateFlowCreateFileDependsCreateSchedule⛔ Prohibited Legacy APIs
⛔ 禁止使用的旧版API
This skill uses DataWorks OpenAPI version 2024-05-18. The following legacy APIs and patterns are strictly prohibited:
| Prohibited Legacy Operation | Correct Replacement |
|---|---|
| |
| No folder needed, use |
| |
| |
| |
Any operation based on folder paths ( | Specify path via |
| |
| |
| |
| No plugin installation needed, use PascalCase RPC direct invocation |
How to tell — STOP if any of these are true:
- You are typing ,
create-file,create-business, or any kebab-case DataWorks command → WRONG. Use PascalCase RPC:create-folder,CreateNodeCreateWorkflowDefinition - You are running → WRONG. No plugin needed; PascalCase RPC direct invocation works out of the box (requires CLI >= 3.3.1)
aliyun plugin install - You are constructing folder paths (,
/bizroot) → WRONG. Use/workflowrootin FlowSpecscript.path - Your FlowSpec contains ,
apiVersion(at node level), ortype→ WRONG. See the correct format belowschedule
CLI Format: ALL DataWorks 2024-05-18 API calls use PascalCase RPC direct invocation:This requiresaliyun dataworks-public CreateNode --ProjectId ... --Spec '...' --user-agent AlibabaCloud-Agent-SkillsCLI >= 3.3.1. No plugin installation is needed.aliyun
本Skill使用2024-05-18版本的DataWorks OpenAPI,严格禁止使用以下旧版API和模式:
| 禁止使用的旧版操作 | 正确替换方案 |
|---|---|
| |
| 无需创建文件夹,直接使用 |
| |
| |
| 创建用 |
任何基于文件夹路径的操作( | 通过FlowSpec中的 |
| |
| |
| |
| 无需安装插件,直接使用PascalCase RPC调用 |
判断标准:出现以下任意情况请立即停止操作:
- 你正在输入、
create-file、create-business或任何短横线命名的DataWorks命令 → 错误,请使用PascalCase RPC:create-folder、CreateNodeCreateWorkflowDefinition - 你正在运行→ 错误,无需插件,PascalCase RPC可直接调用(要求CLI版本 >= 3.3.1)
aliyun plugin install - 你正在拼接文件夹路径(、
/bizroot) → 错误,请使用FlowSpec中的/workflowrootscript.path - 你的FlowSpec包含、节点层级的
apiVersion或type→ 错误,参考下方的正确格式schedule
CLI格式要求:所有2024-05-18版本的DataWorks API调用都使用PascalCase RPC直接调用:要求aliyun dataworks-public CreateNode --ProjectId ... --Spec '...' --user-agent AlibabaCloud-Agent-SkillsCLI版本 >= 3.3.1,无需安装任何插件。aliyun
⚠️ FlowSpec Anti-Patterns
⚠️ FlowSpec反模式
Agents commonly invent wrong FlowSpec fields. The correct format is shown in the Quick Start below.
| ❌ WRONG | ✅ CORRECT | Notes |
|---|---|---|
| | FlowSpec uses |
| | Only |
| | FlowSpec has no |
| | Node type goes in |
| | Scheduling uses |
| | |
Agent常错误使用不存在的FlowSpec字段,下方快速入门展示了正确格式。
| ❌ 错误写法 | ✅ 正确写法 | 说明 |
|---|---|---|
| | FlowSpec使用 |
| 工作流用 | 仅 |
| | FlowSpec没有 |
节点层级的 | | 节点类型放在 |
| | 调度配置使用 |
未带 | | |
🚀 Quick Start: End-to-End Workflow Creation
🚀 快速入门:端到端工作流创建
Complete working example — create a scheduled workflow with 2 dependent nodes:
bash
undefined完整可用示例——创建包含2个依赖节点的定时工作流:
bash
undefinedStep 1: Create the workflow container
Step 1: Create the workflow container
aliyun dataworks-public CreateWorkflowDefinition
--ProjectId 585549
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}}}]}}'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}}}]}}'
--user-agent AlibabaCloud-Agent-Skills
aliyun dataworks-public CreateWorkflowDefinition
--ProjectId 585549
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}}}]}}'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}}}]}}'
--user-agent AlibabaCloud-Agent-Skills
→ Returns {"Id": "WORKFLOW_ID", ...}
→ Returns {"Id": "WORKFLOW_ID", ...}
Step 2: Create upstream node (Shell) inside the workflow
Step 2: Create upstream node (Shell) inside the workflow
IMPORTANT: Before creating, verify output name "my_project.check_data" is not already used by another node (ListNodes)
IMPORTANT: Before creating, verify output name "my_project.check_data" is not already used by another node (ListNodes)
aliyun dataworks-public CreateNode
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"check_data","id":"check_data","script":{"path":"check_data","runtime":{"command":"DIDE_SHELL"},"content":"#!/bin/bash\necho done"},"outputs":{"nodeOutputs":[{"data":"my_project.check_data","artifactType":"NodeOutput"}]}}]}}'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"check_data","id":"check_data","script":{"path":"check_data","runtime":{"command":"DIDE_SHELL"},"content":"#!/bin/bash\necho done"},"outputs":{"nodeOutputs":[{"data":"my_project.check_data","artifactType":"NodeOutput"}]}}]}}'
--user-agent AlibabaCloud-Agent-Skills
aliyun dataworks-public CreateNode
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"check_data","id":"check_data","script":{"path":"check_data","runtime":{"command":"DIDE_SHELL"},"content":"#!/bin/bash\necho done"},"outputs":{"nodeOutputs":[{"data":"my_project.check_data","artifactType":"NodeOutput"}]}}]}}'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"check_data","id":"check_data","script":{"path":"check_data","runtime":{"command":"DIDE_SHELL"},"content":"#!/bin/bash\necho done"},"outputs":{"nodeOutputs":[{"data":"my_project.check_data","artifactType":"NodeOutput"}]}}]}}'
--user-agent AlibabaCloud-Agent-Skills
→ Returns {"Id": "NODE_A_ID", ...}
→ Returns {"Id": "NODE_A_ID", ...}
Step 3: Create downstream node (SQL) with dependency on upstream
Step 3: Create downstream node (SQL) with dependency on upstream
NOTE on dependencies: "nodeId" is the CURRENT node's name (self-reference), "output" is the UPSTREAM node's output
NOTE on dependencies: "nodeId" is the CURRENT node's name (self-reference), "output" is the UPSTREAM node's output
aliyun dataworks-public CreateNode
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"transform_data","id":"transform_data","script":{"path":"transform_data","runtime":{"command":"ODPS_SQL"},"content":"SELECT 1;"},"outputs":{"nodeOutputs":[{"data":"my_project.transform_data","artifactType":"NodeOutput"}]}}],"dependencies":[{"nodeId":"transform_data","depends":[{"type":"Normal","output":"my_project.check_data"}]}]}}'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"transform_data","id":"transform_data","script":{"path":"transform_data","runtime":{"command":"ODPS_SQL"},"content":"SELECT 1;"},"outputs":{"nodeOutputs":[{"data":"my_project.transform_data","artifactType":"NodeOutput"}]}}],"dependencies":[{"nodeId":"transform_data","depends":[{"type":"Normal","output":"my_project.check_data"}]}]}}'
--user-agent AlibabaCloud-Agent-Skills
aliyun dataworks-public CreateNode
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"transform_data","id":"transform_data","script":{"path":"transform_data","runtime":{"command":"ODPS_SQL"},"content":"SELECT 1;"},"outputs":{"nodeOutputs":[{"data":"my_project.transform_data","artifactType":"NodeOutput"}]}}],"dependencies":[{"nodeId":"transform_data","depends":[{"type":"Normal","output":"my_project.check_data"}]}]}}'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Scene DATAWORKS_PROJECT
--ContainerId WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"name":"transform_data","id":"transform_data","script":{"path":"transform_data","runtime":{"command":"ODPS_SQL"},"content":"SELECT 1;"},"outputs":{"nodeOutputs":[{"data":"my_project.transform_data","artifactType":"NodeOutput"}]}}],"dependencies":[{"nodeId":"transform_data","depends":[{"type":"Normal","output":"my_project.check_data"}]}]}}'
--user-agent AlibabaCloud-Agent-Skills
Step 4: Set workflow schedule (daily at 00:30)
Step 4: Set workflow schedule (daily at 00:30)
aliyun dataworks-public UpdateWorkflowDefinition
--ProjectId 585549
--Id WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}},"trigger":{"cron":"00 30 00 * * ?","timezone":"Asia/Shanghai","type":"Scheduler"}}]}}'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Id WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}},"trigger":{"cron":"00 30 00 * * ?","timezone":"Asia/Shanghai","type":"Scheduler"}}]}}'
--user-agent AlibabaCloud-Agent-Skills
aliyun dataworks-public UpdateWorkflowDefinition
--ProjectId 585549
--Id WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}},"trigger":{"cron":"00 30 00 * * ?","timezone":"Asia/Shanghai","type":"Scheduler"}}]}}'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Id WORKFLOW_ID
--Spec '{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{"name":"my_etl_workflow","script":{"path":"my_etl_workflow","runtime":{"command":"WORKFLOW"}},"trigger":{"cron":"00 30 00 * * ?","timezone":"Asia/Shanghai","type":"Scheduler"}}]}}'
--user-agent AlibabaCloud-Agent-Skills
Step 5: Deploy the workflow online (REQUIRED — workflow is not active until deployed)
Step 5: Deploy the workflow online (REQUIRED — workflow is not active until deployed)
aliyun dataworks-public CreatePipelineRun
--ProjectId 585549
--Type Online --ObjectIds '["WORKFLOW_ID"]'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Type Online --ObjectIds '["WORKFLOW_ID"]'
--user-agent AlibabaCloud-Agent-Skills
aliyun dataworks-public CreatePipelineRun
--ProjectId 585549
--Type Online --ObjectIds '["WORKFLOW_ID"]'
--user-agent AlibabaCloud-Agent-Skills
--ProjectId 585549
--Type Online --ObjectIds '["WORKFLOW_ID"]'
--user-agent AlibabaCloud-Agent-Skills
→ Returns {"Id": "PIPELINE_RUN_ID", ...}
→ Returns {"Id": "PIPELINE_RUN_ID", ...}
Then poll GetPipelineRun and advance stages with ExecPipelineRunStage
Then poll GetPipelineRun and advance stages with ExecPipelineRunStage
(see "Publishing and Deploying" section below for full polling flow)
(see "Publishing and Deploying" section below for full polling flow)
> **Key pattern**: CreateWorkflowDefinition → CreateNode (with ContainerId + outputs.nodeOutputs) → UpdateWorkflowDefinition (add trigger) → **CreatePipelineRun (deploy)**. Each node within a workflow MUST have `outputs.nodeOutputs`. **The workflow is NOT active until deployed via CreatePipelineRun.**
>
> **Dependency wiring summary**: In `spec.dependencies`, `nodeId` is the **current node's own name** (self-reference, NOT the upstream node), and `depends[].output` is the **upstream node's output** (`projectIdentifier.upstream_node_name`). The `outputs.nodeOutputs[].data` value of the upstream node and the `depends[].output` value of the downstream node must be **character-for-character identical**, otherwise the dependency silently fails.
> **核心模式**:CreateWorkflowDefinition → CreateNode(指定ContainerId + outputs.nodeOutputs) → UpdateWorkflowDefinition(添加触发规则) → **CreatePipelineRun(部署)**。工作流中的每个节点必须配置`outputs.nodeOutputs`。**只有通过CreatePipelineRun部署后,工作流才会生效。**
>
> **依赖配置说明**:在`spec.dependencies`中,`nodeId`是**当前节点自身的名称**(自引用,不是上游节点的名称),`depends[].output`是**上游节点的输出标识**(`projectIdentifier.upstream_node_name`)。上游节点的`outputs.nodeOutputs[].data`值与下游节点的`depends[].output`值必须**完全一致**,否则依赖会静默失败。Core Workflow
核心工作流
Environment Discovery (Required Before Creating)
环境探查(创建前必填)
Step 0 — CLI Profile Switch (MUST be the very first action):
Run . If multiple profiles exist, run (prefer -named profile, otherwise ). No command may run before this.
aliyun configure listaliyun configure switch --profile <name>dataworksdefaultaliyun dataworks-publicIf credentials are empty or invalid, STOP HERE. Do not proceed with any API calls. Report the error to the user and instruct them to configure valid credentials outside this session (viaor environment variables). Do not attempt workarounds such as writing config files manually or using placeholder values.aliyun configure
Before creating nodes or workflows, understand the project's existing environment. It is recommended to use a subagent to execute queries, returning only a summary to the main Agent to avoid raw data consuming too much context.
Subagent tasks:
- Call to get the workflow list
ListWorkflowDefinitions - Call to get the existing node list
ListNodes - Call AND
ListDataSourcesto get all available data sources and compute engine bindings (EMR, Hologres, StarRocks, etc.).ListComputeResourcessupplementsListComputeResourceswhich may not return compute-engine-type resourcesListDataSources - Return a summary (do not return raw data):
- Workflow inventory: name + number of contained nodes + type (scheduled/manual)
- Existing nodes relevant to the current task: name + type + parent workflow
- Available data sources + compute resources (name, type) — combine both lists
- Suggested target workflow (if inferable from the task description)
Based on the summary, the main Agent decides: target workflow (existing or new, user decides), node naming (follow existing conventions), and dependencies (infer from SQL references and existing nodes).
Pre-creation conflict check (required, applies to all object types):
- Name duplication check: Before creating any object, use the corresponding List API to check if an object with the same name already exists:
- Workflow →
ListWorkflowDefinitions - Node → (node names are globally unique within a project)
ListNodes - Resource →
ListResources - Function →
ListFunctions - Component →
ListComponents
- Workflow →
- Handling existing objects: Inform the user and ask how to proceed (use existing / rename / update existing). Direct deletion of existing objects is prohibited
- Output name conflict check (CRITICAL): A node's (format
outputs.nodeOutputs[].data) must be globally unique within the project, even across different workflows. Use${projectIdentifier}.NodeNameand inspectListNodes --Name NodeNamein the response to verify. If the output name conflicts with an existing node, the conflict must be resolved before creation — otherwise deployment will fail withOutputs.NodeOutputs[].Data(see troubleshooting.md #11b)"can not exported multiple nodes into the same output"
Certainty level determines interaction approach:
- Certain information → Use directly, do not ask the user
- Confident inference → Proceed, explain the reasoning in the output
- Uncertain information → Must ask the user
步骤0 — CLI配置文件切换(必须是第一个操作):
运行,如果存在多个配置文件,执行(优先选择名称含的配置文件,否则用)。切换完成前不得运行任何命令。
aliyun configure listaliyun configure switch --profile <name>dataworksdefaultaliyun dataworks-public如果凭证为空或无效,请立即停止操作。 不要继续调用任何API,向用户上报错误并指导用户在当前会话外配置有效凭证(通过或环境变量)。不要尝试变通方案,例如手动编写配置文件或使用占位值。aliyun configure
在创建节点或工作流前,需要了解项目的现有环境。建议使用子Agent执行查询,仅向主Agent返回摘要信息,避免原始数据占用过多上下文。
子Agent任务:
- 调用获取工作流列表
ListWorkflowDefinitions - 调用获取现有节点列表
ListNodes - 调用和
ListDataSources获取所有可用数据源和计算引擎绑定(EMR、Hologres、StarRocks等)。ListComputeResources可补充ListComputeResources可能不返回的计算引擎类资源ListDataSources - 返回摘要(不要返回原始数据):
- 工作流清单:名称 + 包含节点数 + 类型(定时/手动)
- 与当前任务相关的现有节点:名称 + 类型 + 所属工作流
- 可用数据源 + 计算资源(名称、类型)——合并两个接口的返回结果
- 建议的目标工作流(如果可从任务描述推断)
主Agent根据摘要决定:目标工作流(使用现有或新建,由用户决定)、节点命名(遵循现有规范)、依赖关系(从SQL引用和现有节点推断)。
创建前冲突检查(必填,适用于所有对象类型):
- 名称重复检查:创建任何对象前,使用对应的List API检查是否已存在同名对象:
- 工作流 →
ListWorkflowDefinitions - 节点 → (节点名称在项目内全局唯一)
ListNodes - 资源 →
ListResources - 函数 →
ListFunctions - 组件 →
ListComponents
- 工作流 →
- 已有对象处理:告知用户并询问处理方式(使用现有/重命名/更新现有)。严禁直接删除已有对象
- 输出名称冲突检查(关键):节点的(格式
outputs.nodeOutputs[].data)必须在项目内全局唯一,即使是跨工作流的节点。使用${projectIdentifier}.NodeName并检查返回结果中的ListNodes --Name NodeName进行验证。如果输出名称与现有节点冲突,必须在创建前解决,否则部署会失败并返回Outputs.NodeOutputs[].Data(参考troubleshooting.md #11b)"can not exported multiple nodes into the same output"
信息确定程度决定交互方式:
- 确定信息 → 直接使用,无需询问用户
- 高置信度推断 → 继续执行,在输出中说明推断理由
- 不确定信息 → 必须询问用户
Creating Nodes
创建节点
Unified workflow: Whether in OpenAPI Mode or Git Mode, generate the same local file structure.
统一工作流:无论是OpenAPI模式还是Git模式,都生成相同的本地文件结构。
Step 1: Create the Node Directory and Three Files
步骤1:创建节点目录和三个文件
One folder = one node, containing three files:
my_node/
├── my_node.spec.json # FlowSpec node definition
├── my_node.sql # Code file (extension based on contentFormat)
└── dataworks.properties # Runtime configuration (actual values)spec.json — Copy the minimal Spec from , modify name and path, and use placeholders to reference values from properties. If the user specifies trigger, dependencies, rerunTimes, etc., add them to the spec as well.
references/nodetypes/{category}/{TYPE}.md${spec.xxx}Code file — Determine the format (sql/shell/python/json/empty) based on the in the node type documentation; determine the extension based on the field.
contentFormatextensiondataworks.properties — Fill in actual values:
properties
projectIdentifier=<actual project identifier>
spec.datasource.name=<actual datasource name>
spec.runtimeResource.resourceGroup=<actual resource group identifier>Do not fill in uncertain values — if omitted, the server automatically uses project defaults.
Reference examples:
assets/templates/一个文件夹对应一个节点,包含三个文件:
my_node/
├── my_node.spec.json # FlowSpec节点定义
├── my_node.sql # 代码文件(后缀根据contentFormat决定)
└── dataworks.properties # 运行时配置(实际值)spec.json — 从复制最小Spec模板,修改名称和路径,使用占位符引用properties中的值。如果用户指定了触发规则、依赖、重跑次数等,也添加到spec中。
references/nodetypes/{category}/{TYPE}.md${spec.xxx}代码文件 — 根据节点类型文档中的决定格式(sql/shell/python/json/空),根据字段决定文件后缀。
contentFormatextensiondataworks.properties — 填写实际值:
properties
projectIdentifier=<实际项目标识>
spec.datasource.name=<实际数据源名称>
spec.runtimeResource.resourceGroup=<实际资源组标识>不要填写不确定的值——如果省略,服务端会自动使用项目默认值。
参考示例:
assets/templates/Step 2: Submit
步骤2:提交
Default is OpenAPI (unless the user explicitly says "commit to Git"):
-
Useto merge the three files into API input:
build.pybashpython $SKILL/scripts/build.py ./my_node > /tmp/spec.jsonbuild.py does three things (no third-party dependencies; if errors occur, refer to the source code to execute manually):- Read → replace
dataworks.propertiesand${spec.xxx}placeholders in spec.json${projectIdentifier} - Read the code file → embed into
script.content - Output the merged complete JSON
- Read
-
Validate the spec before submission:bash
python $SKILL/scripts/validate.py ./my_node -
Pre-submission spec review (MANDATORY) — Before calling CreateNode, review the merged JSON against this checklist:
- matches the intended node type (check
script.runtime.command)references/nodetypes/{category}/{TYPE}.md - — Required if the node type needs a data source (see the node type doc's
datasourcefield). Check thatdatasourceTypematches an existing data source (name) or compute resource (ListDataSources), andListComputeResourcesmatches the expected engine type (e.g.,type,odps,hologres,emr). If unsure, omit and let the server use project defaultsstarrocks - — Check that the value matches an existing resource group (
runtimeResource.resourceGroup). If unsure, omit and let the server use project defaultsListResourceGroups - — For workflow nodes: omit to inherit the workflow schedule; only set when the user explicitly specifies a per-node schedule. For standalone nodes: set if the user specified a schedule
trigger - — Required for workflow nodes. Format:
outputs.nodeOutputs. Verify the output name is globally unique in the project ({"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"})ListNodes --Name - —
dependenciesmust be the current node's own name (self-reference).nodeIdmust exactly match the upstream node'sdepends[].output. Every workflow node MUST have dependencies: root nodes (no upstream) MUST depend onoutputs.nodeOutputs[].data(underscore, not dot); downstream nodes depend on upstream outputs. A workflow node with NO dependencies entry will become an orphan${projectIdentifier}_root - No invented fields — Compare against the FlowSpec Anti-Patterns table above; remove any field not documented in
references/flowspec-guide.md
-
-
Call the API to submit (refer to references/api/CreateNode.md):bash
# DataWorks 2024-05-18 API does not yet have plugin mode (kebab-case), use RPC direct invocation format (PascalCase) aliyun dataworks-public CreateNode \ --ProjectId $PROJECT_ID \ --Scene DATAWORKS_PROJECT \ --Spec "$(cat /tmp/spec.json)" \ --user-agent AlibabaCloud-Agent-SkillsNote:is in RPC direct invocation format and does not require any plugin installation. If the command is not found, check the aliyun CLI version (requires >= 3.3.1). Never downgrade to legacy kebab-case commands (aliyun dataworks-public CreateNode/create-file).create-folderSandbox fallback: Ifis blocked, use Python$(cat ...).subprocess.run(['aliyun', 'dataworks-public', 'CreateNode', '--ProjectId', str(PID), '--Scene', 'DATAWORKS_PROJECT', '--Spec', spec_str, '--user-agent', 'AlibabaCloud-Agent-Skills']) -
To place within a workflow, add
--ContainerId $WorkflowId
Git Mode (when the user explicitly requests): , DataWorks automatically syncs and replaces placeholders
git add ./my_node && git commitMinimum required fields (verified in practice, universal across all 130+ types):
- — Node name
name - — Must be set equal to
id. Ensuresnamecan match. Without explicitspec.dependencies[*].nodeId, the API may silently drop dependenciesid - — Script path, must end with the node name; the server automatically prepends the workflow prefix
script.path - — Node type (e.g., ODPS_SQL, DIDE_SHELL)
script.runtime.command
Copyable minimal node Spec (Shell node example):
json
{"version":"2.0.0","kind":"Node","spec":{"nodes":[{
"name":"my_shell_node","id":"my_shell_node",
"script":{"path":"my_shell_node","runtime":{"command":"DIDE_SHELL"},"content":"#!/bin/bash\necho hello"}
}]}}Other fields are not required; the server will automatically fill in project defaults:
- datasource, runtimeResource — If unsure, do not pass them; the server automatically binds project defaults
- trigger — If not passed, inherits the workflow schedule. Only pass when specified by the user
- dependencies, rerunTimes, etc. — Only pass when specified by the user
- outputs.nodeOutputs — Optional for standalone nodes; required for nodes within a workflow (), otherwise downstream dependencies silently fail. ⚠️ The output name (
{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"}) must be globally unique within the project — if another node (even in a different workflow) already uses the same output name, deployment will fail with "can not exported multiple nodes into the same output". Always check with${projectIdentifier}.NodeNamebefore creatingListNodes
默认使用OpenAPI模式(除非用户明确要求「提交到Git」):
-
使用将三个文件合并为API输入:
build.pybashpython $SKILL/scripts/build.py ./my_node > /tmp/spec.jsonbuild.py会完成三件事(无第三方依赖;如果报错,可参考源码手动执行):- 读取→ 替换spec.json中的
dataworks.properties和${spec.xxx}占位符${projectIdentifier} - 读取代码文件 → 嵌入到中
script.content - 输出合并后的完整JSON
- 读取
-
提交前校验spec:bash
python $SKILL/scripts/validate.py ./my_node -
提交前spec审查(必填) — 调用CreateNode前,对照以下检查清单审核合并后的JSON:
- 与预期节点类型匹配(参考
script.runtime.command)references/nodetypes/{category}/{TYPE}.md - — 如果节点类型需要数据源(参考节点类型文档的
datasource字段)则必填。检查datasourceType与现有数据源(name)或计算资源(ListDataSources)匹配,ListComputeResources与预期引擎类型匹配(例如type、odps、hologres、emr)。如果不确定,省略该字段,让服务端使用项目默认值starrocks - — 检查值与现有资源组(
runtimeResource.resourceGroup)匹配。如果不确定,省略该字段,让服务端使用项目默认值ListResourceGroups - — 工作流内节点:省略以继承工作流调度规则,仅当用户明确指定节点级调度时才设置。独立节点:如果用户指定了调度规则则设置
trigger - — 工作流内节点必填。格式:
outputs.nodeOutputs。验证输出名称在项目内全局唯一({"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"})ListNodes --Name - —
dependencies必须是当前节点自身的名称(自引用)。nodeId必须完全匹配上游节点的depends[].output。每个工作流节点必须配置依赖:根节点(无上游)必须依赖outputs.nodeOutputs[].data(下划线,不是点);下游节点依赖上游输出。没有依赖配置的工作流节点会成为孤立节点${projectIdentifier}_root - 无虚构字段 — 对照上方的FlowSpec反模式表,删除所有未在中记录的字段
references/flowspec-guide.md
-
-
调用API提交(参考references/api/CreateNode.md):bash
# DataWorks 2024-05-18 API 暂不支持插件模式(短横线命名),使用PascalCase RPC直接调用格式 aliyun dataworks-public CreateNode \ --ProjectId $PROJECT_ID \ --Scene DATAWORKS_PROJECT \ --Spec "$(cat /tmp/spec.json)" \ --user-agent AlibabaCloud-Agent-Skills注意:为RPC直接调用格式,无需安装任何插件。如果提示命令不存在,检查阿里云CLI版本(要求 >= 3.3.1)。严禁回退使用旧版短横线命名命令(aliyun dataworks-public CreateNode/create-file)。create-folder沙箱兼容方案:如果被禁用,使用Python$(cat ...)。subprocess.run(['aliyun', 'dataworks-public', 'CreateNode', '--ProjectId', str(PID), '--Scene', 'DATAWORKS_PROJECT', '--Spec', spec_str, '--user-agent', 'AlibabaCloud-Agent-Skills']) -
如果要放到工作流中,添加参数
--ContainerId $WorkflowId
Git模式(用户明确要求时使用):执行,DataWorks会自动同步并替换占位符。
git add ./my_node && git commit必填最小字段(经实际验证,适用于所有130+种节点类型):
- — 节点名称
name - — 必须与
id值相同,确保name可以匹配。如果未显式指定spec.dependencies[*].nodeId,API可能会静默丢弃依赖配置id - — 脚本路径,必须以节点名称结尾;服务端会自动添加工作流前缀
script.path - — 节点类型(例如ODPS_SQL、DIDE_SHELL)
script.runtime.command
可复制的最小节点Spec(Shell节点示例):
json
{"version":"2.0.0","kind":"Node","spec":{"nodes":[{
"name":"my_shell_node","id":"my_shell_node",
"script":{"path":"my_shell_node","runtime":{"command":"DIDE_SHELL"},"content":"#!/bin/bash\necho hello"}
}]}}其他字段非必填,服务端会自动填充项目默认值:
- datasource、runtimeResource — 如果不确定,不要传;服务端会自动绑定项目默认值
- trigger — 如果不传,继承工作流调度规则,仅当用户指定时才传
- dependencies、rerunTimes等 — 仅当用户指定时才传
- outputs.nodeOutputs — 独立节点可选;工作流内节点必填(),否则下游依赖会静默失败。⚠️ 输出名称(
{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"})必须在项目内全局唯一——如果其他节点(即使在不同工作流中)已使用相同的输出名称,部署会失败并提示「can not exported multiple nodes into the same output」。创建前务必使用${projectIdentifier}.NodeName检查ListNodes
Workflow and Node Relationship
工作流与节点关系
Project
└── Workflow ← Container, unified scheduling management
├── Node A ← Minimum execution unit
├── Node B (depends A)
└── Node C (depends B)- A workflow is the container and scheduling unit for nodes, with its own trigger and strategy
- Nodes can exist independently at the root level or belong to a workflow (user decides)
- The workflow's is always
script.runtime.command"WORKFLOW" - Dependency configuration for nodes within a workflow: only maintain dependencies in the array (do NOT dual-write
spec.dependencies). ⚠️inputs.nodeOutputsis a self-reference — it must match the current node's ownspec.dependencies[*].nodeId(the node that HAS the dependency), NOT the upstream node's name or ID.nameis the upstream node's output identifier (depends[].output). Upstream nodes must declare${projectIdentifier}.UpstreamNodeNameoutputs.nodeOutputs
Project
└── Workflow ← 容器,统一调度管理
├── Node A ← 最小执行单元
├── Node B (依赖A)
└── Node C (依赖B)- 工作流是节点的容器和调度单元,有自身的触发规则和策略
- 节点可以独立存在于根层级,也可以属于某个工作流(由用户决定)
- 工作流的固定为
script.runtime.command"WORKFLOW" - 工作流内节点的依赖配置:仅在数组中维护依赖(不要同时写
spec.dependencies)。⚠️inputs.nodeOutputs是自引用——必须匹配当前节点自身的spec.dependencies[*].nodeId(即配置依赖的节点本身),而不是上游节点的名称或ID。name是上游节点的输出标识(depends[].output)。上游节点必须声明${projectIdentifier}.UpstreamNodeNameoutputs.nodeOutputs
Creating Workflows
创建工作流
- Create the workflow definition (minimal spec):
Calljson
{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{ "name":"workflow_name","script":{"path":"workflow_name","runtime":{"command":"WORKFLOW"}} }]}}→ returns WorkflowIdCreateWorkflowDefinition - Create nodes in dependency order (each node passes )
ContainerId=WorkflowId- Before each node: Check that is not already used as an output by any existing node in the project (use
${projectIdentifier}.NodeNamewithListNodesand inspect--Name). Duplicate output names cause deployment failureOutputs.NodeOutputs[].Data - Each node's spec must include :
outputs.nodeOutputs{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"} - Downstream nodes declare dependencies in :
spec.dependencies= current node's own name (self-reference),nodeId= upstream node's output (see workflow-guide.md)depends[].output
- Before each node: Check that
- Verify dependencies (MANDATORY after all nodes created) — For each downstream node, call . If
ListNodeDependencies --Id <NodeID>isTotalCountbut the node should have upstream dependencies, the CreateNode API silently dropped them. Fix immediately with0usingUpdateNode(see "Updating dependencies" below). Do NOT proceed to deploy until all dependencies are confirmedspec.dependencies - Set the schedule — with
UpdateWorkflowDefinition(if the user specified a schedule)trigger - Deploy online (REQUIRED) — → poll
CreatePipelineRun(Type=Online, ObjectIds=[WorkflowId])→ advance stages withGetPipelineRun. A workflow is NOT active until deployed. Do not skip this step or tell the user to do it manually.ExecPipelineRunStage
Detailed guide and copyable complete node Spec examples (including outputs and dependencies): references/workflow-guide.md
- 创建工作流定义(最小Spec):
调用json
{"version":"2.0.0","kind":"CycleWorkflow","spec":{"workflows":[{ "name":"workflow_name","script":{"path":"workflow_name","runtime":{"command":"WORKFLOW"}} }]}}→ 返回WorkflowIdCreateWorkflowDefinition - 按依赖顺序创建节点(每个节点传入)
ContainerId=WorkflowId- 创建每个节点前:检查是否已被项目内其他节点用作输出(使用
${projectIdentifier}.NodeName加ListNodes参数,检查--Name)。重复的输出名称会导致部署失败Outputs.NodeOutputs[].Data - 每个节点的spec 必须包含 :
outputs.nodeOutputs{"data":"${projectIdentifier}.NodeName","artifactType":"NodeOutput"} - 下游节点在中声明依赖:
spec.dependencies= 当前节点自身的名称(自引用),nodeId= 上游节点的输出(参考workflow-guide.md)depends[].output
- 创建每个节点前:检查
- 验证依赖(所有节点创建后必填) — 对每个下游节点,调用。如果
ListNodeDependencies --Id <NodeID>为TotalCount但节点应该有上游依赖,说明CreateNode API静默丢弃了依赖配置。立即使用0配合UpdateNode修复(参考下方「更新依赖」部分)。确认所有依赖正常前,不要继续部署spec.dependencies - 设置调度规则 — 如果用户指定了调度规则,调用添加
UpdateWorkflowDefinition配置trigger - 上线部署(必填) — 调用→ 轮询
CreatePipelineRun(Type=Online, ObjectIds=[WorkflowId])→ 使用GetPipelineRun推进执行阶段。只有部署后工作流才会生效,不要跳过此步骤或让用户手动执行。ExecPipelineRunStage
详细指南和可复制的完整节点Spec示例(包含输出和依赖):references/workflow-guide.md
Updating Existing Nodes
更新现有节点
Must use incremental updates — only pass the node id + fields to modify:
json
{"version":"2.0.0","kind":"Node","spec":{"nodes":[{
"id":"NodeID",
"script":{"content":"new code"}
}]}}⚠️ Critical: UpdateNode always uses, even if the node belongs to a workflow. Do NOT use"kind":"Node"— that is only for workflow-level operations ("kind":"CycleWorkflow").UpdateWorkflowDefinition
Do not pass unchanged fields like datasource or runtimeResource (the server may have corrected values; passing them back can cause errors).
⚠️ Updating dependencies: To fix or change a node's dependencies via UpdateNode, use— NEVER usespec.dependencies. Example:inputs.nodeOutputsjson{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"id":"NodeID"}],"dependencies":[{"nodeId":"current_node_name","depends":[{"type":"Normal","output":"project.upstream_node"}]}]}}
必须使用增量更新 — 仅传入节点id + 需要修改的字段:
json
{"version":"2.0.0","kind":"Node","spec":{"nodes":[{
"id":"NodeID",
"script":{"content":"new code"}
}]}}⚠️ 关键提示:UpdateNode 始终使用,即使节点属于某个工作流。不要使用"kind":"Node"——该值仅用于工作流级操作("kind":"CycleWorkflow")。UpdateWorkflowDefinition
不要传入未修改的字段,例如datasource或runtimeResource(服务端可能已有修正后的值,回传会导致错误)。
⚠️ 更新依赖:要通过UpdateNode修复或修改节点的依赖,请使用——严禁使用spec.dependencies。示例:inputs.nodeOutputsjson{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"id":"NodeID"}],"dependencies":[{"nodeId":"current_node_name","depends":[{"type":"Normal","output":"project.upstream_node"}]}]}}
Update + Republish Workflow
更新 + 重新发布工作流
Complete end-to-end flow for modifying an existing node and deploying the change:
- Find the node — → get Node ID
ListNodes(Name=xxx) - Update the node — with incremental spec (
UpdateNode, onlykind:Node+ changed fields)id - Publish — → poll
CreatePipelineRun(type=Online, object_ids=[NodeID])→ advance stages withGetPipelineRunExecPipelineRunStage
bash
undefined修改现有节点并部署变更的完整端到端流程:
- 查找节点 — 执行→ 获取节点ID
ListNodes(Name=xxx) - 更新节点 — 使用增量Spec调用(
UpdateNode,仅包含kind:Node+ 变更字段)id - 发布 — 调用→ 轮询
CreatePipelineRun(type=Online, object_ids=[NodeID])→ 使用GetPipelineRun推进执行阶段ExecPipelineRunStage
bash
undefinedStep 1: Find the node
Step 1: Find the node
aliyun dataworks-public ListNodes --ProjectId $PID --Name "my_node" --user-agent AlibabaCloud-Agent-Skills
aliyun dataworks-public ListNodes --ProjectId $PID --Name "my_node" --user-agent AlibabaCloud-Agent-Skills
→ Note the node Id from the response
→ Note the node Id from the response
Step 2: Update (incremental — only id + changed fields)
Step 2: Update (incremental — only id + changed fields)
aliyun dataworks-public UpdateNode --ProjectId $PID --Id $NODE_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"id":"'$NODE_ID'","script":{"content":"SELECT 1;"}}]}}'
--user-agent AlibabaCloud-Agent-Skills
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"id":"'$NODE_ID'","script":{"content":"SELECT 1;"}}]}}'
--user-agent AlibabaCloud-Agent-Skills
aliyun dataworks-public UpdateNode --ProjectId $PID --Id $NODE_ID
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"id":"'$NODE_ID'","script":{"content":"SELECT 1;"}}]}}'
--user-agent AlibabaCloud-Agent-Skills
--Spec '{"version":"2.0.0","kind":"Node","spec":{"nodes":[{"id":"'$NODE_ID'","script":{"content":"SELECT 1;"}}]}}'
--user-agent AlibabaCloud-Agent-Skills
Step 3: Publish (see "Publishing and Deploying" below)
Step 3: Publish (see "Publishing and Deploying" below)
aliyun dataworks-public CreatePipelineRun --ProjectId $PID
--PipelineRunParam '{"type":"Online","objectIds":["'$NODE_ID'"]}'
--user-agent AlibabaCloud-Agent-Skills
--PipelineRunParam '{"type":"Online","objectIds":["'$NODE_ID'"]}'
--user-agent AlibabaCloud-Agent-Skills
> **Common wrong paths after UpdateNode** (all prohibited):
> - ❌ `DeployFile` / `SubmitFile` — legacy APIs, will fail or behave unexpectedly
> - ❌ `ImportWorkflowDefinition` — for initial bulk import only, not for updating or publishing
> - ❌ `ListFiles` / `GetFile` — legacy file model, use `ListNodes` / `GetNode` instead
> - ✅ `CreatePipelineRun` → `GetPipelineRun` → `ExecPipelineRunStage`aliyun dataworks-public CreatePipelineRun --ProjectId $PID
--PipelineRunParam '{"type":"Online","objectIds":["'$NODE_ID'"]}'
--user-agent AlibabaCloud-Agent-Skills
--PipelineRunParam '{"type":"Online","objectIds":["'$NODE_ID'"]}'
--user-agent AlibabaCloud-Agent-Skills
> **UpdateNode后的常见错误路径(全部禁止)**:
> - ❌ `DeployFile` / `SubmitFile` — 旧版API,会失败或出现非预期行为
> - ❌ `ImportWorkflowDefinition` — 仅用于初始批量导入,不适用更新或发布
> - ❌ `ListFiles` / `GetFile` — 旧版文件模型,使用`ListNodes` / `GetNode`替代
> - ✅ 正确流程:`CreatePipelineRun` → `GetPipelineRun` → `ExecPipelineRunStage`Publishing and Deploying
发布与部署
⚠️ NEVER use,DeployFile,SubmitFile,ListDeploymentPackages,GetDeploymentPackage, orListFilesfor deployment. These are all legacy APIs. Use ONLY:GetFile→CreatePipelineRun→GetPipelineRun.ExecPipelineRunStage
Publishing is an asynchronous multi-stage pipeline:
- → get PipelineRunId
CreatePipelineRun(Type=Online, ObjectIds=[ID]) - Poll → check
GetPipelineRunandPipeline.StatusPipeline.Stages - When a Stage has status and all preceding Stages are
Init→ callSuccessto advanceExecPipelineRunStage(Code=Stage.Code) - Until the Pipeline overall status becomes /
SuccessFail
Key point: The Build stage runs automatically, but the Check and Deploy stages must be manually advanced. Detailed CLI examples and polling scripts are in references/deploy-guide.md.
CLI Note: TheCLI returns JSON with the top-level keyaliyun(not SDK'sPipeline); Stages are inresp.body.pipeline.Pipeline.Stages
⚠️ 严禁使用、DeployFile、SubmitFile、ListDeploymentPackages、GetDeploymentPackage或ListFiles进行部署,这些都是旧版API。仅允许使用:GetFile→CreatePipelineRun→GetPipelineRun。ExecPipelineRunStage
发布是异步多阶段流程:
- 调用→ 获取PipelineRunId
CreatePipelineRun(Type=Online, ObjectIds=[ID]) - 轮询→ 检查
GetPipelineRun和Pipeline.StatusPipeline.Stages - 当某个Stage状态为且所有前置Stage都为
Init时 → 调用Success推进执行ExecPipelineRunStage(Code=Stage.Code) - 直到Pipeline整体状态变为/
SuccessFail
关键说明:Build阶段会自动运行,但Check和Deploy阶段需要手动推进。完整的CLI示例和轮询脚本参考references/deploy-guide.md。
CLI说明:CLI返回的JSON顶层key为aliyun(而非SDK的Pipeline);阶段信息在resp.body.pipeline中。Pipeline.Stages
Common Node Types
常见节点类型
| Use Case | command | contentFormat | Extension | datasource |
|---|---|---|---|---|
| Shell script | DIDE_SHELL | shell | .sh | — |
| MaxCompute SQL | ODPS_SQL | sql | .sql | odps |
| Python script | PYTHON | python | .py | — |
| Offline data sync | DI | json | .json | — |
| Hologres SQL | HOLOGRES_SQL | sql | .sql | hologres |
| Flink streaming SQL | FLINK_SQL_STREAM | sql | .json | flink |
| Flink batch SQL | FLINK_SQL_BATCH | sql | .json | flink |
| EMR Hive | EMR_HIVE | sql | .sql | emr |
| EMR Spark SQL | EMR_SPARK_SQL | sql | .sql | emr |
| Serverless Spark SQL | SERVERLESS_SPARK_SQL | sql | .sql | emr |
| StarRocks SQL | StarRocks | sql | .sql | starrocks |
| ClickHouse SQL | CLICK_SQL | sql | .sql | clickhouse |
| Virtual node | VIRTUAL | empty | .vi | — |
Complete list (130+ types): references/nodetypes/index.md (searchable by command name, description, and category, with links to detailed documentation for each type)
When you cannot find a node type:
- Check and match by keyword
references/nodetypes/index.md - to locate the documentation directly
Glob("**/{keyword}*.md", path="references/nodetypes") - Use the API to get the spec of a similar node from the live environment as a reference
GetNode - If none of the above works → fall back to and use command-line tools within the Shell to accomplish the task
DIDE_SHELL
| 适用场景 | command | contentFormat | 后缀 | datasource |
|---|---|---|---|---|
| Shell脚本 | DIDE_SHELL | shell | .sh | — |
| MaxCompute SQL | ODPS_SQL | sql | .sql | odps |
| Python脚本 | PYTHON | python | .py | — |
| 离线数据同步 | DI | json | .json | — |
| Hologres SQL | HOLOGRES_SQL | sql | .sql | hologres |
| Flink流处理SQL | FLINK_SQL_STREAM | sql | .json | flink |
| Flink批处理SQL | FLINK_SQL_BATCH | sql | .json | flink |
| EMR Hive | EMR_HIVE | sql | .sql | emr |
| EMR Spark SQL | EMR_SPARK_SQL | sql | .sql | emr |
| Serverless Spark SQL | SERVERLESS_SPARK_SQL | sql | .sql | emr |
| StarRocks SQL | StarRocks | sql | .sql | starrocks |
| ClickHouse SQL | CLICK_SQL | sql | .sql | clickhouse |
| 虚拟节点 | VIRTUAL | empty | .vi | — |
完整列表(130+种类型):references/nodetypes/index.md(支持按命令名称、描述、分类搜索,包含每种类型的详细文档链接)
找不到对应节点类型时:
- 检查,按关键词匹配
references/nodetypes/index.md - 使用直接定位文档
Glob("**/{keyword}*.md", path="references/nodetypes") - 使用API从现有环境获取相似节点的Spec作为参考
GetNode - 以上方式都无效时 → fallback到,在Shell中使用命令行工具完成任务
DIDE_SHELL
Key Constraints
关键约束
- script.path is required: Script path, must end with the node name. When creating, you can pass just the node name; the server automatically prepends the workflow prefix
- Dependencies are configured via (do NOT dual-write
spec.dependencies): Ininputs.nodeOutputs,spec.dependenciesis a self-reference — it must be the current node's ownnodeId(the node being created), NOT the upstream node.nameis the upstream node's output (depends[].output). The upstream's${projectIdentifier}.UpstreamNodeNameand downstream'soutputs.nodeOutputs[].datamust be character-for-character identical. Upstream nodes must declaredepends[].output. ⚠️ Output names (outputs.nodeOutputs) must be globally unique within the project — duplicates cause deployment failure${projectIdentifier}.NodeName - Immutable properties: A node's (node type) cannot be changed after creation; if incorrect, inform the user and suggest creating a new node with the correct type
command - Updates must be incremental: Only pass id + fields to modify; do not pass unchanged fields like datasource/runtimeResource
- datasource.type may be corrected by the server: e.g., →
flink; use the generic type when creatingflink_serverless - Nodes can exist independently: Nodes can be created at the root level (without passing ContainerId) or belong to a workflow (pass ContainerId=WorkflowId). Whether to place in a workflow is the user's decision
- Workflow command is always WORKFLOW: must be
script.runtime.command"WORKFLOW" - Deletion is not supported by this skill: This skill does not provide any delete operations. When creation or publishing fails, never attempt to "fix" the problem by deleting existing objects. Correct approach: diagnose the failure cause → inform the user of the specific conflict → let the user decide how to handle it (rename / update existing)
- Name conflict check is required before creation: Before calling any Create API, use the corresponding List API to confirm the name is not duplicated (see "Environment Discovery"). Name conflicts will cause creation failure; duplicate node output names () will cause dependency errors or publishing failure
outputs.nodeOutputs[].data - Mutating operations require user confirmation: Except for Create and read-only queries (Get/List), all OpenAPI operations that modify existing objects (Update, Move, Rename, etc.) must be shown to the user with explicit confirmation obtained before execution. Confirmation information should include: operation type, target object name/ID, and key changes. These APIs must not be called before user confirmation. Delete and Abolish operations are not supported by this skill
- Use only 2024-05-18 version APIs: All APIs in this skill are DataWorks 2024-05-18 version. Legacy APIs (,
create-file,create-folder, etc.) are prohibited. If an API call returns an error, first check troubleshooting.md; do not fall back to legacy APIsCreateFlowProject - Stop on errors instead of brute-force retrying: If the same error code appears more than 2 consecutive times, the approach is wrong. Stop and analyze the error cause (check troubleshooting.md) instead of repeatedly retrying the same incorrect API with different parameters. Never fall back to legacy APIs (,
create-file, etc.) when a new API fails — review the FlowSpec Anti-Patterns table at the top of this document instead. Specific trap: Ifcreate-businessoutput mentions "Plugin available but not installed" for dataworks-public, do NOT install the plugin — this leads to using deprecated kebab-case APIs. Instead, use PascalCase RPC directly (e.g.,aliyun help)aliyun dataworks-public CreateNode - CLI parameter names must be checked in documentation, guessing is prohibited: Before calling an API, you must first check to confirm parameter names. Common mistakes:
references/api/{APIName}.md's ID parameter isGetProject(not--Id);--ProjectIdrequiresUpdateNode. When unsure, verify with--Idaliyun dataworks-public {APIName} --help - PascalCase RPC only, no kebab-case: CLI commands must use (PascalCase), never
aliyun dataworks-public CreateNode(kebab-case). No plugin installation is needed. If the command is not found, upgradealiyun dataworks-public create-nodeCLI to >= 3.3.1aliyun - No wrapper scripts: Run each CLI command directly in the shell. Never create
aliyun/.shwrapper scripts to batch multiple API calls — this obscures errors and makes debugging impossible. Execute one API call at a time, check the response, then proceed.py - API response = success, not file output: Writing JSON spec files to disk is a preparation step, not completion. The task is complete only when the CLI returns a success response with a valid
aliyun. If the API call fails, fix the spec and retry — do not declare the task done by saving local filesId - On error: re-read the Quick Start, do not invent new approaches: When an API call fails, compare your spec against the exact Quick Start example at the top of this document field by field. The most common cause is an invented FlowSpec field that does not exist. Copy the working example and modify only the values you need to change
- Idempotency protection for write operations: DataWorks 2024-05-18 Create APIs (,
CreateNode,CreateWorkflowDefinition, etc.) do not support aCreatePipelineRunparameter. To prevent duplicate resource creation on network retries or timeouts:ClientToken- Before creating: Always run the pre-creation conflict check (List API) as described in "Environment Discovery" — this is the primary idempotency gate
- After a network error or timeout on Create: Do NOT blindly retry. First call the corresponding List/Get API to check whether the resource was actually created (the server may have processed the request despite the client-side error). Only retry if the resource does not exist
- Record RequestId: Every API response includes a field. Log it so that duplicate-creation incidents can be traced and resolved via Alibaba Cloud support
RequestId
- script.path为必填字段:脚本路径必须以节点名称结尾。创建时可仅传入节点名称,服务端会自动添加工作流前缀
- 依赖通过配置(不要同时写
spec.dependencies):在inputs.nodeOutputs中,spec.dependencies是自引用——必须是当前创建节点自身的nodeId,而非上游节点的名称。name是上游节点的输出(depends[].output)。上游的${projectIdentifier}.UpstreamNodeName与下游的outputs.nodeOutputs[].data必须完全一致。上游节点必须声明depends[].output。⚠️ 输出名称(outputs.nodeOutputs)必须在项目内全局唯一——重复会导致部署失败${projectIdentifier}.NodeName - 不可变属性:节点的(节点类型)创建后不可修改,如果类型错误,告知用户并建议创建正确类型的新节点
command - 更新必须为增量更新:仅传入id + 需要修改的字段;不要传入未修改的字段,例如datasource/runtimeResource
- datasource.type可能被服务端修正:例如→
flink;创建时使用通用类型即可flink_serverless - 节点可独立存在:节点可以创建在根层级(不传入ContainerId),也可以属于某个工作流(传入ContainerId=WorkflowId)。是否放入工作流由用户决定
- 工作流的command固定为WORKFLOW:必须为
script.runtime.command"WORKFLOW" - 本Skill不支持删除操作:本Skill不提供任何删除能力。当创建或发布失败时,严禁通过删除现有对象「修复」问题。正确做法:诊断失败原因 → 告知用户具体冲突 → 让用户决定处理方式(重命名/更新现有对象)
- 创建前必须检查名称冲突:调用任何Create API前,使用对应的List API确认名称未重复(参考「环境探查」部分)。名称冲突会导致创建失败;重复的节点输出名称()会导致依赖错误或发布失败
outputs.nodeOutputs[].data - 变更操作需要用户确认:除了创建和只读查询(Get/List)外,所有修改现有对象的OpenAPI操作(Update、Move、Rename等)必须向用户展示操作信息并获得明确确认后再执行。确认信息需包含:操作类型、目标对象名称/ID、核心变更内容。未获得用户确认前不得调用这些API。本Skill不支持删除和废弃操作
- 仅使用2024-05-18版本API:本Skill中的所有API都是2024-05-18版本的DataWorks API,严禁使用旧版API(、
create-file、create-folder等)。如果API调用返回错误,首先查看troubleshooting.md;不要回退使用旧版APICreateFlowProject - 遇到错误停止操作,不要暴力重试:如果同一错误码连续出现2次以上,说明操作方式错误。停止操作分析错误原因(查看troubleshooting.md),不要使用不同参数反复重试同一个错误的API。新API调用失败时严禁回退使用旧版API(、
create-file等),请重新查看本文档顶部的FlowSpec反模式表。特殊陷阱:如果create-business输出提示dataworks-public「有可用插件但未安装」,不要安装插件——这会导致使用已弃用的短横线命名API。请直接使用PascalCase RPC调用(例如aliyun help)aliyun dataworks-public CreateNode - CLI参数名称必须查阅文档确认,严禁猜测:调用API前,必须先查看确认参数名称。常见错误:
references/api/{APIName}.md的ID参数是GetProject(不是--Id);--ProjectId需要UpdateNode参数。如果不确定,使用--Id验证aliyun dataworks-public {APIName} --help - 仅使用PascalCase RPC,禁止短横线命名:CLI命令必须使用(PascalCase格式),严禁使用
aliyun dataworks-public CreateNode(短横线格式)。无需安装插件。如果提示命令不存在,将aliyun dataworks-public create-nodeCLI升级到 >= 3.3.1版本aliyun - 禁止使用封装脚本:直接在Shell中执行每条CLI命令。严禁创建
aliyun/.sh封装脚本批量调用多个API——这会隐藏错误,导致无法调试。每次执行一个API调用,检查响应后再继续.py - API响应成功才算完成,文件输出不算:将JSON Spec文件写入磁盘是准备步骤,不是任务完成。只有当CLI返回包含有效
aliyun的成功响应时,任务才算完成。如果API调用失败,修复Spec并重试——不要以本地保存了文件为由宣称任务完成Id - 遇到错误重新阅读快速入门,不要发明新方法:API调用失败时,逐字段对照本文档顶部的快速入门示例检查你的Spec。最常见的错误原因是使用了不存在的FlowSpec字段。复制可用的示例,仅修改你需要变更的值即可
- 写操作的幂等性保护:2024-05-18版本的DataWorks Create API(、
CreateNode、CreateWorkflowDefinition等)不支持CreatePipelineRun参数。为防止网络重试或超时导致重复创建资源:ClientToken- 创建前:始终执行「环境探查」部分描述的创建前冲突检查(List API)——这是主要的幂等性防护
- Create操作出现网络错误或超时后:不要盲目重试。首先调用对应的List/Get API检查资源是否已实际创建(即使客户端报错,服务端可能已经处理了请求)。仅当资源不存在时再重试
- 记录RequestId:每个API响应都包含字段,记录该值,以便出现重复创建问题时可以通过阿里云支持排查解决
RequestId
API Quick Reference
API快速参考
API Version: All APIs listed below are DataWorks 2024-05-18 version. CLI invocation format:(PascalCase RPC direct invocation; DataWorks 2024-05-18 does not yet have plugin mode). Only use the APIs listed in the table below; do not search for or use other DataWorks APIs.aliyun dataworks-public {APIName} --Parameter --user-agent AlibabaCloud-Agent-Skills
Detailed parameters and code templates for each API are in . If a call returns an error, you can get the latest definition from .
references/api/{APIName}.mdhttps://api.aliyun.com/meta/v1/products/dataworks-public/versions/2024-05-18/apis/{APIName}/api.jsonAPI版本:下方列出的所有API都是2024-05-18版本的DataWorks API。CLI调用格式:(PascalCase RPC直接调用;2024-05-18版本的DataWorks暂不支持插件模式)。仅允许使用下表列出的API,不要搜索或使用其他DataWorks API。aliyun dataworks-public {APIName} --Parameter --user-agent AlibabaCloud-Agent-Skills
每个API的详细参数和代码模板参考。如果调用返回错误,可以从获取最新定义。
references/api/{APIName}.mdhttps://api.aliyun.com/meta/v1/products/dataworks-public/versions/2024-05-18/apis/{APIName}/api.jsonComponents
组件
| API | Description |
|---|---|
| CreateComponent | Create a component |
| GetComponent | Get component details |
| UpdateComponent | Update a component |
| ListComponents | List components |
| API | 描述 |
|---|---|
| CreateComponent | 创建组件 |
| GetComponent | 获取组件详情 |
| UpdateComponent | 更新组件 |
| ListComponents | 列出组件 |
Nodes
节点
| API | Description |
|---|---|
| CreateNode | Create a data development node. project_id + scene + spec, optional container_id |
| UpdateNode | Update node information. Incremental update, only pass id + fields to change |
| MoveNode | Move a node to a specified path |
| RenameNode | Rename a node |
| GetNode | Get node details, returns the complete spec |
| ListNodes | List nodes, supports filtering by workflow |
| ListNodeDependencies | List a node's dependency nodes |
| API | 描述 |
|---|---|
| CreateNode | 创建数据开发节点。参数:project_id + scene + spec,可选container_id |
| UpdateNode | 更新节点信息。增量更新,仅传入id + 需要变更的字段 |
| MoveNode | 将节点移动到指定路径 |
| RenameNode | 重命名节点 |
| GetNode | 获取节点详情,返回完整Spec |
| ListNodes | 列出节点,支持按工作流筛选 |
| ListNodeDependencies | 列出节点的依赖节点 |
Workflow Definitions
工作流定义
| API | Description |
|---|---|
| CreateWorkflowDefinition | Create a workflow. project_id + spec |
| ImportWorkflowDefinition | Import a workflow (initial bulk import ONLY — do NOT use for updates or publishing; use |
| UpdateWorkflowDefinition | Update workflow information, incremental update |
| MoveWorkflowDefinition | Move a workflow to a target path |
| RenameWorkflowDefinition | Rename a workflow |
| GetWorkflowDefinition | Get workflow details |
| ListWorkflowDefinitions | List workflows, filter by type |
| API | 描述 |
|---|---|
| CreateWorkflowDefinition | 创建工作流。参数:project_id + spec |
| ImportWorkflowDefinition | 导入工作流(仅用于初始批量导入——不要用于更新或发布;更新/发布请使用 |
| UpdateWorkflowDefinition | 更新工作流信息,增量更新 |
| MoveWorkflowDefinition | 将工作流移动到目标路径 |
| RenameWorkflowDefinition | 重命名工作流 |
| GetWorkflowDefinition | 获取工作流详情 |
| ListWorkflowDefinitions | 列出工作流,支持按类型筛选 |
Resources
资源
| API | Description |
|---|---|
| CreateResource | Create a file resource |
| UpdateResource | Update file resource information, incremental update |
| MoveResource | Move a file resource to a specified directory |
| RenameResource | Rename a file resource |
| GetResource | Get file resource details |
| ListResources | List file resources |
| API | 描述 |
|---|---|
| CreateResource | 创建文件资源 |
| UpdateResource | 更新文件资源信息,增量更新 |
| MoveResource | 将文件资源移动到指定目录 |
| RenameResource | 重命名文件资源 |
| GetResource | 获取文件资源详情 |
| ListResources | 列出文件资源 |
Functions
函数
| API | Description |
|---|---|
| CreateFunction | Create a UDF function |
| UpdateFunction | Update UDF function information, incremental update |
| MoveFunction | Move a function to a target path |
| RenameFunction | Rename a function |
| GetFunction | Get function details |
| ListFunctions | List functions |
| API | 描述 |
|---|---|
| CreateFunction | 创建UDF函数 |
| UpdateFunction | 更新UDF函数信息,增量更新 |
| MoveFunction | 将函数移动到目标路径 |
| RenameFunction | 重命名函数 |
| GetFunction | 获取函数详情 |
| ListFunctions | 列出函数 |
Publishing Pipeline
发布流水线
| API | Description |
|---|---|
| CreatePipelineRun | Create a publishing pipeline. type=Online/Offline |
| ExecPipelineRunStage | Execute a specified stage of the publishing pipeline, async requires polling |
| GetPipelineRun | Get publishing pipeline details, returns Stages status |
| ListPipelineRuns | List publishing pipelines |
| ListPipelineRunItems | Get publishing content |
| API | 描述 |
|---|---|
| CreatePipelineRun | 创建发布流水线。type=Online/Offline |
| ExecPipelineRunStage | 执行发布流水线的指定阶段,异步操作需要轮询 |
| GetPipelineRun | 获取发布流水线详情,返回阶段状态 |
| ListPipelineRuns | 列出发布流水线 |
| ListPipelineRunItems | 获取发布内容 |
Auxiliary Queries
辅助查询
| API | Description |
|---|---|
| GetProject | Get projectIdentifier by id |
| ListDataSources | List data sources |
| ListComputeResources | List compute engine bindings (EMR, Hologres, StarRocks, etc.) — supplements ListDataSources |
| ListResourceGroups | List resource groups |
| API | 描述 |
|---|---|
| GetProject | 通过ID获取projectIdentifier |
| ListDataSources | 列出数据源 |
| ListComputeResources | 列出计算引擎绑定(EMR、Hologres、StarRocks等)——补充ListDataSources的返回结果 |
| ListResourceGroups | 列出资源组 |
Reference Documentation
参考文档
| Scenario | Document |
|---|---|
| Complete list of APIs and CLI commands | references/related-apis.md |
| RAM permission policy configuration | references/ram-policies.md |
| Operation verification methods | references/verification-method.md |
| Acceptance criteria and test cases | references/acceptance-criteria.md |
| CLI installation and configuration guide | references/cli-installation-guide.md |
| Node type index (130+ types) | references/nodetypes/index.md |
| FlowSpec field reference | references/flowspec-guide.md |
| Workflow development | references/workflow-guide.md |
| Scheduling configuration | references/scheduling-guide.md |
| Publishing and unpublishing | references/deploy-guide.md |
| DI data integration | references/di-guide.md |
| Troubleshooting | references/troubleshooting.md |
| Complete examples | assets/templates/README.md |
| 适用场景 | 文档路径 |
|---|---|
| API和CLI命令完整列表 | references/related-apis.md |
| RAM权限策略配置 | references/ram-policies.md |
| 操作验证方法 | references/verification-method.md |
| 验收标准和测试用例 | references/acceptance-criteria.md |
| CLI安装和配置指南 | references/cli-installation-guide.md |
| 节点类型索引(130+种类型) | references/nodetypes/index.md |
| FlowSpec字段参考 | references/flowspec-guide.md |
| 工作流开发 | references/workflow-guide.md |
| 调度配置 | references/scheduling-guide.md |
| 发布与下线 | references/deploy-guide.md |
| DI数据集成 | references/di-guide.md |
| 问题排查 | references/troubleshooting.md |
| 完整示例 | assets/templates/README.md |