physical-ai-video-data-augmentation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Physical AI Video Data Augmentation Workflow Orchestrator

Physical AI 视频数据增强工作流编排器

Default workflow skill for VDA execution on OSMO. It owns flow selection, preflight, cache readiness, inference-path decisions, submit-time interpolation, monitoring, and output retrieval. Component skills are consult-only.

这是在OSMO上执行VDA的默认工作流Skill。它负责流程选择、预检、缓存就绪、推理路径决策、提交时插值、监控以及输出检索。组件Skill仅用于咨询。

Purpose

用途

Run the end-to-end VDA workflow safely and reproducibly from preflight to output download.

Do NOT use this skill for container-internal tuning-only questions.

安全且可复现地运行从预检到输出下载的端到端VDA工作流。

请勿将此Skill用于仅容器内部调优的问题。

Prerequisites

前提条件

Confirm these before running preflight or any submit. Missing required secrets surface as

USER_INPUT_REQUIRED:

from

scripts/preflight_credentials.sh

Requirement	How it is satisfied	Used for
NGC API key (optional)	`NGC_API_KEY` , `NGC_CLI_API_KEY` , or compatible `nvapi-*` token in `NVIDIA_API_KEY` / `OPENAI_API_KEY` / `VLM_API_KEY` / `LLM_API_KEY`	Optional for `nvcr_io` credential refresh and NGC REST scope probe; default VDA image refs are validated via workflow registry probes
Hugging Face token	`HF_TOKEN` (or `HUGGING_FACE_HUB_TOKEN` ), or a cached token at `~/.cache/huggingface/token`	Creates the OSMO `hf_token` credential; pulls gated Cosmos/SeedVR weights
OSMO CLI access	`osmo` on `PATH` , logged in, with a default profile and a registered DATA credential profile matching `storage_url`	Submitting/monitoring workflows and listing/downloading objects
GPU pool	At least one `ONLINE` pool in `osmo pool list --mode free` ; `POD_TEMPLATE` carries GPU toleration/selectors	Scheduling setup + worker tasks

Optional (only for the strict NGC org/team probe):

NGC_ORG

NGC_TEAM

(or

NGC_CLI_ORG

NGC_CLI_TEAM

). External VLM/LLM endpoint keys are validated separately, not by preflight.

Key handling rule:

nvapi-*

tokens are first-class inputs for

nvcr_io

. Never reject by token prefix alone; use workflow registry probe results as source of truth.

在运行预检或任何提交操作前，请确认以下事项。缺少必要密钥会在

scripts/preflight_credentials.sh

的输出中显示为

USER_INPUT_REQUIRED:

。

要求	满足方式	用途
NGC API密钥（可选）	在 `NVIDIA_API_KEY` / `OPENAI_API_KEY` / `VLM_API_KEY` / `LLM_API_KEY` 中设置 `NGC_API_KEY` 、 `NGC_CLI_API_KEY` 或兼容的 `nvapi-*` 令牌	可选用于 `nvcr_io` 凭证刷新和NGC REST范围探测；默认VDA镜像引用通过工作流注册表探测验证
Hugging Face令牌	设置 `HF_TOKEN` （或 `HUGGING_FACE_HUB_TOKEN` ），或在 `~/.cache/huggingface/token` 处有缓存的令牌	创建OSMO的 `hf_token` 凭证；拉取受限制的Cosmos/SeedVR权重
OSMO CLI访问权限	`PATH` 中存在 `osmo` ，已登录，拥有默认配置文件，且注册的DATA凭证配置文件与 `storage_url` 匹配	提交/监控工作流，以及列出/下载对象
GPU池	`osmo pool list --mode free` 中至少有一个 `ONLINE` 池； `POD_TEMPLATE` 包含GPU容忍度/选择器	调度设置 + 工作任务

可选（仅用于严格的NGC组织/团队探测）：

NGC_ORG

NGC_TEAM

（或

NGC_CLI_ORG

NGC_CLI_TEAM

）。外部VLM/LLM端点密钥会单独验证，不由预检操作处理。

密钥处理规则：

nvapi-*

令牌是

nvcr_io

的一等输入。切勿仅根据令牌前缀拒绝；以工作流注册表探测结果为判断依据。

Instructions

操作步骤

Select the workflow (

auto_labeling

augmentation_and_al

e2e

e2e_super_resolution

) from user intent.

Provide a tentative execution-time overview before starting run actions.
Run preflight and readiness checks before submit.
Derive submit-time values from the active dataset backend (never guess
```
storage_url
```
).
Submit the workflow with explicit interpolation values and monitor to completion.
Retrieve outputs, provide side-by-side comparison evidence for augmented flows, and summarize task outcomes.

Use

run_script(...)

for script execution. Canonical examples:

python

run_script("bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/augmentation_and_al.yaml")
run_script("python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/auto_labeling.yaml")
run_script("bash scripts/prepare_demo_assets.sh /srv/sdg/data/vda_inputs")

根据用户意图选择工作流（

auto_labeling

、

augmentation_and_al

、

e2e

、

e2e_super_resolution

）。

在开始运行操作前，提供暂定的执行时间概述。
在提交前运行预检和就绪检查。
从活跃数据集后端推导提交时的值（切勿猜测
```
storage_url
```
）。
使用明确的插值值提交工作流，并监控至完成。
检索输出，为增强流程提供并排对比证据，并总结任务结果。

使用

run_script(...)

执行脚本。标准示例：

python

run_script("bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/augmentation_and_al.yaml")
run_script("python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/auto_labeling.yaml")
run_script("bash scripts/prepare_demo_assets.sh /srv/sdg/data/vda_inputs")

Available Scripts

可用脚本

Use script-level

--help

for exact arguments.

Script	Role
`scripts/preflight_credentials.sh`	Secrets/control-plane preflight and workflow image access checks
`scripts/pre_submit_guard.py`	Submit-time interpolation, cache, and dataset safety checks
`scripts/prepare_demo_assets.sh`	Demo video pull + flatten for default demo path
`scripts/generate_configs.py`	Setup-time config and cookbook projection generation
`scripts/cosmos_worker.sh`	Augmentation worker execution
`scripts/pl_original_worker.sh`	Original-video auto-labeling worker execution
`scripts/pl_augmented_worker.sh`	Augmented-video auto-labeling worker execution
`scripts/osmo_barrier.py`	Multi-node barrier synchronization
`scripts/stage_run_artifacts.sh`	Local mirror of full run output + input video
`scripts/render_side_by_side.sh`	Side-by-side comparison render from local artifacts

使用脚本级别的

--help

查看确切参数。

脚本	作用
`scripts/preflight_credentials.sh`	密钥/控制平面预检和工作流镜像访问检查
`scripts/pre_submit_guard.py`	提交时插值、缓存和数据集安全检查
`scripts/prepare_demo_assets.sh`	拉取演示视频并扁平化至默认演示路径
`scripts/generate_configs.py`	生成设置时的配置和指南投影
`scripts/cosmos_worker.sh`	增强工作任务执行
`scripts/pl_original_worker.sh`	原始视频自动标注工作任务执行
`scripts/pl_augmented_worker.sh`	增强后视频自动标注工作任务执行
`scripts/osmo_barrier.py`	多节点屏障同步
`scripts/stage_run_artifacts.sh`	完整运行输出 + 输入视频的本地镜像
`scripts/render_side_by_side.sh`	从本地工件渲染并排对比内容

Supported Flows

支持的流程

Flow	OSMO YAML	Group sequence	Typical use
`augmentation_and_al`	`assets/configs/osmo/augmentation_and_al.yaml`	setup -> augmentation -> auto_labeling_augmented	Augment one or more videos, then auto-label augmented outputs
`auto_labeling`	`assets/configs/osmo/auto_labeling.yaml`	setup -> auto_labeling	Label original videos only
`e2e`	`assets/configs/osmo/e2e.yaml`	setup -> (auto_labeling_original + augmentation) -> auto_labeling_augmented	Throughput-first path
`e2e_super_resolution`	`assets/configs/osmo/e2e_super_resolution.yaml`	setup -> auto_labeling_original -> augmentation -> auto_labeling_augmented	Sequential path with SR gate before augmentation

Legacy alias

assets/configs/osmo/augmentation_and_pl.yaml

remains for backwards compatibility.

流程	OSMO YAML	组序列	典型用途
`augmentation_and_al`	`assets/configs/osmo/augmentation_and_al.yaml`	setup -> augmentation -> auto_labeling_augmented	增强一个或多个视频，然后对增强后的输出进行自动标注
`auto_labeling`	`assets/configs/osmo/auto_labeling.yaml`	setup -> auto_labeling	仅标注原始视频
`e2e`	`assets/configs/osmo/e2e.yaml`	setup -> (auto_labeling_original + augmentation) -> auto_labeling_augmented	优先考虑吞吐量的路径
`e2e_super_resolution`	`assets/configs/osmo/e2e_super_resolution.yaml`	setup -> auto_labeling_original -> augmentation -> auto_labeling_augmented	增强前带有超分辨率门控的顺序路径

为了向后兼容，仍保留旧别名

assets/configs/osmo/augmentation_and_pl.yaml

。

Pick the right workflow for the user's request

根据用户请求选择合适的工作流

User intent	Workflow
"Label my source videos" / "PL-only" / "no augmentation"	`auto_labeling`
"Create augmented videos and label them"	`augmentation_and_al`
"Run the full pipeline quickly"	`e2e`
"Run full pipeline, but gate on SR-enhanced originals first"	`e2e_super_resolution`

用户意图	工作流
"标注我的源视频" / "仅PL" / "不增强"	`auto_labeling`
"创建增强视频并标注它们"	`augmentation_and_al`
"快速运行完整流水线"	`e2e`
"运行完整流水线，但先基于超分辨率增强的原始视频进行门控"	`e2e_super_resolution`

Disambiguation: handle vague requests before committing

歧义处理：提交前处理模糊请求

Default to autonomy: ask only when missing information blocks execution.

默认自主处理：仅当缺少信息阻碍执行时才询问。

Autonomous defaults (do NOT ask)

自主默认规则（无需询问）

If dataset source is absent, run VDA demo path (
```
scripts/prepare_demo_assets.sh
```
) and continue with
```
dataset=vda-demo
```
.
If flow is not explicitly requested, default to
```
augmentation_and_al
```
.
If endpoint mode is unspecified, default to in-cluster persistent NIM reuse and automatic NIM deploy/repair when unhealthy.
If cache is missing, run
```
setup_model_cache.yaml
```
, rerun pre-submit guard, and continue automatically on success.
After any stage completes successfully, continue to the next stage immediately. Do not pause with "Ready when you are" or equivalent approval prompts.

如果数据集源缺失，运行VDA演示路径（
```
scripts/prepare_demo_assets.sh
```
）并继续使用
```
dataset=vda-demo
```
。
如果未明确请求流程，默认使用
```
augmentation_and_al
```
。
如果未指定端点模式，默认使用集群内持久化NIM复用，当NIM不健康时自动部署/修复。
如果缓存缺失，运行
```
setup_model_cache.yaml
```
，重新运行提交前检查器，成功后自动继续。
任何阶段成功完成后，立即继续下一阶段。请勿以“准备好后告知我”或类似的批准提示暂停。

Triggers that should pause for disambiguation

需要暂停以消除歧义的触发条件

Missing input	Why it matters	Ask
`USER_INPUT_REQUIRED` from preflight	Required secret is missing	Ask one concise unblock question for exactly the missing value(s)
Storage backend prefix cannot be derived from the active dataset/upload root	Wrong scheme causes runtime storage auth mismatch	"What is the backend-native root prefix for this run?"
No ONLINE GPU pool/platform can be selected	Workflow cannot schedule setup/workers	"Which GPU pool/platform should this run target?"

缺失的输入	重要性	询问内容
预检输出中的 `USER_INPUT_REQUIRED`	缺少必要密钥	针对缺失的值提出一个简洁的解决问题的问题
无法从活跃数据集/上传根目录推导存储后端前缀	错误的方案会导致运行时存储认证不匹配	"本次运行的后端原生根前缀是什么？"
无法选择ONLINE状态的GPU池/平台	工作流无法调度设置/工作任务	"本次运行应针对哪个GPU池/平台？"

When NOT to disambiguate

无需消除歧义的情况

Do not ask for cookbook unless user explicitly asks to change scene profile.
Do not offer external endpoints by default.
Do not ask A/B cache strategy questions; default is automatic cache setup.
Do not ask to scale down existing NIMs; this is forbidden.
Do not invent, scrape, or generate random videos when input is missing.
Do not use non-VDA demo sources (for example Carline adaptation assets) unless the user explicitly requests a different dataset.

除非用户明确要求更改场景配置文件，否则不要询问指南相关内容。
默认不提供外部端点。
不要询问A/B缓存策略问题；默认自动设置缓存。
不要要求缩减现有NIM的规模；这是被禁止的。
输入缺失时，不要发明、抓取或生成随机视频。
除非用户明确请求不同的数据集，否则不要使用非VDA演示源（例如Carline适配资产）。

Step 0: Select Flow and Gather Inputs

步骤0：选择流程并收集输入

Input video policy (non-negotiable)

输入视频规则（不可协商）

Always preserve user-provided video inputs (dataset URL, local path, or upload folder) as first-class and preferred.
Never replace an explicit user video with demo assets or any other source.
If no video input is provided, default to VDA demo assets via
```
scripts/prepare_demo_assets.sh
```
(HF dataset flow) without asking extra source-selection questions.
If the user explicitly mentions an input video or dataset, prefer and use that input instead of demo assets.
Use only VDA demo assets (
```
nvidia/video-data-augmentation-demo
```
) for the default demo path.
Never propose arbitrary web clip downloads or placeholder videos unless the user explicitly requests that behavior.

Collect only missing values:

Dataset source (prefer explicit user-provided
```
dataset_url
```
or local upload folder; otherwise default to VDA demo assets and proceed).

Flow (

auto_labeling

augmentation_and_al

e2e

e2e_super_resolution

); default to

augmentation_and_al

when unspecified.

OSMO
```
gpu_platform
```
for all VDA resources (auto-select an ONLINE platform when unambiguous; ask only when no valid option exists).
Endpoint mode (default in-cluster NIM reuse/deploy unless explicitly overridden).

Do not guess

gpu_platform

(for example

microk8s

). Use the exact current platform label shown by

osmo pool list --mode free

(for example

gpu

Generate run stamp before each submit:

bash

STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
RUN_ID="run-$STAMP"

始终将用户提供的视频输入（数据集URL、本地路径或上传文件夹）视为一等优先项。
切勿用演示资产或任何其他源替换用户明确提供的视频。
如果未提供视频输入，默认通过
```
scripts/prepare_demo_assets.sh
```
使用VDA演示资产（HF数据集流程），无需额外询问源选择问题。
如果用户明确提及输入视频或数据集，优先使用该输入而非演示资产。
默认演示路径仅使用VDA演示资产（
```
nvidia/video-data-augmentation-demo
```
）。
除非用户明确要求，否则不要提议任意网络片段下载或占位符视频。

仅收集缺失的值：

数据集源（优先选择用户明确提供的
```
dataset_url
```
或本地上传文件夹；否则默认使用VDA演示资产并继续）。

流程（

auto_labeling

、

augmentation_and_al

、

e2e

、

e2e_super_resolution

）；未指定时默认使用

augmentation_and_al

。

所有VDA资源的OSMO
```
gpu_platform
```
（无歧义时自动选择ONLINE平台；仅当无有效选项时询问）。
端点模式（默认集群内NIM复用/部署，除非明确覆盖）。

切勿猜测

gpu_platform

（例如

microk8s

）。使用

osmo pool list --mode free

显示的确切当前平台标签（例如

gpu

）。

每次提交前生成运行标记：

bash

STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
RUN_ID="run-$STAMP"

Execution Time Overview (required before run)

执行时间概述（运行前必需）

Before running any mutating command (

osmo credential set

, NIM install/repair, cache workflow submit, or target VDA workflow submit), provide a short ETA overview to the user.

Keep it concise (one short paragraph or 4-6 bullets) and include:

whether this looks like a cold start (NIM/cache missing) or warm start (NIM/cache already healthy),
major phases with approximate durations,
a total expected range for the selected workflow.

Baseline ranges (from observed MicroK8s + OSMO runs):

Phase	Typical duration
Credentials + preflight	~1-2 min
NIM deploy/download/warmup (if needed)	~10-15 min
Demo assets download/upload (if demo path)	~1-3 min
Model cache population (if needed)	~15-25 min
Workflow submit + queue/start	~1-3 min

Workflow runtime ranges after submit:

Flow	Typical runtime
`auto_labeling`	~6-15 min
`augmentation_and_al`	~20-35 min
`e2e`	~22-40 min
`e2e_super_resolution`	~25-45 min

Cold-start end-to-end runs are commonly ~45-80 min; warm-start runs are usually ~20-45 min depending on flow and video length.

在运行任何变更命令（

osmo credential set

、NIM安装/修复、缓存工作流提交或目标VDA工作流提交）之前，向用户提供简短的预计时间概述。

保持简洁（一个短段落或4-6个项目符号），并包含：

这看起来像是冷启动（NIM/缓存缺失）还是热启动（NIM/缓存已正常），
主要阶段及大致持续时间，
所选工作流的总预期时间范围。

基准范围（来自观察到的MicroK8s + OSMO运行）：

阶段	典型持续时间
凭证 + 预检	~1-2分钟
NIM部署/下载/预热（如需）	~10-15分钟
演示资产下载/上传（如使用演示路径）	~1-3分钟
模型缓存填充（如需）	~15-25分钟
工作流提交 + 排队/启动	~1-3分钟

提交后的工作流运行时间范围：

流程	典型运行时间
`auto_labeling`	~6-15分钟
`augmentation_and_al`	~20-35分钟
`e2e`	~22-40分钟
`e2e_super_resolution`	~25-45分钟

冷启动端到端运行通常约45-80分钟；热启动运行通常约20-45分钟，具体取决于流程和视频长度。

Common Preconditions (all flows)

通用前置条件（所有流程）

Credential and control-plane preflight
bash
```
bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml
```
Restricted egress:
bash
```
bash scripts/preflight_credentials.sh --no-probe --workflow assets/configs/osmo/<mode>.yaml
```
Preflight does not require a workload-local
```
.env
```
. Runtime interpolation is driven by submit-time values (
```
dataset
```
,
```
run_id
```
,
```
gpu_platform
```
,
```
video
```
,
```
storage_url
```
,
```
skills_dir
```
) supplied in one
```
--set-string
```
list.
Passing
```
--workflow
```
validates pull access for the active workflow image refs (
```
workflow.groups[].tasks[].image
```
) using anonymous bearer access with credential fallback when provided. If replacement NGC/HF secrets are provided in env, preflight refreshes existing
```
nvcr_io
```
/
```
hf_token
```
automatically when present. Use
```
--refresh
```
to force overwrite even when no new env secrets were supplied:
bash
```
bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml --refresh
```
If output contains
```
USER_INPUT_REQUIRED:
```
, ask one concise unblock question and stop.
On workflow image
```
401/403
```
, report registry access failure after probe checks on the listed image refs; do not claim a key family (for example
```
nvapi-*
```
) is categorically unsupported.
Storage interpolation policy
```
storage_url
```
must be derived from the actual dataset/upload backend for the current run.
text
```
dataset_url=azure://storiondevxah69/osmo-workflows/datasets/vda-demo
storage_url=azure://storiondevxah69/osmo-workflows
dataset=vda-demo
```
Never silently default to stale
```
s3://
```
values on non-S3 backends.
Inference policy (non-negotiable)
- Reuse healthy in-cluster persistent NIM endpoints by default.
- If missing/unhealthy, deploy automatically — this is a prerequisite, not a user decision. Do NOT pause to ask; run the install with the VDA allow-list:
bash
```
export NIM_SERVICES="qwen3-vl qwen25-14b"
skills/physical-ai-infrastructure-setup-and-resilient-scaling/components/inference-nim-operator/scripts/install.sh
```
- See
```
references/nim/README.md
```
  for full endpoint docs and health checks.
- External endpoints are opt-in only (explicit request or explicit URLs); only then skip the in-cluster deploy.
- Never infer external mode from credential presence.
- Never scale down/delete existing NIMs to free GPUs.

Readiness guard

bash

osmo pool list --mode free
osmo config show POD_TEMPLATE
python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/<mode>.yaml

Cache auto-remediation
If
```
pre_submit_guard.py
```
reports cache failure, default action is to run:
bash
```
osmo workflow submit assets/configs/osmo/setup_model_cache.yaml \
  --set-string storage_url=<backend-prefix> path=data
```
Then rerun
```
pre_submit_guard.py
```
and submit the target VDA flow only after it passes. Ask user only when backend/prefix is ambiguous or cache setup fails.
Scheduling policy
VDA templates schedule setup and workers on
```
gpu_platform
```
(no
```
system
```
pool dependency for user workloads).

凭证和控制平面预检
bash
```
bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml
```
受限出口：
bash
```
bash scripts/preflight_credentials.sh --no-probe --workflow assets/configs/osmo/<mode>.yaml
```
预检不需要工作负载本地的
```
.env
```
文件。运行时插值由提交时提供的
```
--set-string
```
列表中的值（
```
dataset
```
、
```
run_id
```
、
```
gpu_platform
```
、
```
video
```
、
```
storage_url
```
、
```
skills_dir
```
）驱动。
传递
```
--workflow
```
参数会使用匿名Bearer访问验证活跃工作流镜像引用（
```
workflow.groups[].tasks[].image
```
）的拉取权限，提供凭证时会回退使用凭证。如果环境中提供了替代的NGC/HF密钥，预检会自动刷新现有的
```
nvcr_io
```
/
```
hf_token
```
（如果存在）。使用
```
--refresh
```
强制覆盖，即使未提供新的环境密钥：
bash
```
bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml --refresh
```
如果输出包含
```
USER_INPUT_REQUIRED:
```
，提出一个简洁的解决问题的问题并停止。
如果工作流镜像出现
```
401/403
```
错误，在对列出的镜像引用进行探测检查后报告注册表访问失败；不要断言某个密钥家族（例如
```
nvapi-*
```
）完全不被支持。

存储插值策略

storage_url

必须从当前运行的实际数据集/上传后端推导。

text

dataset_url=azure://storiondevxah69/osmo-workflows/datasets/vda-demo
storage_url=azure://storiondevxah69/osmo-workflows
dataset=vda-demo

切勿在非S3后端上默认使用过时的

s3://

值。

推理策略（不可协商）
- 默认复用健康的集群内持久化NIM端点。
- 如果缺失/不健康，自动部署——这是前置条件，而非用户决策。请勿暂停询问；使用VDA允许列表运行安装：
bash
```
export NIM_SERVICES="qwen3-vl qwen25-14b"
skills/physical-ai-infrastructure-setup-and-resilient-scaling/components/inference-nim-operator/scripts/install.sh
```
- 有关完整的端点文档和健康检查，请参阅
```
references/nim/README.md
```
  。
- 外部端点仅为可选加入（明确请求或明确URL）；仅在此时跳过集群内部署。
- 切勿从凭证存在推断外部模式。
- 切勿缩减/删除现有NIM以释放GPU。

就绪检查器

bash

osmo pool list --mode free
osmo config show POD_TEMPLATE
python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/<mode>.yaml

缓存自动修复
如果
```
pre_submit_guard.py
```
报告缓存失败，默认操作是运行：
bash
```
osmo workflow submit assets/configs/osmo/setup_model_cache.yaml \
  --set-string storage_url=<backend-prefix> path=data
```
然后重新运行
```
pre_submit_guard.py
```
，仅在通过后提交目标VDA流程。仅当后端/前缀不明确或缓存设置失败时询问用户。
调度策略
VDA模板在
```
gpu_platform
```
上调度设置和工作任务（用户工作负载不依赖
```
system
```
池）。

Submit (all flows)

提交（所有流程）

Every flow uses the same submit shape; only the workflow YAML changes. Choose the YAML for the requested flow, then run the command below. Full per-flow walkthroughs (stage matrix and flow details) live in the linked references.

Flow	Workflow YAML	Walkthrough
Augmentation + auto-labeling	`assets/configs/osmo/augmentation_and_al.yaml`	`references/flows/augmentation_and_al.md`
Auto-labeling only	`assets/configs/osmo/auto_labeling.yaml`	`references/flows/auto_labeling.md`
E2E (parallel)	`assets/configs/osmo/e2e.yaml`	`references/flows/e2e.md`
E2E (super-resolution gated)	`assets/configs/osmo/e2e_super_resolution.yaml`	`references/flows/e2e_super_resolution.md`

bash

SKILLS_DIR="$(cd "$(git rev-parse --show-toplevel)/skills/physical-ai-video-data-augmentation" && pwd)"
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
osmo workflow submit assets/configs/osmo/<flow>.yaml \
  --pool <pool> \
  --set-string \
    dataset=<dataset> \
    run_id=run-$STAMP \
    storage_url=<backend-prefix> \
    gpu_platform=<gpu-platform> \
    video=<video-stem> \
    cosmos_model_cache_url=<backend-prefix>/data/models/cosmos_transfer \
    auto_labeling_model_cache_url=<backend-prefix>/data/models/auto_labeling \
    skills_dir="$SKILLS_DIR"

Compatibility note:

Use exactly one
```
--set-string
```
flag and pass all key/value pairs after it.
Do not repeat
```
--set
```
/
```
--set-string
```
flags in the same command; some OSMO builds only honor the last occurrence.
Do not mix
```
--set
```
and
```
--set-string
```
in one submit command.
Pass explicit
```
*_model_cache_url
```
values to avoid nested-template interpolation differences across OSMO environments.
Do not brute-force permutations of flags. Use this shape directly.

Common optional overrides (append key/value pairs to the same

--set-string

list):

bash

cookbook=<scene_profile> \
vlm_url=<openai_base_url> \
llm_url=<openai_base_url> \
cosmos_model_cache_url=<url> \
auto_labeling_model_cache_url=<url>

The auto-labeling-only flow has no augmentation stage, so it omits

cosmos_model_cache_url

at runtime; passing it is harmless and keeps one submit shape across flows.

所有流程使用相同的提交格式；仅工作流YAML不同。为请求的流程选择YAML，然后运行以下命令。每个流程的完整演练（阶段矩阵和流程详情）位于链接的参考文档中。

流程	工作流YAML	演练文档
增强 + 自动标注	`assets/configs/osmo/augmentation_and_al.yaml`	`references/flows/augmentation_and_al.md`
仅自动标注	`assets/configs/osmo/auto_labeling.yaml`	`references/flows/auto_labeling.md`
E2E（并行）	`assets/configs/osmo/e2e.yaml`	`references/flows/e2e.md`
E2E（超分辨率门控）	`assets/configs/osmo/e2e_super_resolution.yaml`	`references/flows/e2e_super_resolution.md`

bash

SKILLS_DIR="$(cd "$(git rev-parse --show-toplevel)/skills/physical-ai-video-data-augmentation" && pwd)"
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
osmo workflow submit assets/configs/osmo/<flow>.yaml \
  --pool <pool> \
  --set-string \
    dataset=<dataset> \
    run_id=run-$STAMP \
    storage_url=<backend-prefix> \
    gpu_platform=<gpu-platform> \
    video=<video-stem> \
    cosmos_model_cache_url=<backend-prefix>/data/models/cosmos_transfer \
    auto_labeling_model_cache_url=<backend-prefix>/data/models/auto_labeling \
    skills_dir="$SKILLS_DIR"

兼容性说明：

仅使用一个
```
--set-string
```
标志，并在其后传递所有键/值对。
同一命令中不要重复
```
--set
```
/
```
--set-string
```
标志；某些OSMO版本仅识别最后一个。
不要在一个提交命令中混合使用
```
--set
```
和
```
--set-string
```
。
传递明确的
```
*_model_cache_url
```
值，以避免不同OSMO环境下的嵌套模板插值差异。
不要强行尝试标志的排列组合。直接使用此格式。

常见可选覆盖（将键/值对追加到同一个

--set-string

列表中）：

bash

cookbook=<scene_profile> \
vlm_url=<openai_base_url> \
llm_url=<openai_base_url> \
cosmos_model_cache_url=<url> \
auto_labeling_model_cache_url=<url>

仅自动标注流程没有增强阶段，因此运行时会省略

cosmos_model_cache_url

；传递该参数不会产生影响，且能保持所有流程的提交格式一致。

OSMO Monitoring

OSMO监控

bash

undefined

bash

undefined

Workflow status + task states

工作流状态 + 任务状态

osmo workflow query <workflow_id> --format-type json
| jq '{status, tasks: [.groups[].tasks[] | {name, status, exit_code}]}'

Logs for a specific task

特定任务的日志

osmo workflow logs <workflow_id> --task <task_name> -n 200

Output retrieval

输出检索

osmo data list --no-pager <output_url> osmo data download <output_url> <local_dir>/


For completion artifacts, always mirror the full run output into workspace:

```bash
ROOT="$(git rev-parse --show-toplevel)"
RUN_LOCAL_DIR="$ROOT/media/vda/runs/<run_id>"
mkdir -p "$RUN_LOCAL_DIR"
osmo data download "<storage_url>/datasets/<dataset>-outputs/<run_id>/" "$RUN_LOCAL_DIR/"

For runs expected to exceed two minutes, send heartbeat updates at least every two minutes. For media evidence, emit one standalone

MEDIA:<absolute-path>

line per message bubble.

Execution continuity requirement:

Heartbeats must report progress while continuing work; they are status updates, not permission prompts.
Do not stop between green stages waiting for approval.
Pause only on blocking failures or explicit user stop/redirect.
If submit fails on interpolation, rerun once with the same canonical single-flag shape and corrected values; do not loop through ad-hoc flag experiments.

MEDIA formatting is strict:

Emit exactly one line:
```
MEDIA:/absolute/path/to/file.mp4
```
Keep
```
MEDIA:
```
contiguous on a single line (never split across lines).
No extra text in the same bubble.
No code fences, bullets, or quotes around the directive.
If render fails: retry once from a stable workspace path, then emit PNG fallback.

osmo data list --no-pager <output_url> osmo data download <output_url> <local_dir>/


对于完成工件，始终将完整的运行输出镜像到工作区：

```bash
ROOT="$(git rev-parse --show-toplevel)"
RUN_LOCAL_DIR="$ROOT/media/vda/runs/<run_id>"
mkdir -p "$RUN_LOCAL_DIR"
osmo data download "<storage_url>/datasets/<dataset>-outputs/<run_id>/" "$RUN_LOCAL_DIR/"

对于预计运行时间超过两分钟的任务，至少每两分钟发送一次心跳更新。对于媒体证据，每个消息气泡中输出一行独立的

MEDIA:<absolute-path>

。

执行连续性要求：

心跳必须在报告进度的同时继续工作；它们是状态更新，而非权限提示。
不要在成功阶段之间停止等待批准。
仅在出现阻塞故障或用户明确停止/重定向时暂停。
如果提交因插值失败，使用相同的标准单标志格式重新运行一次并修正值；不要循环尝试临时标志组合。

MEDIA格式严格要求：

仅输出一行：
```
MEDIA:/absolute/path/to/file.mp4
```
```
MEDIA:
```
必须在同一行连续（切勿跨行拆分）。
同一气泡中不要包含额外文本。
不要在指令周围使用代码块、项目符号或引号。
如果渲染失败：从稳定的工作区路径重试一次，然后输出PNG作为备选。

Post-Run Comparison Evidence (required for augmented flows)

运行后对比证据（增强流程必需）

Applies to

augmentation_and_al

e2e

, and

e2e_super_resolution

after a successful run.

Required completion output (do not stop at raw output URLs):

Stage full outputs + input video into workspace-local path:

bash

bash scripts/stage_run_artifacts.sh \
  --storage-url <storage_url> --dataset <dataset> --run-id <run_id> --video <video>

Render side-by-side from that local run copy:

bash

bash scripts/render_side_by_side.sh \
  --run-local-dir "<repo>/media/vda/runs/<run_id>" --dataset <dataset> --video <video>

Emit MEDIA from the local run copy and include:

augmentation summary from

<run_local_dir>/setup_b0/configs/manifest.yaml

(

sampled_vars

for

<video>_aug0

)

auto-labeling summary from

<run_local_dir>/outputs/pseudo_labeled_augmented/<video>_aug0

for

e2e

e2e_super_resolution

, original-label summary from

<run_local_dir>/outputs/pseudo_labeled/<video>

ffmpeg

is unavailable, emit input and augmented MEDIA from the same local run copy and still provide augmentation + auto-labeling summaries.

For demo runs (no user video provided), explicitly state that input came from

nvidia/video-data-augmentation-demo

适用于

augmentation_and_al

、

e2e

和

e2e_super_resolution

流程成功运行后。

必需的完成输出（不要停留在原始输出URL）：

将完整输出 + 输入视频复制到工作区本地路径：

bash

bash scripts/stage_run_artifacts.sh \
  --storage-url <storage_url> --dataset <dataset> --run-id <run_id> --video <video>

从本地运行副本渲染并排对比内容：

bash

bash scripts/render_side_by_side.sh \
  --run-local-dir "<repo>/media/vda/runs/<run_id>" --dataset <dataset> --video <video>

从本地运行副本输出MEDIA，并包含：

来自

<run_local_dir>/setup_b0/configs/manifest.yaml

的增强摘要（

<video>_aug0

的

sampled_vars

）

来自

<run_local_dir>/outputs/pseudo_labeled_augmented/<video>_aug0

的自动标注摘要

对于

e2e

e2e_super_resolution

，来自

<run_local_dir>/outputs/pseudo_labeled/<video>

的原始标注摘要

如果

ffmpeg

不可用，从同一本地运行副本输出输入和增强后的MEDIA，并仍提供增强 + 自动标注摘要。

对于演示运行（未提供用户视频），明确说明输入来自

nvidia/video-data-augmentation-demo

。

Supporting files

支持文件

Use these canonical locations:

Workflows:
```
assets/configs/osmo/*.yaml
```
Runtime scripts:
```
scripts/*.sh
```
,
```
scripts/*.py
```
Flow walkthroughs:
```
references/flows/*.md
```

Setup and triage:

references/setup.md

references/troubleshooting.md

Images and endpoint policy:

references/container-images.md

references/nim/README.md

Cookbook tuning:
```
assets/cookbooks/TUNING_GUIDE.md
```

使用以下标准位置：

工作流：
```
assets/configs/osmo/*.yaml
```
运行时脚本：
```
scripts/*.sh
```
、
```
scripts/*.py
```
流程演练文档：
```
references/flows/*.md
```

设置和故障排查：

references/setup.md

、

references/troubleshooting.md

镜像和端点策略：

references/container-images.md

、

references/nim/README.md

指南调优：
```
assets/cookbooks/TUNING_GUIDE.md
```