physical-ai-video-data-augmentation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Physical AI Video Data Augmentation Workflow Orchestrator

Physical AI 视频数据增强工作流编排器

Default workflow skill for VDA execution on OSMO. It owns flow selection, preflight, cache readiness, inference-path decisions, submit-time interpolation, monitoring, and output retrieval. Component skills are consult-only.
这是在OSMO上执行VDA的默认工作流Skill。它负责流程选择、预检、缓存就绪、推理路径决策、提交时插值、监控以及输出检索。组件Skill仅用于咨询。

Purpose

用途

Run the end-to-end VDA workflow safely and reproducibly from preflight to output download.
Do NOT use this skill for container-internal tuning-only questions.
安全且可复现地运行从预检到输出下载的端到端VDA工作流。
请勿将此Skill用于仅容器内部调优的问题。

Prerequisites

前提条件

Confirm these before running preflight or any submit. Missing required secrets surface as
USER_INPUT_REQUIRED:
from
scripts/preflight_credentials.sh
.
RequirementHow it is satisfiedUsed for
NGC API key (optional)
NGC_API_KEY
,
NGC_CLI_API_KEY
, or compatible
nvapi-*
token in
NVIDIA_API_KEY
/
OPENAI_API_KEY
/
VLM_API_KEY
/
LLM_API_KEY
Optional for
nvcr_io
credential refresh and NGC REST scope probe; default VDA image refs are validated via workflow registry probes
Hugging Face token
HF_TOKEN
(or
HUGGING_FACE_HUB_TOKEN
), or a cached token at
~/.cache/huggingface/token
Creates the OSMO
hf_token
credential; pulls gated Cosmos/SeedVR weights
OSMO CLI access
osmo
on
PATH
, logged in, with a default profile and a registered DATA credential profile matching
storage_url
Submitting/monitoring workflows and listing/downloading objects
GPU poolAt least one
ONLINE
pool in
osmo pool list --mode free
;
POD_TEMPLATE
carries GPU toleration/selectors
Scheduling setup + worker tasks
Optional (only for the strict NGC org/team probe):
NGC_ORG
+
NGC_TEAM
(or
NGC_CLI_ORG
/
NGC_CLI_TEAM
). External VLM/LLM endpoint keys are validated separately, not by preflight.
Key handling rule:
nvapi-*
tokens are first-class inputs for
nvcr_io
. Never reject by token prefix alone; use workflow registry probe results as source of truth.
在运行预检或任何提交操作前,请确认以下事项。缺少必要密钥会在
scripts/preflight_credentials.sh
的输出中显示为
USER_INPUT_REQUIRED:
要求满足方式用途
NGC API密钥(可选)
NVIDIA_API_KEY
/
OPENAI_API_KEY
/
VLM_API_KEY
/
LLM_API_KEY
中设置
NGC_API_KEY
NGC_CLI_API_KEY
或兼容的
nvapi-*
令牌
可选用于
nvcr_io
凭证刷新和NGC REST范围探测;默认VDA镜像引用通过工作流注册表探测验证
Hugging Face令牌设置
HF_TOKEN
(或
HUGGING_FACE_HUB_TOKEN
),或在
~/.cache/huggingface/token
处有缓存的令牌
创建OSMO的
hf_token
凭证;拉取受限制的Cosmos/SeedVR权重
OSMO CLI访问权限
PATH
中存在
osmo
,已登录,拥有默认配置文件,且注册的DATA凭证配置文件与
storage_url
匹配
提交/监控工作流,以及列出/下载对象
GPU池
osmo pool list --mode free
中至少有一个
ONLINE
池;
POD_TEMPLATE
包含GPU容忍度/选择器
调度设置 + 工作任务
可选(仅用于严格的NGC组织/团队探测):
NGC_ORG
+
NGC_TEAM
(或
NGC_CLI_ORG
/
NGC_CLI_TEAM
)。外部VLM/LLM端点密钥会单独验证,不由预检操作处理。
密钥处理规则:
nvapi-*
令牌是
nvcr_io
的一等输入。切勿仅根据令牌前缀拒绝;以工作流注册表探测结果为判断依据。

Instructions

操作步骤

  1. Select the workflow (
    auto_labeling
    ,
    augmentation_and_al
    ,
    e2e
    ,
    e2e_super_resolution
    ) from user intent.
  2. Provide a tentative execution-time overview before starting run actions.
  3. Run preflight and readiness checks before submit.
  4. Derive submit-time values from the active dataset backend (never guess
    storage_url
    ).
  5. Submit the workflow with explicit interpolation values and monitor to completion.
  6. Retrieve outputs, provide side-by-side comparison evidence for augmented flows, and summarize task outcomes.
Use
run_script(...)
for script execution. Canonical examples:
python
run_script("bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/augmentation_and_al.yaml")
run_script("python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/auto_labeling.yaml")
run_script("bash scripts/prepare_demo_assets.sh /srv/sdg/data/vda_inputs")
  1. 根据用户意图选择工作流(
    auto_labeling
    augmentation_and_al
    e2e
    e2e_super_resolution
    )。
  2. 在开始运行操作前,提供暂定的执行时间概述。
  3. 在提交前运行预检和就绪检查。
  4. 从活跃数据集后端推导提交时的值(切勿猜测
    storage_url
    )。
  5. 使用明确的插值值提交工作流,并监控至完成。
  6. 检索输出,为增强流程提供并排对比证据,并总结任务结果。
使用
run_script(...)
执行脚本。标准示例:
python
run_script("bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/augmentation_and_al.yaml")
run_script("python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/auto_labeling.yaml")
run_script("bash scripts/prepare_demo_assets.sh /srv/sdg/data/vda_inputs")

Available Scripts

可用脚本

Use script-level
--help
for exact arguments.
ScriptRole
scripts/preflight_credentials.sh
Secrets/control-plane preflight and workflow image access checks
scripts/pre_submit_guard.py
Submit-time interpolation, cache, and dataset safety checks
scripts/prepare_demo_assets.sh
Demo video pull + flatten for default demo path
scripts/generate_configs.py
Setup-time config and cookbook projection generation
scripts/cosmos_worker.sh
Augmentation worker execution
scripts/pl_original_worker.sh
Original-video auto-labeling worker execution
scripts/pl_augmented_worker.sh
Augmented-video auto-labeling worker execution
scripts/osmo_barrier.py
Multi-node barrier synchronization
scripts/stage_run_artifacts.sh
Local mirror of full run output + input video
scripts/render_side_by_side.sh
Side-by-side comparison render from local artifacts
使用脚本级别的
--help
查看确切参数。
脚本作用
scripts/preflight_credentials.sh
密钥/控制平面预检和工作流镜像访问检查
scripts/pre_submit_guard.py
提交时插值、缓存和数据集安全检查
scripts/prepare_demo_assets.sh
拉取演示视频并扁平化至默认演示路径
scripts/generate_configs.py
生成设置时的配置和指南投影
scripts/cosmos_worker.sh
增强工作任务执行
scripts/pl_original_worker.sh
原始视频自动标注工作任务执行
scripts/pl_augmented_worker.sh
增强后视频自动标注工作任务执行
scripts/osmo_barrier.py
多节点屏障同步
scripts/stage_run_artifacts.sh
完整运行输出 + 输入视频的本地镜像
scripts/render_side_by_side.sh
从本地工件渲染并排对比内容

Supported Flows

支持的流程

FlowOSMO YAMLGroup sequenceTypical use
augmentation_and_al
assets/configs/osmo/augmentation_and_al.yaml
setup -> augmentation -> auto_labeling_augmentedAugment one or more videos, then auto-label augmented outputs
auto_labeling
assets/configs/osmo/auto_labeling.yaml
setup -> auto_labelingLabel original videos only
e2e
assets/configs/osmo/e2e.yaml
setup -> (auto_labeling_original + augmentation) -> auto_labeling_augmentedThroughput-first path
e2e_super_resolution
assets/configs/osmo/e2e_super_resolution.yaml
setup -> auto_labeling_original -> augmentation -> auto_labeling_augmentedSequential path with SR gate before augmentation
Legacy alias
assets/configs/osmo/augmentation_and_pl.yaml
remains for backwards compatibility.
流程OSMO YAML组序列典型用途
augmentation_and_al
assets/configs/osmo/augmentation_and_al.yaml
setup -> augmentation -> auto_labeling_augmented增强一个或多个视频,然后对增强后的输出进行自动标注
auto_labeling
assets/configs/osmo/auto_labeling.yaml
setup -> auto_labeling仅标注原始视频
e2e
assets/configs/osmo/e2e.yaml
setup -> (auto_labeling_original + augmentation) -> auto_labeling_augmented优先考虑吞吐量的路径
e2e_super_resolution
assets/configs/osmo/e2e_super_resolution.yaml
setup -> auto_labeling_original -> augmentation -> auto_labeling_augmented增强前带有超分辨率门控的顺序路径
为了向后兼容,仍保留旧别名
assets/configs/osmo/augmentation_and_pl.yaml

Pick the right workflow for the user's request

根据用户请求选择合适的工作流

User intentWorkflow
"Label my source videos" / "PL-only" / "no augmentation"
auto_labeling
"Create augmented videos and label them"
augmentation_and_al
"Run the full pipeline quickly"
e2e
"Run full pipeline, but gate on SR-enhanced originals first"
e2e_super_resolution
用户意图工作流
"标注我的源视频" / "仅PL" / "不增强"
auto_labeling
"创建增强视频并标注它们"
augmentation_and_al
"快速运行完整流水线"
e2e
"运行完整流水线,但先基于超分辨率增强的原始视频进行门控"
e2e_super_resolution

Disambiguation: handle vague requests before committing

歧义处理:提交前处理模糊请求

Default to autonomy: ask only when missing information blocks execution.
默认自主处理:仅当缺少信息阻碍执行时才询问。

Autonomous defaults (do NOT ask)

自主默认规则(无需询问)

  • If dataset source is absent, run VDA demo path (
    scripts/prepare_demo_assets.sh
    ) and continue with
    dataset=vda-demo
    .
  • If flow is not explicitly requested, default to
    augmentation_and_al
    .
  • If endpoint mode is unspecified, default to in-cluster persistent NIM reuse and automatic NIM deploy/repair when unhealthy.
  • If cache is missing, run
    setup_model_cache.yaml
    , rerun pre-submit guard, and continue automatically on success.
  • After any stage completes successfully, continue to the next stage immediately. Do not pause with "Ready when you are" or equivalent approval prompts.
  • 如果数据集源缺失,运行VDA演示路径(
    scripts/prepare_demo_assets.sh
    )并继续使用
    dataset=vda-demo
  • 如果未明确请求流程,默认使用
    augmentation_and_al
  • 如果未指定端点模式,默认使用集群内持久化NIM复用,当NIM不健康时自动部署/修复。
  • 如果缓存缺失,运行
    setup_model_cache.yaml
    ,重新运行提交前检查器,成功后自动继续。
  • 任何阶段成功完成后,立即继续下一阶段。请勿以“准备好后告知我”或类似的批准提示暂停。

Triggers that should pause for disambiguation

需要暂停以消除歧义的触发条件

Missing inputWhy it mattersAsk
USER_INPUT_REQUIRED
from preflight
Required secret is missingAsk one concise unblock question for exactly the missing value(s)
Storage backend prefix cannot be derived from the active dataset/upload rootWrong scheme causes runtime storage auth mismatch"What is the backend-native root prefix for this run?"
No ONLINE GPU pool/platform can be selectedWorkflow cannot schedule setup/workers"Which GPU pool/platform should this run target?"
缺失的输入重要性询问内容
预检输出中的
USER_INPUT_REQUIRED
缺少必要密钥针对缺失的值提出一个简洁的解决问题的问题
无法从活跃数据集/上传根目录推导存储后端前缀错误的方案会导致运行时存储认证不匹配"本次运行的后端原生根前缀是什么?"
无法选择ONLINE状态的GPU池/平台工作流无法调度设置/工作任务"本次运行应针对哪个GPU池/平台?"

When NOT to disambiguate

无需消除歧义的情况

  • Do not ask for cookbook unless user explicitly asks to change scene profile.
  • Do not offer external endpoints by default.
  • Do not ask A/B cache strategy questions; default is automatic cache setup.
  • Do not ask to scale down existing NIMs; this is forbidden.
  • Do not invent, scrape, or generate random videos when input is missing.
  • Do not use non-VDA demo sources (for example Carline adaptation assets) unless the user explicitly requests a different dataset.
  • 除非用户明确要求更改场景配置文件,否则不要询问指南相关内容。
  • 默认不提供外部端点。
  • 不要询问A/B缓存策略问题;默认自动设置缓存。
  • 不要要求缩减现有NIM的规模;这是被禁止的。
  • 输入缺失时,不要发明、抓取或生成随机视频。
  • 除非用户明确请求不同的数据集,否则不要使用非VDA演示源(例如Carline适配资产)。

Step 0: Select Flow and Gather Inputs

步骤0:选择流程并收集输入

Input video policy (non-negotiable)

输入视频规则(不可协商)

  • Always preserve user-provided video inputs (dataset URL, local path, or upload folder) as first-class and preferred.
  • Never replace an explicit user video with demo assets or any other source.
  • If no video input is provided, default to VDA demo assets via
    scripts/prepare_demo_assets.sh
    (HF dataset flow) without asking extra source-selection questions.
  • If the user explicitly mentions an input video or dataset, prefer and use that input instead of demo assets.
  • Use only VDA demo assets (
    nvidia/video-data-augmentation-demo
    ) for the default demo path.
  • Never propose arbitrary web clip downloads or placeholder videos unless the user explicitly requests that behavior.
Collect only missing values:
  1. Dataset source (prefer explicit user-provided
    dataset_url
    or local upload folder; otherwise default to VDA demo assets and proceed).
  2. Flow (
    auto_labeling
    ,
    augmentation_and_al
    ,
    e2e
    ,
    e2e_super_resolution
    ); default to
    augmentation_and_al
    when unspecified.
  3. OSMO
    gpu_platform
    for all VDA resources (auto-select an ONLINE platform when unambiguous; ask only when no valid option exists).
  4. Endpoint mode (default in-cluster NIM reuse/deploy unless explicitly overridden).
Do not guess
gpu_platform
(for example
microk8s
). Use the exact current platform label shown by
osmo pool list --mode free
(for example
gpu
).
Generate run stamp before each submit:
bash
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
RUN_ID="run-$STAMP"
  • 始终将用户提供的视频输入(数据集URL、本地路径或上传文件夹)视为一等优先项。
  • 切勿用演示资产或任何其他源替换用户明确提供的视频。
  • 如果未提供视频输入,默认通过
    scripts/prepare_demo_assets.sh
    使用VDA演示资产(HF数据集流程),无需额外询问源选择问题。
  • 如果用户明确提及输入视频或数据集,优先使用该输入而非演示资产。
  • 默认演示路径仅使用VDA演示资产(
    nvidia/video-data-augmentation-demo
    )。
  • 除非用户明确要求,否则不要提议任意网络片段下载或占位符视频。
仅收集缺失的值:
  1. 数据集源(优先选择用户明确提供的
    dataset_url
    或本地上传文件夹;否则默认使用VDA演示资产并继续)。
  2. 流程(
    auto_labeling
    augmentation_and_al
    e2e
    e2e_super_resolution
    );未指定时默认使用
    augmentation_and_al
  3. 所有VDA资源的OSMO
    gpu_platform
    (无歧义时自动选择ONLINE平台;仅当无有效选项时询问)。
  4. 端点模式(默认集群内NIM复用/部署,除非明确覆盖)。
切勿猜测
gpu_platform
(例如
microk8s
)。使用
osmo pool list --mode free
显示的确切当前平台标签(例如
gpu
)。
每次提交前生成运行标记:
bash
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
RUN_ID="run-$STAMP"

Execution Time Overview (required before run)

执行时间概述(运行前必需)

Before running any mutating command (
osmo credential set
, NIM install/repair, cache workflow submit, or target VDA workflow submit), provide a short ETA overview to the user.
Keep it concise (one short paragraph or 4-6 bullets) and include:
  • whether this looks like a cold start (NIM/cache missing) or warm start (NIM/cache already healthy),
  • major phases with approximate durations,
  • a total expected range for the selected workflow.
Baseline ranges (from observed MicroK8s + OSMO runs):
PhaseTypical duration
Credentials + preflight~1-2 min
NIM deploy/download/warmup (if needed)~10-15 min
Demo assets download/upload (if demo path)~1-3 min
Model cache population (if needed)~15-25 min
Workflow submit + queue/start~1-3 min
Workflow runtime ranges after submit:
FlowTypical runtime
auto_labeling
~6-15 min
augmentation_and_al
~20-35 min
e2e
~22-40 min
e2e_super_resolution
~25-45 min
Cold-start end-to-end runs are commonly ~45-80 min; warm-start runs are usually ~20-45 min depending on flow and video length.
在运行任何变更命令(
osmo credential set
、NIM安装/修复、缓存工作流提交或目标VDA工作流提交)之前,向用户提供简短的预计时间概述。
保持简洁(一个短段落或4-6个项目符号),并包含:
  • 这看起来像是冷启动(NIM/缓存缺失)还是热启动(NIM/缓存已正常),
  • 主要阶段及大致持续时间,
  • 所选工作流的总预期时间范围。
基准范围(来自观察到的MicroK8s + OSMO运行):
阶段典型持续时间
凭证 + 预检~1-2分钟
NIM部署/下载/预热(如需)~10-15分钟
演示资产下载/上传(如使用演示路径)~1-3分钟
模型缓存填充(如需)~15-25分钟
工作流提交 + 排队/启动~1-3分钟
提交后的工作流运行时间范围:
流程典型运行时间
auto_labeling
~6-15分钟
augmentation_and_al
~20-35分钟
e2e
~22-40分钟
e2e_super_resolution
~25-45分钟
冷启动端到端运行通常约45-80分钟;热启动运行通常约20-45分钟,具体取决于流程和视频长度。

Common Preconditions (all flows)

通用前置条件(所有流程)

  1. Credential and control-plane preflight
    bash
    bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml
    Restricted egress:
    bash
    bash scripts/preflight_credentials.sh --no-probe --workflow assets/configs/osmo/<mode>.yaml
    Preflight does not require a workload-local
    .env
    . Runtime interpolation is driven by submit-time values (
    dataset
    ,
    run_id
    ,
    gpu_platform
    ,
    video
    ,
    storage_url
    ,
    skills_dir
    ) supplied in one
    --set-string
    list.
    Passing
    --workflow
    validates pull access for the active workflow image refs (
    workflow.groups[].tasks[].image
    ) using anonymous bearer access with credential fallback when provided. If replacement NGC/HF secrets are provided in env, preflight refreshes existing
    nvcr_io
    /
    hf_token
    automatically when present. Use
    --refresh
    to force overwrite even when no new env secrets were supplied:
    bash
    bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml --refresh
    If output contains
    USER_INPUT_REQUIRED:
    , ask one concise unblock question and stop.
    On workflow image
    401/403
    , report registry access failure after probe checks on the listed image refs; do not claim a key family (for example
    nvapi-*
    ) is categorically unsupported.
  2. Storage interpolation policy
    storage_url
    must be derived from the actual dataset/upload backend for the current run.
    text
    dataset_url=azure://storiondevxah69/osmo-workflows/datasets/vda-demo
    storage_url=azure://storiondevxah69/osmo-workflows
    dataset=vda-demo
    Never silently default to stale
    s3://
    values on non-S3 backends.
  3. Inference policy (non-negotiable)
    • Reuse healthy in-cluster persistent NIM endpoints by default.
    • If missing/unhealthy, deploy automatically — this is a prerequisite, not a user decision. Do NOT pause to ask; run the install with the VDA allow-list:
    bash
    export NIM_SERVICES="qwen3-vl qwen25-14b"
    skills/physical-ai-infrastructure-setup-and-resilient-scaling/components/inference-nim-operator/scripts/install.sh
    • See
      references/nim/README.md
      for full endpoint docs and health checks.
    • External endpoints are opt-in only (explicit request or explicit URLs); only then skip the in-cluster deploy.
    • Never infer external mode from credential presence.
    • Never scale down/delete existing NIMs to free GPUs.
  4. Readiness guard
    bash
    osmo pool list --mode free
    osmo config show POD_TEMPLATE
    python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/<mode>.yaml
  5. Cache auto-remediation
    If
    pre_submit_guard.py
    reports cache failure, default action is to run:
    bash
    osmo workflow submit assets/configs/osmo/setup_model_cache.yaml \
      --set-string storage_url=<backend-prefix> path=data
    Then rerun
    pre_submit_guard.py
    and submit the target VDA flow only after it passes. Ask user only when backend/prefix is ambiguous or cache setup fails.
  6. Scheduling policy
    VDA templates schedule setup and workers on
    gpu_platform
    (no
    system
    pool dependency for user workloads).
  1. 凭证和控制平面预检
    bash
    bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml
    受限出口:
    bash
    bash scripts/preflight_credentials.sh --no-probe --workflow assets/configs/osmo/<mode>.yaml
    预检不需要工作负载本地的
    .env
    文件。运行时插值由提交时提供的
    --set-string
    列表中的值(
    dataset
    run_id
    gpu_platform
    video
    storage_url
    skills_dir
    )驱动。
    传递
    --workflow
    参数会使用匿名Bearer访问验证活跃工作流镜像引用(
    workflow.groups[].tasks[].image
    )的拉取权限,提供凭证时会回退使用凭证。 如果环境中提供了替代的NGC/HF密钥,预检会自动刷新现有的
    nvcr_io
    /
    hf_token
    (如果存在)。使用
    --refresh
    强制覆盖,即使未提供新的环境密钥:
    bash
    bash scripts/preflight_credentials.sh --workflow assets/configs/osmo/<mode>.yaml --refresh
    如果输出包含
    USER_INPUT_REQUIRED:
    ,提出一个简洁的解决问题的问题并停止。
    如果工作流镜像出现
    401/403
    错误,在对列出的镜像引用进行探测检查后报告注册表访问失败;不要断言某个密钥家族(例如
    nvapi-*
    )完全不被支持。
  2. 存储插值策略
    storage_url
    必须从当前运行的实际数据集/上传后端推导。
    text
    dataset_url=azure://storiondevxah69/osmo-workflows/datasets/vda-demo
    storage_url=azure://storiondevxah69/osmo-workflows
    dataset=vda-demo
    切勿在非S3后端上默认使用过时的
    s3://
    值。
  3. 推理策略(不可协商)
    • 默认复用健康的集群内持久化NIM端点。
    • 如果缺失/不健康,自动部署——这是前置条件,而非用户决策。请勿暂停询问;使用VDA允许列表运行安装:
    bash
    export NIM_SERVICES="qwen3-vl qwen25-14b"
    skills/physical-ai-infrastructure-setup-and-resilient-scaling/components/inference-nim-operator/scripts/install.sh
    • 有关完整的端点文档和健康检查,请参阅
      references/nim/README.md
    • 外部端点仅为可选加入(明确请求或明确URL);仅在此时跳过集群内部署。
    • 切勿从凭证存在推断外部模式。
    • 切勿缩减/删除现有NIM以释放GPU。
  4. 就绪检查器
    bash
    osmo pool list --mode free
    osmo config show POD_TEMPLATE
    python3 scripts/pre_submit_guard.py --workflow assets/configs/osmo/<mode>.yaml
  5. 缓存自动修复
    如果
    pre_submit_guard.py
    报告缓存失败,默认操作是运行:
    bash
    osmo workflow submit assets/configs/osmo/setup_model_cache.yaml \
      --set-string storage_url=<backend-prefix> path=data
    然后重新运行
    pre_submit_guard.py
    ,仅在通过后提交目标VDA流程。仅当后端/前缀不明确或缓存设置失败时询问用户。
  6. 调度策略
    VDA模板在
    gpu_platform
    上调度设置和工作任务(用户工作负载不依赖
    system
    池)。

Submit (all flows)

提交(所有流程)

Every flow uses the same submit shape; only the workflow YAML changes. Choose the YAML for the requested flow, then run the command below. Full per-flow walkthroughs (stage matrix and flow details) live in the linked references.
FlowWorkflow YAMLWalkthrough
Augmentation + auto-labeling
assets/configs/osmo/augmentation_and_al.yaml
references/flows/augmentation_and_al.md
Auto-labeling only
assets/configs/osmo/auto_labeling.yaml
references/flows/auto_labeling.md
E2E (parallel)
assets/configs/osmo/e2e.yaml
references/flows/e2e.md
E2E (super-resolution gated)
assets/configs/osmo/e2e_super_resolution.yaml
references/flows/e2e_super_resolution.md
bash
SKILLS_DIR="$(cd "$(git rev-parse --show-toplevel)/skills/physical-ai-video-data-augmentation" && pwd)"
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
osmo workflow submit assets/configs/osmo/<flow>.yaml \
  --pool <pool> \
  --set-string \
    dataset=<dataset> \
    run_id=run-$STAMP \
    storage_url=<backend-prefix> \
    gpu_platform=<gpu-platform> \
    video=<video-stem> \
    cosmos_model_cache_url=<backend-prefix>/data/models/cosmos_transfer \
    auto_labeling_model_cache_url=<backend-prefix>/data/models/auto_labeling \
    skills_dir="$SKILLS_DIR"
Compatibility note:
  • Use exactly one
    --set-string
    flag and pass all key/value pairs after it.
  • Do not repeat
    --set
    /
    --set-string
    flags in the same command; some OSMO builds only honor the last occurrence.
  • Do not mix
    --set
    and
    --set-string
    in one submit command.
  • Pass explicit
    *_model_cache_url
    values to avoid nested-template interpolation differences across OSMO environments.
  • Do not brute-force permutations of flags. Use this shape directly.
Common optional overrides (append key/value pairs to the same
--set-string
list):
bash
cookbook=<scene_profile> \
vlm_url=<openai_base_url> \
llm_url=<openai_base_url> \
cosmos_model_cache_url=<url> \
auto_labeling_model_cache_url=<url>
The auto-labeling-only flow has no augmentation stage, so it omits
cosmos_model_cache_url
at runtime; passing it is harmless and keeps one submit shape across flows.
所有流程使用相同的提交格式;仅工作流YAML不同。为请求的流程选择YAML,然后运行以下命令。每个流程的完整演练(阶段矩阵和流程详情)位于链接的参考文档中。
流程工作流YAML演练文档
增强 + 自动标注
assets/configs/osmo/augmentation_and_al.yaml
references/flows/augmentation_and_al.md
仅自动标注
assets/configs/osmo/auto_labeling.yaml
references/flows/auto_labeling.md
E2E(并行)
assets/configs/osmo/e2e.yaml
references/flows/e2e.md
E2E(超分辨率门控)
assets/configs/osmo/e2e_super_resolution.yaml
references/flows/e2e_super_resolution.md
bash
SKILLS_DIR="$(cd "$(git rev-parse --show-toplevel)/skills/physical-ai-video-data-augmentation" && pwd)"
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
osmo workflow submit assets/configs/osmo/<flow>.yaml \
  --pool <pool> \
  --set-string \
    dataset=<dataset> \
    run_id=run-$STAMP \
    storage_url=<backend-prefix> \
    gpu_platform=<gpu-platform> \
    video=<video-stem> \
    cosmos_model_cache_url=<backend-prefix>/data/models/cosmos_transfer \
    auto_labeling_model_cache_url=<backend-prefix>/data/models/auto_labeling \
    skills_dir="$SKILLS_DIR"
兼容性说明:
  • 仅使用一个
    --set-string
    标志,并在其后传递所有键/值对。
  • 同一命令中不要重复
    --set
    /
    --set-string
    标志;某些OSMO版本仅识别最后一个。
  • 不要在一个提交命令中混合使用
    --set
    --set-string
  • 传递明确的
    *_model_cache_url
    值,以避免不同OSMO环境下的嵌套模板插值差异。
  • 不要强行尝试标志的排列组合。直接使用此格式。
常见可选覆盖(将键/值对追加到同一个
--set-string
列表中):
bash
cookbook=<scene_profile> \
vlm_url=<openai_base_url> \
llm_url=<openai_base_url> \
cosmos_model_cache_url=<url> \
auto_labeling_model_cache_url=<url>
仅自动标注流程没有增强阶段,因此运行时会省略
cosmos_model_cache_url
;传递该参数不会产生影响,且能保持所有流程的提交格式一致。

OSMO Monitoring

OSMO监控

bash
undefined
bash
undefined

Workflow status + task states

工作流状态 + 任务状态

osmo workflow query <workflow_id> --format-type json
| jq '{status, tasks: [.groups[].tasks[] | {name, status, exit_code}]}'
osmo workflow query <workflow_id> --format-type json
| jq '{status, tasks: [.groups[].tasks[] | {name, status, exit_code}]}'

Logs for a specific task

特定任务的日志

osmo workflow logs <workflow_id> --task <task_name> -n 200
osmo workflow logs <workflow_id> --task <task_name> -n 200

Output retrieval

输出检索

osmo data list --no-pager <output_url> osmo data download <output_url> <local_dir>/

For completion artifacts, always mirror the full run output into workspace:

```bash
ROOT="$(git rev-parse --show-toplevel)"
RUN_LOCAL_DIR="$ROOT/media/vda/runs/<run_id>"
mkdir -p "$RUN_LOCAL_DIR"
osmo data download "<storage_url>/datasets/<dataset>-outputs/<run_id>/" "$RUN_LOCAL_DIR/"
For runs expected to exceed two minutes, send heartbeat updates at least every two minutes. For media evidence, emit one standalone
MEDIA:<absolute-path>
line per message bubble.
Execution continuity requirement:
  • Heartbeats must report progress while continuing work; they are status updates, not permission prompts.
  • Do not stop between green stages waiting for approval.
  • Pause only on blocking failures or explicit user stop/redirect.
  • If submit fails on interpolation, rerun once with the same canonical single-flag shape and corrected values; do not loop through ad-hoc flag experiments.
MEDIA formatting is strict:
  • Emit exactly one line:
    MEDIA:/absolute/path/to/file.mp4
  • Keep
    MEDIA:
    contiguous on a single line (never split across lines).
  • No extra text in the same bubble.
  • No code fences, bullets, or quotes around the directive.
  • If render fails: retry once from a stable workspace path, then emit PNG fallback.
osmo data list --no-pager <output_url> osmo data download <output_url> <local_dir>/

对于完成工件,始终将完整的运行输出镜像到工作区:

```bash
ROOT="$(git rev-parse --show-toplevel)"
RUN_LOCAL_DIR="$ROOT/media/vda/runs/<run_id>"
mkdir -p "$RUN_LOCAL_DIR"
osmo data download "<storage_url>/datasets/<dataset>-outputs/<run_id>/" "$RUN_LOCAL_DIR/"
对于预计运行时间超过两分钟的任务,至少每两分钟发送一次心跳更新。对于媒体证据,每个消息气泡中输出一行独立的
MEDIA:<absolute-path>
执行连续性要求:
  • 心跳必须在报告进度的同时继续工作;它们是状态更新,而非权限提示。
  • 不要在成功阶段之间停止等待批准。
  • 仅在出现阻塞故障或用户明确停止/重定向时暂停。
  • 如果提交因插值失败,使用相同的标准单标志格式重新运行一次并修正值;不要循环尝试临时标志组合。
MEDIA格式严格要求:
  • 仅输出一行:
    MEDIA:/absolute/path/to/file.mp4
  • MEDIA:
    必须在同一行连续(切勿跨行拆分)。
  • 同一气泡中不要包含额外文本。
  • 不要在指令周围使用代码块、项目符号或引号。
  • 如果渲染失败:从稳定的工作区路径重试一次,然后输出PNG作为备选。

Post-Run Comparison Evidence (required for augmented flows)

运行后对比证据(增强流程必需)

Applies to
augmentation_and_al
,
e2e
, and
e2e_super_resolution
after a successful run.
Required completion output (do not stop at raw output URLs):
  1. Stage full outputs + input video into workspace-local path:
    bash
    bash scripts/stage_run_artifacts.sh \
      --storage-url <storage_url> --dataset <dataset> --run-id <run_id> --video <video>
  2. Render side-by-side from that local run copy:
    bash
    bash scripts/render_side_by_side.sh \
      --run-local-dir "<repo>/media/vda/runs/<run_id>" --dataset <dataset> --video <video>
  3. Emit MEDIA from the local run copy and include:
    • augmentation summary from
      <run_local_dir>/setup_b0/configs/manifest.yaml
      (
      sampled_vars
      for
      <video>_aug0
      )
    • auto-labeling summary from
      <run_local_dir>/outputs/pseudo_labeled_augmented/<video>_aug0
    • for
      e2e
      /
      e2e_super_resolution
      , original-label summary from
      <run_local_dir>/outputs/pseudo_labeled/<video>
If
ffmpeg
is unavailable, emit input and augmented MEDIA from the same local run copy and still provide augmentation + auto-labeling summaries.
For demo runs (no user video provided), explicitly state that input came from
nvidia/video-data-augmentation-demo
.
适用于
augmentation_and_al
e2e
e2e_super_resolution
流程成功运行后。
必需的完成输出(不要停留在原始输出URL):
  1. 将完整输出 + 输入视频复制到工作区本地路径:
    bash
    bash scripts/stage_run_artifacts.sh \
      --storage-url <storage_url> --dataset <dataset> --run-id <run_id> --video <video>
  2. 从本地运行副本渲染并排对比内容:
    bash
    bash scripts/render_side_by_side.sh \
      --run-local-dir "<repo>/media/vda/runs/<run_id>" --dataset <dataset> --video <video>
  3. 从本地运行副本输出MEDIA,并包含:
    • 来自
      <run_local_dir>/setup_b0/configs/manifest.yaml
      的增强摘要(
      <video>_aug0
      sampled_vars
    • 来自
      <run_local_dir>/outputs/pseudo_labeled_augmented/<video>_aug0
      的自动标注摘要
    • 对于
      e2e
      /
      e2e_super_resolution
      ,来自
      <run_local_dir>/outputs/pseudo_labeled/<video>
      的原始标注摘要
如果
ffmpeg
不可用,从同一本地运行副本输出输入和增强后的MEDIA,并仍提供增强 + 自动标注摘要。
对于演示运行(未提供用户视频),明确说明输入来自
nvidia/video-data-augmentation-demo

Supporting files

支持文件

Use these canonical locations:
  • Workflows:
    assets/configs/osmo/*.yaml
  • Runtime scripts:
    scripts/*.sh
    ,
    scripts/*.py
  • Flow walkthroughs:
    references/flows/*.md
  • Setup and triage:
    references/setup.md
    ,
    references/troubleshooting.md
  • Images and endpoint policy:
    references/container-images.md
    ,
    references/nim/README.md
  • Cookbook tuning:
    assets/cookbooks/TUNING_GUIDE.md
使用以下标准位置:
  • 工作流:
    assets/configs/osmo/*.yaml
  • 运行时脚本:
    scripts/*.sh
    scripts/*.py
  • 流程演练文档:
    references/flows/*.md
  • 设置和故障排查:
    references/setup.md
    references/troubleshooting.md
  • 镜像和端点策略:
    references/container-images.md
    references/nim/README.md
  • 指南调优:
    assets/cookbooks/TUNING_GUIDE.md