Cosmos-RL

Supervised fine-tuning (SFT) of nvidia/Cosmos-Reason2-8B on video reasoning tasks. Pretrained weights are sourced from HuggingFace, not NGC. This is a gated model — requires

HF_TOKEN

Uses FSDP-based parallelism with

dp_shard_size

for GPU count and

dp_replicate_size

for node count (not the standard

num_gpus

num_nodes

When to Use

Use this skill to train, evaluate, quantize, or run inference on Cosmos-Reason2-8B for video question-answering and video reasoning. The core workflow is: confirm

HF_TOKEN

gating, sample annotations for

video_fps

, load the spec template, apply the critical train overrides below, then launch through the platform skill (or AutoML when enabled).

Dataclass Schemas

Generated TAO Core schemas are packaged in

schemas/<action>.schema.json

, with

schemas/manifest.json

listing available actions. Each generated schema also emits

references/spec_template_<action>.yaml

from the schema top-level

default

field. AutoML enablement is declared at the model layer in

references/skill_info.yaml

via

automl_enabled

. Runnable AutoML still requires

schemas/train.schema.json

and

references/spec_template_train.yaml

to exist and parse. Use the packaged train schema for

automl_default_parameters

automl_disabled_parameters

, defaults, min/max bounds, enums, option weights, math conditions, dependencies, and popular parameters. Do not expect

~/tao-core

at runtime; maintainers regenerate schemas/templates before packaging the skill bank.

Train Action Policy

This model is AutoML-enabled at the model layer. Before handling any train-stage request, read

references/skill_info.yaml

and resolve the run override from either an explicit

automl_policy

value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as

automl_policy: off

for this run only; otherwise default to

auto

. When

automl_policy: auto

automl_enabled: true

, and both

schemas/train.schema.json

and

references/spec_template_train.yaml

are packaged, route the train action through

tao-skill-bank:tao-run-automl

by default with this model's

skill_dir

. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and

automl_policy

. Use direct model training only when

automl_policy: off

or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.

Non-train actions such as

evaluate

inference

export

, and deploy flows stay in this model skill. The per-run

automl_policy

override does not change model metadata.

Credentials

HF_TOKEN (required): HuggingFace access token. The user must accept the model agreement at https://huggingface.co/nvidia/Cosmos-Reason2-8B and provide a token with read access. Passed to the container as a
```
docker_env_var
```
.

Datasets

Dataset type is vlm in llava format; accepted intents are training, evaluation, and testing. Inputs may be dataset roots (root mode maps

<root>/annotations.json

plus

<root>

as the media path) or direct spec-key paths (when annotations and media live in different locations). Before launching train/AutoML/evaluate, sample the annotation JSON and require

video_fps

in each record — missing

video_fps

makes the Cosmos-RL SFT loader fail with

Error processing sample: 'video_fps'

after the job starts. Stop before runner generation if it is absent and ask the user to fix the annotation files; do not start AutoML to discover this inside torchrun.

See

references/datasets.md

for the full training requirements, the launch intake reminder (spec-key options, root-mode mapping, container-image confirmation, and the

check_tao_launch_preflight.py

invocation), the Per-Action Dataset Requirements table, the

data_sources

mapping with direct-override examples, and the eval-dataset / auto-split policy.

Spec Construction

cosmos-rl is

mode: config

. Always start from
references/spec_template_train.yaml
(or

spec_template_evaluate.yaml

for evaluate) — load it via

yaml.safe_load(...)

and apply user overrides on top. The spec the model consumes is nested dicts, not flat dotted keys; the dotted override notation denotes paths into the nested spec, so walk the path and assign at the leaf. Data source overrides are mandatory for every action and must be built from the Per-Action Dataset Requirements table in

references/datasets.md

See

references/spec-construction.md

for the load-template-then-override pattern and the full typical override blocks for train (including

policy.model_max_length=81920

dp_shard_size

dp_replicate_size

, and LoRA

lora_alpha

lora_dropout

), evaluate, quantize, and inference, plus the note that

custom.val_dataset

leaf keys are valid even when absent from the default spec object.

Critical Overrides (Train)

These are the keys whose template defaults are wrong or where omission flips the run into a different mode:

Parameter	Template Default	Required Value	Why
`policy.model_name_or_path`	`nvidia/Cosmos-Reason2-8B`	`hf_model://nvidia/Cosmos-Reason2-8B` (or local checkpoint)	The bare HF id makes cosmos-rl fetch from HF Hub at runtime; the `hf_model://` URI form pre-downloads the weights before the training command starts
`policy.model_max_length`	40960	Keep at 40960 or higher	Smaller than ~40k causes `vision_embeds` shape mismatch on video inputs
`train.train_batch_per_replica`	32	Any multiple of `train.train_policy.mini_batch`	Mismatch raises an immediate AssertionError
`train.train_policy.type`	`"sft"`	Keep as `"sft"` for SFT workflows	If dropped during agent regeneration, cosmos-rl flips to RL mode → rollout replica allocated → multi-node attempted → hostname errors when `num_nodes=1`

Parameters

train.train_batch_per_replica

must be divisible by

train.train_policy.mini_batch

;

policy.model_max_length

must be 40960 or higher for video SFT;

policy.parallelism.dp_shard_size

should equal GPUs per node and

dp_replicate_size

the node count;

custom.vision.fps

and

custom.vision.nframes

are mutually exclusive (set exactly one). Cosmos-RL models are 8B parameters and benefit from multi-GPU FSDP sharding — recommended: 8x A100 or H100 (80GB each).

See

references/parameters.md

for the complete parameter reference: training loop, model & policy, parallelism (including multi-node guidance and platform-skill pointers), optimization & data loading, vision encoders (fps vs nframes details and the decord/torchvision failure mode), checkpointing, validation, logging, and hardware.

Evaluate

The evaluator reads a flat TOML config with top-level keys

dataset

model

task

evaluation

vision

generation

metrics

results

num_gpus

results_dir

. Task type is

""

(General Evaluator, auto-detects binary yes/no classification and computes TP/FP/TN/FN/accuracy/precision/recall/F1) or

"its_directionality"

(left/right/straight; do NOT use for collision detection). The

actions.evaluate

block in

references/skill_info.yaml

declares inputs and outputs; for SDK invocation see

skills/platform/tao-run-platform/SKILL.md

See

references/evaluate.md

for the config-format detail, task-type notes, LoRA evaluation (checkpoint path via

spec_overrides

with

model.enable_lora

model.base_model_path

and adapter merge behavior), selective download (

{annotation, format, keys}

partial media pull), and the results format and metrics.

Error Patterns

Common failures include CUDA OOM in train (reduce

mini_batch

or raise

dp_shard_size

), OOM during LoRA evaluation, NaN loss, the

vision_embeds

shape mismatch (raise

model_max_length

to 40960),

train_batch_per_replica

not divisible by

mini_batch

train_batch_per_replica

larger than samples per rank (the

'NoneType' object has no attribute 'state_dict'

0-step crash), stale dataset cache after changing fps/total_pixels, and the gated-repo authentication loop.

See

references/troubleshooting.md

for the full diagnosis and fix for each error pattern.

DEFT Support and Parent-Model Inference

Cosmos-RL implements the DEFT workflow contract for video QA tasks (see

config.json

and

workflow/deft/deft.md

). Gap analysis via

scripts/analyze_gaps.py

reads cosmos-rl

results.json

, compares predictions by exact string match after

.lower().strip()

, and emits a parquet of failure cases — so eval prompts must force short constrained answers. Model-specific parent-model inference mappings (evaluate/inference/quantize/train spec fields → inference functions, checkpoint metadata, and

parent_job_id

handling) live in the reference, not in

config.json

See

references/deft-and-inference-mappings.md

for the gap-analysis detail and limitation, and the full parent-model inference mapping table.

tao-finetune-cosmos-reason

NPX Install

Tags

SKILL.md Content