Loading...
Loading...
Create and modify NeMo AutoModel training and evaluation recipes, including YAML structure, builders, and execution flow.
npx skill4agent add nvidia/skills nemo-automodel-recipe-developmentnemo_automodel/recipes/examples/automodel finetune llm -c <config.yaml>_target__target_--optimizer.lrstep_scheduler.val_check_intervalstep_scheduler.checkpoint_intervalvalidation_datasetrestore_from.pathstep_scheduler.val_check_intervalstep_scheduler.checkpoint_intervalvalidation_datasetrestore_from.path_target_CLI (automodel finetune llm -c config.yaml)
-> app.py parses command + domain + config
-> recipe script (e.g. train_ft.py) main(config_path)
-> Recipe class .setup() builds all components
-> .run_train_validation_loop() executes trainingBaseRecipesetup()run_train_validation_loop()build_model()build_optimizer()build_dataloader()build_loss_fn()build_lr_scheduler()build_step_scheduler()build_checkpoint_config()torch.compilestep_scheduler:
max_steps: 1000
num_epochs: 1
grad_accumulation_steps: 4
val_check_interval: 100
checkpoint_interval: 500
log_interval: 10
dist_env:
master_addr: localhost
master_port: 29500
rng:
seed: 42
model:
_target_: nemo_automodel.models.llm.NemotronHForCausalLM
name_or_path: meta-llama/Llama-3.2-1B
# additional model kwargs passed to the constructor
compile:
enabled: false
backend: inductor
clip_grad_norm:
max_norm: 1.0
distributed:
strategy: fsdp2 # fsdp2 | megatron_fsdp | ddp
dp_size: auto
tp_size: 1
cp_size: 1
loss_fn:
_target_: torch.nn.CrossEntropyLoss
dataset:
_target_: nemo_automodel.datasets.squad.SquadDataset
tokenizer_name_or_path: meta-llama/Llama-3.2-1B
max_seq_length: 2048
validation_dataset:
_target_: nemo_automodel.datasets.squad.SquadDataset
split: validation
packed_sequence:
enabled: false
dataloader:
batch_size: 4
num_workers: 4
pin_memory: true
optimizer:
_target_: torch.optim.AdamW
lr: 2.0e-5
weight_decay: 0.01
lr_scheduler:
_target_: nemo_automodel.schedulers.CosineAnnealingWarmup
warmup_steps: 50
min_lr: 1.0e-6_target__target_optimizer:
_target_: torch.optim.AdamW # callable
lr: 2.0e-5 # kwarg
weight_decay: 0.01 # kwargtorch.optim.AdamW(lr=2e-5, weight_decay=0.01)automodel finetune llm -c config.yaml \
--optimizer.lr 1e-4 \
--step_scheduler.max_steps 500 \
--distributed.tp_size 2step_scheduler:
val_check_interval: 100
checkpoint_interval: 500
validation_dataset:
_target_: nemo_automodel.datasets.squad.SquadDataset
split: validation
restore_from:
path: /checkpoints/step-500nemo_automodel/recipes/llm/train_ft.pynemo_automodel/recipes/llm/kd.pynemo_automodel/recipes/llm/benchmark.pyNeMoAutoModelForImageTextToTextprocessornemo_automodel/recipes/vlm/finetune.pyNeMoAutoDiffusionPipelineparallel_schemenemo_automodel/recipes/diffusion/train.pynemo_automodel/recipes/retrieval/train_bi_encoder.pynemo_automodel/recipes/retrieval/train_cross_encoder.pynemo_automodel/recipes/retrieval/mine_hard_negatives.pyfor epoch in range(num_epochs):
for batch_idx in range(batches_per_epoch):
# --- gradient accumulation inner loop ---
for micro_batch in micro_batches:
if pipeline_parallel:
schedule.step(micro_batch) # PP schedule
else:
loss = model(micro_batch) # direct forward
loss.backward()
# --- optimizer step ---
scale_grads_and_clip_grad_norm(model, max_norm)
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
# --- logging ---
MetricsSample(step, epoch, loss, grad_norm, lr, mem, tps, mfu)
# --- validation (at configured intervals) ---
if step % val_check_interval == 0:
run_validation()
# --- checkpoint (at configured intervals) ---
if step % checkpoint_interval == 0:
save_checkpoint()scale_grads_and_clip_grad_norm()clip_grad_norm.max_normcp_size > 1make_cp_batch_and_ctx()MetricsSamplestepepochlossgrad_normlrmemtpsmfustep_scheduler.val_check_intervalvalidation_datasetstep_scheduler.checkpoint_intervalrestore_fromrestore_from:
path: /checkpoints/step-500| Problem | Cause | Fix |
|---|---|---|
| Silent config errors | Typo in | The class path must be a valid, importable Python callable. Double-check the module path and class name. |
| Training crashes at first step | | Ensure the batch size math is consistent across all dimensions. |
| New recipe not accessible via CLI | Missing CLI command alias registration | Register the new route in the CLI app so |
| Shape mismatch at forward pass | Dataset collate function output does not match model input signature | Verify that the collate function returns tensors with the keys and shapes the model expects. |
| OOM during validation | Validation batch size too large or gradients not disabled | Wrap validation in |
| Checkpoint restore fails | Mismatched model architecture between checkpoint and config | Ensure the model config matches the checkpoint exactly (layer count, hidden dim, vocab size). |