Loading...
Loading...
Real-time stereo depth estimation using FastFoundationStereo (FFS), the distilled bp2 commercial variant of FoundationStereo. Predicts disparity maps from stereo image pairs with ~10× lower latency than full FoundationStereo. Use when training, evaluating, exporting, or running inference for a TAO FastFoundationStereo (FFS) model. Trigger phrases include "train fast stereo", "real-time stereo disparity", "FastFoundationStereo", "distilled stereo depth".
npx skill4agent add promptingcompany/nv-skills tao-train-fast-foundation-stereodepth_netmodel.model_type: FastFoundationStereoFoundationStereodepth-net-stereogen_trt_engineevaluateinferencereferences/tao-deploy-fast-foundation-stereo.mdreferences/spec_template_deploy.yamlreferences/skill_info.yamlautoml_policyautoml_policy: offautoautoml_policy: autoautoml_enabled: trueschemas/train.schema.jsonreferences/spec_template_train.yamltao-skill-bank:tao-run-automlskill_dirautoml_policyautoml_policy: offevaluateinferenceexportautoml_policymodel_best_bp2_serialize.pthtraininferenceevaluateexportgen_trt_enginetrain.pretrained_model_pathS3_TRAINS3_EVALdocker rundocker run ... -v <host_data_root>:<host_data_root>:ro <container> ...<output_dir>data_sources[*].data_filedepth-net-stereo| Columns | Format | Use |
|---|---|---|
| 2 | | Stereo inference (no GT) |
| 3 | | Stereo with GT |
| 4 | | Stereo with GT and occlusion mask |
depth_net convertdepth-net-stereoconvert_spec.yamlmodel_typedataset_namemodel_type: FastFoundationStereodataset_nameGenericDataset| Data category | | |
|---|---|---|
| Middlebury | | |
| KITTI | | |
| ETH3D | | |
| FSD synthetic | | |
| IsaacReal synthetic | | |
| Crestereo synthetic | | |
| Other / non-canonical | | |
dataset_name: GenericDatasetmodel:
model_type: FastFoundationStereo
encoder: vitl
hidden_dims: [128] # 1-layer GRU; NOT [128,128,128]
n_gru_layers: 1 # bp2 single-GRU
corr_radius: 4
corr_levels: 2
n_downsample: 2
valid_iters: 8
max_disparity: 192 # bp2 commercial; NOT 416 (full FS default)
volume_dim: 28 # bp2 ckpt invariant; NOT 32 (full FS default)
mixed_precision: false # see references/parameters.md
gwc_feature_normalize: true # see references/parameters.md
# 15 bp2 distilled width overrides — copy as-is
motion_encoder_widths: [56, 96, 16, 12]
motion_encoder_final: 48
gru_hidden: 60
gru_gating_conv_widths: [100, 168]
disp_head_input_dim: 60
disp_head_intermediate: 36
disp_head_pwconv1_widths: [212, 244]
mask_widths: [32, 16]
stem_2_widths: [12, 16]
spx_2_gru_widths: [16, 12, 16, 24]
spx_gru_out: 9
classifier_mid: 14
cnet_conv04_widths: [60, 48]
cam_mid_channels: 8
cost_agg_conv_patch_padding: [0, 0, 0]references/spec_template_*.yamlreferences/spec-overrides.mdFFS_MODEL_BLOCKmodel.model_type: FastFoundationStereodataset.<...>.data_sources[*].dataset_namedataset.<...>.data_sources[*].data_file<action>.checkpointtrain.pretrained_model_path<train.results_dir>/<task>/dn_model_latest.pthModelCheckpointtrain.results_dir: /workspace/results/finetune/train/workspace/results/finetune/train/train/dn_model_latest.pth<action>.checkpointparent_job_idreferences/parent-model-inference.mdcrop_sizedataset.test_dataset.augmentation.crop_sizeexport.input_heightinput_widthreferences/tao-deploy-fast-foundation-stereo.mddocker run --gpus 'device=0' --shm-size 16G --ipc=host \
--user $(id -u):$(id -g) \
-v <data_root>:<data_root>:ro \
-v <output_dir>:<output_dir> \
-v <bp2_ckpt_dir>:<bp2_ckpt_dir>:ro \
<container> \
depth_net <action> -e <spec.yaml>--user $(id -u):$(id -g)nobody:nogroup__pycache__.pycreferences/troubleshooting.mdstatus.jsonkpitraintrain_lossExecution status: PASSevaluateepebp1bp2bp3d1rmseabs_relsq_relrmse_loginferenceresults_direvaluatekpi.val/epekpi.val/bp1val/evaluatekpi.epekpi.bp1val/status.jsongen_trt_engineevaluatetrain (optional) → finetuned ckpt
evaluate (pyt) → PyT eager EPE / bp on val GT
inference (pyt) → PyT eager disparity samples (visual sanity)
export → static fp32 ONNX (recommended at 480×736 or 320×736)
gen_trt_engine → fp16 TRT engine on static ONNX path
inference (deploy) → TRT disparity samples
evaluate (deploy) → TRT EPE / bp drift vs PyT eager fp32trainexportdataset_namedata_sourcesFSDIsaacRealDatasetCrestereoMiddleburyEth3dKittiGenericDataset| Action | Spec Key | Source | Files | List? |
|---|---|---|---|---|
| evaluate | dataset.test_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| inference | dataset.infer_dataset.data_sources | inference_dataset | data_file: annotations.txt + dataset_name | Yes |
| train | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | Yes |
| train | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
data_sourcesdata_filedataset_namemodel.*references/spec-overrides.mdFFS_MODEL_BLOCKdataset.val_dataset.data_sourcesdata_filedataset_namereferences/parameters.mdmodel.*dataset.*train.*max_disparity: 192gwc_feature_normalize: truemixed_precision: falsevolume_dim: 28valid_iterssave_raw_pfmepebp1bp2bp3d1rmseabs_relsq_relrmse_logexportmodel.mixed_precisiongen_trt_enginegen_trt_engine.tensorrt.data_typefp16fp32fp16references/export-trt-defaults.mdexport.batch_sizeexport.dynamic_hwreferences/tao-deploy-fast-foundation-stereo.mdreferences/troubleshooting.mdshape mismatchgwc_feature_normalizedynamic_hw: trueKey 'encoder' not in 'StereoBackBone'dataset_namedata_sourcesmax_disparity: 192depth_net_stereo: not foundcrop_sizeFailed to import SAM3config.jsoncreate_job()references/parent-model-inference.mdparent_modelparent_model_folderparent_job_id<action>.checkpoint