Depth Net Stereo
Stereo depth estimation using FoundationStereo architecture. Predicts disparity maps from stereo image pairs for 3D reconstruction.
Uses pretrained Depth Anything v2 and EdgeNeXt encoders. Set
model.stereo_backbone.depth_anything_v2_pretrained_path
and
model.stereo_backbone.edgenext_pretrained_path
.
The mono and stereo skills both invoke the unified TAO
CLI inside the container; the mono/stereo family is selected via
(e.g.,
).
For TAO Deploy TensorRT actions (
, TensorRT
, and TensorRT
), read
references/tao-deploy-foundation-stereo.md
first. The deploy spec template lives in this skill's
references/spec_template_deploy.yaml
.
Train Action Policy
This model is AutoML-enabled at the model layer. Before handling any train-stage request, read
references/skill_info.yaml
and resolve the run override from either an explicit
value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as
for this run only; otherwise default to
. When
,
, and both
schemas/train.schema.json
and
references/spec_template_train.yaml
are packaged, route the train action through
tao-skill-bank:tao-run-automl
by default with this model's
. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and
. Use direct model training only when
or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
Non-train actions such as
,
,
, and deploy flows stay in this model skill. The per-run
override does not change model metadata.
Workflow
Prerequisites — data accessibility
Your dataset (left + right images + GT disparity) must be reachable from inside the container:
- SDK runner: place files at the S3 paths the runner resolves (the / placeholders shown in Typical Spec Overrides). The runner handles S3 → container-path mounting transparently.
- Direct (e.g. local testing): mount the host dataset root read-only at the same in-container path:
docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...
The same accessibility requirement applies to the
written by all actions.
Step 1 — Annotation file
Per-line annotation file referenced by
data_sources[*].data_file
:
| Columns | Format | Use |
|---|
| 2 | | Stereo inference (no GT) |
| 3 | <left> <right> <disparity>
| Stereo with GT |
| 4 | <left> <right> <disparity> <occlusion_mask>
| Stereo with GT and occlusion mask |
If you already have one, point to it. Otherwise generate via
:
depth_net convert -e <convert_spec.yaml>
yaml
data_root: <directory whose immediate children are scene folders that contain your image+depth files; convert walks data_root recursively but expects per-scene subdirectories at one level below>
image_dir_pattern: [<substring matching left image paths>]
right_dir_pattern: [<substring matching right image paths>]
depth_dir_pattern: [<substring matching GT disparity paths>]
nocc_dir_pattern: [] # optional, occlusion mask paths
image_extension: '.png' # always include the leading dot
depth_extension: '.png' # form must match image_extension (the swap is a substring replace)
nocc_extension: ''
split_ratio: 0.0 # 0.0/1.0 = test-only; 0.8 = 80/20 train+val
walks
recursively, selects paths whose path-string contains
all substrings in
(AND-filter), then derives right / depth / mask paths by replacing
with the corresponding pattern's first element plus extension swap. Inspect your dataset's directory layout and identify the substrings distinguishing left, right, and GT (e.g.
vs
vs
for Middlebury).
Step 2 — Pair and based on your data
Prefer the dataset-specific class when your layout matches a supported one — it applies class-specific path conventions, evaluation crops, and (where applicable) occlusion-mask handling. Fall back to
only for layouts that do not match any registered class.
| Data category | | |
|---|
| Middlebury data | | |
| KITTI data | | |
| ETH3D data | | |
| FSD synthetic data | | |
| IsaacReal synthetic data | | |
| Crestereo synthetic data | | |
| Other / non-canonical layout | | |
See
Training Requirements → Formats for the full registered-class list. The same
value applies across train and evaluate actions (all of which use 3-column or 4-column annotations with GT disparity). The deploy-side
action follows the same rule — see
references/tao-deploy-foundation-stereo.md
. For inference with 2-column annotations (left + right, no GT), use
dataset_name: GenericDataset
regardless of data layout — the dataset-specific classes (
/
/
/
/
/
) require 3-column input and reject 2-column annotations at the dataloader level. For inference with 3-column annotations (left + right + GT), the dataset-specific class is fine.
Step 3 — Write spec yaml from Typical Spec Overrides
Copy the action block from
references/foundation-stereo-spec-overrides.md
(per-action
, mandatory data sources). Replace:
- from Step 2 (typically )
dataset.<...>.data_sources[*].dataset_name
from Step 2
dataset.<...>.data_sources[*].data_file
with the path from Step 1
- For deploy-side : enforce
dataset.test_dataset.batch_size: 1
(see references/tao-deploy-foundation-stereo.md
).
Shape consistency: the
in
dataset.test_dataset.augmentation.crop_size
should match
/
so the trained-model evaluator and the deploy-side TensorRT evaluator operate at the same shape — see
references/foundation-stereo-troubleshooting.md
.
Step 4 — Run
docker run --gpus 'device=0' --shm-size 16G --ipc=host \
--user $(id -u):$(id -g) \
-v <data_root>:<data_root>:ro \
-v <output_dir>:<output_dir> \
<container> \
depth_net <action> -e <spec.yaml>
Without
the container writes outputs as
, blocking host-side cleanup / retry.
Step 5 — Verify
- Container exit code 0
- block populated
- For : inspect per-step directly (the entrypoint reports even when loss is NaN)
- For : rely on / / / / / (the evaluator also emits / / which are non-meaningful for stereo — see
references/foundation-stereo-parameters.md
)
- For : artifacts under
For TAO Deploy TensorRT actions (
, TensorRT
, and TensorRT
), read
references/tao-deploy-foundation-stereo.md
first. Deploy spec templates live in this skill's
folder with the
spec_template_deploy_*.yaml
prefix.
Training Requirements
- Valid values for stereo (case-insensitive): , , , , , ,
- Monitoring metric: val/loss
Per-Action Dataset Requirements
| Action | Spec Key | Source | Files | List? |
|---|
| evaluate | dataset.test_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| inference | dataset.infer_dataset.data_sources | inference_dataset | data_file: annotations.txt + dataset_name | Yes |
| quantize | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | Yes |
| quantize | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| quantize | dataset.quant_calibration_dataset.images_dir | train_datasets | images.tar.gz | No |
| train | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | Yes |
| train | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
Typical Spec Overrides
Data source overrides are
mandatory for every action — the agent MUST construct data source paths from the Per-Action Dataset Requirements table above and include them in
. Each
entry is a dict with
two mandatory fields:
and
.
See
references/foundation-stereo-spec-overrides.md
for the full per-action
blocks (train, evaluate, export, gen_trt_engine, inference, quantize) with
/
placeholders.
Eval Dataset
Optional. Val dataset configured via
dataset.val_dataset.data_sources
(each entry needs
and
).
Important Parameters
Key defaults:
=
(only selectable type);
(top-level, not under
) schema default
but
FS small NGC ckpt requires , override explicitly;
default 416;
default 1e-4;
fp32 (recommended) or fp16 (no bf16);
default
. The
field name is
, not
.
See
references/foundation-stereo-parameters.md
for the full parameter glossary (all
,
,
,
fields with defaults and ranges) and the
Evaluation Metrics reference (which
/
/
/
to trust and why
/
/
are non-meaningful for stereo).
Multi-GPU / Multi-Node
Launch method: Lightning-managed (single
process, Lightning spawns workers).
| Spec Key | Description | Default |
|---|
| Number of GPUs | 1 |
| GPU device indices | [0] |
| Number of nodes | 1 |
train.distributed_strategy
| or | |
Same DDP/FSDP behavior as depth-net-mono. Multi-node requires
,
,
,
env vars.
Export / TRT Defaults
TRT data types FP32 / FP16. Static-shape ONNX (
) and batch-only dynamic ONNX (
) both support
; height and width are always pinned to the trace shape (H/W-dynamic engines are not supported — build separate engines per (H, W)). For the NGC release (576×960), set
,
,
.
See
references/foundation-stereo-export-trt-hardware.md
for the full export / TRT defaults (the opset-vs-
pairing rules, determinism notes,
GPU-memory thresholds) and the
Hardware requirements. See
references/tao-deploy-foundation-stereo.md
for the three supported deploy paths and the validation table.
Full TAO Deploy reference: tao-deploy-foundation-stereo.
Error Patterns
Common issues: disparity overflow (reduce
); missing pretrained paths (set both
model.stereo_backbone.depth_anything_v2_pretrained_path
and
model.stereo_backbone.edgenext_pretrained_path
);
Key 'encoder' not in 'StereoBackBone'
(
is top-level
);
Key 'dataset_name' is not in struct
(each
entry needs both
and
);
bash: exec: depth_net_stereo: not found
(entrypoint is
, no suffix).
See
references/foundation-stereo-troubleshooting.md
for the full error patterns plus the pyt-vs-deploy
discussion (the pyt
path runs at native image resolution and ignores
, with the Middlebury resolution guidance) and the
Shape consistency rule.
Spec Param / Parent Model Inference
Model-specific inference mappings belong in MD, not in
. Generated runners read these mappings and apply them with SDK helpers before
(mirrors the old microservices
flow). For
/
, pass the upstream train/export/AutoML child job id as
; the SDK lists the parent result folder, filters checkpoint artifacts, and returns the selected model file or folder. Do not add these mappings back to
and do not patch generated runner scripts to guess checkpoint paths.
See
references/foundation-stereo-spec-param-inference.md
for the full per-action inference-mapping table (train / evaluate / inference / export / gen_trt_engine / quantize, including the train pretrained-path link/destination and resume-checkpoint mappings).