tao-train-depth-anything-v2
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDepth Net Mono
Depth Net Mono
Monocular depth estimation using Metric Depth Anything v2 or Relative Depth Anything architectures. Predicts per-pixel depth from single RGB images.
Pretrained checkpoint loading varies by model variant and use case — see the Pretrained checkpoint loading — use case matrix in .
references/parameters.mdThe mono and stereo skills both invoke the unified TAO CLI inside the container; the mono/stereo family is selected via (full parameter glossary in ).
depth_netmodel.model_typereferences/parameters.mdFor TAO Deploy TensorRT actions (, TensorRT , and TensorRT ), read first. The deploy spec template lives in this skill's .
gen_trt_engineevaluateinferencereferences/tao-deploy-depth-anything-v2.mdreferences/spec_template_deploy.yaml使用Metric Depth Anything v2或Relative Depth Anything架构实现单目深度估计,可从单张RGB图像中预测逐像素深度。
预训练检查点的加载方式因模型变体和使用场景而异——详见中的预训练检查点加载——使用场景矩阵。
references/parameters.md单目(mono)和双目(stereo)技能都会调用容器内统一的TAO CLI;通过选择单目/双目系列(完整参数术语表见)。
depth_netmodel.model_typereferences/parameters.md对于TAO Deploy TensorRT操作(、TensorRT 和TensorRT ),请先阅读。部署规格模板位于本技能的中。
gen_trt_engineevaluateinferencereferences/tao-deploy-depth-anything-v2.mdreferences/spec_template_deploy.yamlTrain Action Policy
训练操作策略
This model is AutoML-enabled at the model layer. Before handling any train-stage request, read and resolve the run override from either an explicit value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as for this run only; otherwise default to . When , , and both and are packaged, route the train action through by default with this model's . Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and . Use direct model training only when or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
references/skill_info.yamlautoml_policyautoml_policy: offautoautoml_policy: autoautoml_enabled: trueschemas/train.schema.jsonreferences/spec_template_train.yamltao-skill-bank:tao-run-automlskill_dirautoml_policyautoml_policy: offNon-train actions such as , , , and deploy flows stay in this model skill. The per-run override does not change model metadata.
evaluateinferenceexportautoml_policy该模型在模型层支持AutoML。处理任何训练阶段的请求前,请阅读,并通过显式的值或用户的工作流请求确定运行覆盖配置。将“turn off AutoML”、“disable AutoML”、“no HPO”或“plain training”这类短语视为本次运行的;否则默认设为。当、,且已打包和时,默认通过结合本模型的来路由训练操作。保留数据集、规格、输出目录、GPU/平台设置、父检查点和的工作流/应用覆盖配置。仅当或打包的训练架构/模板缺失时,才使用直接模型训练;在架构缺失的情况下,需报告AutoML已启用,但在生成架构前无法针对该模型运行。
references/skill_info.yamlautoml_policyautoml_policy: offautoautoml_policy: autoautoml_enabled: trueschemas/train.schema.jsonreferences/spec_template_train.yamltao-skill-bank:tao-run-automlskill_dirautoml_policyautoml_policy: off非训练操作(如、、和部署流程)仍在本模型技能中执行。单次运行的覆盖配置不会更改模型元数据。
evaluateinferenceexportautoml_policyWorkflow
工作流
Prerequisites — data accessibility
前提条件——数据可访问性
Your dataset (RGB images + GT depth files) must be reachable from inside the container:
- SDK runner: place files at the S3 paths the runner resolves (the /
S3_TRAINplaceholders shown in the spec overrides). The runner handles S3 → container-path mounting transparently.S3_EVAL - Direct (e.g. local testing): mount the host dataset root read-only at the same in-container path:
docker run
docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...The same accessibility requirement applies to the written by all actions.
<output_dir>你的数据集(RGB图像+GT深度文件)必须能从容器内部访问:
- SDK运行器:将文件放置在运行器可解析的S3路径(规格覆盖配置中显示的/
S3_TRAIN占位符)。运行器会自动处理S3到容器路径的挂载。S3_EVAL - 直接(例如本地测试):将主机数据集根目录以只读方式挂载到容器内的相同路径:
docker run
docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...所有操作写入的也需满足相同的可访问性要求。
<output_dir>Step 1 — Annotation file
步骤1——标注文件
Per-line annotation file referenced by :
data_sources[*].data_file| Columns | Format | Use |
|---|---|---|
| 1 | | Mono inference (no GT) |
| 2 | | Mono with GT |
If you already have one, point to it. Otherwise generate via :
depth_net convertdepth_net convert -e <convert_spec.yaml>convert_spec.yamlyaml
data_root: <directory whose immediate children are scene/sample folders that contain your image+depth files; convert walks data_root recursively but expects per-scene subdirectories at one level below>
image_dir_pattern: [<substring matching left/RGB image paths>]
depth_dir_pattern: [<substring matching GT depth paths>]
image_extension: '' # optional .endswith filter, e.g. '.jpg'
depth_extension: '' # optional, swapped during depth derivation, e.g. '.png'
split_ratio: 0.0 # 0.0/1.0 = test-only; 0.8 = 80/20 train+valconvertdata_rootimage_dir_patternimage_dir_pattern[0]depth_dir_pattern[0]image_extensiondepth_extensionrgb_sync_depth_data_root/data/nyu_v2/eval/test/data/nyu_v2/eval/test/bathroomimage_extensiondepth_extension'.jpg''jpg'data_sources[*].data_file| 列 | 格式 | 用途 |
|---|---|---|
| 1 | | 单目推理(无真值) |
| 2 | | 带真值的单目任务 |
如果你已有标注文件,直接指向它即可。否则可通过生成:
depth_net convertdepth_net convert -e <convert_spec.yaml>convert_spec.yamlyaml
data_root: <目录,其子目录为包含图像+深度文件的场景/样本文件夹;convert会递归遍历data_root,但要求场景子目录位于data_root的下一级>
image_dir_pattern: [<匹配左/RGB图像路径的子串>]
depth_dir_pattern: [<匹配GT深度路径的子串>]
image_extension: '' # 可选的后缀过滤,例如'.jpg'
depth_extension: '' # 可选,在深度路径推导时替换,例如'.png'
split_ratio: 0.0 # 0.0/1.0 = 仅测试集;0.8 = 80/20的训练+验证集convertdata_rootimage_dir_patternimage_dir_pattern[0]depth_dir_pattern[0]image_extensiondepth_extensionrgb_sync_depth_data_root/data/nyu_v2/eval/test/data/nyu_v2/eval/test/bathroomimage_extensiondepth_extension'.jpg''jpg'Step 2 — Pair model_type
and dataset_name
based on your data
model_typedataset_name步骤2——根据数据匹配model_type
和dataset_name
model_typedataset_nameDefault — generic class for each task:
| Data category | | |
|---|---|---|
| Disparity-encoded data (pixels) | | |
| Metric depth (meters) | | |
| Mono inference (no GT, any image) | matches train choice | |
Dataset-specific class — switch when the data needs preprocessing the generic class does not perform:
| Special case | | | What the class adds |
|---|---|---|---|
NYU | | | mm→m unit conversion + Eigen evaluation crop |
NYU | | | same |
Using a generic class on data that requires unit conversion (e.g. raw NYU uint16 PNGs) results in an empty valid mask and silent . Match the class to your data's encoding.
train_loss = NaN默认情况——每个任务的通用类:
| 数据类别 | | |
|---|---|---|
| 视差编码数据(像素) | | |
| 度量深度(米) | | |
| 单目推理(无真值,任意图像) | 与训练选择匹配 | |
特定数据集类——当数据需要通用类未提供的预处理时切换:
| 特殊情况 | | | 该类新增功能 |
|---|---|---|---|
NYU | | | 毫米→米单位转换 + Eigen评估裁剪 |
NYU | | | 同上 |
如果对需要单位转换的数据使用通用类(例如原始NYU uint16 PNG文件),会导致有效掩码为空,且但无提示。请确保类与数据编码匹配。
train_loss = NaNStep 3 — Write spec yaml from the spec overrides
步骤3——根据规格覆盖配置编写spec yaml
Copy the action block from . Replace:
references/spec-overrides.md- from Step 2
model.model_type - from Step 2
dataset.<...>.data_sources[*].dataset_name - with the path from Step 1 (S3 path under SDK runner, host path for direct docker)
data_sources[*].data_file - For metric finetune: additionally apply the Metric Variant Finetuning Recipe in .
references/finetuning.md
For mono training set (recommended) or (Ampere SM80+, alternative).
train.precision: fp32bf16复制中的操作块。替换:
references/spec-overrides.md- 步骤2中的
model.model_type - 步骤2中的
dataset.<...>.data_sources[*].dataset_name - 步骤1中的路径作为(SDK运行器下为S3路径,直接docker运行时为主机路径)
data_sources[*].data_file - 对于度量深度微调:额外应用中的度量变体微调方案。
references/finetuning.md
单目训练请设置(推荐)或(Ampere SM80+,备选)。
train.precision: fp32bf16Step 4 — Run
步骤4——运行
docker run --gpus 'device=0' --shm-size 16G --ipc=host \
--user $(id -u):$(id -g) \
-v <data_root>:<data_root>:ro \
-v <output_dir>:<output_dir> \
<container> \
depth_net <action> -e <spec.yaml>Without the container writes outputs as , blocking host-side cleanup and retry.
--user $(id -u):$(id -g)nobody:nogroupdocker run --gpus 'device=0' --shm-size 16G --ipc=host \
--user $(id -u):$(id -g) \
-v <data_root>:<data_root>:ro \
-v <output_dir>:<output_dir> \
<container> \
depth_net <action> -e <spec.yaml>如果不添加,容器会以身份写入输出,导致主机端无法清理和重试。
--user $(id -u):$(id -g)nobody:nogroupStep 5 — Verify
步骤5——验证
- Container exit code 0
status.jsonblock populatedkpi- For : inspect per-step
traindirectly — the entrypoint reportstrain_losseven whenExecution status: PASS(see the Sanity-run PASS criteria intrain_loss = NaN)references/finetuning.md - For /
evaluate: artifacts underinferenceresults_dir
For TAO Deploy TensorRT actions (, TensorRT , and TensorRT ), read first. Deploy spec templates live in this skill's folder with the prefix.
gen_trt_engineevaluateinferencereferences/tao-deploy-depth-anything-v2.mdreferences/spec_template_deploy_*.yaml- 容器退出码为0
- 中的
status.json块已填充kpi - 对于:直接检查每一步的
train——即使train_loss,入口也会报告train_loss = NaN(详见Execution status: PASS中的** sanity-run PASS标准**)references/finetuning.md - 对于/
evaluate:inference下存在输出产物results_dir
对于TAO Deploy TensorRT操作(、TensorRT 和TensorRT ),请先阅读。部署规格模板位于本技能的文件夹中,前缀为。
gen_trt_engineevaluateinferencereferences/tao-deploy-depth-anything-v2.mdreferences/spec_template_deploy_*.yamlTraining Requirements
训练要求
- Valid values for mono
dataset_name(case-insensitive):data_sources,ThreeDVLM,FSD,NvCLIP,IssacStereo,Crestereo,Middlebury,NYUDV2,NYUDV2Relative,RelativeMonoDataset.MetricMonoDatasetcarries metric depth GT (meters) — pair withNYUDV2;MetricDepthAnythingis the same data with relative-depth conventions — pair withNYUDV2Relative.RelativeDepthAnything - Monitoring metric: val/loss
- 单目的有效
data_sources值(大小写不敏感):dataset_name、ThreeDVLM、FSD、NvCLIP、IssacStereo、Crestereo、Middlebury、NYUDV2、NYUDV2Relative、RelativeMonoDataset。MetricMonoDataset包含度量深度真值(米)——需与NYUDV2搭配;MetricDepthAnything是采用相对深度规范的同一数据——需与NYUDV2Relative搭配。RelativeDepthAnything - 监控指标:val/loss
Per-Action Dataset Requirements
各操作的数据集要求
| Action | Spec Key | Source | Files | List? |
|---|---|---|---|---|
| evaluate | dataset.test_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| inference | dataset.infer_dataset.data_sources | inference_dataset | data_file: annotations.txt + dataset_name | Yes |
| quantize | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | Yes |
| quantize | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| quantize | dataset.quant_calibration_dataset.images_dir | train_datasets | images.tar.gz | No |
| train | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | Yes |
| train | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | Yes |
| 操作 | 规格键 | 来源 | 文件 | 是否为列表? |
|---|---|---|---|---|
| evaluate | dataset.test_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | 是 |
| inference | dataset.infer_dataset.data_sources | inference_dataset | data_file: annotations.txt + dataset_name | 是 |
| quantize | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | 是 |
| quantize | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | 是 |
| quantize | dataset.quant_calibration_dataset.images_dir | train_datasets | images.tar.gz | 否 |
| train | dataset.train_dataset.data_sources | train_datasets | data_file: annotations.txt + dataset_name | 是 |
| train | dataset.val_dataset.data_sources | eval_dataset | data_file: annotations.txt + dataset_name | 是 |
Spec Overrides
规格覆盖配置
Data source overrides are mandatory for every action — construct the data source paths from the Per-Action Dataset Requirements table above and include them in ; each entry is a dict with the two mandatory fields and . See for the full per-action / / / / override blocks and the precision recommendations.
spec_overridesdata_sourcesdata_filedataset_namereferences/spec-overrides.mdtrainevaluateexportinferencequantize数据源覆盖配置对每个操作都是必填项——根据上述各操作数据集要求表构建数据源路径,并将其包含在中;每个条目是包含和两个必填字段的字典。详见中的完整训练/评估/导出/推理/量化操作覆盖块及精度建议。
spec_overridesdata_sourcesdata_filedataset_namereferences/spec-overrides.mdEval Dataset
评估数据集
Optional. Val dataset configured via (each entry needs and ).
dataset.val_dataset.data_sourcesdata_filedataset_name可选。验证数据集通过配置(每个条目需包含和)。
dataset.val_dataset.data_sourcesdata_filedataset_nameImportant Parameters
重要参数
Full parameter glossary (, , , , fields with options, defaults, and sources) plus the Pretrained checkpoint loading — use case matrix live in . Key starting points: (default ), (default ), (default 1e-4, AdamW), ( recommended), (default ).
model.*train.*dataset.*export.*inference.*references/parameters.mdmodel.model_typeMetricDepthAnythingmodel.encodervitltrain.optim.lrtrain.precisionfp32dataset.{train,val,test,infer}_dataset.augmentation.crop_size[518, 518]完整参数术语表(、、、、字段的选项、默认值及来源)以及预训练检查点加载——使用场景矩阵位于中。关键起始参数:(默认)、(默认)、(默认1e-4,AdamW)、(推荐)、(默认)。
model.*train.*dataset.*export.*inference.*references/parameters.mdmodel.model_typeMetricDepthAnythingmodel.encodervitltrain.optim.lrtrain.precisionfp32dataset.{train,val,test,infer}_dataset.augmentation.crop_size[518, 518]Finetuning Recipes
微调方案
Relative and Metric variant finetuning recipes — including required spec keys, the metric block required in both train AND export specs, trainer-enforced defaults (, , ), sanity-run overrides, and the Sanity-run PASS criteria for catching silent — are in . Both recipes use with (the AdamW default is too aggressive when finetuning from a converged/pretrained backbone).
dataset.{normalize_depth, min_depth, max_depth}clip_grad_norm: 0.1warmup_steps: 20weight_decay: 1e-4train_loss = NaNreferences/finetuning.mdtrain.optim.lr: 5e-6LambdaLR1e-4相对深度和度量深度变体的微调方案——包括必填规格键、训练和导出规格中都需要的度量块、训练器强制默认值(、、)、sanity-run覆盖配置,以及用于检测无声的sanity-run PASS标准——位于中。两种方案均使用搭配(AdamW默认的1e-4在从收敛/预训练骨干网络微调时过于激进)。
dataset.{normalize_depth, min_depth, max_depth}clip_grad_norm: 0.1warmup_steps: 20weight_decay: 1e-4train_loss = NaNreferences/finetuning.mdtrain.optim.lr: 5e-6LambdaLRMulti-GPU / Multi-Node
多GPU/多节点
Launch method: Lightning-managed (single process, Lightning spawns workers).
python| Spec Key | Description | Default |
|---|---|---|
| Number of GPUs | 1 |
| GPU device indices | [0] |
| Number of nodes | 1 |
| | |
- with activation checkpointing:
ddpfind_unused_parameters=False - without:
ddpfind_unused_parameters=True - forces precision to FP16
fsdp
Multi-node env vars (set by orchestrator): , , , , .
WORLD_SIZENODE_RANKMASTER_ADDRMASTER_PORTNUM_GPU_PER_NODE启动方式:Lightning管理(单个进程,Lightning生成工作进程)。
python| 规格键 | 描述 | 默认值 |
|---|---|---|
| GPU数量 | 1 |
| GPU设备索引 | [0] |
| 节点数量 | 1 |
| | |
- 带激活 checkpointing的:
ddpfind_unused_parameters=False - 不带激活 checkpointing的:
ddpfind_unused_parameters=True - 强制精度为FP16
fsdp
多节点环境变量(由编排器设置):、、、、。
WORLD_SIZENODE_RANKMASTER_ADDRMASTER_PORTNUM_GPU_PER_NODEExport / TRT Defaults
导出/TRT默认值
- TRT data types: FP32, BF16 (Ampere SM80+). FP16 is not supported for the ViT-L mono backbone.
- Recommended TRT precision: . Use
bf16if BF16 hardware is unavailable.fp32
Full TAO Deploy reference: tao-deploy-depth-anything-v2.
- TRT数据类型:FP32、BF16(Ampere SM80+)。ViT-L单目骨干网络不支持FP16。
- 推荐TRT精度:。如果没有BF16硬件,使用
bf16。fp32
完整TAO Deploy参考:tao-deploy-depth-anything-v2。
Hardware
硬件
Minimum 1 GPU(s), recommended 2 GPU(s). 24GB+ VRAM per GPU. ViT-Large encoder is memory intensive. Use (recommended) or (Ampere SM80+, alternative) for training. Activation checkpointing is available for larger inputs.
fp32bf16最少1块GPU,推荐2块GPU。每块GPU需24GB+显存。ViT-Large编码器对内存要求较高。训练使用(推荐)或(Ampere SM80+,备选)。针对更大输入可启用激活 checkpointing。
fp32bf16Error Patterns
错误模式
Common failure signatures and fixes — depth range mismatch, missing pretrained weights, , missing , , metric variant hyperparameter sourcing, and the export refuse-to-overwrite ONNX error — are documented in .
Key 'encoder' not in 'MonoBackBone'dataset_namedepth_net_mono: not foundreferences/troubleshooting.md常见故障特征及修复方案——深度范围不匹配、缺失预训练权重、、缺失、、度量变体超参数来源、导出时拒绝覆盖ONNX错误——记录在中。
Key 'encoder' not in 'MonoBackBone'dataset_namedepth_net_mono: not foundreferences/troubleshooting.mdSpec Param / Parent Model Inference
规格参数/父模型推理
Model-specific inference mappings (the full per-action spec-field → inference-function table, plus / resolution guidance) are in . These mappings belong in MD, not in ; generated runners should read that reference and apply the mappings with SDK helpers before .
depth_net_mono.config.jsonparent_modelparent_job_idreferences/spec-param-inference.mdconfig.jsoncreate_job()模型特定的推理映射(完整的各操作规格字段→推理函数表,以及/解析指南)位于中。这些映射应记录在MD文件中,而非;生成的运行器应读取该参考文档,并在前通过SDK助手应用映射。
depth_net_mono.config.jsonparent_modelparent_job_idreferences/spec-param-inference.mdconfig.jsoncreate_job()