tao-train-depth-anything-v2

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Depth Net Mono

Depth Net Mono

Monocular depth estimation using Metric Depth Anything v2 or Relative Depth Anything architectures. Predicts per-pixel depth from single RGB images.
Pretrained checkpoint loading varies by model variant and use case — see the Pretrained checkpoint loading — use case matrix in
references/parameters.md
.
The mono and stereo skills both invoke the unified TAO
depth_net
CLI inside the container; the mono/stereo family is selected via
model.model_type
(full parameter glossary in
references/parameters.md
).
For TAO Deploy TensorRT actions (
gen_trt_engine
, TensorRT
evaluate
, and TensorRT
inference
), read
references/tao-deploy-depth-anything-v2.md
first. The deploy spec template lives in this skill's
references/spec_template_deploy.yaml
.
使用Metric Depth Anything v2或Relative Depth Anything架构实现单目深度估计,可从单张RGB图像中预测逐像素深度。
预训练检查点的加载方式因模型变体和使用场景而异——详见
references/parameters.md
中的预训练检查点加载——使用场景矩阵
单目(mono)和双目(stereo)技能都会调用容器内统一的TAO
depth_net
CLI;通过
model.model_type
选择单目/双目系列(完整参数术语表见
references/parameters.md
)。
对于TAO Deploy TensorRT操作(
gen_trt_engine
、TensorRT
evaluate
和TensorRT
inference
),请先阅读
references/tao-deploy-depth-anything-v2.md
。部署规格模板位于本技能的
references/spec_template_deploy.yaml
中。

Train Action Policy

训练操作策略

This model is AutoML-enabled at the model layer. Before handling any train-stage request, read
references/skill_info.yaml
and resolve the run override from either an explicit
automl_policy
value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as
automl_policy: off
for this run only; otherwise default to
auto
. When
automl_policy: auto
,
automl_enabled: true
, and both
schemas/train.schema.json
and
references/spec_template_train.yaml
are packaged, route the train action through
tao-skill-bank:tao-run-automl
by default with this model's
skill_dir
. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and
automl_policy
. Use direct model training only when
automl_policy: off
or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
Non-train actions such as
evaluate
,
inference
,
export
, and deploy flows stay in this model skill. The per-run
automl_policy
override does not change model metadata.
该模型在模型层支持AutoML。处理任何训练阶段的请求前,请阅读
references/skill_info.yaml
,并通过显式的
automl_policy
值或用户的工作流请求确定运行覆盖配置。将“turn off AutoML”、“disable AutoML”、“no HPO”或“plain training”这类短语视为本次运行的
automl_policy: off
;否则默认设为
auto
。当
automl_policy: auto
automl_enabled: true
,且已打包
schemas/train.schema.json
references/spec_template_train.yaml
时,默认通过
tao-skill-bank:tao-run-automl
结合本模型的
skill_dir
来路由训练操作。保留数据集、规格、输出目录、GPU/平台设置、父检查点和
automl_policy
的工作流/应用覆盖配置。仅当
automl_policy: off
或打包的训练架构/模板缺失时,才使用直接模型训练;在架构缺失的情况下,需报告AutoML已启用,但在生成架构前无法针对该模型运行。
非训练操作(如
evaluate
inference
export
和部署流程)仍在本模型技能中执行。单次运行的
automl_policy
覆盖配置不会更改模型元数据。

Workflow

工作流

Prerequisites — data accessibility

前提条件——数据可访问性

Your dataset (RGB images + GT depth files) must be reachable from inside the container:
  • SDK runner: place files at the S3 paths the runner resolves (the
    S3_TRAIN
    /
    S3_EVAL
    placeholders shown in the spec overrides). The runner handles S3 → container-path mounting transparently.
  • Direct
    docker run
    (e.g. local testing): mount the host dataset root read-only at the same in-container path:
docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...
The same accessibility requirement applies to the
<output_dir>
written by all actions.
你的数据集(RGB图像+GT深度文件)必须能从容器内部访问:
  • SDK运行器:将文件放置在运行器可解析的S3路径(规格覆盖配置中显示的
    S3_TRAIN
    /
    S3_EVAL
    占位符)。运行器会自动处理S3到容器路径的挂载。
  • 直接
    docker run
    (例如本地测试):将主机数据集根目录以只读方式挂载到容器内的相同路径:
docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...
所有操作写入的
<output_dir>
也需满足相同的可访问性要求。

Step 1 — Annotation file

步骤1——标注文件

Per-line annotation file referenced by
data_sources[*].data_file
:
ColumnsFormatUse
1
<image>
Mono inference (no GT)
2
<image> <gt_depth>
Mono with GT
If you already have one, point to it. Otherwise generate via
depth_net convert
:
depth_net convert -e <convert_spec.yaml>
convert_spec.yaml
template:
yaml
data_root: <directory whose immediate children are scene/sample folders that contain your image+depth files; convert walks data_root recursively but expects per-scene subdirectories at one level below>
image_dir_pattern: [<substring matching left/RGB image paths>]
depth_dir_pattern: [<substring matching GT depth paths>]
image_extension: ''     # optional .endswith filter, e.g. '.jpg'
depth_extension: ''     # optional, swapped during depth derivation, e.g. '.png'
split_ratio: 0.0        # 0.0/1.0 = test-only; 0.8 = 80/20 train+val
convert
walks
data_root
recursively, selects paths whose path-string contains all substrings in
image_dir_pattern
(AND-filter), then derives the depth path by replacing
image_dir_pattern[0]
with
depth_dir_pattern[0]
and
image_extension
with
depth_extension
. Inspect your dataset's directory layout and identify the substring distinguishing RGB images from depth files (e.g.
rgb_
vs
sync_depth_
).
data_root
must point at the parent that contains the per-scene subdirectories (e.g. for NYU eval, use
/data/nyu_v2/eval/test
, not
/data/nyu_v2/eval/test/bathroom
— the latter limits the walk to a single scene). Always include the leading dot in
image_extension
/
depth_extension
(e.g.
'.jpg'
not
'jpg'
); the substring swap is form-sensitive and a mismatch silently corrupts derived paths.
data_sources[*].data_file
引用的逐行标注文件:
格式用途
1
<image>
单目推理(无真值)
2
<image> <gt_depth>
带真值的单目任务
如果你已有标注文件,直接指向它即可。否则可通过
depth_net convert
生成:
depth_net convert -e <convert_spec.yaml>
convert_spec.yaml
模板:
yaml
data_root: <目录,其子目录为包含图像+深度文件的场景/样本文件夹;convert会递归遍历data_root,但要求场景子目录位于data_root的下一级>
image_dir_pattern: [<匹配左/RGB图像路径的子串>]
depth_dir_pattern: [<匹配GT深度路径的子串>]
image_extension: ''     # 可选的后缀过滤,例如'.jpg'
depth_extension: ''     # 可选,在深度路径推导时替换,例如'.png'
split_ratio: 0.0        # 0.0/1.0 = 仅测试集;0.8 = 80/20的训练+验证集
convert
会递归遍历
data_root
,选择路径字符串包含
image_dir_pattern
中所有子串的路径(与过滤),然后通过将
image_dir_pattern[0]
替换为
depth_dir_pattern[0]
image_extension
替换为
depth_extension
来推导深度路径。检查你的数据集目录结构,找出区分RGB图像和深度文件的子串(例如
rgb_
vs
sync_depth_
)。
data_root
必须指向包含场景子目录的父目录(例如,对于NYU评估集,使用
/data/nyu_v2/eval/test
,而非
/data/nyu_v2/eval/test/bathroom
——后者会将遍历限制在单个场景)。
image_extension
/
depth_extension
必须包含前导点(例如
'.jpg'
而非
'jpg'
);子串替换对格式敏感,不匹配会导致推导路径无声损坏。

Step 2 — Pair
model_type
and
dataset_name
based on your data

步骤2——根据数据匹配
model_type
dataset_name

Default — generic class for each task:
Data category
model_type
dataset_name
Disparity-encoded data (pixels)
RelativeDepthAnything
RelativeMonoDataset
Metric depth (meters)
MetricDepthAnything
MetricMonoDataset
Mono inference (no GT, any image)matches train choice
RelativeMonoDataset
or
MetricMonoDataset
Dataset-specific class — switch when the data needs preprocessing the generic class does not perform:
Special case
model_type
dataset_name
What the class adds
NYU
sync_depth_*.png
(raw uint16 millimetres) — relative
RelativeDepthAnything
NYUDV2Relative
mm→m unit conversion + Eigen evaluation crop
NYU
sync_depth_*.png
(raw uint16 millimetres) — metric
MetricDepthAnything
NYUDV2
same
Using a generic class on data that requires unit conversion (e.g. raw NYU uint16 PNGs) results in an empty valid mask and silent
train_loss = NaN
. Match the class to your data's encoding.
默认情况——每个任务的通用类:
数据类别
model_type
dataset_name
视差编码数据(像素)
RelativeDepthAnything
RelativeMonoDataset
度量深度(米)
MetricDepthAnything
MetricMonoDataset
单目推理(无真值,任意图像)与训练选择匹配
RelativeMonoDataset
MetricMonoDataset
特定数据集类——当数据需要通用类未提供的预处理时切换:
特殊情况
model_type
dataset_name
该类新增功能
NYU
sync_depth_*.png
(原始uint16毫米单位)——相对深度
RelativeDepthAnything
NYUDV2Relative
毫米→米单位转换 + Eigen评估裁剪
NYU
sync_depth_*.png
(原始uint16毫米单位)——度量深度
MetricDepthAnything
NYUDV2
同上
如果对需要单位转换的数据使用通用类(例如原始NYU uint16 PNG文件),会导致有效掩码为空,且
train_loss = NaN
但无提示。请确保类与数据编码匹配。

Step 3 — Write spec yaml from the spec overrides

步骤3——根据规格覆盖配置编写spec yaml

Copy the action block from
references/spec-overrides.md
. Replace:
  • model.model_type
    from Step 2
  • dataset.<...>.data_sources[*].dataset_name
    from Step 2
  • data_sources[*].data_file
    with the path from Step 1 (S3 path under SDK runner, host path for direct docker)
  • For metric finetune: additionally apply the Metric Variant Finetuning Recipe in
    references/finetuning.md
    .
For mono training set
train.precision: fp32
(recommended) or
bf16
(Ampere SM80+, alternative).
复制
references/spec-overrides.md
中的操作块。替换:
  • 步骤2中的
    model.model_type
  • 步骤2中的
    dataset.<...>.data_sources[*].dataset_name
  • 步骤1中的路径作为
    data_sources[*].data_file
    (SDK运行器下为S3路径,直接docker运行时为主机路径)
  • 对于度量深度微调:额外应用
    references/finetuning.md
    中的度量变体微调方案
单目训练请设置
train.precision: fp32
(推荐)或
bf16
(Ampere SM80+,备选)。

Step 4 — Run

步骤4——运行

docker run --gpus 'device=0' --shm-size 16G --ipc=host \
  --user $(id -u):$(id -g) \
  -v <data_root>:<data_root>:ro \
  -v <output_dir>:<output_dir> \
  <container> \
  depth_net <action> -e <spec.yaml>
Without
--user $(id -u):$(id -g)
the container writes outputs as
nobody:nogroup
, blocking host-side cleanup and retry.
docker run --gpus 'device=0' --shm-size 16G --ipc=host \
  --user $(id -u):$(id -g) \
  -v <data_root>:<data_root>:ro \
  -v <output_dir>:<output_dir> \
  <container> \
  depth_net <action> -e <spec.yaml>
如果不添加
--user $(id -u):$(id -g)
,容器会以
nobody:nogroup
身份写入输出,导致主机端无法清理和重试。

Step 5 — Verify

步骤5——验证

  • Container exit code 0
  • status.json
    kpi
    block populated
  • For
    train
    : inspect per-step
    train_loss
    directly — the entrypoint reports
    Execution status: PASS
    even when
    train_loss = NaN
    (see the Sanity-run PASS criteria in
    references/finetuning.md
    )
  • For
    evaluate
    /
    inference
    : artifacts under
    results_dir
For TAO Deploy TensorRT actions (
gen_trt_engine
, TensorRT
evaluate
, and TensorRT
inference
), read
references/tao-deploy-depth-anything-v2.md
first. Deploy spec templates live in this skill's
references/
folder with the
spec_template_deploy_*.yaml
prefix.
  • 容器退出码为0
  • status.json
    中的
    kpi
    块已填充
  • 对于
    train
    :直接检查每一步的
    train_loss
    ——即使
    train_loss = NaN
    ,入口也会报告
    Execution status: PASS
    (详见
    references/finetuning.md
    中的** sanity-run PASS标准**)
  • 对于
    evaluate
    /
    inference
    results_dir
    下存在输出产物
对于TAO Deploy TensorRT操作(
gen_trt_engine
、TensorRT
evaluate
和TensorRT
inference
),请先阅读
references/tao-deploy-depth-anything-v2.md
。部署规格模板位于本技能的
references/
文件夹中,前缀为
spec_template_deploy_*.yaml

Training Requirements

训练要求

  • Valid
    dataset_name
    values for mono
    data_sources
    (case-insensitive):
    ThreeDVLM
    ,
    FSD
    ,
    NvCLIP
    ,
    IssacStereo
    ,
    Crestereo
    ,
    Middlebury
    ,
    NYUDV2
    ,
    NYUDV2Relative
    ,
    RelativeMonoDataset
    ,
    MetricMonoDataset
    .
    NYUDV2
    carries metric depth GT (meters) — pair with
    MetricDepthAnything
    ;
    NYUDV2Relative
    is the same data with relative-depth conventions — pair with
    RelativeDepthAnything
    .
  • Monitoring metric: val/loss
  • 单目
    data_sources
    的有效
    dataset_name
    值(大小写不敏感):
    ThreeDVLM
    FSD
    NvCLIP
    IssacStereo
    Crestereo
    Middlebury
    NYUDV2
    NYUDV2Relative
    RelativeMonoDataset
    MetricMonoDataset
    NYUDV2
    包含度量深度真值(米)——需与
    MetricDepthAnything
    搭配;
    NYUDV2Relative
    是采用相对深度规范的同一数据——需与
    RelativeDepthAnything
    搭配。
  • 监控指标:val/loss

Per-Action Dataset Requirements

各操作的数据集要求

ActionSpec KeySourceFilesList?
evaluatedataset.test_dataset.data_sourceseval_datasetdata_file: annotations.txt + dataset_nameYes
inferencedataset.infer_dataset.data_sourcesinference_datasetdata_file: annotations.txt + dataset_nameYes
quantizedataset.train_dataset.data_sourcestrain_datasetsdata_file: annotations.txt + dataset_nameYes
quantizedataset.val_dataset.data_sourceseval_datasetdata_file: annotations.txt + dataset_nameYes
quantizedataset.quant_calibration_dataset.images_dirtrain_datasetsimages.tar.gzNo
traindataset.train_dataset.data_sourcestrain_datasetsdata_file: annotations.txt + dataset_nameYes
traindataset.val_dataset.data_sourceseval_datasetdata_file: annotations.txt + dataset_nameYes
操作规格键来源文件是否为列表?
evaluatedataset.test_dataset.data_sourceseval_datasetdata_file: annotations.txt + dataset_name
inferencedataset.infer_dataset.data_sourcesinference_datasetdata_file: annotations.txt + dataset_name
quantizedataset.train_dataset.data_sourcestrain_datasetsdata_file: annotations.txt + dataset_name
quantizedataset.val_dataset.data_sourceseval_datasetdata_file: annotations.txt + dataset_name
quantizedataset.quant_calibration_dataset.images_dirtrain_datasetsimages.tar.gz
traindataset.train_dataset.data_sourcestrain_datasetsdata_file: annotations.txt + dataset_name
traindataset.val_dataset.data_sourceseval_datasetdata_file: annotations.txt + dataset_name

Spec Overrides

规格覆盖配置

Data source overrides are mandatory for every action — construct the data source paths from the Per-Action Dataset Requirements table above and include them in
spec_overrides
; each
data_sources
entry is a dict with the two mandatory fields
data_file
and
dataset_name
. See
references/spec-overrides.md
for the full per-action
train
/
evaluate
/
export
/
inference
/
quantize
override blocks and the precision recommendations.
数据源覆盖配置对每个操作都是必填项——根据上述各操作数据集要求表构建数据源路径,并将其包含在
spec_overrides
中;每个
data_sources
条目是包含
data_file
dataset_name
两个必填字段的字典。详见
references/spec-overrides.md
中的完整训练/评估/导出/推理/量化操作覆盖块及精度建议。

Eval Dataset

评估数据集

Optional. Val dataset configured via
dataset.val_dataset.data_sources
(each entry needs
data_file
and
dataset_name
).
可选。验证数据集通过
dataset.val_dataset.data_sources
配置(每个条目需包含
data_file
dataset_name
)。

Important Parameters

重要参数

Full parameter glossary (
model.*
,
train.*
,
dataset.*
,
export.*
,
inference.*
fields with options, defaults, and sources) plus the Pretrained checkpoint loading — use case matrix live in
references/parameters.md
. Key starting points:
model.model_type
(default
MetricDepthAnything
),
model.encoder
(default
vitl
),
train.optim.lr
(default 1e-4, AdamW),
train.precision
(
fp32
recommended),
dataset.{train,val,test,infer}_dataset.augmentation.crop_size
(default
[518, 518]
).
完整参数术语表(
model.*
train.*
dataset.*
export.*
inference.*
字段的选项、默认值及来源)以及预训练检查点加载——使用场景矩阵位于
references/parameters.md
中。关键起始参数:
model.model_type
(默认
MetricDepthAnything
)、
model.encoder
(默认
vitl
)、
train.optim.lr
(默认1e-4,AdamW)、
train.precision
(推荐
fp32
)、
dataset.{train,val,test,infer}_dataset.augmentation.crop_size
(默认
[518, 518]
)。

Finetuning Recipes

微调方案

Relative and Metric variant finetuning recipes — including required spec keys, the metric
dataset.{normalize_depth, min_depth, max_depth}
block required in both train AND export specs, trainer-enforced defaults (
clip_grad_norm: 0.1
,
warmup_steps: 20
,
weight_decay: 1e-4
), sanity-run overrides, and the Sanity-run PASS criteria for catching silent
train_loss = NaN
— are in
references/finetuning.md
. Both recipes use
train.optim.lr: 5e-6
with
LambdaLR
(the AdamW default
1e-4
is too aggressive when finetuning from a converged/pretrained backbone).
相对深度和度量深度变体的微调方案——包括必填规格键、训练和导出规格中都需要的度量
dataset.{normalize_depth, min_depth, max_depth}
块、训练器强制默认值(
clip_grad_norm: 0.1
warmup_steps: 20
weight_decay: 1e-4
)、sanity-run覆盖配置,以及用于检测无声
train_loss = NaN
sanity-run PASS标准——位于
references/finetuning.md
中。两种方案均使用
train.optim.lr: 5e-6
搭配
LambdaLR
(AdamW默认的1e-4在从收敛/预训练骨干网络微调时过于激进)。

Multi-GPU / Multi-Node

多GPU/多节点

Launch method: Lightning-managed (single
python
process, Lightning spawns workers).
Spec KeyDescriptionDefault
train.num_gpus
Number of GPUs1
train.gpu_ids
GPU device indices[0]
train.num_nodes
Number of nodes1
train.distributed_strategy
ddp
or
fsdp
ddp
  • ddp
    with activation checkpointing:
    find_unused_parameters=False
  • ddp
    without:
    find_unused_parameters=True
  • fsdp
    forces precision to FP16
Multi-node env vars (set by orchestrator):
WORLD_SIZE
,
NODE_RANK
,
MASTER_ADDR
,
MASTER_PORT
,
NUM_GPU_PER_NODE
.
启动方式:Lightning管理(单个
python
进程,Lightning生成工作进程)。
规格键描述默认值
train.num_gpus
GPU数量1
train.gpu_ids
GPU设备索引[0]
train.num_nodes
节点数量1
train.distributed_strategy
ddp
fsdp
ddp
  • 带激活 checkpointing的
    ddp
    find_unused_parameters=False
  • 不带激活 checkpointing的
    ddp
    find_unused_parameters=True
  • fsdp
    强制精度为FP16
多节点环境变量(由编排器设置):
WORLD_SIZE
NODE_RANK
MASTER_ADDR
MASTER_PORT
NUM_GPU_PER_NODE

Export / TRT Defaults

导出/TRT默认值

  • TRT data types: FP32, BF16 (Ampere SM80+). FP16 is not supported for the ViT-L mono backbone.
  • Recommended TRT precision:
    bf16
    . Use
    fp32
    if BF16 hardware is unavailable.
Full TAO Deploy reference: tao-deploy-depth-anything-v2.
  • TRT数据类型:FP32、BF16(Ampere SM80+)。ViT-L单目骨干网络不支持FP16。
  • 推荐TRT精度:
    bf16
    。如果没有BF16硬件,使用
    fp32
完整TAO Deploy参考:tao-deploy-depth-anything-v2

Hardware

硬件

Minimum 1 GPU(s), recommended 2 GPU(s). 24GB+ VRAM per GPU. ViT-Large encoder is memory intensive. Use
fp32
(recommended) or
bf16
(Ampere SM80+, alternative) for training. Activation checkpointing is available for larger inputs.
最少1块GPU,推荐2块GPU。每块GPU需24GB+显存。ViT-Large编码器对内存要求较高。训练使用
fp32
(推荐)或
bf16
(Ampere SM80+,备选)。针对更大输入可启用激活 checkpointing。

Error Patterns

错误模式

Common failure signatures and fixes — depth range mismatch, missing pretrained weights,
Key 'encoder' not in 'MonoBackBone'
, missing
dataset_name
,
depth_net_mono: not found
, metric variant hyperparameter sourcing, and the export refuse-to-overwrite ONNX error — are documented in
references/troubleshooting.md
.
常见故障特征及修复方案——深度范围不匹配、缺失预训练权重、
Key 'encoder' not in 'MonoBackBone'
、缺失
dataset_name
depth_net_mono: not found
、度量变体超参数来源、导出时拒绝覆盖ONNX错误——记录在
references/troubleshooting.md
中。

Spec Param / Parent Model Inference

规格参数/父模型推理

Model-specific inference mappings (the full
depth_net_mono.config.json
per-action spec-field → inference-function table, plus
parent_model
/
parent_job_id
resolution guidance) are in
references/spec-param-inference.md
. These mappings belong in MD, not in
config.json
; generated runners should read that reference and apply the mappings with SDK helpers before
create_job()
.
模型特定的推理映射(完整的
depth_net_mono.config.json
各操作规格字段→推理函数表,以及
parent_model
/
parent_job_id
解析指南)位于
references/spec-param-inference.md
中。这些映射应记录在MD文件中,而非
config.json
;生成的运行器应读取该参考文档,并在
create_job()
前通过SDK助手应用映射。