tao-train-depth-anything-v2

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Depth Net Mono

Monocular depth estimation using Metric Depth Anything v2 or Relative Depth Anything architectures. Predicts per-pixel depth from single RGB images.

Pretrained checkpoint loading varies by model variant and use case — see the Pretrained checkpoint loading — use case matrix in

references/parameters.md

The mono and stereo skills both invoke the unified TAO

depth_net

CLI inside the container; the mono/stereo family is selected via

model.model_type

(full parameter glossary in

references/parameters.md

For TAO Deploy TensorRT actions (

gen_trt_engine

, TensorRT

evaluate

, and TensorRT

inference

), read

references/tao-deploy-depth-anything-v2.md

first. The deploy spec template lives in this skill's

references/spec_template_deploy.yaml

使用Metric Depth Anything v2或Relative Depth Anything架构实现单目深度估计，可从单张RGB图像中预测逐像素深度。

预训练检查点的加载方式因模型变体和使用场景而异——详见

references/parameters.md

中的预训练检查点加载——使用场景矩阵。

单目（mono）和双目（stereo）技能都会调用容器内统一的TAO

depth_net

CLI；通过

model.model_type

选择单目/双目系列（完整参数术语表见

references/parameters.md

）。

对于TAO Deploy TensorRT操作（

gen_trt_engine

、TensorRT

evaluate

和TensorRT

inference

），请先阅读

references/tao-deploy-depth-anything-v2.md

。部署规格模板位于本技能的

references/spec_template_deploy.yaml

中。

Train Action Policy

训练操作策略

This model is AutoML-enabled at the model layer. Before handling any train-stage request, read

references/skill_info.yaml

and resolve the run override from either an explicit

automl_policy

value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as

automl_policy: off

for this run only; otherwise default to

auto

. When

automl_policy: auto

automl_enabled: true

, and both

schemas/train.schema.json

and

references/spec_template_train.yaml

are packaged, route the train action through

tao-skill-bank:tao-run-automl

by default with this model's

skill_dir

. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and

automl_policy

. Use direct model training only when

automl_policy: off

or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.

Non-train actions such as

evaluate

inference

export

, and deploy flows stay in this model skill. The per-run

automl_policy

override does not change model metadata.

该模型在模型层支持AutoML。处理任何训练阶段的请求前，请阅读

references/skill_info.yaml

，并通过显式的

automl_policy

值或用户的工作流请求确定运行覆盖配置。将“turn off AutoML”、“disable AutoML”、“no HPO”或“plain training”这类短语视为本次运行的

automl_policy: off

；否则默认设为

auto

。当

automl_policy: auto

、

automl_enabled: true

，且已打包

schemas/train.schema.json

和

references/spec_template_train.yaml

时，默认通过

tao-skill-bank:tao-run-automl

结合本模型的

skill_dir

来路由训练操作。保留数据集、规格、输出目录、GPU/平台设置、父检查点和

automl_policy

的工作流/应用覆盖配置。仅当

automl_policy: off

或打包的训练架构/模板缺失时，才使用直接模型训练；在架构缺失的情况下，需报告AutoML已启用，但在生成架构前无法针对该模型运行。

非训练操作（如

evaluate

、

inference

、

export

和部署流程）仍在本模型技能中执行。单次运行的

automl_policy

覆盖配置不会更改模型元数据。

Workflow

工作流

Prerequisites — data accessibility

前提条件——数据可访问性

Your dataset (RGB images + GT depth files) must be reachable from inside the container:

SDK runner: place files at the S3 paths the runner resolves (the
```
S3_TRAIN
```
/
```
S3_EVAL
```
placeholders shown in the spec overrides). The runner handles S3 → container-path mounting transparently.
Direct
docker run
(e.g. local testing): mount the host dataset root read-only at the same in-container path:

docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...

The same accessibility requirement applies to the

<output_dir>

written by all actions.

你的数据集（RGB图像+GT深度文件）必须能从容器内部访问：

SDK运行器：将文件放置在运行器可解析的S3路径（规格覆盖配置中显示的
```
S3_TRAIN
```
/
```
S3_EVAL
```
占位符）。运行器会自动处理S3到容器路径的挂载。
直接
docker run
（例如本地测试）：将主机数据集根目录以只读方式挂载到容器内的相同路径：

docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...

所有操作写入的

<output_dir>

也需满足相同的可访问性要求。

Step 1 — Annotation file

步骤1——标注文件

Per-line annotation file referenced by

data_sources[*].data_file

Columns	Format	Use
1	`<image>`	Mono inference (no GT)
2	`<image> <gt_depth>`	Mono with GT

If you already have one, point to it. Otherwise generate via

depth_net convert

depth_net convert -e <convert_spec.yaml>

convert_spec.yaml

template:

yaml

data_root: <directory whose immediate children are scene/sample folders that contain your image+depth files; convert walks data_root recursively but expects per-scene subdirectories at one level below>
image_dir_pattern: [<substring matching left/RGB image paths>]
depth_dir_pattern: [<substring matching GT depth paths>]
image_extension: ''     # optional .endswith filter, e.g. '.jpg'
depth_extension: ''     # optional, swapped during depth derivation, e.g. '.png'
split_ratio: 0.0        # 0.0/1.0 = test-only; 0.8 = 80/20 train+val

convert

walks

data_root

recursively, selects paths whose path-string contains all substrings in

image_dir_pattern

(AND-filter), then derives the depth path by replacing

image_dir_pattern[0]

with

depth_dir_pattern[0]

and

image_extension

with

depth_extension

. Inspect your dataset's directory layout and identify the substring distinguishing RGB images from depth files (e.g.

rgb_

sync_depth_

data_root

must point at the parent that contains the per-scene subdirectories (e.g. for NYU eval, use

/data/nyu_v2/eval/test

, not

/data/nyu_v2/eval/test/bathroom

— the latter limits the walk to a single scene). Always include the leading dot in

image_extension

depth_extension

(e.g.

'.jpg'

not

'jpg'

); the substring swap is form-sensitive and a mismatch silently corrupts derived paths.

data_sources[*].data_file

引用的逐行标注文件：

列	格式	用途
1	`<image>`	单目推理（无真值）
2	`<image> <gt_depth>`	带真值的单目任务

如果你已有标注文件，直接指向它即可。否则可通过

depth_net convert

生成：

depth_net convert -e <convert_spec.yaml>

convert_spec.yaml

模板：

yaml

data_root: <目录，其子目录为包含图像+深度文件的场景/样本文件夹；convert会递归遍历data_root，但要求场景子目录位于data_root的下一级>
image_dir_pattern: [<匹配左/RGB图像路径的子串>]
depth_dir_pattern: [<匹配GT深度路径的子串>]
image_extension: ''     # 可选的后缀过滤，例如'.jpg'
depth_extension: ''     # 可选，在深度路径推导时替换，例如'.png'
split_ratio: 0.0        # 0.0/1.0 = 仅测试集；0.8 = 80/20的训练+验证集

convert

会递归遍历

data_root

，选择路径字符串包含

image_dir_pattern

中所有子串的路径（与过滤），然后通过将

image_dir_pattern[0]

替换为

depth_dir_pattern[0]

、

image_extension

替换为

depth_extension

来推导深度路径。检查你的数据集目录结构，找出区分RGB图像和深度文件的子串（例如

rgb_

sync_depth_

）。

data_root

必须指向包含场景子目录的父目录（例如，对于NYU评估集，使用

/data/nyu_v2/eval/test

，而非

/data/nyu_v2/eval/test/bathroom

——后者会将遍历限制在单个场景）。

image_extension

depth_extension

必须包含前导点（例如

'.jpg'

而非

'jpg'

）；子串替换对格式敏感，不匹配会导致推导路径无声损坏。

Step 2 — Pair

model_type

and

dataset_name

based on your data

步骤2——根据数据匹配

model_type

和

dataset_name

Default — generic class for each task:

Data category	`model_type`	`dataset_name`
Disparity-encoded data (pixels)	`RelativeDepthAnything`	`RelativeMonoDataset`
Metric depth (meters)	`MetricDepthAnything`	`MetricMonoDataset`
Mono inference (no GT, any image)	matches train choice	`RelativeMonoDataset` or `MetricMonoDataset`

Dataset-specific class — switch when the data needs preprocessing the generic class does not perform:

Special case	`model_type`	`dataset_name`	What the class adds
NYU `sync_depth_*.png` (raw uint16 millimetres) — relative	`RelativeDepthAnything`	`NYUDV2Relative`	mm→m unit conversion + Eigen evaluation crop
NYU `sync_depth_*.png` (raw uint16 millimetres) — metric	`MetricDepthAnything`	`NYUDV2`	same

Using a generic class on data that requires unit conversion (e.g. raw NYU uint16 PNGs) results in an empty valid mask and silent

train_loss = NaN

. Match the class to your data's encoding.

默认情况——每个任务的通用类：

数据类别	`model_type`	`dataset_name`
视差编码数据（像素）	`RelativeDepthAnything`	`RelativeMonoDataset`
度量深度（米）	`MetricDepthAnything`	`MetricMonoDataset`
单目推理（无真值，任意图像）	与训练选择匹配	`RelativeMonoDataset` 或 `MetricMonoDataset`

特定数据集类——当数据需要通用类未提供的预处理时切换：

特殊情况	`model_type`	`dataset_name`	该类新增功能
NYU `sync_depth_*.png` （原始uint16毫米单位）——相对深度	`RelativeDepthAnything`	`NYUDV2Relative`	毫米→米单位转换 + Eigen评估裁剪
NYU `sync_depth_*.png` （原始uint16毫米单位）——度量深度	`MetricDepthAnything`	`NYUDV2`	同上

如果对需要单位转换的数据使用通用类（例如原始NYU uint16 PNG文件），会导致有效掩码为空，且

train_loss = NaN

但无提示。请确保类与数据编码匹配。

Step 3 — Write spec yaml from the spec overrides

步骤3——根据规格覆盖配置编写spec yaml

Copy the action block from

references/spec-overrides.md

. Replace:

```
model.model_type
```
from Step 2

dataset.<...>.data_sources[*].dataset_name

from Step 2

```
data_sources[*].data_file
```
with the path from Step 1 (S3 path under SDK runner, host path for direct docker)
For metric finetune: additionally apply the Metric Variant Finetuning Recipe in
```
references/finetuning.md
```
.

For mono training set

train.precision: fp32

(recommended) or

bf16

(Ampere SM80+, alternative).

复制

references/spec-overrides.md

中的操作块。替换：

步骤2中的
```
model.model_type
```

步骤2中的

dataset.<...>.data_sources[*].dataset_name

步骤1中的路径作为
```
data_sources[*].data_file
```
（SDK运行器下为S3路径，直接docker运行时为主机路径）
对于度量深度微调：额外应用
```
references/finetuning.md
```
中的度量变体微调方案。

单目训练请设置

train.precision: fp32

（推荐）或

bf16

（Ampere SM80+，备选）。

Step 4 — Run

步骤4——运行

docker run --gpus 'device=0' --shm-size 16G --ipc=host \
  --user $(id -u):$(id -g) \
  -v <data_root>:<data_root>:ro \
  -v <output_dir>:<output_dir> \
  <container> \
  depth_net <action> -e <spec.yaml>

Without

--user $(id -u):$(id -g)

the container writes outputs as

nobody:nogroup

, blocking host-side cleanup and retry.

docker run --gpus 'device=0' --shm-size 16G --ipc=host \
  --user $(id -u):$(id -g) \
  -v <data_root>:<data_root>:ro \
  -v <output_dir>:<output_dir> \
  <container> \
  depth_net <action> -e <spec.yaml>

如果不添加

--user $(id -u):$(id -g)

，容器会以

nobody:nogroup

身份写入输出，导致主机端无法清理和重试。

Step 5 — Verify

步骤5——验证

Container exit code 0
```
status.json
```
```
kpi
```
block populated
For
```
train
```
: inspect per-step
```
train_loss
```
directly — the entrypoint reports
```
Execution status: PASS
```
even when
```
train_loss = NaN
```
(see the Sanity-run PASS criteria in
```
references/finetuning.md
```
)
For
```
evaluate
```
/
```
inference
```
: artifacts under
```
results_dir
```

For TAO Deploy TensorRT actions (

gen_trt_engine

, TensorRT

evaluate

, and TensorRT

inference

), read

references/tao-deploy-depth-anything-v2.md

first. Deploy spec templates live in this skill's

references/

folder with the

spec_template_deploy_*.yaml

prefix.

容器退出码为0
```
status.json
```
中的
```
kpi
```
块已填充
对于
```
train
```
：直接检查每一步的
```
train_loss
```
——即使
```
train_loss = NaN
```
，入口也会报告
```
Execution status: PASS
```
（详见
```
references/finetuning.md
```
中的** sanity-run PASS标准**）
对于
```
evaluate
```
/
```
inference
```
：
```
results_dir
```
下存在输出产物

对于TAO Deploy TensorRT操作（

gen_trt_engine

、TensorRT

evaluate

和TensorRT

inference

），请先阅读

references/tao-deploy-depth-anything-v2.md

。部署规格模板位于本技能的

references/

文件夹中，前缀为

spec_template_deploy_*.yaml

。

Training Requirements

训练要求

Valid
dataset_name
values for mono
data_sources
(case-insensitive):

ThreeDVLM

FSD

NvCLIP

IssacStereo

Crestereo

Middlebury

NYUDV2

NYUDV2Relative

RelativeMonoDataset

MetricMonoDataset

NYUDV2

carries metric depth GT (meters) — pair with

MetricDepthAnything

;

NYUDV2Relative

is the same data with relative-depth conventions — pair with

RelativeDepthAnything

Monitoring metric: val/loss

单目
```
data_sources
```
的有效
```
dataset_name
```
值（大小写不敏感）：
```
ThreeDVLM
```
、
```
FSD
```
、
```
NvCLIP
```
、
```
IssacStereo
```
、
```
Crestereo
```
、
```
Middlebury
```
、
```
NYUDV2
```
、
```
NYUDV2Relative
```
、
```
RelativeMonoDataset
```
、
```
MetricMonoDataset
```
。
```
NYUDV2
```
包含度量深度真值（米）——需与
```
MetricDepthAnything
```
搭配；
```
NYUDV2Relative
```
是采用相对深度规范的同一数据——需与
```
RelativeDepthAnything
```
搭配。
监控指标：val/loss

Per-Action Dataset Requirements

各操作的数据集要求

Action	Spec Key	Source	Files	List?
evaluate	dataset.test_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	Yes
inference	dataset.infer_dataset.data_sources	inference_dataset	data_file: annotations.txt + dataset_name	Yes
quantize	dataset.train_dataset.data_sources	train_datasets	data_file: annotations.txt + dataset_name	Yes
quantize	dataset.val_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	Yes
quantize	dataset.quant_calibration_dataset.images_dir	train_datasets	images.tar.gz	No
train	dataset.train_dataset.data_sources	train_datasets	data_file: annotations.txt + dataset_name	Yes
train	dataset.val_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	Yes

操作	规格键	来源	文件	是否为列表？
evaluate	dataset.test_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	是
inference	dataset.infer_dataset.data_sources	inference_dataset	data_file: annotations.txt + dataset_name	是
quantize	dataset.train_dataset.data_sources	train_datasets	data_file: annotations.txt + dataset_name	是
quantize	dataset.val_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	是
quantize	dataset.quant_calibration_dataset.images_dir	train_datasets	images.tar.gz	否
train	dataset.train_dataset.data_sources	train_datasets	data_file: annotations.txt + dataset_name	是
train	dataset.val_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	是

Spec Overrides

规格覆盖配置

Data source overrides are mandatory for every action — construct the data source paths from the Per-Action Dataset Requirements table above and include them in

spec_overrides

; each

data_sources

entry is a dict with the two mandatory fields

data_file

and

dataset_name

. See

references/spec-overrides.md

for the full per-action

train

evaluate

export

inference

quantize

override blocks and the precision recommendations.

数据源覆盖配置对每个操作都是必填项——根据上述各操作数据集要求表构建数据源路径，并将其包含在

spec_overrides

中；每个

data_sources

条目是包含

data_file

和

dataset_name

两个必填字段的字典。详见

references/spec-overrides.md

中的完整训练/评估/导出/推理/量化操作覆盖块及精度建议。

Eval Dataset

评估数据集

Optional. Val dataset configured via

dataset.val_dataset.data_sources

(each entry needs

data_file

and

dataset_name

可选。验证数据集通过

dataset.val_dataset.data_sources

配置（每个条目需包含

data_file

和

dataset_name

）。

Important Parameters

重要参数

Full parameter glossary (

model.*

train.*

dataset.*

export.*

inference.*

fields with options, defaults, and sources) plus the Pretrained checkpoint loading — use case matrix live in

references/parameters.md

. Key starting points:

model.model_type

(default

MetricDepthAnything

model.encoder

(default

vitl

train.optim.lr

(default 1e-4, AdamW),

train.precision

(

fp32

recommended),

dataset.{train,val,test,infer}_dataset.augmentation.crop_size

(default

[518, 518]

完整参数术语表（

model.*

、

train.*

、

dataset.*

、

export.*

、

inference.*

字段的选项、默认值及来源）以及预训练检查点加载——使用场景矩阵位于

references/parameters.md

中。关键起始参数：

model.model_type

（默认

MetricDepthAnything

）、

model.encoder

（默认

vitl

）、

train.optim.lr

（默认1e-4，AdamW）、

train.precision

（推荐

fp32

）、

dataset.{train,val,test,infer}_dataset.augmentation.crop_size

（默认

[518, 518]

）。

Finetuning Recipes

微调方案

Relative and Metric variant finetuning recipes — including required spec keys, the metric

dataset.{normalize_depth, min_depth, max_depth}

block required in both train AND export specs, trainer-enforced defaults (

clip_grad_norm: 0.1

warmup_steps: 20

weight_decay: 1e-4

), sanity-run overrides, and the Sanity-run PASS criteria for catching silent

train_loss = NaN

— are in

references/finetuning.md

. Both recipes use

train.optim.lr: 5e-6

with

LambdaLR

(the AdamW default

1e-4

is too aggressive when finetuning from a converged/pretrained backbone).

相对深度和度量深度变体的微调方案——包括必填规格键、训练和导出规格中都需要的度量

dataset.{normalize_depth, min_depth, max_depth}

块、训练器强制默认值（

clip_grad_norm: 0.1

、

warmup_steps: 20

、

weight_decay: 1e-4

）、sanity-run覆盖配置，以及用于检测无声

train_loss = NaN

的sanity-run PASS标准——位于

references/finetuning.md

中。两种方案均使用

train.optim.lr: 5e-6

搭配

LambdaLR

（AdamW默认的1e-4在从收敛/预训练骨干网络微调时过于激进）。

Multi-GPU / Multi-Node

多GPU/多节点

Launch method: Lightning-managed (single

python

process, Lightning spawns workers).

Spec Key	Description	Default
`train.num_gpus`	Number of GPUs	1
`train.gpu_ids`	GPU device indices	[0]
`train.num_nodes`	Number of nodes	1
`train.distributed_strategy`	`ddp` or `fsdp`	`ddp`

```
ddp
```
with activation checkpointing:
```
find_unused_parameters=False
```
```
ddp
```
without:
```
find_unused_parameters=True
```
```
fsdp
```
forces precision to FP16

Multi-node env vars (set by orchestrator):

WORLD_SIZE

NODE_RANK

MASTER_ADDR

MASTER_PORT

NUM_GPU_PER_NODE

启动方式：Lightning管理（单个

python

进程，Lightning生成工作进程）。

规格键	描述	默认值
`train.num_gpus`	GPU数量	1
`train.gpu_ids`	GPU设备索引	[0]
`train.num_nodes`	节点数量	1
`train.distributed_strategy`	`ddp` 或 `fsdp`	`ddp`

带激活 checkpointing的
```
ddp
```
：
```
find_unused_parameters=False
```
不带激活 checkpointing的
```
ddp
```
：
```
find_unused_parameters=True
```
```
fsdp
```
强制精度为FP16

多节点环境变量（由编排器设置）：

WORLD_SIZE

、

NODE_RANK

、

MASTER_ADDR

、

MASTER_PORT

、

NUM_GPU_PER_NODE

。

Export / TRT Defaults

导出/TRT默认值

TRT data types: FP32, BF16 (Ampere SM80+). FP16 is not supported for the ViT-L mono backbone.
Recommended TRT precision:
```
bf16
```
. Use
```
fp32
```
if BF16 hardware is unavailable.

Full TAO Deploy reference: tao-deploy-depth-anything-v2.

TRT数据类型：FP32、BF16（Ampere SM80+）。ViT-L单目骨干网络不支持FP16。
推荐TRT精度：
```
bf16
```
。如果没有BF16硬件，使用
```
fp32
```
。

完整TAO Deploy参考：tao-deploy-depth-anything-v2。

Hardware

硬件

Minimum 1 GPU(s), recommended 2 GPU(s). 24GB+ VRAM per GPU. ViT-Large encoder is memory intensive. Use

fp32

(recommended) or

bf16

(Ampere SM80+, alternative) for training. Activation checkpointing is available for larger inputs.

最少1块GPU，推荐2块GPU。每块GPU需24GB+显存。ViT-Large编码器对内存要求较高。训练使用

fp32

（推荐）或

bf16

（Ampere SM80+，备选）。针对更大输入可启用激活 checkpointing。

Error Patterns

错误模式

Common failure signatures and fixes — depth range mismatch, missing pretrained weights,

Key 'encoder' not in 'MonoBackBone'

, missing

dataset_name

depth_net_mono: not found

, metric variant hyperparameter sourcing, and the export refuse-to-overwrite ONNX error — are documented in

references/troubleshooting.md

常见故障特征及修复方案——深度范围不匹配、缺失预训练权重、

Key 'encoder' not in 'MonoBackBone'

、缺失

dataset_name

、

depth_net_mono: not found

、度量变体超参数来源、导出时拒绝覆盖ONNX错误——记录在

references/troubleshooting.md

中。

Spec Param / Parent Model Inference

规格参数/父模型推理

Model-specific inference mappings (the full

depth_net_mono.config.json

per-action spec-field → inference-function table, plus

parent_model

parent_job_id

resolution guidance) are in

references/spec-param-inference.md

. These mappings belong in MD, not in

config.json

; generated runners should read that reference and apply the mappings with SDK helpers before

create_job()

模型特定的推理映射（完整的

depth_net_mono.config.json

各操作规格字段→推理函数表，以及

parent_model

parent_job_id

解析指南）位于

references/spec-param-inference.md

中。这些映射应记录在MD文件中，而非

config.json

；生成的运行器应读取该参考文档，并在

create_job()

前通过SDK助手应用映射。

tao-train-depth-anything-v2

Original

Translation

Depth Net Mono

Depth Net Mono

Train Action Policy

训练操作策略

Workflow

工作流

Prerequisites — data accessibility

前提条件——数据可访问性

Step 1 — Annotation file

步骤1——标注文件

Step 2 — Pair model_type and dataset_name based on your data

步骤2——根据数据匹配model_type和dataset_name

Step 3 — Write spec yaml from the spec overrides

步骤3——根据规格覆盖配置编写spec yaml

Step 4 — Run

步骤4——运行

Step 5 — Verify

步骤5——验证

Training Requirements

训练要求

Per-Action Dataset Requirements

各操作的数据集要求

Spec Overrides

规格覆盖配置

Eval Dataset

评估数据集

Important Parameters

重要参数

Finetuning Recipes

微调方案

Multi-GPU / Multi-Node

多GPU/多节点

Export / TRT Defaults

导出/TRT默认值

Hardware

硬件

Error Patterns

错误模式

Spec Param / Parent Model Inference

规格参数/父模型推理

Step 2 — Pair
`model_type`
and
`dataset_name`
based on your data

步骤2——根据数据匹配
`model_type`
和
`dataset_name`