tao-train-fast-foundation-stereo

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Depth Net Fast Stereo

Depth Net 快速立体视觉

Real-time stereo depth estimation using FastFoundationStereo (FFS) — the bp2 commercial distilled variant of FoundationStereo. Predicts disparity maps from rectified stereo image pairs with per-layer pruned widths for real-time inference.

The mono / stereo / fast-stereo skills share the unified TAO

depth_net

CLI; FFS is selected via

model.model_type: FastFoundationStereo

. FFS differs from

FoundationStereo

only in pruned per-layer widths and a serialized forward path; everything else (entrypoint, action verbs, dataset classes, deploy chain) is identical to

depth-net-stereo

For TAO Deploy TensorRT actions (

gen_trt_engine

, TensorRT

evaluate

, TensorRT

inference

), read

references/tao-deploy-fast-foundation-stereo.md

first. The deploy spec template lives at

references/spec_template_deploy.yaml

使用FastFoundationStereo (FFS)——FoundationStereo的bp2商用蒸馏变体，进行实时立体深度估计。通过每层剪枝宽度实现实时推理，从校正后的立体图像对中预测视差图。

单目/立体/快速立体技能共享统一的TAO

depth_net

CLI；通过设置

model.model_type: FastFoundationStereo

来选择FFS。FFS与

FoundationStereo

的区别仅在于每层剪枝宽度和序列化前向路径；其他所有内容（入口点、操作动词、数据集类、部署链）都与

depth-net-stereo

完全相同。

对于TAO Deploy TensorRT操作（

gen_trt_engine

、TensorRT

evaluate

、TensorRT

inference

），请先阅读

references/tao-deploy-fast-foundation-stereo.md

。部署规范模板位于

references/spec_template_deploy.yaml

。

Train Action Policy

训练操作策略

This model is AutoML-enabled at the model layer. Before handling any train-stage request, read

references/skill_info.yaml

and resolve the run override from either an explicit

automl_policy

value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as

automl_policy: off

for this run only; otherwise default to

auto

. When

automl_policy: auto

automl_enabled: true

, and both

schemas/train.schema.json

and

references/spec_template_train.yaml

are packaged, route the train action through

tao-skill-bank:tao-run-automl

by default with this model's

skill_dir

. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and

automl_policy

. Use direct model training only when

automl_policy: off

or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.

Non-train actions such as

evaluate

inference

export

, and deploy flows stay in this model skill. The per-run

automl_policy

override does not change model metadata.

该模型在模型层支持AutoML。处理任何训练阶段请求前，请阅读

references/skill_info.yaml

，并通过显式的

automl_policy

值或用户的工作流请求确定运行覆盖配置。将“turn off AutoML”、“disable AutoML”、“no HPO”或“plain training”这类短语视为本次运行的

automl_policy: off

；否则默认设置为

auto

。当

automl_policy: auto

、

automl_enabled: true

，且

schemas/train.schema.json

和

references/spec_template_train.yaml

已打包时，默认通过

tao-skill-bank:tao-run-automl

结合该模型的

skill_dir

路由训练操作。保留数据集、规范、输出目录、GPU/平台设置、父检查点和

automl_policy

的工作流/应用覆盖配置。仅当

automl_policy: off

或打包的训练架构/模板缺失时，才使用直接模型训练；若架构缺失，需报告AutoML已启用，但在生成架构前无法针对该模型运行。

非训练操作（如

evaluate

、

inference

、

export

和部署流程）仍在该模型技能中执行。每次运行的

automl_policy

覆盖配置不会更改模型元数据。

Two Use Cases

两种使用场景

FFS ships with a pre-trained bp2 commercial checkpoint (

model_best_bp2_serialize.pth

Raw deploy — use the bp2 ckpt as-is. Skip
```
train
```
; run
```
inference
```
/
```
evaluate
```
/
```
export
```
/
```
gen_trt_engine
```
directly with the bp2 file as the action's checkpoint.
Finetune on user data — set
```
train.pretrained_model_path
```
to the bp2 file, train on user data, then verify + deploy on the resulting ckpt. The full 7-action sequence (train → evaluate pyt → inference pyt → export → gen_trt_engine → inference deploy → evaluate deploy) is supported.

FFS附带一个预训练的bp2商用检查点（

model_best_bp2_serialize.pth

）。

直接部署——直接使用bp2检查点。跳过
```
train
```
；直接使用bp2文件作为操作的检查点，运行
```
inference
```
/
```
evaluate
```
/
```
export
```
/
```
gen_trt_engine
```
。
基于用户数据微调——将
```
train.pretrained_model_path
```
设置为bp2文件路径，在用户数据上进行训练，然后验证并部署生成的检查点。支持完整的7步操作流程（train → evaluate pyt → inference pyt → export → gen_trt_engine → inference deploy → evaluate deploy）。

Workflow

工作流

Prerequisites — data accessibility

前提条件——数据可访问性

Your dataset (left + right images + GT disparity for train / evaluate, left + right only for inference) must be reachable from inside the container:

SDK runner: place files at the S3 paths the runner resolves (
```
S3_TRAIN
```
/
```
S3_EVAL
```
placeholders shown in the spec overrides).
Direct
docker run
(e.g. local testing): mount the host dataset root read-only at the same in-container path:

docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...

The same accessibility requirement applies to the

<output_dir>

written by all actions, and to the bp2 checkpoint path.

你的数据集（训练/评估用的左+右图像+GT视差，推理用的左+右图像）必须能从容器内部访问：

SDK运行器：将文件放置在运行器可解析的S3路径（规范覆盖配置中显示的
```
S3_TRAIN
```
/
```
S3_EVAL
```
占位符）。
直接
docker run
（如本地测试）：将主机数据集根目录以只读方式挂载到容器内的相同路径：

docker run ... -v <host_data_root>:<host_data_root>:ro <container> ...

所有操作写入的

<output_dir>

以及bp2检查点路径也需满足相同的可访问性要求。

Step 1 — Annotation file

步骤1——标注文件

Per-line annotation file referenced by

data_sources[*].data_file

. Schema is identical to

depth-net-stereo

Columns	Format	Use
2	`<left> <right>`	Stereo inference (no GT)
3	`<left> <right> <disparity>`	Stereo with GT
4	`<left> <right> <disparity> <occlusion_mask>`	Stereo with GT and occlusion mask

Generate via

depth_net convert

if needed; see the

depth-net-stereo

skill for

convert_spec.yaml

template.

data_sources[*].data_file

引用的每行标注文件，其架构与

depth-net-stereo

相同：

列数	格式	用途
2	`<left> <right>`	立体推理（无GT）
3	`<left> <right> <disparity>`	带GT的立体任务
4	`<left> <right> <disparity> <occlusion_mask>`	带GT和遮挡掩码的立体任务

如有需要，可通过

depth_net convert

生成；请参考

depth-net-stereo

技能中的

convert_spec.yaml

模板。

Step 2 — Pair

model_type

and

dataset_name

based on your data

步骤2——根据数据匹配

model_type

和

dataset_name

Use

model_type: FastFoundationStereo

for FFS. The

dataset_name

choice mirrors the stereo skill — pick the dataset-specific class when your layout matches a registered one, otherwise

GenericDataset

Data category	`model_type`	`dataset_name`
Middlebury	`FastFoundationStereo`	`Middlebury`
KITTI	`FastFoundationStereo`	`Kitti`
ETH3D	`FastFoundationStereo`	`Eth3d`
FSD synthetic	`FastFoundationStereo`	`FSD`
IsaacReal synthetic	`FastFoundationStereo`	`IsaacRealDataset`
Crestereo synthetic	`FastFoundationStereo`	`Crestereo`
Other / non-canonical	`FastFoundationStereo`	`GenericDataset`

For inference with 2-column annotations (left + right, no GT), use

dataset_name: GenericDataset

regardless of layout.

FFS需使用

model_type: FastFoundationStereo

。

dataset_name

的选择与立体技能一致——当你的数据布局与已注册的数据集匹配时，选择对应数据集类，否则选择

GenericDataset

。

数据类别	`model_type`	`dataset_name`
Middlebury	`FastFoundationStereo`	`Middlebury`
KITTI	`FastFoundationStereo`	`Kitti`
ETH3D	`FastFoundationStereo`	`Eth3d`
FSD合成数据	`FastFoundationStereo`	`FSD`
IsaacReal合成数据	`FastFoundationStereo`	`IsaacRealDataset`
Crestereo合成数据	`FastFoundationStereo`	`Crestereo`
其他/非标准数据	`FastFoundationStereo`	`GenericDataset`

对于使用2列标注（左+右，无GT）的推理任务，无论布局如何，均使用

dataset_name: GenericDataset

。

Step 3 — Set the bp2 distilled width overrides

步骤3——设置bp2蒸馏宽度覆盖配置

FFS requires 15 model-section width override fields whose values match the bp2 commercial checkpoint exactly. Omitting any field falls back to TAO defaults that do not match the bp2 ckpt and produce shape-mismatch errors at forward time.

yaml

model:
  model_type: FastFoundationStereo
  encoder: vitl
  hidden_dims: [128]                    # 1-layer GRU; NOT [128,128,128]
  n_gru_layers: 1                       # bp2 single-GRU
  corr_radius: 4
  corr_levels: 2
  n_downsample: 2
  valid_iters: 8
  max_disparity: 192                    # bp2 commercial; NOT 416 (full FS default)
  volume_dim: 28                       # bp2 ckpt invariant; NOT 32 (full FS default)
  mixed_precision: false                # see references/parameters.md
  gwc_feature_normalize: true           # see references/parameters.md

  # 15 bp2 distilled width overrides — copy as-is
  motion_encoder_widths: [56, 96, 16, 12]
  motion_encoder_final: 48
  gru_hidden: 60
  gru_gating_conv_widths: [100, 168]
  disp_head_input_dim: 60
  disp_head_intermediate: 36
  disp_head_pwconv1_widths: [212, 244]
  mask_widths: [32, 16]
  stem_2_widths: [12, 16]
  spx_2_gru_widths: [16, 12, 16, 24]
  spx_gru_out: 9
  classifier_mid: 14
  cnet_conv04_widths: [60, 48]
  cam_mid_channels: 8
  cost_agg_conv_patch_padding: [0, 0, 0]

The spec templates at

references/spec_template_*.yaml

carry this block as the canonical source.

FFS需要15个模型段宽度覆盖字段，其值必须与bp2商用检查点完全匹配。省略任何字段都会回退到TAO默认值，而这些默认值与bp2检查点不匹配，会在前向传播时产生形状不匹配错误。

yaml

model:
  model_type: FastFoundationStereo
  encoder: vitl
  hidden_dims: [128]                    # 1层GRU；不可设为[128,128,128]
  n_gru_layers: 1                       # bp2单GRU
  corr_radius: 4
  corr_levels: 2
  n_downsample: 2
  valid_iters: 8
  max_disparity: 192                    # bp2商用版本；不可设为416（完整FS默认值）
  volume_dim: 28                       # bp2检查点不变量；不可设为32（完整FS默认值）
  mixed_precision: false                # 参考references/parameters.md
  gwc_feature_normalize: true           # 参考references/parameters.md

  # 15个bp2蒸馏宽度覆盖配置——原样复制
  motion_encoder_widths: [56, 96, 16, 12]
  motion_encoder_final: 48
  gru_hidden: 60
  gru_gating_conv_widths: [100, 168]
  disp_head_input_dim: 60
  disp_head_intermediate: 36
  disp_head_pwconv1_widths: [212, 244]
  mask_widths: [32, 16]
  stem_2_widths: [12, 16]
  spx_2_gru_widths: [16, 12, 16, 24]
  spx_gru_out: 9
  classifier_mid: 14
  cnet_conv04_widths: [60, 48]
  cam_mid_channels: 8
  cost_agg_conv_patch_padding: [0, 0, 0]

references/spec_template_*.yaml

中的规范模板包含此代码块，作为标准来源。

Step 4 — Write spec yaml from the spec overrides

步骤4——根据覆盖配置编写规范yaml

Copy the action block from

references/spec-overrides.md

(per-action Python override dicts plus the shared

FFS_MODEL_BLOCK

). Replace:

```
model.model_type: FastFoundationStereo
```
(already set)

dataset.<...>.data_sources[*].dataset_name

from Step 2

```
dataset.<...>.data_sources[*].data_file
```
with the path from Step 1
For raw deploy use cases (no train): set
```
<action>.checkpoint
```
to the bp2 file path
For finetune use cases: set
```
train.pretrained_model_path
```
to the bp2 file path

Chained train → next action checkpoint path: For local Docker chaining (no SDK runner), the trained checkpoint lives at

<train.results_dir>/<task>/dn_model_latest.pth

— Lightning

ModelCheckpoint

nests under the task name. Example:

train.results_dir: /workspace/results/finetune/train

produces

/workspace/results/finetune/train/train/dn_model_latest.pth

. Use that nested path for the next action's

<action>.checkpoint

. SDK-runner deploys resolve this automatically via

parent_job_id

— see

references/parent-model-inference.md

Shape consistency:

crop_size

dataset.test_dataset.augmentation.crop_size

should match

export.input_height

input_width

for end-to-end pyt-vs-deploy comparability — see

references/tao-deploy-fast-foundation-stereo.md

's shape table.

从

references/spec-overrides.md

复制操作块（每个操作的Python覆盖字典以及共享的

FFS_MODEL_BLOCK

）。替换以下内容：

```
model.model_type: FastFoundationStereo
```
（已设置）

dataset.<...>.data_sources[*].dataset_name

（来自步骤2）

```
dataset.<...>.data_sources[*].data_file
```
（来自步骤1的路径）
对于直接部署场景（无训练）：将
```
<action>.checkpoint
```
设置为bp2文件路径
对于微调场景：将
```
train.pretrained_model_path
```
设置为bp2文件路径

链式训练→后续操作的检查点路径：对于本地Docker链式调用（无SDK运行器），训练后的检查点位于

<train.results_dir>/<task>/dn_model_latest.pth

——Lightning

ModelCheckpoint

嵌套在任务名称下。示例：

train.results_dir: /workspace/results/finetune/train

会生成

/workspace/results/finetune/train/train/dn_model_latest.pth

。将此嵌套路径用于后续操作的

<action>.checkpoint

。SDK运行器部署会通过

parent_job_id

自动解析此路径——请参考

references/parent-model-inference.md

。

形状一致性：

dataset.test_dataset.augmentation.crop_size

中的

crop_size

应与

export.input_height

input_width

匹配，以确保端到端pyt与部署的可比性——请参考

references/tao-deploy-fast-foundation-stereo.md

中的形状表。

Step 5 — Run

步骤5——运行

docker run --gpus 'device=0' --shm-size 16G --ipc=host \
  --user $(id -u):$(id -g) \
  -v <data_root>:<data_root>:ro \
  -v <output_dir>:<output_dir> \
  -v <bp2_ckpt_dir>:<bp2_ckpt_dir>:ro \
  <container> \
  depth_net <action> -e <spec.yaml>

Without

--user $(id -u):$(id -g)

the container writes outputs as

nobody:nogroup

, blocking host-side cleanup / retry.

For the local bind-mount

__pycache__

caveat (QA / development only — clearing stale

.pyc

files that shadow patched source), see

references/troubleshooting.md

→ "Local bind-mount tip".

docker run --gpus 'device=0' --shm-size 16G --ipc=host \
  --user $(id -u):$(id -g) \
  -v <data_root>:<data_root>:ro \
  -v <output_dir>:<output_dir> \
  -v <bp2_ckpt_dir>:<bp2_ckpt_dir>:ro \
  <container> \
  depth_net <action> -e <spec.yaml>

如果不添加

--user $(id -u):$(id -g)

，容器会以

nobody:nogroup

身份写入输出，导致主机端无法清理/重试。

关于本地绑定挂载

__pycache__

的注意事项（仅用于QA/开发——清除过时的

.pyc

文件，避免覆盖补丁源码），请参考

references/troubleshooting.md

→“Local bind-mount tip”。

Step 6 — Verify

步骤6——验证

Container exit code 0
```
status.json
```
```
kpi
```
block populated
For
```
train
```
: inspect per-step
```
train_loss
```
directly (the entrypoint reports
```
Execution status: PASS
```
even when loss is NaN)
For
```
evaluate
```
: rely on
```
epe
```
/
```
bp1
```
/
```
bp2
```
/
```
bp3
```
/
```
d1
```
/
```
rmse
```
(the evaluator also emits
```
abs_rel
```
/
```
sq_rel
```
/
```
rmse_log
```
which are non-meaningful for stereo)
For
```
inference
```
: artifacts under
```
results_dir
```
KPI namespace difference between pyt and deploy: pyt
```
evaluate
```
writes the metric set under
```
kpi.val/epe
```
,
```
kpi.val/bp1
```
, etc. (namespaced by Lightning's
```
val/
```
prefix). Deploy
```
evaluate
```
(TRT engine path) writes the same metric set under
```
kpi.epe
```
,
```
kpi.bp1
```
, etc. (no
```
val/
```
prefix). Downstream verification scripts that read
```
status.json
```
need to handle both shapes.
Validate drift on your own dataset: if you compare TAO FFS deploy (
```
gen_trt_engine
```
+ TRT
```
evaluate
```
) against the upstream FFS deploy path on the same input, expect a small residual mean_abs disparity drift (TAO export graph + TRT 10.13 interaction; not improvable at the source-code level). The exact magnitude is dataset and hardware dependent — measure on your own data and decide whether the drift is acceptable for your downstream task.

容器退出码为0
```
status.json
```
中的
```
kpi
```
块已填充
对于
```
train
```
：直接检查每一步的
```
train_loss
```
（即使损失为NaN，入口点仍会报告
```
Execution status: PASS
```
）
对于
```
evaluate
```
：依赖
```
epe
```
/
```
bp1
```
/
```
bp2
```
/
```
bp3
```
/
```
d1
```
/
```
rmse
```
（评估器还会输出
```
abs_rel
```
/
```
sq_rel
```
/
```
rmse_log
```
，但这些对立体任务无意义）
对于
```
inference
```
：
```
results_dir
```
下存在输出 artifacts
pyt与部署的KPI命名空间差异：pyt的
```
evaluate
```
将指标集写入
```
kpi.val/epe
```
、
```
kpi.val/bp1
```
等（带有Lightning的
```
val/
```
前缀命名空间）。部署的
```
evaluate
```
（TRT引擎路径）将相同的指标集写入
```
kpi.epe
```
、
```
kpi.bp1
```
等（无
```
val/
```
前缀）。读取
```
status.json
```
的下游验证脚本需要处理这两种格式。
在自有数据集上验证漂移：如果在相同输入上比较TAO FFS部署（
```
gen_trt_engine
```
+TRT
```
evaluate
```
）与上游FFS部署路径，会存在微小的平均绝对视差残留漂移（由TAO导出图+TRT 10.13交互导致；无法在源码层面改进）。漂移的具体幅度取决于数据集和硬件——请在自有数据上测量，并判断漂移是否满足下游任务要求。

7-action deploy flow

7步部署流程

train (optional)            → finetuned ckpt
evaluate (pyt)              → PyT eager EPE / bp on val GT
inference (pyt)             → PyT eager disparity samples (visual sanity)
export                      → static fp32 ONNX (recommended at 480×736 or 320×736)
gen_trt_engine             → fp16 TRT engine on static ONNX path
inference (deploy)         → TRT disparity samples
evaluate (deploy)          → TRT EPE / bp drift vs PyT eager fp32

Skip

train

for raw-bp2 deploy. The remaining 6 actions (or the 4 deploy-only verbs starting from

export

) cover both use cases.

Full TAO Deploy reference: tao-deploy-fast-foundation-stereo.

train（可选）            → 微调后的检查点
evaluate（pyt）              → PyT eager模式下基于验证集GT的EPE/bp指标
inference（pyt）             → PyT eager模式下的视差样本（视觉合理性检查）
export                      → 静态fp32 ONNX（推荐尺寸为480×736或320×736）
gen_trt_engine             → 基于静态ONNX路径生成fp16 TRT引擎
inference（deploy）         → TRT视差样本
evaluate（deploy）          → TRT EPE/bp与PyT eager fp32的漂移对比

直接使用bp2部署时跳过

train

。剩余6步操作（或从

export

开始的4步仅部署操作）可覆盖两种使用场景。

完整TAO Deploy参考：tao-deploy-fast-foundation-stereo。

Training Requirements

训练要求

Valid
dataset_name
values for stereo
data_sources
(case-insensitive):

FSD

IsaacRealDataset

Crestereo

Middlebury

Eth3d

Kitti

GenericDataset

Monitoring metric: val/loss

立体

data_sources

的有效

dataset_name

值（大小写不敏感）：

FSD

、

IsaacRealDataset

、

Crestereo

、

Middlebury

、

Eth3d

、

Kitti

、

GenericDataset

监控指标：val/loss

Per-Action Dataset Requirements

各操作的数据集要求

Action	Spec Key	Source	Files	List?
evaluate	dataset.test_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	Yes
inference	dataset.infer_dataset.data_sources	inference_dataset	data_file: annotations.txt + dataset_name	Yes
train	dataset.train_dataset.data_sources	train_datasets	data_file: annotations.txt + dataset_name	Yes
train	dataset.val_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	Yes

Data source overrides are mandatory for every action. Each

data_sources

entry needs both

data_file

and

dataset_name

. The

model.*

width fields from Step 3 are also mandatory. See

references/spec-overrides.md

for the complete per-action override dicts (train finetune, raw-bp2 evaluate / inference / export) and the shared

FFS_MODEL_BLOCK

操作	规范键	来源	文件	是否为列表？
evaluate	dataset.test_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	是
inference	dataset.infer_dataset.data_sources	inference_dataset	data_file: annotations.txt + dataset_name	是
train	dataset.train_dataset.data_sources	train_datasets	data_file: annotations.txt + dataset_name	是
train	dataset.val_dataset.data_sources	eval_dataset	data_file: annotations.txt + dataset_name	是

数据源覆盖配置对每个操作都是必填项。每个

data_sources

条目都需要

data_file

和

dataset_name

。步骤3中的

model.*

宽度字段也是必填项。请参考

references/spec-overrides.md

获取完整的每个操作覆盖字典（训练微调、直接bp2评估/推理/导出）以及共享的

FFS_MODEL_BLOCK

。

Eval Dataset

评估数据集

Optional. Val dataset configured via

dataset.val_dataset.data_sources

(each entry needs

data_file

and

dataset_name

可选。验证数据集通过

dataset.val_dataset.data_sources

配置（每个条目需要

data_file

和

dataset_name

）。

Parameters, Metrics, Hardware

参数、指标、硬件

See

references/parameters.md

for the full parameter glossary (

model.*

dataset.*

train.*

knobs including

max_disparity: 192

gwc_feature_normalize: true

mixed_precision: false

volume_dim: 28

valid_iters

save_raw_pfm

), the evaluation-metric table (

epe

bp1

bp2

bp3

d1

rmse

are meaningful;

abs_rel

sq_rel

rmse_log

are not), multi-GPU / multi-node spec keys, and hardware requirements.

请参考

references/parameters.md

获取完整的参数术语表（

model.*

dataset.*

train.*

参数，包括

max_disparity: 192

、

gwc_feature_normalize: true

、

mixed_precision: false

、

volume_dim: 28

、

valid_iters

、

save_raw_pfm

）、评估指标表（

epe

bp1

bp2

bp3

d1

rmse

有意义；

abs_rel

sq_rel

rmse_log

无意义）、多GPU/多节点规范键以及硬件要求。

Export / TRT Defaults

导出/TRT默认值

export

always emits a fp32 ONNX regardless of

model.mixed_precision

; the fp16 vs fp32 selection happens at

gen_trt_engine

via

gen_trt_engine.tensorrt.data_type

. Recommended TRT precision for FFS-bp2 is

fp16

on the static-shape ONNX path (lowest drift). The dynamic-shape path supports both

fp32

(default; static-fp32 parity) and

fp16

(latency-critical multi-resolution; higher drift, may NaN under some checkpoint states — fall back to fp32 if observed).

See

references/export-trt-defaults.md

for the full TRT/ONNX defaults and the four-way export use-case matrix (

export.batch_size

export.dynamic_hw

; dynamic H/W is FFS-only). See

references/tao-deploy-fast-foundation-stereo.md

for the deployment matrix and static-vs-dynamic shape guidance.

无论

model.mixed_precision

设置如何，

export

始终输出fp32 ONNX；fp16与fp32的选择在

gen_trt_engine

阶段通过

gen_trt_engine.tensorrt.data_type

完成。FFS-bp2推荐的TRT精度为静态形状ONNX路径下的

fp16

（漂移最小）。动态形状路径支持

fp32

（默认；与静态fp32性能一致）和

fp16

（延迟敏感的多分辨率场景；漂移更高，在某些检查点状态下可能产生NaN——若出现此情况，回退到fp32）。

请参考

references/export-trt-defaults.md

获取完整的TRT/ONNX默认值以及四种导出场景矩阵（

export.batch_size

export.dynamic_hw

；动态H/W为FFS独有）。请参考

references/tao-deploy-fast-foundation-stereo.md

获取部署矩阵以及静态与动态形状的指导说明。

Troubleshooting

故障排除

See

references/troubleshooting.md

for error patterns and fixes, including

shape mismatch

at forward (missing width override), missing

gwc_feature_normalize

(TAO Core too old),

dynamic_hw: true

warning on FS / mono export,

Key 'encoder' not in 'StereoBackBone'

, missing

dataset_name

data_sources

, negative disparity, larger-than-expected disparity drift (missing

max_disparity: 192

depth_net_stereo: not found

, decorative pyt-eval

crop_size

, the cosmetic

Failed to import SAM3

warning, and silent dynamic-deploy stride-incompatibility.

请参考

references/troubleshooting.md

获取错误模式及修复方法，包括前向传播时的

shape mismatch

（缺失宽度覆盖配置）、缺失

gwc_feature_normalize

（TAO Core版本过旧）、FS/单目导出时的

dynamic_hw: true

警告、

Key 'encoder' not in 'StereoBackBone'

、

data_sources

中缺失

dataset_name

、负视差、超出预期的视差漂移（缺失

max_disparity: 192

）、

depth_net_stereo: not found

、pyt评估中的装饰性

crop_size

、

Failed to import SAM3

警告（仅 cosmetic）以及动态部署中的静默步长不兼容问题。

Spec Param / Parent Model Inference

规范参数/父模型推理

Model-specific inference mappings belong in this skill, not in

config.json

. Generated runners should apply the mappings with SDK helpers before

create_job()

. See

references/parent-model-inference.md

for the full per-action spec-field → inference-function mapping table.

For

parent_model

parent_model_folder

, pass the upstream train / export / AutoML child job id as

parent_job_id

. The SDK lists the parent result folder, filters checkpoint artifacts, and returns the selected model file or folder. For raw-bp2 use cases without a parent train job, set the

<action>.checkpoint

field explicitly to the bp2 file path. Do not patch generated runner scripts to guess checkpoint paths.

模型特定的推理映射属于该技能，而非

config.json

。生成的运行器应在

create_job()

前通过SDK助手应用这些映射。请参考

references/parent-model-inference.md

获取完整的每个操作规范字段→推理函数映射表。

对于

parent_model

或

parent_model_folder

，将上游训练/导出/AutoML子任务ID作为

parent_job_id

传入。SDK会列出父结果文件夹，过滤检查点 artifacts，并返回选中的模型文件或文件夹。对于无父训练任务的直接bp2场景，需显式将

<action>.checkpoint

字段设置为bp2文件路径。请勿修改生成的运行器脚本以猜测检查点路径。