tao-train-single-step

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Normal Train

常规训练

Standard supervised fine-tuning: train a model on a labeled dataset, optionally evaluate, then optionally export. The most common TAO workflow for adapting a pretrained model to a new dataset.

标准监督微调：在标注数据集上训练模型，可选择进行评估，然后可选择导出。这是将预训练模型适配到新数据集时最常用的TAO工作流。

Steps

步骤

train — executed through AutoML when the selected model has
```
automl_enabled: true
```
and
```
automl_policy
```
is
```
auto
```
; set
```
automl_policy=off
```
for a plain single training run
eval — executed if
```
eval_dataset_uri
```
is resolved
export — optional, on user request after training

训练 — 当所选模型设置
```
automl_enabled: true
```
且
```
automl_policy
```
为
```
auto
```
时，通过AutoML执行；若要进行普通单次训练，设置
```
automl_policy=off
```
评估 — 若
```
eval_dataset_uri
```
已解析则执行
导出 — 可选操作，在训练后根据用户请求执行

Prerequisites

前提条件

Required

必填项

model: A compatible TAO model (e.g., clip, nvdinov2, grounding_dino)
train_dataset_uri: URI of the training dataset (e.g.,
```
s3://bucket/train/
```
)

platform: Ask from the generated supported-platform list:

${TAO_SKILL_BANK_PATH:-~/tao-skills-external}/scripts/list_tao_platforms.py --format text

container image confirmation: resolve the default image from the selected model/action config, show it to the user, and require confirmation or
```
image=<override>
```
before creating runner files or submitting training.

模型：兼容的TAO模型（例如：clip, nvdinov2, grounding_dino）
train_dataset_uri：训练数据集的URI（例如：
```
s3://bucket/train/
```
）

平台：从生成的支持平台列表中选择，生成命令：

${TAO_SKILL_BANK_PATH:-~/tao-skills-external}/scripts/list_tao_platforms.py --format text

容器镜像确认：从所选模型/操作配置中解析默认镜像，展示给用户，在创建运行器文件或提交训练前，需用户确认或指定
```
image=<override>
```
。

Optional

可选项

eval_dataset_uri: Some model skills mark this as required — check the resolved model skill before treating it as optional.
base_checkpoint: If not provided, defaults to the NGC pretrained checkpoint listed in the model skill, or trains from scratch if no NGC checkpoint exists.
automl_policy:
```
auto
```
by default; set
```
off
```
to bypass model-level AutoML for this run while leaving model metadata unchanged.
image override: Use
```
image=<override>
```
to pin a specific TAO toolkit build after reviewing the resolved default.

eval_dataset_uri：部分模型技能将其标记为必填项——在将其视为可选项前，请检查已解析的模型技能。
base_checkpoint：若未提供，默认使用模型技能中列出的NGC预训练检查点；若不存在NGC检查点，则从头开始训练。
automl_policy：默认值为
```
auto
```
；设置为
```
off
```
可在本次运行中绕过模型级AutoML，同时保持模型元数据不变。
镜像覆盖：查看解析后的默认镜像后，可使用
```
image=<override>
```
来固定特定TAO toolkit版本。

Launch Intake

启动引导

After the user confirms they want this standard train/eval/export workflow, ask which supported platform they intend to run on. Generate the choices with

scripts/list_tao_platforms.py --format text

; do not scan platform docs or folders.

Before creating a plain train runner, inspect the selected model's metadata with

scripts/list_tao_models.py --scope automl --format json

or read

skills/models/<network>/references/skill_info.yaml

. If

automl_enabled

is true and the helper reports a valid train schema for that model, route the train stage through

skills/applications/tao-run-automl

by default. Only stay on the plain train path when

automl_policy=off

, the user explicitly asks for no HPO/AutoML, or AutoML is enabled but not runnable because the model's train schema is not packaged yet.

Also ask whether long-running monitoring should stay enabled and how many minutes between status updates. Defaults: enabled, 5 minutes.

After the model/action are known, run

scripts/resolve_tao_image.py --model <network> --action train --format text

and ask whether to use the resolved image or an

image=<override>

. Do not create the tao-train-single-step runner until the image is confirmed.

After platform selection, run

scripts/list_tao_platforms.py --platform <platform> --format text

and ask only for credentials relevant to that platform, plus any selected-model credentials. Do not ask for unrelated platform credentials.

在用户确认需要此标准训练/评估/导出工作流后，询问其计划运行的支持平台。使用

scripts/list_tao_platforms.py --format text

生成选项列表；请勿扫描平台文档或文件夹。

在创建普通训练运行器之前，使用

scripts/list_tao_models.py --scope automl --format json

或读取

skills/models/<network>/references/skill_info.yaml

来检查所选模型的元数据。如果

automl_enabled

为true且辅助工具报告该模型有有效的训练架构，则默认通过

skills/applications/tao-run-automl

路由训练阶段。仅当

automl_policy=off

、用户明确要求不使用HPO/AutoML，或AutoML已启用但因模型训练架构尚未打包而无法运行时，才保留在普通训练路径。

同时询问是否保持长时间运行监控启用，以及状态更新的间隔分钟数。默认设置：启用，5分钟。

确定模型/操作后，运行

scripts/resolve_tao_image.py --model <network> --action train --format text

，询问用户是使用解析后的镜像还是指定

image=<override>

。在镜像确认前，请勿创建tao-train-single-step运行器。

选择平台后，运行

scripts/list_tao_platforms.py --platform <platform> --format text

，仅询问与该平台相关的凭证，以及所选模型所需的凭证。请勿询问无关平台的凭证。