tao-train-single-step
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNormal Train
常规训练
Standard supervised fine-tuning: train a model on a labeled dataset, optionally evaluate, then optionally export. The most common TAO workflow for adapting a pretrained model to a new dataset.
标准监督微调:在标注数据集上训练模型,可选择进行评估,然后可选择导出。这是将预训练模型适配到新数据集时最常用的TAO工作流。
Steps
步骤
- train — executed through AutoML when the selected model has
and
automl_enabled: trueisautoml_policy; setautofor a plain single training runautoml_policy=off - eval — executed if is resolved
eval_dataset_uri - export — optional, on user request after training
- 训练 — 当所选模型设置且
automl_enabled: true为automl_policy时,通过AutoML执行;若要进行普通单次训练,设置autoautoml_policy=off - 评估 — 若已解析则执行
eval_dataset_uri - 导出 — 可选操作,在训练后根据用户请求执行
Prerequisites
前提条件
Required
必填项
- model: A compatible TAO model (e.g., clip, nvdinov2, grounding_dino)
- train_dataset_uri: URI of the training dataset (e.g., )
s3://bucket/train/ - platform: Ask from the generated supported-platform list:
${TAO_SKILL_BANK_PATH:-~/tao-skills-external}/scripts/list_tao_platforms.py --format text - container image confirmation: resolve the default image from the selected
model/action config, show it to the user, and require confirmation or
before creating runner files or submitting training.
image=<override>
- 模型:兼容的TAO模型(例如:clip, nvdinov2, grounding_dino)
- train_dataset_uri:训练数据集的URI(例如:)
s3://bucket/train/ - 平台:从生成的支持平台列表中选择,生成命令:
${TAO_SKILL_BANK_PATH:-~/tao-skills-external}/scripts/list_tao_platforms.py --format text - 容器镜像确认:从所选模型/操作配置中解析默认镜像,展示给用户,在创建运行器文件或提交训练前,需用户确认或指定。
image=<override>
Optional
可选项
- eval_dataset_uri: Some model skills mark this as required — check the resolved model skill before treating it as optional.
- base_checkpoint: If not provided, defaults to the NGC pretrained checkpoint listed in the model skill, or trains from scratch if no NGC checkpoint exists.
- automl_policy: by default; set
autoto bypass model-level AutoML for this run while leaving model metadata unchanged.off - image override: Use to pin a specific TAO toolkit build after reviewing the resolved default.
image=<override>
- eval_dataset_uri:部分模型技能将其标记为必填项——在将其视为可选项前,请检查已解析的模型技能。
- base_checkpoint:若未提供,默认使用模型技能中列出的NGC预训练检查点;若不存在NGC检查点,则从头开始训练。
- automl_policy:默认值为;设置为
auto可在本次运行中绕过模型级AutoML,同时保持模型元数据不变。off - 镜像覆盖:查看解析后的默认镜像后,可使用来固定特定TAO toolkit版本。
image=<override>
Launch Intake
启动引导
After the user confirms they want this standard train/eval/export workflow,
ask which supported platform they intend to run on. Generate the choices with
; do not scan platform docs or
folders.
scripts/list_tao_platforms.py --format textBefore creating a plain train runner, inspect the selected model's metadata
with or read
. If is true and
the helper reports a valid train schema for that model, route the train stage
through by default. Only stay on the plain train path
when , the user explicitly asks for no HPO/AutoML, or AutoML
is enabled but not runnable because the model's train schema is not packaged
yet.
scripts/list_tao_models.py --scope automl --format jsonskills/models/<network>/references/skill_info.yamlautoml_enabledskills/applications/tao-run-automlautoml_policy=offAlso ask whether long-running monitoring should stay enabled and how many
minutes between status updates. Defaults: enabled, 5 minutes.
After the model/action are known, run and ask whether to use the resolved
image or an . Do not create the tao-train-single-step runner until the
image is confirmed.
scripts/resolve_tao_image.py --model <network> --action train --format textimage=<override>After platform selection, run
and ask
only for credentials relevant to that platform, plus any selected-model
credentials. Do not ask for unrelated platform credentials.
scripts/list_tao_platforms.py --platform <platform> --format text在用户确认需要此标准训练/评估/导出工作流后,询问其计划运行的支持平台。使用生成选项列表;请勿扫描平台文档或文件夹。
scripts/list_tao_platforms.py --format text在创建普通训练运行器之前,使用或读取来检查所选模型的元数据。如果为true且辅助工具报告该模型有有效的训练架构,则默认通过路由训练阶段。仅当、用户明确要求不使用HPO/AutoML,或AutoML已启用但因模型训练架构尚未打包而无法运行时,才保留在普通训练路径。
scripts/list_tao_models.py --scope automl --format jsonskills/models/<network>/references/skill_info.yamlautoml_enabledskills/applications/tao-run-automlautoml_policy=off同时询问是否保持长时间运行监控启用,以及状态更新的间隔分钟数。默认设置:启用,5分钟。
确定模型/操作后,运行,询问用户是使用解析后的镜像还是指定。在镜像确认前,请勿创建tao-train-single-step运行器。
scripts/resolve_tao_image.py --model <network> --action train --format textimage=<override>选择平台后,运行
,仅询问与该平台相关的凭证,以及所选模型所需的凭证。请勿询问无关平台的凭证。
scripts/list_tao_platforms.py --platform <platform> --format text