tao-finetune-huggingface-model
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<!--
Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!--
Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
tao-finetune-huggingface-model
tao-finetune-huggingface-model
Local NVIDIA GPU fine-tuning for HuggingFace models, grounded in live-fetched
documentation with curated references as a fallback safety net. One NGC
container, a small set of focused scripts, one push to HF Hub. Behavior is
governed by the rules in this file — follow them, do not improvise.
Order of authority (highest first): (1) user input → (2) live research
(model card, HF repo example, author script, task docs, paper — always fetched,
Step 3) → (3) curated (fallback when live research is silent) →
(4) training-data memory (last resort, suspect). On conflict, live research wins
for the specific model + current API. See for the
full order and conflict-resolution rules.
references/*.mdreferences/core-rules.md基于实时获取的文档对HuggingFace模型进行本地NVIDIA GPU微调,同时将精心整理的参考资料作为备用保障。仅需一个NGC容器、少量针对性脚本,即可一键推送到HF Hub。所有操作需遵循本文档中的规则,不得随意变通。
权威优先级(从高到低): (1) 用户输入 → (2) 实时调研(模型卡片、HF仓库示例、作者脚本、任务文档、论文——始终在步骤3获取) → (3) 精心整理的文件(实时调研无结果时的备选方案) → (4) 训练数据记忆(最后手段,可信度低)。若出现冲突,针对特定模型+当前API的实时调研结果优先。完整优先级及冲突解决规则请查看。
references/*.mdreferences/core-rules.mdInputs
输入参数
Required:
- — HuggingFace model ID, e.g.
model_idgoogle/vit-base-patch16-224
Conditional credentials (loaded by the SessionStart hook from ):
~/.config/tao/.env- — only when the model/dataset is gated (read) or
HF_TOKENis on (write); public +push_to_hubruns don't need it. The agent never reads the value — only checks presence withpush_to_hub: false.[ -n "$HF_TOKEN" ] - ,
WANDB_API_KEY— only when WandB is enabled; setWANDB_PROJECTto opt out.WANDB_MODE=disabled
Dataset — exactly one:
- — HuggingFace dataset ID (source:
dataset_id)hf - — local folder or file (source:
local_dataset_path); optionallocal∈ {auto, imagefolder, coco, voc, jsonl, arrow, parquet, csv} (default auto-detect).local_dataset_format - (omit) — agent recommends popular datasets (source: )
recommend
Optional (have defaults): (auto-detected); ,
, , ; ;
(push target; if unset and HF_TOKEN has write access,
auto-derived as );
(set to skip); (skip zero-shot baseline eval).
task_typen_train=10000n_eval=1000n_epochs=3lora_r=16output_dir=./output/<model_short_name>hf_model_repo<whoami>/<model_short_name>-finetunedpush_to_hub=TrueFalseskip_baseline=FalseOptional deliverables (off by default): →
(per-step ✅/⚠️/❌ journal); →
with curves & samples; →
with fake-data heterogeneous-batch tests.
emit_progress_logoutput_dir/PROGRESS.mdemit_reportreports/report.{pdf,html}emit_unit_teststests/All values live in . Never hardcode in Python.
output_dir/config.yaml必填项:
- — HuggingFace模型ID,例如
model_idgoogle/vit-base-patch16-224
条件性凭据(由SessionStart钩子从加载):
~/.config/tao/.env- — 仅当模型/数据集为** gated(受限访问)**或
HF_TOKEN开启时需要;公开模型且push_to_hub的运行无需此凭据。Agent不会读取具体值,仅通过push_to_hub: false检查是否存在。[ -n "$HF_TOKEN" ] - ,
WANDB_API_KEY— 仅当启用WandB时需要;设置WANDB_PROJECT可选择退出。WANDB_MODE=disabled
数据集(三选一):
- — HuggingFace数据集ID (来源:
dataset_id)hf - — 本地文件夹或文件 (来源:
local_dataset_path);可选参数local∈ {auto, imagefolder, coco, voc, jsonl, arrow, parquet, csv}(默认自动检测)。local_dataset_format - (留空) — Agent将推荐热门数据集 (来源:)
recommend
可选参数(含默认值): (自动检测);、、、;;(推送目标;若未设置且HF_TOKEN具备写入权限,将自动生成为);(设置为可跳过推送);(跳过零样本基线评估)。
task_typen_train=10000n_eval=1000n_epochs=3lora_r=16output_dir=./output/<model_short_name>hf_model_repo<whoami>/<model_short_name>-finetunedpush_to_hub=TrueFalseskip_baseline=False可选交付物(默认关闭): → 生成(每步骤✅/⚠️/❌日志); → 生成(包含曲线与样本); → 生成目录(含基于伪造数据的异构批量测试)。
emit_progress_logoutput_dir/PROGRESS.mdemit_reportreports/report.{pdf,html}emit_unit_teststests/所有参数值均存储在中,严禁硬编码到Python代码内。
output_dir/config.yamlExecution platform
执行平台
This skill orchestrates what to run; the platform skills own how (read them
first, do not redraft their conventions here):
(GPU host runtime — driver 580, CUDA Toolkit 13.0, NVIDIA Container Toolkit
1.19.0),
( flags, NGC auth, , mounts, env passthrough,
/, error modes), and
(local Docker job preflight — daemon reachable, GPU smoke).
tao-setup-nvidia-gpu-hosttao-run-on-dockerdocker run--gpus--ipc=host--shm-sizetao-run-on-local-dockerDefault platform: — build a one-off image
() and run it on the local Docker daemon. Ask only if the
user needs a different backend (Brev, Lepton/SLURM/Kubernetes). See
for that path plus the alternate-backend
routing, the GPU-runtime preflight, the credentials policy, and the
conventions.
local-dockerrun-<short>:latestreferences/execution-platform.mddocker run本技能负责编排要执行的内容;平台技能负责处理执行方式(请先阅读这些技能文档,不得在此重写其约定):
(GPU主机运行时——驱动版本580、CUDA Toolkit 13.0、NVIDIA Container Toolkit 1.19.0)、(参数、NGC认证、、挂载、环境变量传递、/、错误模式)以及(本地Docker任务预检查——守护进程可达、GPU冒烟测试)。
tao-setup-nvidia-gpu-hosttao-run-on-dockerdocker run--gpus--ipc=host--shm-sizetao-run-on-local-docker默认平台: — 构建一次性镜像()并在本地Docker守护进程上运行。仅当用户需要其他后端(Brev、Lepton/SLURM/Kubernetes)时询问。请查看获取该路径及备选后端路由、GPU运行时预检查、凭据策略和约定。
local-dockerrun-<short>:latestreferences/execution-platform.mddocker runReferences — fallback safety net
参考资料——备用保障
Curated are consulted only when live research is silent,
ambiguous, or unavailable; live docs always win for the specific model + current
API. The workflow steps below link the file each step needs directly. Before
falling back, log the live source you tried and why it was insufficient (in
, and PROGRESS.md if enabled). markers in
/ are a research checklist, not code to inline —
if a block has no Step 3 finding, refetch the listed URL.
references/*.mdconfig.yamlnotes:[FETCH LIVE]cv-scripts.mdvlm-scripts.mdSee for the complete index — every always-on
reference plus the three opt-in ones gated by a flag ( ←
, ← , ←
), each with its per-step role.
references/reference-index.mdprogress-tracking.mdemit_progress_logtesting.mdemit_unit_testsreporting.mdemit_report仅当实时调研无结果、存在歧义或无法获取时,才会参考精心整理的文件;针对特定模型+当前API的实时文档始终优先。以下工作流步骤会直接链接各步骤所需的文件。在使用备选方案前,请记录尝试过的实时来源及不足原因(写入的字段,若启用则同时写入PROGRESS.md)。/中的标记是调研检查清单,而非要嵌入的代码——若某模块无步骤3的调研结果,请重新获取列出的URL。
references/*.mdconfig.yamlnotes:cv-scripts.mdvlm-scripts.md[FETCH LIVE]完整索引请查看——包含所有默认启用的参考资料,以及三个由标志控制的可选参考资料( ← 、 ← 、 ← ),每个资料都标注了其在各步骤中的作用。
references/reference-index.mdprogress-tracking.mdemit_progress_logtesting.mdemit_unit_testsreporting.mdemit_reportCore rules
核心规则
The non-negotiable behaviors. Full text in .
Short version:
references/core-rules.md- Your HF-library knowledge is outdated. Fetch live docs before writing any ML code; never generate trainer args / collator / transforms from memory (Step 3).
- Smoke-test on real data with before any full run.
--max_steps 1 - Never silently substitute model_id, dataset_id, or training_method — stop and ask.
- Error recovery is minimal-change. OOM → halve batch, double grad_accum, enable gradient checkpointing (don't switch to LoRA without approval); NaN → reduce LR 10×; flat loss → inspect collator; same error 3× → stop and ask.
- Dataset columns verified BEFORE the collator. Rename → ; restructuring → stop and ask.
prepare_data.py - Hardware sizing (bf16): ≤3B → 24 GB, 7–13B → 80 GB, 30B+ → multi-GPU or LoRA on 1× 80 GB, 70B+ → 8× 80 GB or LoRA. Won't fit + no LoRA request → ask.
references/core-rules.md不可协商的行为准则。完整内容请查看。精简版:
references/core-rules.md- 你的HF库知识已过时。 在编写任何ML代码前请获取实时文档;切勿凭记忆生成训练器参数/整理器/转换逻辑(步骤3)。
- 在全量运行前,使用真实数据执行冒烟测试。
--max_steps 1 - 切勿擅自替换 model_id、dataset_id或训练方法——停止操作并询问用户。
- 错误恢复需最小化改动。 内存不足(OOM)→ 将批量大小减半、加倍梯度累积、启用梯度检查点(未经批准不得切换到LoRA);出现NaN→将学习率降低10倍;损失持平→检查整理器;同一错误出现3次→停止操作并询问用户。
- 数据集列需在整理器前验证。 重命名→修改;结构调整→停止操作并询问用户。
prepare_data.py - 硬件规格(bf16精度): ≤3B参数→24 GB显存,7–13B→80 GB,30B+→多GPU或在单张80 GB显存GPU上使用LoRA,70B+→8张80 GB显存GPU或使用LoRA。若无法容纳且未请求LoRA→询问用户。
references/core-rules.mdWorkflow — 6 steps
工作流——6个步骤
Single pass, sequential. Each step has a clear gate before the next begins.
单轮顺序执行,每个步骤完成后需通过明确的检查点才能进入下一步。
Step 1 — Inspect & qualify
步骤1 — 检查与验证
Decide whether to proceed at all. 1a. Probe model and 1b. Probe dataset
via two CPU-only containerized probes (no host Python
prereqs): the model probe reports , , , head
counts; the dataset probe verifies loadability + column schema. Detect
from + + card body (card silent on
→ , log under ). For
, present 3–5 picks from
; for , use
loaders. 1c. Accept/reject, 1d. walk
recording matches in
, then 1e. write the skeleton.
python:3.12-slimmodel_typearchitecturestagstaskarchitecturestagsAutoModelFor...references/model-discovery.mdnotes:source = recommendreferences/dataset-recommendations.mdsource = localreferences/dataset-sources.mdreferences/compat-workarounds.mdconfig.yamlapplicable_workarounds:config.yamlSee for the full probe scripts +
invocations, the Docker-daemon preflight, prerequisites (, optional
/, default
bind-mounted by Steps 4–5), dataset-column verification + rename rule, the full
reject criteria, compat-walk detail, the exact skeleton, and cleanup.
references/step1-probes.mddocker runMODEL_IDDATASET_IDHF_TOKENOUTPUT_DIR./output/<model_short_name>.probeGate: exists with model, dataset, task, applicable_workarounds.
Do not proceed if any field is missing.
config.yaml决定是否继续执行。1a. 探测模型和1b. 探测数据集通过两个基于容器的CPU-only探测任务完成(无需主机Python环境):模型探测将返回、、、头部数量;数据集探测将验证可加载性及列结构。从++卡片内容中检测(若卡片未提及→参考,并记录在字段)。若,从中展示3–5个推荐数据集;若,使用中的加载器。1c. 接受/拒绝,1d. 遍历并将匹配项记录在的字段中,然后1e. 编写框架。
python:3.12-slimmodel_typearchitecturestagsarchitecturestagstaskAutoModelFor...references/model-discovery.mdnotes:source = recommendreferences/dataset-recommendations.mdsource = localreferences/dataset-sources.mdreferences/compat-workarounds.mdconfig.yamlapplicable_workarounds:config.yaml完整探测脚本+调用、Docker守护进程预检查、先决条件(、可选/、步骤4–5将默认挂载的为)、数据集列验证+重命名规则、完整拒绝标准、兼容性遍历细节、精确框架及清理操作请查看。
docker runMODEL_IDDATASET_IDHF_TOKENOUTPUT_DIR./output/<model_short_name>.probereferences/step1-probes.md检查点: 已存在,且包含模型、数据集、任务、applicable_workarounds字段。若任何字段缺失,不得继续执行。
config.yamlStep 2 — Hardware audit & NGC image
步骤2 — 硬件审计与NGC镜像选型
Verify Docker + GPU + disk, pick the NGC PyTorch image live, finalize
hardware-dependent compat rules. 2a. Audit (hard gate) via
(driver branch 580, CUDA Toolkit 13.0,
NVIDIA Container Toolkit 1.19.0); on failure ask to authorize the install, then
re-run; soft-warn on free disk; check only the credentials this run
needs; do not proceed to Step 4 on a hard-fail; record ,
, , . 2b. Pick NGC image (live) —
highest-versioned PyTorch NGC image with and
container CUDA host CUDA Toolkit (never reject for an //
suffix); WebFetch fail → fallback. 2c.
Re-evaluate -dependent compat rules. 2d. Model-fit check — bf16
; if > 60% of , recommend
LoRA.
tao-setup-nvidia-gpu-host --check-only< 100 GBgpu_countgpu_namedriver_majorvram_gb_per_gpuMin driver ≤ driver_major≤aNbNrcNreferences/hardware-container.mdhwparam_bytes ≈ 2×param_countvram_gb_per_gpu × 1e9See for the full audit script, the soft-warn
references/hardware-audit-ngc.md- override, live-selection rules, the support-matrix WebFetch URL, the
MIN_DISK_GB/ SDPA+GQA24.09-py3fallback, and theattn_implementation: "eager"failure note.could not select device driver
Gate: has , , , ,
. Hardware-dependent compat fixes are recorded.
config.yamlngc_imagegpu_countgpu_namedriver_majorvram_gb_per_gpu验证Docker+GPU+磁盘,实时选择NGC PyTorch镜像,最终确定硬件相关兼容性规则。**2a. 审计(硬性检查点)**通过完成(驱动分支580、CUDA Toolkit 13.0、NVIDIA Container Toolkit 1.19.0);若失败,询问用户是否授权安装,然后重新运行;磁盘可用空间时发出软警告;仅检查本次运行所需的凭据;若审计失败,不得进入步骤4;记录、、、。2b. 实时选择NGC镜像——选择最高版本的PyTorch NGC镜像,要求且容器CUDA版本主机CUDA Toolkit版本(不得因//后缀拒绝);若WebFetch失败→使用中的备选方案。2c. 重新评估依赖的兼容性规则。2d. 模型适配检查——bf16精度下;若超过的60%,推荐使用LoRA。
tao-setup-nvidia-gpu-host --check-only< 100 GBgpu_countgpu_namedriver_majorvram_gb_per_gpuMin driver ≤ driver_major≤aNbNrcNreferences/hardware-container.mdhwparam_bytes ≈ 2×param_countvram_gb_per_gpu × 1e9完整审计脚本、软警告+覆盖规则、实时选择规则、支持矩阵WebFetch URL、/SDPA+GQA的备选方案、失败说明请查看。
MIN_DISK_GB24.09-py3attn_implementation: "eager"could not select device driverreferences/hardware-audit-ngc.md检查点: 包含、、、、字段。硬件相关兼容性修复已记录。
config.yamlngc_imagegpu_countgpu_namedriver_majorvram_gb_per_gpuStep 3 — Research the recipe
步骤3 — 调研训练方案
Fetch the live recipe — the agent's // memory is
suspect, so Step 3 is non-negotiable. Walk
in priority order (Priority 1 → 6).
Stop once you have, for the detected task: the / processor class,
train + eval transforms, collator, , and hyperparameter hints
(LR, batch size, epochs, scheduler). Record findings in and
append source URLs to . If a slot has no live
finding, fall back to the matching scaffold ( /
) and log "fallback to scaffold — no live source for <slot>"
under . Conflict-resolution rules: .
transformerstrlpeftreferences/research-priorities.mdAutoModelcompute_metricsmeta/recipe.mdconfig.yaml: research_sources:cv-scripts.mdvlm-scripts.mdnotes:references/research-priorities.mdGate: every required slot above is filled, with a source URL or an explicit
scaffold-fallback note.
获取实时训练方案——Agent的//相关记忆可信度低,因此步骤3为必填项。按优先级顺序遍历(优先级1→6)。
当获取到针对检测任务的以下内容后停止:/处理器类、训练+评估转换逻辑、整理器、以及超参数提示(学习率、批量大小、 epoch数、调度器)。将调研结果记录在中,并将来源URL追加到字段。若某模块无实时调研结果,使用匹配的框架(/)并在字段记录"fallback to scaffold — no live source for <slot>"。冲突解决规则请查看。
transformerstrlpeftreferences/research-priorities.mdAutoModelcompute_metricsmeta/recipe.mdconfig.yaml: research_sources:cv-scripts.mdvlm-scripts.mdnotes:references/research-priorities.md检查点: 上述所有必填模块均已填充,且附带来源URL或明确的框架备选说明。
Step 4 — Generate project & smoke-test
步骤4 — 生成项目与冒烟测试
Write all scripts, build the image, prepare data, run a 1-step smoke on real
data (one , two s).
docker builddocker run4a. Generate project files in — , ,
, , , (eval script
MUST be , never — collides with HF ),
, for VLM-LoRA, . Authority order: Step 3
live research → scaffold reference ( / ) for
structure only, never their blocks. Apply each
entry as a Dockerfile block, requirements pin, config
override, or runtime env var. Every generated begins with the NVIDIA
Apache-2.0 -comment copyright header (emitter must fail otherwise). If
, also generate per . See
for the full file table, the exact copyright
header, and the Dockerfile template (deps → compat → code layer order).
output_dir/config.yamlDockerfilerequirements.txtprepare_data.pytrain.pyrun_eval.pyrun_eval.pyevaluate.pyevaluateinfer.pymerge_lora.py.gitignorecv-scripts.mdvlm-scripts.md[FETCH LIVE]applicable_workarounds.py#emit_unit_tests: truetests/references/testing.mdreferences/project-scaffold.md4b. Build, prepare, smoke — , then run
§1 (build), §2 (prepare_data), §3 (smoke,
); §3 lists the smoke pass criteria (no exception, loss
finite, at step 1). If , also run
inside the container. Any failure → STOP.
docker build -t run-<short>:latest .references/docker-runs.md--smoke --max_steps 1grad_norm > 0emit_unit_tests: truepytest tests/4c. Preflight summary — print the boxed summary (reference
URL, dataset columns, push_to_hub repo, wandb monitoring, ngc_image, hardware,
smoke result) and verify every field is filled before launching full training.
Exact format: .
─ PREFLIGHT ─references/project-scaffold.mdGate: project files written, image built, smoke PASSED, preflight has no
blank fields.
编写所有脚本、构建镜像、准备数据、使用真实数据执行1步冒烟测试(一次、两次)。
docker builddocker run4a. 在中生成项目文件——、、、、、(评估脚本必须为,不得使用——会与HF的库冲突)、、针对VLM-LoRA的、。权威优先级:步骤3的实时调研结果→框架参考(/仅用于结构,不得使用其模块)。将每个条目作为Dockerfile模块、依赖版本锁定、配置覆盖或运行时环境变量应用。所有生成的文件必须以NVIDIA Apache-2.0的注释版权头开头(否则生成器必须报错)。若,同时根据生成目录。完整文件列表、精确版权头、Dockerfile模板(依赖→兼容性→代码层顺序)请查看。
output_dir/config.yamlDockerfilerequirements.txtprepare_data.pytrain.pyrun_eval.pyrun_eval.pyevaluate.pyevaluateinfer.pymerge_lora.py.gitignorecv-scripts.mdvlm-scripts.md[FETCH LIVE]applicable_workarounds.py#emit_unit_tests: truereferences/testing.mdtests/references/project-scaffold.md4b. 构建、准备、冒烟测试——执行,然后运行中的§1(构建)、§2(prepare_data)、§3(冒烟测试,);§3列出了冒烟测试通过标准(无异常、损失值有限、步骤1的)。若,同时在容器内运行。任何失败→停止操作。
docker build -t run-<short>:latest .references/docker-runs.md--smoke --max_steps 1grad_norm > 0emit_unit_tests: truepytest tests/4c. 预检查总结——打印带框的总结(参考URL、数据集列、push_to_hub仓库、wandb监控、ngc_image、硬件、冒烟测试结果),并在启动全量训练前验证所有字段已填充。精确格式请查看。
─ PREFLIGHT ─references/project-scaffold.md检查点: 项目文件已编写、镜像已构建、冒烟测试通过、预检查无空白字段。
Step 5 — Train, evaluate, infer
步骤5 — 训练、评估、推理
Run in order, all commands in : 5a baseline eval
(§4, skip if ), 5b full training detached (§5), 5c
LoRA merge (§6, only VLM-with-LoRA), 5d post-train eval (§7), 5e
inference 5 samples (§8). Multi-GPU: prepend
to . Watch : loss should drop within
10-20 steps (flat → stop; NaN → reduce LR; OOM → halve batch; full recovery in
+ ). If
, run after Step 5e per .
references/docker-runs.mdskip_baseline: truetorchrun --nproc_per_node=$gpu_countpython train.pydocker logs -f hft_trainreferences/core-rules.mdreferences/error-playbook.mdemit_report: truereport.pyreferences/reporting.mdGate: all of — (or for LoRA)
exists; has a numeric primary metric;
exists (unless skipped);
has 5 samples; wandb URL shows descending loss.
checkpoints/final/checkpoints/merged/reports/eval_results.jsonreports/baseline_results.jsonreports/inference_samples/按顺序执行,所有命令均在中:5a基线评估(§4,若则跳过)、5b全量训练(后台运行,§5)、5cLoRA合并(§6,仅针对带LoRA的VLM)、5d训练后评估(§7)、5e推理5个样本(§8)。多GPU场景:在前添加。查看:损失值应在10-20步内下降(损失持平→停止;出现NaN→降低学习率;OOM→减半批量大小;完整恢复方案请查看+)。若,在步骤5e后根据运行。
references/docker-runs.mdskip_baseline: truepython train.pytorchrun --nproc_per_node=$gpu_countdocker logs -f hft_trainreferences/core-rules.mdreferences/error-playbook.mdemit_report: truereferences/reporting.mdreport.py检查点: 满足以下所有条件——(或LoRA场景下的)存在;包含数值型主指标;存在(除非已跳过);包含5个样本;wandb URL显示损失值下降。
checkpoints/final/checkpoints/merged/reports/eval_results.jsonreports/baseline_results.jsonreports/inference_samples/Step 6 — Push & emit rerun skill
步骤6 — 推送与生成可重运行技能
Publish the run and make it reproducible without re-research.
6a. Push to HF Hub — use (pushes weights merged or
final, a generated model card , ,
, , , , and
if ). Skip iff is
explicit in .
references/hub-push.mdREADME.mdresults/{eval,baseline}_results.jsonconfig.yamlDockerfilerequirements.txtinference_samples/*.jpgreport.{pdf,html}emit_report: truepush_to_hub: falseconfig.yaml6b. Emit rerun skill at per
. Every must be a real
value (literal placeholders are a bug); include the full YAML (,
, , ) and the NVIDIA copyright notice in
an HTML comment immediately after the closing , as in that template; an
emitter must fail unless the emitted contains those fields and the
copyright comment.
<output_dir>/skills/run-<short>/SKILL.mdreferences/pipeline-skill-template.md<placeholder>licensecompatibilitymetadataallowed-tools---SKILL.mdGate (Done criteria): all of — Step 5 gate met; HF Hub repo exists at the
resolved URL with weights + card + (unless );
exists with no left,
with metadata + copyright HTML comment per .
results/push_to_hub: false<output_dir>/skills/run-<short>/SKILL.md<placeholder>pipeline-skill-template.mdFinal message to user — terse, with direct URLs: wandb URL; HF Hub URL;
primary metric baseline → fine-tuned (Δ); path to ;
path to .
reports/inference_samples/<output_dir>/skills/run-<short>/SKILL.md发布运行结果并使其无需重新调研即可复现。
6a. 推送到HF Hub——使用(推送合并后的权重或最终权重、生成的模型卡片、、、、、,若则同时推送)。仅当中明确设置时跳过此步骤。
references/hub-push.mdREADME.mdresults/{eval,baseline}_results.jsonconfig.yamlDockerfilerequirements.txtinference_samples/*.jpgemit_report: truereport.{pdf,html}config.yamlpush_to_hub: false6b. 在生成可重运行技能——遵循。所有必须替换为真实值(保留字面占位符为错误);包含完整YAML(、、、),并在闭合后立即添加HTML注释格式的NVIDIA版权声明,与模板一致;若生成的未包含这些字段和版权注释,生成器必须报错。
<output_dir>/skills/run-<short>/SKILL.mdreferences/pipeline-skill-template.md<placeholder>licensecompatibilitymetadataallowed-tools---SKILL.md检查点(完成标准): 满足以下所有条件——步骤5的检查点已通过;HF Hub仓库在解析后的URL存在,且包含权重+卡片+(除非);存在,且无残留,包含符合要求的元数据+HTML版权注释。
results/push_to_hub: false<output_dir>/skills/run-<short>/SKILL.md<placeholder>pipeline-skill-template.md给用户的最终消息——简洁明了,包含直接URL:wandb URL;HF Hub URL;主指标从基线到微调的变化值(Δ);路径;路径。
reports/inference_samples/<output_dir>/skills/run-<short>/SKILL.mdError playbook
错误处理手册
On a known runtime error, consult before
redesigning anything — its symptom → minimal-fix table covers NGC ENTRYPOINT,
SDPA+GQA, regression, numpy 2.x ABI, Albumentations bbox,
PEFT + gradient_checkpointing, SmolVLM SDPA, LoRA target-regex, missing CV
augmentation, OOM at step 0, and more. When a row fires twice across runs, lift
it into with a rule, auto-applied in
Step 1d before the error can fire.
references/error-playbook.mdtransformers>=4.51references/compat-workarounds.mddetect遇到已知运行时错误时,请先查阅再进行任何修改——其症状→最小修复表涵盖了NGC ENTRYPOINT、SDPA+GQA、回归、numpy 2.x ABI、Albumentations边界框、PEFT+梯度检查点、SmolVLM SDPA、LoRA目标正则表达式、缺失CV增强、步骤0出现OOM等问题。若同一错误在多次运行中出现两次,请将其添加到并附带规则,在步骤1d自动应用以避免错误再次发生。
references/error-playbook.mdtransformers>=4.51references/compat-workarounds.mddetectCommunication style
沟通风格
Terse: no filler, no restating the request; always include direct Hub + wandb
URLs; on error state what went wrong, why, what you changed (no menus, no
"Option A/B/C" when the answer is clear — act). Full text:
.
references/core-rules.md简洁:无冗余内容,不得重复用户请求;始终包含Hub+wandb的直接URL;出现错误时说明问题、原因及修改内容(无需菜单,若答案明确不得提供"选项A/B/C"——直接执行)。完整内容请查看。
references/core-rules.mdExample pipelines
示例流水线
- tao-rerun-convnext-cifar10 — facebook/convnext-tiny-224 on cifar10 (image-classification, 10 classes, subset 5000/1000).
- tao-rerun-detr-cppe5 — facebook/detr-resnet-50 on cppe-5 (object-detection, 5 classes, subset 800/200).
- tao-rerun-segformer-foodseg103 — nvidia/mit-b0 on EduardoPacheco/FoodSeg103 (semantic segmentation, 103 classes + background, subset 1000/200).
- tao-rerun-smolvlm-vqav2 — HuggingFaceTB/SmolVLM-256M-Instruct on merve/vqav2-small (image-text-to-text VLM LoRA, subset 500/100, 5 epochs).
- tao-rerun-convnext-cifar10 — facebook/convnext-tiny-224在cifar10上的微调(图像分类,10类,子集5000/1000)。
- tao-rerun-detr-cppe5 — facebook/detr-resnet-50在cppe-5上的微调(目标检测,5类,子集800/200)。
- tao-rerun-segformer-foodseg103 — nvidia/mit-b0在EduardoPacheco/FoodSeg103上的微调(语义分割,103类+背景,子集1000/200)。
- tao-rerun-smolvlm-vqav2 — HuggingFaceTB/SmolVLM-256M-Instruct在merve/vqav2-small上的微调(图文到文本VLM LoRA,子集500/100,5个epoch)。