tao-run-automl-deft-pipeline

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AutoML + DEFT Pipeline

AutoML + DEFT 工作流

A workflow-bridge skill that runs three phases in sequence by delegating to two existing skills —
tao-run-automl
for HPO and a DEFT application skill (default
tao-run-deft-aoi
for AOI; other
skills/applications/deft-*
skills for non-AOI cases) for the iterative data-improvement loop.
This skill does not re-implement AutoML or DEFT. It owns only the connective tissue: HPO spec inputs, the spec-handoff between AutoML and DEFT, and the post-DEFT AutoML re-run on the augmented dataset.
这是一个工作流桥接技能,通过调用两个现有技能按顺序执行三个阶段——
tao-run-automl
用于HPO(超参数优化),DEFT应用技能(AOI场景默认使用
tao-run-deft-aoi
;非AOI场景使用其他
skills/applications/deft-*
技能)用于迭代式数据改进循环。
本技能不会重新实现AutoML或DEFT,仅负责衔接工作:HPO规格输入、AutoML与DEFT之间的规格传递,以及在增强数据集上重新运行DEFT后的AutoML优化。

When this skill applies

适用场景

  • User asks to "run the AOI workflow" or "improve my AOI ChangeNet model" — default to this skill, not
    tao-run-deft-aoi
    directly. The bare DEFT loop is the inner stage of this pipeline.
  • User wants AutoML and DEFT chained on the same model/dataset
  • User says "AutoML at both ends", "tune HPs then DEFT", "warm-start DEFT", "AutoML before and after DEFT"
  • User has an AutoML-tuned spec and asks how to feed it into DEFT
  • 用户要求“运行AOI工作流”或“改进我的AOI ChangeNet模型”——默认使用本技能,而非直接调用
    tao-run-deft-aoi
    。纯DEFT循环是本流程的内部阶段。
  • 用户希望在同一模型/数据集上串联AutoML和DEFT
  • 用户提到“两端使用AutoML”“先调优超参数再执行DEFT”“热启动DEFT”“DEFT前后都用AutoML”
  • 用户已有AutoML调优后的规格,询问如何将其传入DEFT

When this skill does NOT apply

不适用场景

  • User explicitly asks for the DEFT loop only ("run JUST the DEFT loop", "skip AutoML") → use
    tao-run-deft-aoi
    directly
  • User wants only AutoML with no follow-on DEFT → use
    tao-run-automl
    directly
  • User is doing zero-shot eval, RAG, or non-training workflows

  • 用户明确要求仅运行DEFT循环(“仅运行DEFT循环”“跳过AutoML”)→ 直接使用
    tao-run-deft-aoi
  • 用户仅需AutoML,无需后续DEFT → 直接使用
    tao-run-automl
  • 用户正在进行零样本评估、RAG或非训练类工作流

The mental model

核心模型

Phase 1 (AutoML baseline)        Phase 2 (DEFT loop, plain train)        Phase 3 (AutoML refinement)
─────────────────────────        ────────────────────────────────        ───────────────────────────
specs/baseline_spec.yaml         (Phase 1 winner pre-seeds baseline      ${RESULTS_DIR}/iter${N}/dataset/
train/base/training_set.csv       — DEFT skips its baseline train)       train_combined_iter${N}.csv
        │                                       │                                       │
        ▼                                       ▼                                       ▼
[ AutoML HPO sweep ]               [ DEFT: baseline-inference → RCA       [ AutoML HPO sweep ]
   N recommendations                 → iter 1..N (plain retrain) ]        re-tunes HPs against the
   pick best by val_loss / FAR      RCA / route / SDG / mining             DEFT-augmented dataset
        │                                       │                                       │
        ▼                                       ▼                                       ▼
best HPs spec + ckpt ─────►      DEFT-augmented CSV ───────────►        final best checkpoint
                                 + iter winner checkpoint               (the deliverable; no
                                 (Phase 3 warm-starts from it)           further retrain)
The handoffs are:
  • Phase 1 → Phase 2: a spec file AND the winning checkpoint. Retraining the same HPs in DEFT's baseline step is wasted compute, so the bridge deep-merges Phase 1's winning HPs onto
    baseline_spec.yaml
    , copies the winning checkpoint into
    ${RESULTS_DIR}/baseline/train/
    under the filename DEFT expects, and pre-populates
    deft_state.json
    +
    loop_log.jsonl
    so DEFT resumes at baseline inference → evaluate → RCA → iter 1. DEFT itself stays plain-train (
    automl_policy: off
    preserved). Verbatim 4-step procedure in
    references/handoff.md
    .
  • Phase 2 → Phase 3: a training CSV AND the iter winner's checkpoint. The CSV (
    train_combined_iter${N_final}.csv
    ) is AutoML's training data; the checkpoint (
    iterations.<best>.best_ckpt_path
    from
    deft_state.json
    ) is wired into each rec's
    train.pretrained_model_path
    so Phase 3 fine-tunes from Phase 2's winner rather than from scratch. Without this warm-start Phase 3 routinely regresses vs the iter winner. Phase 3's winning checkpoint is the deliverable — no separate retrain after Phase 3. See
    references/handoff.md
    .
Phase 1 (AutoML baseline)        Phase 2 (DEFT loop, plain train)        Phase 3 (AutoML refinement)
─────────────────────────        ────────────────────────────────        ───────────────────────────
specs/baseline_spec.yaml         (Phase 1 winner pre-seeds baseline      ${RESULTS_DIR}/iter${N}/dataset/
train/base/training_set.csv       — DEFT skips its baseline train)       train_combined_iter${N}.csv
        │                                       │                                       │
        ▼                                       ▼                                       ▼
[ AutoML HPO sweep ]               [ DEFT: baseline-inference → RCA       [ AutoML HPO sweep ]
   N recommendations                 → iter 1..N (plain retrain) ]        re-tunes HPs against the
   pick best by val_loss / FAR      RCA / route / SDG / mining             DEFT-augmented dataset
        │                                       │                                       │
        ▼                                       ▼                                       ▼
best HPs spec + ckpt ─────►      DEFT-augmented CSV ───────────►        final best checkpoint
                                 + iter winner checkpoint               (the deliverable; no
                                 (Phase 3 warm-starts from it)           further retrain)
衔接逻辑如下:
  • 阶段1 → 阶段2:传递规格文件最优检查点。在DEFT的基线步骤中重新训练相同超参数会浪费计算资源,因此桥接技能会将阶段1的最优超参数深度合并到
    baseline_spec.yaml
    中,将最优检查点复制到
    ${RESULTS_DIR}/baseline/train/
    目录下并命名为DEFT期望的文件名,同时预填充
    deft_state.json
    loop_log.jsonl
    ,使DEFT从基线推理→评估→RCA→迭代1开始执行。DEFT自身保持plain-train模式(保留
    automl_policy: off
    )。详细的四步流程见
    references/handoff.md
  • 阶段2 → 阶段3:传递训练CSV文件迭代最优检查点。CSV文件(
    train_combined_iter${N_final}.csv
    )作为AutoML的训练数据;检查点(来自
    deft_state.json
    iterations.<best>.best_ckpt_path
    )会被配置到每个推荐项的
    train.pretrained_model_path
    中,使阶段3从阶段2的最优模型开始微调,而非从头训练。如果没有热启动,阶段3的性能通常会比迭代最优模型退化。阶段3的最优检查点是最终交付物——阶段3结束后无需单独重训。详见
    references/handoff.md

Why three phases instead of two

为何采用三阶段而非两阶段

  • Phase 1 alone finds good HPs on the original training distribution, but the model still has the distributional gaps DEFT is designed to fill.
  • Phase 2 alone (just DEFT) fills the gaps but uses whatever HPs
    specs/baseline_spec.yaml
    was hand-authored with — usually not optimal.
  • Phase 3 alone would run AutoML against the augmented dataset, but without a tuned baseline the DEFT loop's iteration cost is higher (slower convergence, more iterations to hit the KPI).
Running all three: AutoML cheap-tunes once on the original data, DEFT does the heavy data work with reasonable HPs, then AutoML tunes again on the now-richer dataset. Phase 3 is the most important of the three for the final deployed FAR/recall.
  • 仅阶段1:能在原始训练分布上找到优质超参数,但模型仍存在DEFT旨在填补的分布缺口。
  • 仅阶段2(仅DEFT):能填补分布缺口,但使用的是
    specs/baseline_spec.yaml
    中手动编写的超参数——通常并非最优。
  • 仅阶段3:会在增强数据集上运行AutoML,但如果没有调优后的基线,DEFT循环的迭代成本会更高(收敛速度慢,达到KPI所需迭代次数更多)。
运行全部三个阶段:AutoML在原始数据上快速调优一次,DEFT使用合理的超参数完成繁重的数据处理工作,然后AutoML在更丰富的数据集上再次调优。阶段3对最终部署的FAR/召回率最为重要。

Cost up-front

前期成本

The pipeline is sequential. Total wall-clock ≈ Phase 1 (N_automl × per-rec train) + Phase 2 (M iterations × per-iter cost) + Phase 3 (N_automl × per-rec train).
Note that Phase 2 has no separate baseline train — Phase 1's winning checkpoint is reused as DEFT's baseline, so the baseline cost lands inside Phase 1's N_automl trainings rather than as an extra retrain. Surface this to the user before kickoff. Typically Phase 2's iterations still dominate (each includes SDG + retrain), but Phase 1 and Phase 3 each add several hours on a single-GPU box. Use the per-job estimate from the user's setup (if they have one) rather than guessing minutes. See
references/pitfalls.md
for the per-phase cost breakdown.

流程为串行执行,总耗时≈阶段1(N_automl × 单推荐项训练时间) + 阶段2(M次迭代 × 单次迭代成本) + 阶段3(N_automl × 单推荐项训练时间)。
注意阶段2没有单独的基线训练——阶段1的最优检查点会被复用为DEFT的基线,因此基线成本包含在阶段1的N_automl次训练中,而非额外的重训。在启动前需向用户说明这一点。通常阶段2的迭代仍占主导(每次迭代包含SDG + 重训),但阶段1和阶段3在单GPU设备上各需数小时。请根据用户环境的单任务估算时间(如果有)来计算,而非猜测分钟数。各阶段成本明细见
references/pitfalls.md

Consolidated Pre-Flight — one gate, all three phases

统一预检查——单入口,全三阶段

The pipeline has exactly one user gate. Before any side-effecting action (docker pull, docker login, any job-launch call delegated to a downstream skill, file mutations under
${RESULTS_DIR}/
), the agent must produce a single consolidated Pre-Flight Summary that subsumes every downstream skill's preflight. Once the user approves, the run is autonomous through all three phases — no further interactive pauses.
The user explicitly does not want to be paged between phases. The DEFT loop's own inline
## Pre-Flight Summary
gate becomes a zero-question display step (every value pre-supplied), as does
tao-run-automl
's shared launch preflight in Phases 1 and 3.
Before printing the gate the agent must read every downstream preflight section in full and run every read-only check those sections prescribe, surfacing each outcome in the summary. Running every step of the DEFT skill's
## Pre-Flight
is mandatory — if any step is skipped the consolidated gate is invalid and the pipeline must not advance. The summary must include, in order: (1) workspace/host/platform/network, (2) credentials SET/UNSET status, (3) resolved container image URIs with PRESENT/MISSING, (4) dataset table with leakage check, (5) Phase 1 config, (6) Phase 2 config incl. pre-seeded baseline source, (7) Phase 3 config, (8) compute estimate, (9) the confirmation line. After the gate, pass every collected value through to each downstream skill so it has nothing to ask. The only allowed post-gate pauses are mid-run hard-stop safety gates (e.g. DEFT's KPI regression gate); call them out in the summary.
See
references/preflight.md
for the full build procedure, the exact mandatory contents of each summary section (with the GPU memory rule of thumb, DEFT loop defaults, and required inputs verbatim), the downstream gate-suppression inputs, and the fallback when an older skill-bank version hard-codes its own STOP gate.

本流程仅有一个用户确认环节。在执行任何会产生副作用的操作(拉取docker镜像、docker登录、调用下游技能启动任务、修改
${RESULTS_DIR}/
下的文件)之前,Agent必须生成一份统一的预检查汇总,涵盖所有下游技能的预检查内容。用户确认后,流程将自动完成所有三个阶段——无需再进行交互式暂停。
用户明确不希望在阶段之间被打扰。DEFT循环自身的
## Pre-Flight Summary
确认环节变为无提示展示步骤(所有值已预先提供),阶段1和阶段3中
tao-run-automl
的共享启动预检查也是如此。
在展示确认环节之前,Agent必须完整阅读所有下游预检查部分,并执行这些部分规定的所有只读检查,在汇总中展示每个检查的结果。必须执行DEFT技能
## Pre-Flight
中的每一步——如果跳过任何步骤,统一确认环节将无效,流程不得继续。汇总必须按以下顺序包含:(1) 工作区/主机/平台/网络,(2) 凭据SET/UNSET状态,(3) 已解析的容器镜像URI及PRESENT/MISSING状态,(4) 数据集泄漏检查表,(5) 阶段1配置,(6) 阶段2配置(含预填充的基线源),(7) 阶段3配置,(8) 计算资源估算,(9) 确认语句。确认后,将所有收集到的值传递给每个下游技能,使其无需再询问。仅允许在运行中途遇到硬停止安全检查(例如DEFT的KPI退化检查)时暂停,并需在汇总中提前说明。
完整的构建流程、汇总各部分的必填内容(含GPU内存经验法则、DEFT循环默认值和必填输入)、下游检查抑制输入,以及旧版技能库硬编码自身STOP检查时的回退方案,均见
references/preflight.md

Phase 1 — AutoML baseline

阶段1 —— AutoML基线

Invoke
tao-skill-bank:tao-run-automl
with:
InputAOI defaultNotes
network_arch
visual-changenet
Same model the DEFT loop expects
train_dataset_uri
<workspace>/train/base/training_set.csv
Same training set DEFT will start from
eval_dataset_uri
<workspace>/train/base/validation_set.csv
Held-out — must NOT be the KPI test set (
<workspace>/kpi/testing_set.csv
), since that set is reserved for DEFT's final reporting
metric
FAR @ 100% recall (preferred) or
val_loss
See
references/pitfalls.md
— ChangeNet AOI is class-imbalanced, val_loss alone can mode-collapse
algorithm
bayesian
LLM-brain or
autoresearch
if compute is tight
automl_max_recommendations
5–10 for AOIMore recs = better HPs but linear in compute
spec_overrides
Pin epochs / batch_size; sweep optimizer-related HPs onlyOtherwise AutoML wanders into long-train regimes that blow Phase 2's budget
After the sweep finishes, AutoML's
result["best"]["specs"]
is the winning hyperparameter dict.
调用
tao-skill-bank:tao-run-automl
,参数如下:
输入AOI默认值说明
network_arch
visual-changenet
与DEFT循环预期的模型一致
train_dataset_uri
<workspace>/train/base/training_set.csv
与DEFT初始使用的训练集一致
eval_dataset_uri
<workspace>/train/base/validation_set.csv
预留数据集——不得使用KPI测试集(
<workspace>/kpi/testing_set.csv
),该测试集仅用于DEFT的最终报告
metric
FAR @ 100% recall(优先)或
val_loss
references/pitfalls.md
——ChangeNet AOI存在类别不平衡问题,仅用val_loss可能导致模式崩溃
algorithm
bayesian
如果计算资源紧张,可使用LLM-brain或
autoresearch
automl_max_recommendations
AOI场景为5–10推荐项越多,超参数质量越高,但计算量呈线性增长
spec_overrides
固定epochs/batch_size;仅调优优化器相关超参数否则AutoML可能进入长训练周期,超出阶段2的预算
超参数搜索完成后,AutoML的
result["best"]["specs"]
即为最优超参数字典。

Handoff to Phase 2

向阶段2传递内容

Phase 1 hands over two artifacts: the winning spec and the winning checkpoint. Instead of retraining the same HPs in DEFT's baseline step, pre-seed DEFT's baseline state from Phase 1's outputs so DEFT starts at baseline inference → evaluate → RCA → iter 1. The four steps — write the merged
baseline_spec_automl.yaml
, copy the winning checkpoint into
${RESULTS_DIR}/baseline/train/
, initialise
deft_state.json
with
iterations.baseline.stage_completed == "train"
(and append the matching
loop_log.jsonl
entry), then invoke DEFT — are given verbatim with the exact code in
references/handoff.md
.
automl_policy: off
inside the loop is preserved.
阶段1传递两个工件:最优规格和最优检查点。为避免在DEFT的基线步骤中重新训练相同超参数,需从阶段1的输出预填充DEFT的基线状态,使DEFT从基线推理→评估→RCA→迭代1开始执行。四步流程——写入合并后的
baseline_spec_automl.yaml
、将最优检查点复制到
${RESULTS_DIR}/baseline/train/
、初始化
deft_state.json
并设置
iterations.baseline.stage_completed == "train"
(同时在
loop_log.jsonl
中添加匹配条目)、然后调用DEFT——的具体代码见
references/handoff.md
。循环内的
automl_policy: off
保持不变。

Quality check before handing off

传递前的质量检查

Run a quick eval of the winning checkpoint against the held-out set: per-class prediction counts (if it collapsed to one class, evaluate the 2nd or 3rd best instead) and a comparison to a zero-shot ChangeNet baseline (if AutoML did not improve over zero-shot, surface that and pause). See
references/handoff.md
.

针对预留数据集快速评估最优检查点:查看每类预测数量(如果模型崩溃为单一类别,则选择第二或第三优的模型),并与零样本ChangeNet基线对比(如果AutoML未优于零样本模型,需向用户说明并暂停)。详见
references/handoff.md

Phase 2 — DEFT loop (plain training, baseline pre-seeded from Phase 1)

阶段2 —— DEFT循环(plain训练,基线由阶段1预填充)

Invoke
tao-skill-bank:tao-run-deft-aoi
(read its
SKILL.md
for the full interface). For non-AOI applications, invoke the matching DEFT skill; the handoff shape is the same.
The DEFT loop's baseline-train sub-step is skipped. Phase 1 already produced a checkpoint trained at the winning HPs, and Phase 1's handoff (see above) pre-populated
${RESULTS_DIR}/baseline/train/
and
${RESULTS_DIR}/deft_state.json
so DEFT resumes at baseline inference → evaluate → RCA → iter 1. The rest of the DEFT loop runs unchanged. Do not modify its
automl_policy: off
invariant.
The DEFT loop owns:
  • The Pre-Flight Summary display step — not a fresh user gate. The Consolidated Pre-Flight (above) is the single gate; the DEFT summary still prints as an audit-trail display of the pre-seeded
    baseline/train/
    source but must not re-prompt, since every input was collected in the consolidated gate.
  • Baseline inference → evaluate → RCA on the pre-seeded checkpoint, and the full per-iteration RCA → routing → SDG → mining → assemble → train cycle.
  • KPI gating and stop conditions;
    ${RESULTS_DIR}/
    layout,
    deft_state.json
    ,
    loop_log.jsonl
    ,
    DEFT_Loop_Report.html
    .
After the loop exits (KPI met or
max_iterations
reached), capture two values from
deft_state.json
:
  • iterations.<best>.best_ckpt_path
    — the loop's best plain-train checkpoint
  • The final iteration label
    N_final
    — used to locate the augmented training CSV
If the DEFT loop hard-stops on an unrecoverable gate, skip Phase 3. There is no validated augmented CSV to feed AutoML.

调用
tao-skill-bank:tao-run-deft-aoi
(完整接口见其
SKILL.md
)。对于非AOI应用,调用对应的DEFT技能;传递逻辑相同。
DEFT循环的基线训练子步骤将被跳过。阶段1已生成基于最优超参数训练的检查点,且阶段1的传递操作(见上文)已预填充
${RESULTS_DIR}/baseline/train/
${RESULTS_DIR}/deft_state.json
,使DEFT从基线推理→评估→RCA→迭代1开始执行。DEFT循环的其余部分保持不变。不得修改其
automl_policy: off
的固定设置
DEFT循环负责:
  • 预检查汇总展示步骤——并非新的用户确认环节。统一预检查(见上文)是唯一的确认环节;DEFT汇总仍会打印,作为预填充的
    baseline/train/
    源的审计轨迹展示,但不得重新提示,因为所有输入已在统一预检查中收集完成。
  • 基于预填充检查点的基线推理→评估→RCA,以及完整的每迭代RCA→路由→SDG→挖掘→组装→训练周期。
  • KPI检查和停止条件;
    ${RESULTS_DIR}/
    目录结构、
    deft_state.json
    loop_log.jsonl
    DEFT_Loop_Report.html
循环退出后(达到KPI或
max_iterations
),从
deft_state.json
中获取两个值:
  • iterations.<best>.best_ckpt_path
    ——循环的最优plain-train检查点
  • 最终迭代标签
    N_final
    ——用于定位增强后的训练CSV
如果DEFT循环因不可恢复的检查点硬停止,跳过阶段3。此时没有经过验证的增强CSV可提供给AutoML。

Phase 3 — AutoML refinement on the DEFT-augmented dataset

阶段3 —— 基于DEFT增强数据集的AutoML优化

Re-invoke
tao-skill-bank:tao-run-automl
with the augmented training CSV as the train dataset, the same held-out validation CSV as before, and Phase 2's iter winner checkpoint as the warm-start:
InputAOI value
network_arch
visual-changenet
train_dataset_uri
${RESULTS_DIR}/iter${N_final}/dataset/train_combined_iter${N_final}.csv
eval_dataset_uri
Same as Phase 1 (
<workspace>/train/base/validation_set.csv
) — keep the comparison apples-to-apples
metric
Same metric as Phase 1
algorithm
Same as Phase 1
automl_max_recommendations
5–10
Initial specStart from
<workspace>/specs/baseline_spec_automl.yaml
(Phase 1's winner) — gives the sweep a strong centroid to refine around
Warm-start checkpoint
iterations.<best>.best_ckpt_path
from
${RESULTS_DIR}/deft_state.json
— set
spec_overrides["train"]["pretrained_model_path"]
to this path. Each Phase 3 rec then fine-tunes from Phase 2's winner instead of training from scratch.
The warm-start is mandatory: with no warm-start, every rec starts from random init with only 10-20 epochs to reconverge, Phase 3's
val_loss
regresses 0.03-0.05 vs iter1, and the
_pick_best
safety net silently rolls back to the iter winner — wasting Phase 3's compute. The concrete
spec_overrides
code (selecting the lowest-
far_pct
iteration, excluding any prior
final_automl
), the broad-exploration tradeoff, output to
${RESULTS_DIR}/final_automl/
, and wiring Phase 3's checkpoint back into the DEFT report via
iterations.final_automl
+ re-running
prepare_inference_spec.py
(with the
_pick_best
regression safety net) are all in
references/handoff.md
.

重新调用
tao-skill-bank:tao-run-automl
,使用增强后的训练CSV作为训练数据集,使用与之前相同的预留验证CSV,并以阶段2的迭代最优检查点作为热启动
输入AOI场景值
network_arch
visual-changenet
train_dataset_uri
${RESULTS_DIR}/iter${N_final}/dataset/train_combined_iter${N_final}.csv
eval_dataset_uri
与阶段1相同(
<workspace>/train/base/validation_set.csv
)——保持对比的一致性
metric
与阶段1相同
algorithm
与阶段1相同
automl_max_recommendations
5–10
初始规格
<workspace>/specs/baseline_spec_automl.yaml
(阶段1的最优规格)开始——为超参数搜索提供强中心基准
热启动检查点来自
${RESULTS_DIR}/deft_state.json
iterations.<best>.best_ckpt_path
——将
spec_overrides["train"]["pretrained_model_path"]
设置为该路径。阶段3的每个推荐项将从阶段2的最优模型开始微调,而非从头训练。
热启动是强制要求:如果没有热启动,每个推荐项都会从随机初始化开始,仅用10-20个 epoch重新收敛,阶段3的
val_loss
会比迭代1退化0.03-0.05,且
_pick_best
安全机制会自动回退到迭代最优模型——浪费阶段3的计算资源。具体的
spec_overrides
代码(选择
far_pct
最低的迭代,排除任何先前的
final_automl
)、广泛探索的权衡、输出到
${RESULTS_DIR}/final_automl/
,以及通过
iterations.final_automl
将阶段3的检查点重新关联到DEFT报告并重新运行
prepare_inference_spec.py
(含
_pick_best
退化安全机制)的逻辑,均见
references/handoff.md

Pitfalls and quality checks

注意事项与质量检查

These apply to both AutoML phases — bake them into agent behavior, don't just paste once. The full detail is in
references/pitfalls.md
:
  • Metric pitfalls (AOI is class-imbalanced). ChangeNet AOI is PASS-dominant;
    val_loss
    can mode-collapse to a zero-recall PASS-everything model. Prefer FAR @ 100%-recall directly, or gate val_loss with a
    pred_counts
    sanity check, or decide top-K by FAR @ 100%-recall. For balanced / regression tasks, val_loss is fine.
  • Run-to-run noise. AutoML can show 2–3× metric variance for the same config. If the winner looks suspiciously better than the runner-up, re-run with a fresh seed before committing the spec to Phase 2.
  • Cleanliness (data leakage). Both AutoML phases use a validation set distinct from the KPI test set (
    kpi/testing_set.csv
    ), which stays untouched until DEFT's evaluate stage. Phase 3 trains on the augmented CSV but keeps the same val set so Phase 1 and Phase 3 numbers stay comparable.
  • Compute budget. Surface the per-phase structure up front and only give a wall-clock range after the user supplies their per-job time.

这些规则适用于两个AutoML阶段——需融入Agent行为,而非仅粘贴一次。详细内容见
references/pitfalls.md
  • 指标陷阱(AOI存在类别不平衡):ChangeNet AOI以PASS类为主;
    val_loss
    可能导致模型崩溃为零召回率的“全PASS”模型。优先直接使用FAR @ 100%-recall,或用
    pred_counts
    合理性检查约束val_loss,或按FAR @ 100%-recall选择前K个模型。对于平衡/回归任务,val_loss是可行的。
  • 运行间噪声:相同配置下,AutoML的指标可能存在2–3倍的差异。如果最优模型的性能明显优于次优模型,在将规格提交给阶段2之前,需使用新种子重新运行。
  • 数据清洁度(数据泄漏):两个AutoML阶段使用的验证集均与KPI测试集(
    kpi/testing_set.csv
    )分离,该测试集在DEFT的评估阶段之前保持未使用状态。阶段3在增强CSV上训练,但使用相同的验证集,使阶段1和阶段3的结果具有可比性。
  • 计算预算:提前向用户说明各阶段的结构,仅在用户提供单任务时间后给出耗时范围。

Quick Start (AOI worked example)

快速开始(AOI示例)

When starting fresh from "run the AOI workflow", the agent delivers a three-phase worded message to the user (Phase 1 AutoML baseline → Phase 2 DEFT loop → Phase 3 AutoML refinement, with the cost framing and "OK to proceed?" close), then after confirmation invokes
tao-run-automl
(Phase 1), writes the merged spec, pre-seeds
deft_state.json
, invokes
tao-run-deft-aoi
(Phase 2) with every input pre-supplied, and invokes
tao-run-automl
again (Phase 3) — with no further pauses unless a downstream skill hits an unrecoverable hard-stop gate — then summarizes the trajectory (baseline AutoML best → DEFT iter 1 → ... → DEFT iter N_final → Phase 3 best).
See
references/quick-start.md
for the verbatim customer-facing message and the exact post-confirmation invoke sequence.
当用户从“运行AOI工作流”开始时,Agent需向用户发送包含三阶段的文字说明(阶段1:AutoML基线 → 阶段2:DEFT循环 → 阶段3:AutoML优化,包含成本说明和“是否继续?”的确认),用户确认后调用
tao-run-automl
(阶段1)、写入合并后的规格、预填充
deft_state.json
、调用
tao-run-deft-aoi
(阶段2,所有输入预先提供)、再次调用
tao-run-automl
(阶段3)——除非下游技能遇到不可恢复的硬停止检查点,否则无需再暂停——最后总结整个流程的性能轨迹(基线AutoML最优 → DEFT迭代1 → ... → DEFT迭代N_final → 阶段3最优)。
面向客户的标准话术和确认后的具体调用序列见
references/quick-start.md

Non-AOI DEFT applications

非AOI DEFT应用

The same three-phase pattern applies to other DEFT skills — swap
network_arch
, the Phase 2 DEFT skill, the spec/checkpoint path conventions, and the Phase 3 augmented-CSV path. The handoff shape (Phase 1 emits spec + checkpoint that pre-seeds the DEFT baseline, Phase 2 emits an augmented dataset, Phase 3 emits the final checkpoint) is identical, and the baseline-skip mechanism is generic to any DEFT-style loop with a resumable baseline state. See
references/quick-start.md
.

相同的三阶段模式适用于其他DEFT技能——只需替换
network_arch
、阶段2的DEFT技能、规格/检查点路径约定,以及阶段3的增强CSV路径。传递逻辑(阶段1输出规格+检查点预填充DEFT基线,阶段2输出增强数据集,阶段3输出最终检查点)完全相同,且基线跳过机制适用于任何具有可恢复基线状态的DEFT风格循环。详见
references/quick-start.md

See also

相关链接

  • tao-skill-bank:tao-run-automl
    — AutoML interface, algorithms, HP ranges
  • tao-skill-bank:tao-run-deft-aoi
    — full DEFT AOI loop (Phase 2 default)
  • tao-skill-bank:tao-train-visual-changenet
    — underlying ChangeNet train/eval/infer skill (used by both AutoML and DEFT)
  • Other
    skills/applications/deft-*
    skills — non-AOI Phase 2 targets
  • references/preflight.md
    — building the consolidated pre-flight gate
  • references/handoff.md
    — Phase 1→2 pre-seed, Phase 2 quality check, Phase 3 warm-start + report wiring
  • references/pitfalls.md
    — metric, noise, leakage, and compute-budget guidance
  • references/quick-start.md
    — verbatim worked-example message and non-AOI variant
  • tao-skill-bank:tao-run-automl
    ——AutoML接口、算法、超参数范围
  • tao-skill-bank:tao-run-deft-aoi
    ——完整DEFT AOI循环(阶段2默认)
  • tao-skill-bank:tao-train-visual-changenet
    ——底层ChangeNet训练/评估/推理技能(AutoML和DEFT均使用)
  • 其他
    skills/applications/deft-*
    技能——非AOI场景的阶段2目标
  • references/preflight.md
    ——构建统一预检查确认环节
  • references/handoff.md
    ——阶段1→2预填充、阶段2质量检查、阶段3热启动+报告关联
  • references/pitfalls.md
    ——指标、噪声、泄漏和计算预算指南
  • references/quick-start.md
    ——标准示例话术和非AOI变体