tao-run-automl-deft-pipeline

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AutoML + DEFT Pipeline

AutoML + DEFT 工作流

A workflow-bridge skill that runs three phases in sequence by delegating to two existing skills —

tao-run-automl

for HPO and a DEFT application skill (default

tao-run-deft-aoi

for AOI; other

skills/applications/deft-*

skills for non-AOI cases) for the iterative data-improvement loop.

This skill does not re-implement AutoML or DEFT. It owns only the connective tissue: HPO spec inputs, the spec-handoff between AutoML and DEFT, and the post-DEFT AutoML re-run on the augmented dataset.

这是一个工作流桥接技能，通过调用两个现有技能按顺序执行三个阶段——

tao-run-automl

用于HPO（超参数优化），DEFT应用技能（AOI场景默认使用

tao-run-deft-aoi

；非AOI场景使用其他

skills/applications/deft-*

技能）用于迭代式数据改进循环。

本技能不会重新实现AutoML或DEFT，仅负责衔接工作：HPO规格输入、AutoML与DEFT之间的规格传递，以及在增强数据集上重新运行DEFT后的AutoML优化。

When this skill applies

适用场景

User asks to "run the AOI workflow" or "improve my AOI ChangeNet model" — default to this skill, not
```
tao-run-deft-aoi
```
directly. The bare DEFT loop is the inner stage of this pipeline.
User wants AutoML and DEFT chained on the same model/dataset
User says "AutoML at both ends", "tune HPs then DEFT", "warm-start DEFT", "AutoML before and after DEFT"
User has an AutoML-tuned spec and asks how to feed it into DEFT

用户要求“运行AOI工作流”或“改进我的AOI ChangeNet模型”——默认使用本技能，而非直接调用
```
tao-run-deft-aoi
```
。纯DEFT循环是本流程的内部阶段。
用户希望在同一模型/数据集上串联AutoML和DEFT
用户提到“两端使用AutoML”“先调优超参数再执行DEFT”“热启动DEFT”“DEFT前后都用AutoML”
用户已有AutoML调优后的规格，询问如何将其传入DEFT

When this skill does NOT apply

不适用场景

User explicitly asks for the DEFT loop only ("run JUST the DEFT loop", "skip AutoML") → use
```
tao-run-deft-aoi
```
directly
User wants only AutoML with no follow-on DEFT → use
```
tao-run-automl
```
directly
User is doing zero-shot eval, RAG, or non-training workflows

用户明确要求仅运行DEFT循环（“仅运行DEFT循环”“跳过AutoML”）→ 直接使用
```
tao-run-deft-aoi
```
用户仅需AutoML，无需后续DEFT → 直接使用
```
tao-run-automl
```
用户正在进行零样本评估、RAG或非训练类工作流

The mental model

核心模型

Phase 1 (AutoML baseline)        Phase 2 (DEFT loop, plain train)        Phase 3 (AutoML refinement)
─────────────────────────        ────────────────────────────────        ───────────────────────────
specs/baseline_spec.yaml         (Phase 1 winner pre-seeds baseline      ${RESULTS_DIR}/iter${N}/dataset/
train/base/training_set.csv       — DEFT skips its baseline train)       train_combined_iter${N}.csv
        │                                       │                                       │
        ▼                                       ▼                                       ▼
[ AutoML HPO sweep ]               [ DEFT: baseline-inference → RCA       [ AutoML HPO sweep ]
   N recommendations                 → iter 1..N (plain retrain) ]        re-tunes HPs against the
   pick best by val_loss / FAR      RCA / route / SDG / mining             DEFT-augmented dataset
        │                                       │                                       │
        ▼                                       ▼                                       ▼
best HPs spec + ckpt ─────►      DEFT-augmented CSV ───────────►        final best checkpoint
                                 + iter winner checkpoint               (the deliverable; no
                                 (Phase 3 warm-starts from it)           further retrain)

The handoffs are:

Phase 1 → Phase 2: a spec file AND the winning checkpoint. Retraining the same HPs in DEFT's baseline step is wasted compute, so the bridge deep-merges Phase 1's winning HPs onto
```
baseline_spec.yaml
```
, copies the winning checkpoint into
```
${RESULTS_DIR}/baseline/train/
```
under the filename DEFT expects, and pre-populates
```
deft_state.json
```
+
```
loop_log.jsonl
```
so DEFT resumes at baseline inference → evaluate → RCA → iter 1. DEFT itself stays plain-train (
```
automl_policy: off
```
preserved). Verbatim 4-step procedure in
```
references/handoff.md
```
.
Phase 2 → Phase 3: a training CSV AND the iter winner's checkpoint. The CSV (
```
train_combined_iter${N_final}.csv
```
) is AutoML's training data; the checkpoint (
```
iterations.<best>.best_ckpt_path
```
from
```
deft_state.json
```
) is wired into each rec's
```
train.pretrained_model_path
```
so Phase 3 fine-tunes from Phase 2's winner rather than from scratch. Without this warm-start Phase 3 routinely regresses vs the iter winner. Phase 3's winning checkpoint is the deliverable — no separate retrain after Phase 3. See
```
references/handoff.md
```
.

Phase 1 (AutoML baseline)        Phase 2 (DEFT loop, plain train)        Phase 3 (AutoML refinement)
─────────────────────────        ────────────────────────────────        ───────────────────────────
specs/baseline_spec.yaml         (Phase 1 winner pre-seeds baseline      ${RESULTS_DIR}/iter${N}/dataset/
train/base/training_set.csv       — DEFT skips its baseline train)       train_combined_iter${N}.csv
        │                                       │                                       │
        ▼                                       ▼                                       ▼
[ AutoML HPO sweep ]               [ DEFT: baseline-inference → RCA       [ AutoML HPO sweep ]
   N recommendations                 → iter 1..N (plain retrain) ]        re-tunes HPs against the
   pick best by val_loss / FAR      RCA / route / SDG / mining             DEFT-augmented dataset
        │                                       │                                       │
        ▼                                       ▼                                       ▼
best HPs spec + ckpt ─────►      DEFT-augmented CSV ───────────►        final best checkpoint
                                 + iter winner checkpoint               (the deliverable; no
                                 (Phase 3 warm-starts from it)           further retrain)

衔接逻辑如下：

阶段1 → 阶段2：传递规格文件和最优检查点。在DEFT的基线步骤中重新训练相同超参数会浪费计算资源，因此桥接技能会将阶段1的最优超参数深度合并到
```
baseline_spec.yaml
```
中，将最优检查点复制到
```
${RESULTS_DIR}/baseline/train/
```
目录下并命名为DEFT期望的文件名，同时预填充
```
deft_state.json
```
和
```
loop_log.jsonl
```
，使DEFT从基线推理→评估→RCA→迭代1开始执行。DEFT自身保持plain-train模式（保留
```
automl_policy: off
```
）。详细的四步流程见
```
references/handoff.md
```
。
阶段2 → 阶段3：传递训练CSV文件和迭代最优检查点。CSV文件（
```
train_combined_iter${N_final}.csv
```
）作为AutoML的训练数据；检查点（来自
```
deft_state.json
```
的
```
iterations.<best>.best_ckpt_path
```
）会被配置到每个推荐项的
```
train.pretrained_model_path
```
中，使阶段3从阶段2的最优模型开始微调，而非从头训练。如果没有热启动，阶段3的性能通常会比迭代最优模型退化。阶段3的最优检查点是最终交付物——阶段3结束后无需单独重训。详见
```
references/handoff.md
```
。

Why three phases instead of two

为何采用三阶段而非两阶段

Phase 1 alone finds good HPs on the original training distribution, but the model still has the distributional gaps DEFT is designed to fill.
Phase 2 alone (just DEFT) fills the gaps but uses whatever HPs
```
specs/baseline_spec.yaml
```
was hand-authored with — usually not optimal.
Phase 3 alone would run AutoML against the augmented dataset, but without a tuned baseline the DEFT loop's iteration cost is higher (slower convergence, more iterations to hit the KPI).

Running all three: AutoML cheap-tunes once on the original data, DEFT does the heavy data work with reasonable HPs, then AutoML tunes again on the now-richer dataset. Phase 3 is the most important of the three for the final deployed FAR/recall.

仅阶段1：能在原始训练分布上找到优质超参数，但模型仍存在DEFT旨在填补的分布缺口。
仅阶段2（仅DEFT）：能填补分布缺口，但使用的是
```
specs/baseline_spec.yaml
```
中手动编写的超参数——通常并非最优。
仅阶段3：会在增强数据集上运行AutoML，但如果没有调优后的基线，DEFT循环的迭代成本会更高（收敛速度慢，达到KPI所需迭代次数更多）。

运行全部三个阶段：AutoML在原始数据上快速调优一次，DEFT使用合理的超参数完成繁重的数据处理工作，然后AutoML在更丰富的数据集上再次调优。阶段3对最终部署的FAR/召回率最为重要。

Cost up-front

前期成本

The pipeline is sequential. Total wall-clock ≈ Phase 1 (N_automl × per-rec train) + Phase 2 (M iterations × per-iter cost) + Phase 3 (N_automl × per-rec train).

Note that Phase 2 has no separate baseline train — Phase 1's winning checkpoint is reused as DEFT's baseline, so the baseline cost lands inside Phase 1's N_automl trainings rather than as an extra retrain. Surface this to the user before kickoff. Typically Phase 2's iterations still dominate (each includes SDG + retrain), but Phase 1 and Phase 3 each add several hours on a single-GPU box. Use the per-job estimate from the user's setup (if they have one) rather than guessing minutes. See

references/pitfalls.md

for the per-phase cost breakdown.

流程为串行执行，总耗时≈阶段1（N_automl × 单推荐项训练时间） + 阶段2（M次迭代 × 单次迭代成本） + 阶段3（N_automl × 单推荐项训练时间）。

注意阶段2没有单独的基线训练——阶段1的最优检查点会被复用为DEFT的基线，因此基线成本包含在阶段1的N_automl次训练中，而非额外的重训。在启动前需向用户说明这一点。通常阶段2的迭代仍占主导（每次迭代包含SDG + 重训），但阶段1和阶段3在单GPU设备上各需数小时。请根据用户环境的单任务估算时间（如果有）来计算，而非猜测分钟数。各阶段成本明细见

references/pitfalls.md

。

Consolidated Pre-Flight — one gate, all three phases

统一预检查——单入口，全三阶段

The pipeline has exactly one user gate. Before any side-effecting action (docker pull, docker login, any job-launch call delegated to a downstream skill, file mutations under

${RESULTS_DIR}/

), the agent must produce a single consolidated Pre-Flight Summary that subsumes every downstream skill's preflight. Once the user approves, the run is autonomous through all three phases — no further interactive pauses.

The user explicitly does not want to be paged between phases. The DEFT loop's own inline

## Pre-Flight Summary

gate becomes a zero-question display step (every value pre-supplied), as does

tao-run-automl

's shared launch preflight in Phases 1 and 3.

Before printing the gate the agent must read every downstream preflight section in full and run every read-only check those sections prescribe, surfacing each outcome in the summary. Running every step of the DEFT skill's

## Pre-Flight

is mandatory — if any step is skipped the consolidated gate is invalid and the pipeline must not advance. The summary must include, in order: (1) workspace/host/platform/network, (2) credentials SET/UNSET status, (3) resolved container image URIs with PRESENT/MISSING, (4) dataset table with leakage check, (5) Phase 1 config, (6) Phase 2 config incl. pre-seeded baseline source, (7) Phase 3 config, (8) compute estimate, (9) the confirmation line. After the gate, pass every collected value through to each downstream skill so it has nothing to ask. The only allowed post-gate pauses are mid-run hard-stop safety gates (e.g. DEFT's KPI regression gate); call them out in the summary.

See

references/preflight.md

for the full build procedure, the exact mandatory contents of each summary section (with the GPU memory rule of thumb, DEFT loop defaults, and required inputs verbatim), the downstream gate-suppression inputs, and the fallback when an older skill-bank version hard-codes its own STOP gate.

本流程仅有一个用户确认环节。在执行任何会产生副作用的操作（拉取docker镜像、docker登录、调用下游技能启动任务、修改

${RESULTS_DIR}/

下的文件）之前，Agent必须生成一份统一的预检查汇总，涵盖所有下游技能的预检查内容。用户确认后，流程将自动完成所有三个阶段——无需再进行交互式暂停。

用户明确不希望在阶段之间被打扰。DEFT循环自身的

## Pre-Flight Summary

确认环节变为无提示展示步骤（所有值已预先提供），阶段1和阶段3中

tao-run-automl

的共享启动预检查也是如此。

在展示确认环节之前，Agent必须完整阅读所有下游预检查部分，并执行这些部分规定的所有只读检查，在汇总中展示每个检查的结果。必须执行DEFT技能

## Pre-Flight

中的每一步——如果跳过任何步骤，统一确认环节将无效，流程不得继续。汇总必须按以下顺序包含：(1) 工作区/主机/平台/网络，(2) 凭据SET/UNSET状态，(3) 已解析的容器镜像URI及PRESENT/MISSING状态，(4) 数据集泄漏检查表，(5) 阶段1配置，(6) 阶段2配置（含预填充的基线源），(7) 阶段3配置，(8) 计算资源估算，(9) 确认语句。确认后，将所有收集到的值传递给每个下游技能，使其无需再询问。仅允许在运行中途遇到硬停止安全检查（例如DEFT的KPI退化检查）时暂停，并需在汇总中提前说明。

完整的构建流程、汇总各部分的必填内容（含GPU内存经验法则、DEFT循环默认值和必填输入）、下游检查抑制输入，以及旧版技能库硬编码自身STOP检查时的回退方案，均见

references/preflight.md

。

Phase 1 — AutoML baseline

阶段1 —— AutoML基线

Invoke

tao-skill-bank:tao-run-automl

with:

Input	AOI default	Notes
`network_arch`	`visual-changenet`	Same model the DEFT loop expects
`train_dataset_uri`	`<workspace>/train/base/training_set.csv`	Same training set DEFT will start from
`eval_dataset_uri`	`<workspace>/train/base/validation_set.csv`	Held-out — must NOT be the KPI test set ( `<workspace>/kpi/testing_set.csv` ), since that set is reserved for DEFT's final reporting
`metric`	FAR @ 100% recall (preferred) or `val_loss`	See `references/pitfalls.md` — ChangeNet AOI is class-imbalanced, val_loss alone can mode-collapse
`algorithm`	`bayesian`	LLM-brain or `autoresearch` if compute is tight
`automl_max_recommendations`	5–10 for AOI	More recs = better HPs but linear in compute
`spec_overrides`	Pin epochs / batch_size; sweep optimizer-related HPs only	Otherwise AutoML wanders into long-train regimes that blow Phase 2's budget

After the sweep finishes, AutoML's

result["best"]["specs"]

is the winning hyperparameter dict.

调用

tao-skill-bank:tao-run-automl

，参数如下：

输入	AOI默认值	说明
`network_arch`	`visual-changenet`	与DEFT循环预期的模型一致
`train_dataset_uri`	`<workspace>/train/base/training_set.csv`	与DEFT初始使用的训练集一致
`eval_dataset_uri`	`<workspace>/train/base/validation_set.csv`	预留数据集——不得使用KPI测试集（ `<workspace>/kpi/testing_set.csv` ），该测试集仅用于DEFT的最终报告
`metric`	FAR @ 100% recall（优先）或 `val_loss`	见 `references/pitfalls.md` ——ChangeNet AOI存在类别不平衡问题，仅用val_loss可能导致模式崩溃
`algorithm`	`bayesian`	如果计算资源紧张，可使用LLM-brain或 `autoresearch`
`automl_max_recommendations`	AOI场景为5–10	推荐项越多，超参数质量越高，但计算量呈线性增长
`spec_overrides`	固定epochs/batch_size；仅调优优化器相关超参数	否则AutoML可能进入长训练周期，超出阶段2的预算

超参数搜索完成后，AutoML的

result["best"]["specs"]

即为最优超参数字典。

Handoff to Phase 2

向阶段2传递内容

Phase 1 hands over two artifacts: the winning spec and the winning checkpoint. Instead of retraining the same HPs in DEFT's baseline step, pre-seed DEFT's baseline state from Phase 1's outputs so DEFT starts at baseline inference → evaluate → RCA → iter 1. The four steps — write the merged

baseline_spec_automl.yaml

, copy the winning checkpoint into

${RESULTS_DIR}/baseline/train/

, initialise

deft_state.json

with

iterations.baseline.stage_completed == "train"

(and append the matching

loop_log.jsonl

entry), then invoke DEFT — are given verbatim with the exact code in

references/handoff.md

automl_policy: off

inside the loop is preserved.

阶段1传递两个工件：最优规格和最优检查点。为避免在DEFT的基线步骤中重新训练相同超参数，需从阶段1的输出预填充DEFT的基线状态，使DEFT从基线推理→评估→RCA→迭代1开始执行。四步流程——写入合并后的

baseline_spec_automl.yaml

、将最优检查点复制到

${RESULTS_DIR}/baseline/train/

、初始化

deft_state.json

并设置

iterations.baseline.stage_completed == "train"

（同时在

loop_log.jsonl

中添加匹配条目）、然后调用DEFT——的具体代码见

references/handoff.md

。循环内的

automl_policy: off

保持不变。

Quality check before handing off

传递前的质量检查

Run a quick eval of the winning checkpoint against the held-out set: per-class prediction counts (if it collapsed to one class, evaluate the 2nd or 3rd best instead) and a comparison to a zero-shot ChangeNet baseline (if AutoML did not improve over zero-shot, surface that and pause). See

references/handoff.md

针对预留数据集快速评估最优检查点：查看每类预测数量（如果模型崩溃为单一类别，则选择第二或第三优的模型），并与零样本ChangeNet基线对比（如果AutoML未优于零样本模型，需向用户说明并暂停）。详见

references/handoff.md

。

Phase 2 — DEFT loop (plain training, baseline pre-seeded from Phase 1)

阶段2 —— DEFT循环（plain训练，基线由阶段1预填充）

Invoke

tao-skill-bank:tao-run-deft-aoi

(read its

SKILL.md

for the full interface). For non-AOI applications, invoke the matching DEFT skill; the handoff shape is the same.

The DEFT loop's baseline-train sub-step is skipped. Phase 1 already produced a checkpoint trained at the winning HPs, and Phase 1's handoff (see above) pre-populated

${RESULTS_DIR}/baseline/train/

and

${RESULTS_DIR}/deft_state.json

so DEFT resumes at baseline inference → evaluate → RCA → iter 1. The rest of the DEFT loop runs unchanged. Do not modify its
automl_policy: off
invariant.

The DEFT loop owns:

The Pre-Flight Summary display step — not a fresh user gate. The Consolidated Pre-Flight (above) is the single gate; the DEFT summary still prints as an audit-trail display of the pre-seeded
```
baseline/train/
```
source but must not re-prompt, since every input was collected in the consolidated gate.
Baseline inference → evaluate → RCA on the pre-seeded checkpoint, and the full per-iteration RCA → routing → SDG → mining → assemble → train cycle.

KPI gating and stop conditions;

${RESULTS_DIR}/

layout,

deft_state.json

loop_log.jsonl

DEFT_Loop_Report.html

After the loop exits (KPI met or

max_iterations

reached), capture two values from

deft_state.json

```
iterations.<best>.best_ckpt_path
```
— the loop's best plain-train checkpoint
The final iteration label
```
N_final
```
— used to locate the augmented training CSV

If the DEFT loop hard-stops on an unrecoverable gate, skip Phase 3. There is no validated augmented CSV to feed AutoML.

调用

tao-skill-bank:tao-run-deft-aoi

（完整接口见其

SKILL.md

）。对于非AOI应用，调用对应的DEFT技能；传递逻辑相同。

DEFT循环的基线训练子步骤将被跳过。阶段1已生成基于最优超参数训练的检查点，且阶段1的传递操作（见上文）已预填充

${RESULTS_DIR}/baseline/train/

和

${RESULTS_DIR}/deft_state.json

，使DEFT从基线推理→评估→RCA→迭代1开始执行。DEFT循环的其余部分保持不变。不得修改其
automl_policy: off
的固定设置。

DEFT循环负责：

预检查汇总展示步骤——并非新的用户确认环节。统一预检查（见上文）是唯一的确认环节；DEFT汇总仍会打印，作为预填充的
```
baseline/train/
```
源的审计轨迹展示，但不得重新提示，因为所有输入已在统一预检查中收集完成。
基于预填充检查点的基线推理→评估→RCA，以及完整的每迭代RCA→路由→SDG→挖掘→组装→训练周期。

KPI检查和停止条件；

${RESULTS_DIR}/

目录结构、

deft_state.json

、

loop_log.jsonl

、

DEFT_Loop_Report.html

。

循环退出后（达到KPI或

max_iterations

），从

deft_state.json

中获取两个值：

```
iterations.<best>.best_ckpt_path
```
——循环的最优plain-train检查点
最终迭代标签
```
N_final
```
——用于定位增强后的训练CSV

如果DEFT循环因不可恢复的检查点硬停止，跳过阶段3。此时没有经过验证的增强CSV可提供给AutoML。

Phase 3 — AutoML refinement on the DEFT-augmented dataset

阶段3 —— 基于DEFT增强数据集的AutoML优化

Re-invoke

tao-skill-bank:tao-run-automl

with the augmented training CSV as the train dataset, the same held-out validation CSV as before, and Phase 2's iter winner checkpoint as the warm-start:

Input	AOI value
`network_arch`	`visual-changenet`
`train_dataset_uri`	`${RESULTS_DIR}/iter${N_final}/dataset/train_combined_iter${N_final}.csv`
`eval_dataset_uri`	Same as Phase 1 ( `<workspace>/train/base/validation_set.csv` ) — keep the comparison apples-to-apples
`metric`	Same metric as Phase 1
`algorithm`	Same as Phase 1
`automl_max_recommendations`	5–10
Initial spec	Start from `<workspace>/specs/baseline_spec_automl.yaml` (Phase 1's winner) — gives the sweep a strong centroid to refine around
Warm-start checkpoint	`iterations.<best>.best_ckpt_path` from `${RESULTS_DIR}/deft_state.json` — set `spec_overrides["train"]["pretrained_model_path"]` to this path. Each Phase 3 rec then fine-tunes from Phase 2's winner instead of training from scratch.

The warm-start is mandatory: with no warm-start, every rec starts from random init with only 10-20 epochs to reconverge, Phase 3's

val_loss

regresses 0.03-0.05 vs iter1, and the

_pick_best

safety net silently rolls back to the iter winner — wasting Phase 3's compute. The concrete

spec_overrides

code (selecting the lowest-

far_pct

iteration, excluding any prior

final_automl

), the broad-exploration tradeoff, output to

${RESULTS_DIR}/final_automl/

, and wiring Phase 3's checkpoint back into the DEFT report via

iterations.final_automl

+ re-running

prepare_inference_spec.py

(with the

_pick_best

regression safety net) are all in

references/handoff.md

重新调用

tao-skill-bank:tao-run-automl

，使用增强后的训练CSV作为训练数据集，使用与之前相同的预留验证CSV，并以阶段2的迭代最优检查点作为热启动：

输入	AOI场景值
`network_arch`	`visual-changenet`
`train_dataset_uri`	`${RESULTS_DIR}/iter${N_final}/dataset/train_combined_iter${N_final}.csv`
`eval_dataset_uri`	与阶段1相同（ `<workspace>/train/base/validation_set.csv` ）——保持对比的一致性
`metric`	与阶段1相同
`algorithm`	与阶段1相同
`automl_max_recommendations`	5–10
初始规格	从 `<workspace>/specs/baseline_spec_automl.yaml` （阶段1的最优规格）开始——为超参数搜索提供强中心基准
热启动检查点	来自 `${RESULTS_DIR}/deft_state.json` 的 `iterations.<best>.best_ckpt_path` ——将 `spec_overrides["train"]["pretrained_model_path"]` 设置为该路径。阶段3的每个推荐项将从阶段2的最优模型开始微调，而非从头训练。

热启动是强制要求：如果没有热启动，每个推荐项都会从随机初始化开始，仅用10-20个 epoch重新收敛，阶段3的

val_loss

会比迭代1退化0.03-0.05，且

_pick_best

安全机制会自动回退到迭代最优模型——浪费阶段3的计算资源。具体的

spec_overrides

代码（选择

far_pct

最低的迭代，排除任何先前的

final_automl

）、广泛探索的权衡、输出到

${RESULTS_DIR}/final_automl/

，以及通过

iterations.final_automl

将阶段3的检查点重新关联到DEFT报告并重新运行

prepare_inference_spec.py

（含

_pick_best

退化安全机制）的逻辑，均见

references/handoff.md

。

Pitfalls and quality checks

注意事项与质量检查

These apply to both AutoML phases — bake them into agent behavior, don't just paste once. The full detail is in

references/pitfalls.md

Metric pitfalls (AOI is class-imbalanced). ChangeNet AOI is PASS-dominant;
```
val_loss
```
can mode-collapse to a zero-recall PASS-everything model. Prefer FAR @ 100%-recall directly, or gate val_loss with a
```
pred_counts
```
sanity check, or decide top-K by FAR @ 100%-recall. For balanced / regression tasks, val_loss is fine.
Run-to-run noise. AutoML can show 2–3× metric variance for the same config. If the winner looks suspiciously better than the runner-up, re-run with a fresh seed before committing the spec to Phase 2.
Cleanliness (data leakage). Both AutoML phases use a validation set distinct from the KPI test set (
```
kpi/testing_set.csv
```
), which stays untouched until DEFT's evaluate stage. Phase 3 trains on the augmented CSV but keeps the same val set so Phase 1 and Phase 3 numbers stay comparable.
Compute budget. Surface the per-phase structure up front and only give a wall-clock range after the user supplies their per-job time.

这些规则适用于两个AutoML阶段——需融入Agent行为，而非仅粘贴一次。详细内容见

references/pitfalls.md

：

指标陷阱（AOI存在类别不平衡）：ChangeNet AOI以PASS类为主；
```
val_loss
```
可能导致模型崩溃为零召回率的“全PASS”模型。优先直接使用FAR @ 100%-recall，或用
```
pred_counts
```
合理性检查约束val_loss，或按FAR @ 100%-recall选择前K个模型。对于平衡/回归任务，val_loss是可行的。
运行间噪声：相同配置下，AutoML的指标可能存在2–3倍的差异。如果最优模型的性能明显优于次优模型，在将规格提交给阶段2之前，需使用新种子重新运行。
数据清洁度（数据泄漏）：两个AutoML阶段使用的验证集均与KPI测试集（
```
kpi/testing_set.csv
```
）分离，该测试集在DEFT的评估阶段之前保持未使用状态。阶段3在增强CSV上训练，但使用相同的验证集，使阶段1和阶段3的结果具有可比性。
计算预算：提前向用户说明各阶段的结构，仅在用户提供单任务时间后给出耗时范围。

Quick Start (AOI worked example)

快速开始（AOI示例）

When starting fresh from "run the AOI workflow", the agent delivers a three-phase worded message to the user (Phase 1 AutoML baseline → Phase 2 DEFT loop → Phase 3 AutoML refinement, with the cost framing and "OK to proceed?" close), then after confirmation invokes

tao-run-automl

(Phase 1), writes the merged spec, pre-seeds

deft_state.json

, invokes

tao-run-deft-aoi

(Phase 2) with every input pre-supplied, and invokes

tao-run-automl

again (Phase 3) — with no further pauses unless a downstream skill hits an unrecoverable hard-stop gate — then summarizes the trajectory (baseline AutoML best → DEFT iter 1 → ... → DEFT iter N_final → Phase 3 best).

See

references/quick-start.md

for the verbatim customer-facing message and the exact post-confirmation invoke sequence.

当用户从“运行AOI工作流”开始时，Agent需向用户发送包含三阶段的文字说明（阶段1：AutoML基线 → 阶段2：DEFT循环 → 阶段3：AutoML优化，包含成本说明和“是否继续？”的确认），用户确认后调用

tao-run-automl

（阶段1）、写入合并后的规格、预填充

deft_state.json

、调用

tao-run-deft-aoi

（阶段2，所有输入预先提供）、再次调用

tao-run-automl

（阶段3）——除非下游技能遇到不可恢复的硬停止检查点，否则无需再暂停——最后总结整个流程的性能轨迹（基线AutoML最优 → DEFT迭代1 → ... → DEFT迭代N_final → 阶段3最优）。

面向客户的标准话术和确认后的具体调用序列见

references/quick-start.md

。

Non-AOI DEFT applications

非AOI DEFT应用

The same three-phase pattern applies to other DEFT skills — swap

network_arch

, the Phase 2 DEFT skill, the spec/checkpoint path conventions, and the Phase 3 augmented-CSV path. The handoff shape (Phase 1 emits spec + checkpoint that pre-seeds the DEFT baseline, Phase 2 emits an augmented dataset, Phase 3 emits the final checkpoint) is identical, and the baseline-skip mechanism is generic to any DEFT-style loop with a resumable baseline state. See

references/quick-start.md

相同的三阶段模式适用于其他DEFT技能——只需替换

network_arch

、阶段2的DEFT技能、规格/检查点路径约定，以及阶段3的增强CSV路径。传递逻辑（阶段1输出规格+检查点预填充DEFT基线，阶段2输出增强数据集，阶段3输出最终检查点）完全相同，且基线跳过机制适用于任何具有可恢复基线状态的DEFT风格循环。详见

references/quick-start.md

。