AutoML + DEFT Pipeline

A workflow-bridge skill that runs three phases in sequence by delegating to two existing skills —

tao-run-automl

for HPO and a DEFT application skill (default

tao-run-deft-aoi

for AOI; other

skills/applications/deft-*

skills for non-AOI cases) for the iterative data-improvement loop.

This skill does not re-implement AutoML or DEFT. It owns only the connective tissue: HPO spec inputs, the spec-handoff between AutoML and DEFT, and the post-DEFT AutoML re-run on the augmented dataset.

When this skill applies

User asks to "run the AOI workflow" or "improve my AOI ChangeNet model" — default to this skill, not
```
tao-run-deft-aoi
```
directly. The bare DEFT loop is the inner stage of this pipeline.
User wants AutoML and DEFT chained on the same model/dataset
User says "AutoML at both ends", "tune HPs then DEFT", "warm-start DEFT", "AutoML before and after DEFT"
User has an AutoML-tuned spec and asks how to feed it into DEFT

When this skill does NOT apply

User explicitly asks for the DEFT loop only ("run JUST the DEFT loop", "skip AutoML") → use
```
tao-run-deft-aoi
```
directly
User wants only AutoML with no follow-on DEFT → use
```
tao-run-automl
```
directly
User is doing zero-shot eval, RAG, or non-training workflows

The mental model

Phase 1 (AutoML baseline)        Phase 2 (DEFT loop, plain train)        Phase 3 (AutoML refinement)
─────────────────────────        ────────────────────────────────        ───────────────────────────
specs/baseline_spec.yaml         (Phase 1 winner pre-seeds baseline      ${RESULTS_DIR}/iter${N}/dataset/
train/base/training_set.csv       — DEFT skips its baseline train)       train_combined_iter${N}.csv
        │                                       │                                       │
        ▼                                       ▼                                       ▼
[ AutoML HPO sweep ]               [ DEFT: baseline-inference → RCA       [ AutoML HPO sweep ]
   N recommendations                 → iter 1..N (plain retrain) ]        re-tunes HPs against the
   pick best by val_loss / FAR      RCA / route / SDG / mining             DEFT-augmented dataset
        │                                       │                                       │
        ▼                                       ▼                                       ▼
best HPs spec + ckpt ─────►      DEFT-augmented CSV ───────────►        final best checkpoint
                                 + iter winner checkpoint               (the deliverable; no
                                 (Phase 3 warm-starts from it)           further retrain)

The handoffs are:

Phase 1 → Phase 2: a spec file AND the winning checkpoint. Retraining the same HPs in DEFT's baseline step is wasted compute, so the bridge deep-merges Phase 1's winning HPs onto
```
baseline_spec.yaml
```
, copies the winning checkpoint into
```
${RESULTS_DIR}/baseline/train/
```
under the filename DEFT expects, and pre-populates
```
deft_state.json
```
+
```
loop_log.jsonl
```
so DEFT resumes at baseline inference → evaluate → RCA → iter 1. DEFT itself stays plain-train (
```
automl_policy: off
```
preserved). Verbatim 4-step procedure in
```
references/handoff.md
```
.
Phase 2 → Phase 3: a training CSV AND the iter winner's checkpoint. The CSV (
```
train_combined_iter${N_final}.csv
```
) is AutoML's training data; the checkpoint (
```
iterations.<best>.best_ckpt_path
```
from
```
deft_state.json
```
) is wired into each rec's
```
train.pretrained_model_path
```
so Phase 3 fine-tunes from Phase 2's winner rather than from scratch. Without this warm-start Phase 3 routinely regresses vs the iter winner. Phase 3's winning checkpoint is the deliverable — no separate retrain after Phase 3. See
```
references/handoff.md
```
.

Why three phases instead of two

Phase 1 alone finds good HPs on the original training distribution, but the model still has the distributional gaps DEFT is designed to fill.
Phase 2 alone (just DEFT) fills the gaps but uses whatever HPs
```
specs/baseline_spec.yaml
```
was hand-authored with — usually not optimal.
Phase 3 alone would run AutoML against the augmented dataset, but without a tuned baseline the DEFT loop's iteration cost is higher (slower convergence, more iterations to hit the KPI).

Running all three: AutoML cheap-tunes once on the original data, DEFT does the heavy data work with reasonable HPs, then AutoML tunes again on the now-richer dataset. Phase 3 is the most important of the three for the final deployed FAR/recall.

Cost up-front

The pipeline is sequential. Total wall-clock ≈ Phase 1 (N_automl × per-rec train) + Phase 2 (M iterations × per-iter cost) + Phase 3 (N_automl × per-rec train).

Note that Phase 2 has no separate baseline train — Phase 1's winning checkpoint is reused as DEFT's baseline, so the baseline cost lands inside Phase 1's N_automl trainings rather than as an extra retrain. Surface this to the user before kickoff. Typically Phase 2's iterations still dominate (each includes SDG + retrain), but Phase 1 and Phase 3 each add several hours on a single-GPU box. Use the per-job estimate from the user's setup (if they have one) rather than guessing minutes. See

references/pitfalls.md

for the per-phase cost breakdown.

Consolidated Pre-Flight — one gate, all three phases

The pipeline has exactly one user gate. Before any side-effecting action (docker pull, docker login, any job-launch call delegated to a downstream skill, file mutations under

${RESULTS_DIR}/

), the agent must produce a single consolidated Pre-Flight Summary that subsumes every downstream skill's preflight. Once the user approves, the run is autonomous through all three phases — no further interactive pauses.

The user explicitly does not want to be paged between phases. The DEFT loop's own inline

## Pre-Flight Summary

gate becomes a zero-question display step (every value pre-supplied), as does

tao-run-automl

's shared launch preflight in Phases 1 and 3.

Before printing the gate the agent must read every downstream preflight section in full and run every read-only check those sections prescribe, surfacing each outcome in the summary. Running every step of the DEFT skill's

## Pre-Flight

is mandatory — if any step is skipped the consolidated gate is invalid and the pipeline must not advance. The summary must include, in order: (1) workspace/host/platform/network, (2) credentials SET/UNSET status, (3) resolved container image URIs with PRESENT/MISSING, (4) dataset table with leakage check, (5) Phase 1 config, (6) Phase 2 config incl. pre-seeded baseline source, (7) Phase 3 config, (8) compute estimate, (9) the confirmation line. After the gate, pass every collected value through to each downstream skill so it has nothing to ask. The only allowed post-gate pauses are mid-run hard-stop safety gates (e.g. DEFT's KPI regression gate); call them out in the summary.

See

references/preflight.md

for the full build procedure, the exact mandatory contents of each summary section (with the GPU memory rule of thumb, DEFT loop defaults, and required inputs verbatim), the downstream gate-suppression inputs, and the fallback when an older skill-bank version hard-codes its own STOP gate.

Phase 1 — AutoML baseline

Invoke

tao-skill-bank:tao-run-automl

with:

Input	AOI default	Notes
`network_arch`	`visual-changenet`	Same model the DEFT loop expects
`train_dataset_uri`	`<workspace>/train/base/training_set.csv`	Same training set DEFT will start from
`eval_dataset_uri`	`<workspace>/train/base/validation_set.csv`	Held-out — must NOT be the KPI test set ( `<workspace>/kpi/testing_set.csv` ), since that set is reserved for DEFT's final reporting
`metric`	FAR @ 100% recall (preferred) or `val_loss`	See `references/pitfalls.md` — ChangeNet AOI is class-imbalanced, val_loss alone can mode-collapse
`algorithm`	`bayesian`	LLM-brain or `autoresearch` if compute is tight
`automl_max_recommendations`	5–10 for AOI	More recs = better HPs but linear in compute
`spec_overrides`	Pin epochs / batch_size; sweep optimizer-related HPs only	Otherwise AutoML wanders into long-train regimes that blow Phase 2's budget

After the sweep finishes, AutoML's

result["best"]["specs"]

is the winning hyperparameter dict.

Handoff to Phase 2

Phase 1 hands over two artifacts: the winning spec and the winning checkpoint. Instead of retraining the same HPs in DEFT's baseline step, pre-seed DEFT's baseline state from Phase 1's outputs so DEFT starts at baseline inference → evaluate → RCA → iter 1. The four steps — write the merged

baseline_spec_automl.yaml

, copy the winning checkpoint into

${RESULTS_DIR}/baseline/train/

, initialise

deft_state.json

with

iterations.baseline.stage_completed == "train"

(and append the matching

loop_log.jsonl

entry), then invoke DEFT — are given verbatim with the exact code in

references/handoff.md

automl_policy: off

inside the loop is preserved.

Quality check before handing off

Run a quick eval of the winning checkpoint against the held-out set: per-class prediction counts (if it collapsed to one class, evaluate the 2nd or 3rd best instead) and a comparison to a zero-shot ChangeNet baseline (if AutoML did not improve over zero-shot, surface that and pause). See

references/handoff.md

Phase 2 — DEFT loop (plain training, baseline pre-seeded from Phase 1)

Invoke

tao-skill-bank:tao-run-deft-aoi

(read its

SKILL.md

for the full interface). For non-AOI applications, invoke the matching DEFT skill; the handoff shape is the same.

The DEFT loop's baseline-train sub-step is skipped. Phase 1 already produced a checkpoint trained at the winning HPs, and Phase 1's handoff (see above) pre-populated

${RESULTS_DIR}/baseline/train/

and

${RESULTS_DIR}/deft_state.json

so DEFT resumes at baseline inference → evaluate → RCA → iter 1. The rest of the DEFT loop runs unchanged. Do not modify its
automl_policy: off
invariant.

The DEFT loop owns:

The Pre-Flight Summary display step — not a fresh user gate. The Consolidated Pre-Flight (above) is the single gate; the DEFT summary still prints as an audit-trail display of the pre-seeded
```
baseline/train/
```
source but must not re-prompt, since every input was collected in the consolidated gate.
Baseline inference → evaluate → RCA on the pre-seeded checkpoint, and the full per-iteration RCA → routing → SDG → mining → assemble → train cycle.

KPI gating and stop conditions;

${RESULTS_DIR}/

layout,

deft_state.json

loop_log.jsonl

DEFT_Loop_Report.html

After the loop exits (KPI met or

max_iterations

reached), capture two values from

deft_state.json

```
iterations.<best>.best_ckpt_path
```
— the loop's best plain-train checkpoint
The final iteration label
```
N_final
```
— used to locate the augmented training CSV

If the DEFT loop hard-stops on an unrecoverable gate, skip Phase 3. There is no validated augmented CSV to feed AutoML.

Phase 3 — AutoML refinement on the DEFT-augmented dataset

Re-invoke

tao-skill-bank:tao-run-automl

with the augmented training CSV as the train dataset, the same held-out validation CSV as before, and Phase 2's iter winner checkpoint as the warm-start:

Input	AOI value
`network_arch`	`visual-changenet`
`train_dataset_uri`	`${RESULTS_DIR}/iter${N_final}/dataset/train_combined_iter${N_final}.csv`
`eval_dataset_uri`	Same as Phase 1 ( `<workspace>/train/base/validation_set.csv` ) — keep the comparison apples-to-apples
`metric`	Same metric as Phase 1
`algorithm`	Same as Phase 1
`automl_max_recommendations`	5–10
Initial spec	Start from `<workspace>/specs/baseline_spec_automl.yaml` (Phase 1's winner) — gives the sweep a strong centroid to refine around
Warm-start checkpoint	`iterations.<best>.best_ckpt_path` from `${RESULTS_DIR}/deft_state.json` — set `spec_overrides["train"]["pretrained_model_path"]` to this path. Each Phase 3 rec then fine-tunes from Phase 2's winner instead of training from scratch.

The warm-start is mandatory: with no warm-start, every rec starts from random init with only 10-20 epochs to reconverge, Phase 3's

val_loss

regresses 0.03-0.05 vs iter1, and the

_pick_best

safety net silently rolls back to the iter winner — wasting Phase 3's compute. The concrete

spec_overrides

code (selecting the lowest-

far_pct

iteration, excluding any prior

final_automl

), the broad-exploration tradeoff, output to

${RESULTS_DIR}/final_automl/

, and wiring Phase 3's checkpoint back into the DEFT report via

iterations.final_automl

+ re-running

prepare_inference_spec.py

(with the

_pick_best

regression safety net) are all in

references/handoff.md

Pitfalls and quality checks

These apply to both AutoML phases — bake them into agent behavior, don't just paste once. The full detail is in

references/pitfalls.md

Metric pitfalls (AOI is class-imbalanced). ChangeNet AOI is PASS-dominant;
```
val_loss
```
can mode-collapse to a zero-recall PASS-everything model. Prefer FAR @ 100%-recall directly, or gate val_loss with a
```
pred_counts
```
sanity check, or decide top-K by FAR @ 100%-recall. For balanced / regression tasks, val_loss is fine.
Run-to-run noise. AutoML can show 2–3× metric variance for the same config. If the winner looks suspiciously better than the runner-up, re-run with a fresh seed before committing the spec to Phase 2.
Cleanliness (data leakage). Both AutoML phases use a validation set distinct from the KPI test set (
```
kpi/testing_set.csv
```
), which stays untouched until DEFT's evaluate stage. Phase 3 trains on the augmented CSV but keeps the same val set so Phase 1 and Phase 3 numbers stay comparable.
Compute budget. Surface the per-phase structure up front and only give a wall-clock range after the user supplies their per-job time.

Quick Start (AOI worked example)

When starting fresh from "run the AOI workflow", the agent delivers a three-phase worded message to the user (Phase 1 AutoML baseline → Phase 2 DEFT loop → Phase 3 AutoML refinement, with the cost framing and "OK to proceed?" close), then after confirmation invokes

tao-run-automl

(Phase 1), writes the merged spec, pre-seeds

deft_state.json

, invokes

tao-run-deft-aoi

(Phase 2) with every input pre-supplied, and invokes

tao-run-automl

again (Phase 3) — with no further pauses unless a downstream skill hits an unrecoverable hard-stop gate — then summarizes the trajectory (baseline AutoML best → DEFT iter 1 → ... → DEFT iter N_final → Phase 3 best).

See

references/quick-start.md

for the verbatim customer-facing message and the exact post-confirmation invoke sequence.

Non-AOI DEFT applications

The same three-phase pattern applies to other DEFT skills — swap

network_arch

, the Phase 2 DEFT skill, the spec/checkpoint path conventions, and the Phase 3 augmented-CSV path. The handoff shape (Phase 1 emits spec + checkpoint that pre-seeds the DEFT baseline, Phase 2 emits an augmented dataset, Phase 3 emits the final checkpoint) is identical, and the baseline-skip mechanism is generic to any DEFT-style loop with a resumable baseline state. See

references/quick-start.md

tao-run-automl-deft-pipeline

NPX Install

Tags

SKILL.md Content