Physical AI Defect Image Generation Workflow Orchestrator
Table of Contents
End-to-end orchestration of defect image generation, augmentation, and labeling pipelines for AOI (Automated Optical Inspection) datasets. Every flow has a canonical OSMO workflow YAML in
that chains all steps non-interactively. Use-case cookbooks in
provide PCBA usd2roi/image-edit configs and AnomalyGen training configs for PCBA, metal surface, and glass inspection. This skill governs flow selection, data handoffs, and submit commands; component internals live in each component's
.
Supported Flows
| Flow | Entry point | OSMO YAML | Steps | Use cases |
|---|
| Day 0 — Texture Defects | CAD scene USD ( ships in the cookbook) | texture_defect_generation_day0.yaml
| usd2roi (scan_grid + per-cell ROI crops) → image-edit augmentation (nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL
) → finetune-or-passthrough → infer (anomalygen labels inline, including missing-component) | PCBA |
| Day 0 — Good Image (usd2roi + Image-Edit) | CAD scene USD + per-board / / | good_image_generation.yaml
| usd2roi-render (scan_grid + per-cell ROI crop) → Qwen Image-Edit (OVSL2SL appearance transfer) | PCBA clean-image set (ChangeNet golden halves, finetune positives, real-photo pairing) |
| Day 0 — Structural Defects | CAD scene USD + per-board | structural_defect_generation.yaml
| isaac-render (pose defects: shift / tombstone / sideflip) + per-component crop (single pod) → Qwen Image-Edit (OVSL2SL lighting transfer; pose geometry preserved) | PCBA pose-defect set; ChangeNet defect halves |
| Day 1 — Infer + Label (real-photo alignment, DEFAULT) | CAD-derived USD + real PCBA photo (both ship in ) | texture_defect_generation_day1_real_alignment.yaml
| usd2roi day-1 render → MI register → per-ROI crop → yq-render config → finetune-or-passthrough → infer (anomalygen labels inline) | Default PCBA Day 1. Raw AOI screenshot of any usd2roi-supported board |
| Day 1 — Infer + Label (manual ROI) | Pre-captured clean images + ROI masks (NGC artifact or user upload) | texture_defect_generation_day1_manual_roi.yaml
| yq-render config → finetune-or-passthrough → infer (anomalygen labels inline) | Metal surface, glass (no USD/real-photo flow); PCBA only when user explicitly asks for pre-captured ROI experimentation |
| Finetune Only | Labeled anomaly URL artifact | | yq-render config → finetune (validate_dataset → prep_testcase → torchrun) | Any use case; produces checkpoint for Day 0 or Day 1. Requires raw training data under <dig_url_root>/datasets/<usecase>/raw
(see assets/configs/setup/setup_<usecase>.yaml
). |
All flows run on OSMO. Day 0 flows require
(Qwen Image-Edit OVSL2SL — existing URL or local deploy from
); Finetune Only has no external endpoints.
Pick the right workflow for the user's defect class
| Defect class | Workflow | Mechanism |
|---|
| Clean / good / scan-grid / pairs | good_image_generation.yaml
| usd2roi-render + Qwen Image-Edit |
| Texture defects (solder bridge, scratch, discoloration) AND missing-component (handled natively by AnomalyGen, NOT structural) | texture_defect_generation_day0.yaml
| Qwen Image-Edit + AnomalyGen AMP/SDG |
| Structural / pose defects (tombstone, shift, sideflip) | structural_defect_generation.yaml
| IsaacSim pose perturbation |
| Day 1 inference + labeling on a real image | texture_defect_generation_day1_real_alignment.yaml
(PCBA default) or texture_defect_generation_day1_manual_roi.yaml
(metal/glass; PCBA only when user explicitly asks for pre-captured ROI / skip-alignment) | usd2roi day-1 registration (real-alignment) or direct inference (manual-ROI) |
ChangeNet golden/defect pairs: submit
good_image_generation.yaml
+
structural_defect_generation.yaml
with the same
(two-submission pairing convention).
Day 0 and Day 1 share the same downstream shape: a Jinja-gated
(omitted when
use_pretrained_checkpoint=true
) feeding
. Day 0 prepends
+
; Day 1 starts from
<dig_url_root>/datasets/<usecase>/raw
. Per-stage detail: each flow's walkthrough.
User intent → knob mapping
Every OV flow is two-stage:
caps the
final per-cell crops (stage 2);
caps
raw scan-grid patches (stage 1, each yielding multiple crops).
DO NOT auto-map "generate N images" → (wrong stage).
does not exist on
structural_defect_generation.yaml
(one crop per component — use
) or
texture_defect_generation_day1_real_alignment.yaml
(narrow via the cookbook's
whitelist). Full knob table, smoke-test recipes, defaults, caveats:
references/knob_mapping.md
.
Structural-defect sizing (no knob exists)
Structural output is
non-linear in — doubling frames adds ~1.6–1.7× crops, not 2×. Don't use
(no effect) or
(fails). Validated yield table + target-size formula:
references/flows/structural_defect_generation.md
§"Sizing the output". For ambiguous "generate N images", surface the calibration table via
.
Disambiguation: handle vague requests before committing
Underspecified prompts ("generate me some images", "run the PCBA flow", "give me defects")
must not be resolved by silently assuming a flow / usecase / knob mapping. When intent is ambiguous, pause and present candidate interpretations via
(2–4 mutually exclusive options) before submitting. Disambiguate the load-bearing choices:
which flow, which use case, what stage a count refers to, finetune vs. passthrough.
Settled defaults you should NOT disambiguate: PCBA Day 1 → real-alignment; board →
; image-edit endpoint → local cluster service (
);
use_pretrained_checkpoint=true
; Day 1 real-alignment
default_spatial_dependency=cad
(fall back to
only when CAD masks are unavailable, see
references/flows/texture_defect_generation_day1_real_alignment.md
).
is the one exception — NO silent default. First-time (no memory entry), MUST elicit via
before any submit /
/
.
is a
suggestion to confirm, never auto-picked (~80 GB+ lands there). Later runs may reuse the remembered value silently. See Step 0 + memory rules (§4).
Full trigger table, prompt construction, and when-NOT-to-ask exceptions: references/disambiguation.md
— load before assembling
options for any vague request.
Step 0: Select Flow, Cookbook, and Gather Inputs
Before this step, if the request is vague (e.g. "generate me images", "run the PCBA flow", "give me defects"), pause and run the disambiguation cheat sheet above — present candidate interpretations via
and let the user pick. Don't auto-pick a load-bearing default the user didn't actually choose.
First-time gate
If memory has no entries for this user, ASK the up-front preference questions in ONE
call BEFORE any preflight /
/
/
, save to memory (§4), then proceed. Bundle:
- — MUST be elicited, not auto-picked. Offer as a confirmable suggestion; else user provides their own OSMO-supported storage prefix. ~80 GB+ lands here. No escape hatch other than memory-recall of a previously confirmed value.
- Default OSMO — candidates from → .
- Pod-template confirmation — only when
osmo config show POD_TEMPLATE
returns 403 (§2 has the exact question).
- Image-edit endpoint — Day 0 only: Option A (existing URL) vs Option B (deploy local NIM).
Subsequent conversations read these silently from memory. Per-flow choices (use case, checkpoint vs finetune, board, knobs) are asked each time — see below.
Preflight ordering (after the first-time gate)
Run §1
→ §2
preflight_pod_template.sh
→ §3
preflight_urls.sh <flow> <usecase>
→ §4 generate the run stamp.
Cadence: §1 and §2 are once-per-conversation gates with cross-conversation memory caching (see §4a in
references/preconditions.md
) — skip when memory records them as already verified / user-confirmed. §3 runs before every submit (varies by flow). §4 is the agent's job — fresh
per submit.
Pod-template enforcement is two layers: the pre-submit
preflight_pod_template.sh
gate (§2) plus an in-pod runtime preflight on every OV + training task (fails fast on missing
/usr/share/nvidia/nvoptix.bin
or
< 16 GiB). Runtime failure despite §2 passing → template was patched out → route to
physical-ai-infrastructure-setup-and-resilient-scaling
. Missing creds / URL artifacts → offer to submit
+
setup/setup_pretrained.yaml
first.
Then ask the user in one message — per-flow choices only (the first-time gate above already covered
, pool, pod-template, and endpoint preferences; pull those from memory):
- Use case — PCBA (use Day 0 + pcb cookbook), metal surface (Day 1 + metal_surface cookbook), glass (Day 1 + glass cookbook), or custom?
- Checkpoint available? — If yes (
use_pretrained_checkpoint=true
), use <dig_url_root>/models/<usecase>
and provide . If no, finetune from <dig_url_root>/datasets/<usecase>/raw
.
- Local-NIM pool capacity check (Day 0 Option B only) — before , check via
physical-ai-infrastructure-setup-and-resilient-scaling
. cannot host NIM + DIG concurrently → ask user to add GPUs or switch to Option A. is always nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL
, never generic .
- Save user preferences to memory — after the first-time gate (and after any submit diverging from a documented default), persist load-bearing choices (, OSMO pool, default board, image-edit endpoint, pod-template state, osmo-admin role). Never save (constant — saving invites drift) or ephemeral state (STAMP, one-off ). Full table:
references/preconditions.md
§4a "Memory rules". Read relevant memories at the start of every new conversation and apply silently.
Review the relevant flow reference before asking — most values have sensible defaults. Day 1 routing: PCBA defaults to
; metal/glass have no USD flow so always
; don't ask the user "manual or real-alignment?" for PCBA unless they explicitly ask to skip alignment.
Common Preconditions (all flows)
Quick reference. Long-form:
references/preconditions.md
.
-
OSMO credentials + tokens — once per conversation.
If a exists in the workspace, source it first (
) so
is exported. Run
scripts/preflight_credentials.sh
; authoritative check is the OSMO cred
is provisioned (images are public on
— no registry cred needed). Pass
in restricted-egress shells. See
references/preconditions.md
§1.
-
Pod template — once per conversation, with cross-conversation memory caching (see Step 0 §6). Skip when memory records the cluster verified / user-confirmed / 409-skipped. Otherwise run
scripts/preflight_pod_template.sh
and branch on exit code (0=verified / 1=patch via infra skill / 2=ask-user (HTTP 403) / 3=skip (HTTP 409) / 4=env-fix). Full branching prose and prompts in
references/preconditions.md
§2.
-
Required URL artifacts — before every submit. Run
DIG_URL_ROOT=<dig_url_root> scripts/preflight_urls.sh <flow> <usecase> [variant]
. If anything is missing,
stop and submit the relevant + setup/setup_pretrained.yaml
first (the OSMO setup workflows) — see
.
Never download assets locally to work around a problem; if setup fails on credentials, ask the user to rectify them and re-submit on OSMO. Per-flow checklist:
| Flow | Use case | Required URL artifacts under |
|---|
| Day 0 — Texture Defects | PCBA | , , , |
| Day 0 — Good Image | PCBA | only |
| Day 0 — Structural Defects | PCBA | only |
| Day 1 | Metal surface | , , datasets/metal_surface/raw
|
| Day 1 | Glass | , , |
| Day 1 real-photo alignment | PCBA | Day 1 PCBA plus |
| Finetune Only | Any | , |
Built-in
values are
,
,
. See
references/preconditions.md
§3.
-
Name stamping — regenerate
$STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
before every submit and pass
. Production YAMLs ship no
default. See
references/preconditions.md
§4.
-
Glass case (UC3) — Roboflow zip — only for
. Upload
to an OSMO URL prefix first; pass
--set uc3_zip_url_root=<prefix>
. Full procedure:
§"Glass case (UC3)".
Flow walkthroughs
Each flow's full walkthrough — group diagrams, prerequisites, submit-command variants, data handoffs, per-stage troubleshooting — lives under
. The agent should read the matching file before submitting any flow it hasn't run in the current conversation.
| Flow | Workflow YAML | Walkthrough |
|---|
| Day 0 — Texture Defects (PCBA) | assets/configs/texture_defect_generation_day0.yaml
| references/flows/texture_defect_generation_day0.md
|
| Day 0 — Good Image (PCBA) | assets/configs/good_image_generation.yaml
| references/flows/good_image_generation.md
|
| Day 0 — Structural Defects (PCBA) | assets/configs/structural_defect_generation.yaml
| references/flows/structural_defect_generation.md
|
| Day 1 — Infer + Label (real-photo alignment, default PCBA) | assets/configs/texture_defect_generation_day1_real_alignment.yaml
| references/flows/texture_defect_generation_day1_real_alignment.md
|
| Day 1 — Infer + Label (manual ROI, metal/glass + PCBA experimentation) | assets/configs/texture_defect_generation_day1_manual_roi.yaml
| references/flows/texture_defect_generation_day1_manual_roi.md
|
| Finetune Only | assets/configs/finetune.yaml
| references/flows/finetune.md
|
Cross-flow invariants
use_pretrained_checkpoint=true
(default) → passthrough against . Set to to insert an in-pod group (cookbook yq-patched in-pod, no pre-submit render step).
- Day 0 emits per-cell
crop/<MATERIAL>/<cell>/...
trees; Day 1 emits per-ROI crops registered against the USD; structural emits flat per-component crops.
- Shipped per-usecase + defaults: see
references/preconditions.md
§"Shipped checkpoint and defaults".
OSMO Monitoring
Load before any , , or action in this skill. It defines the polling cadence, task-status interpretation, log-pull escalation thresholds, failure-classification routing, and what to surface to the user vs. silently retry. Do not assemble a post-submit watch loop or status summary from memory — re-read it on the first such action of every conversation.
bash
osmo workflow query <workflow_id> --format-type json | jq '{status, tasks: [.groups[].tasks[] | {name, status, exit_code}]}'
osmo workflow logs <workflow_id> -t <task_name> -n 200
osmo data download <dig_url_root>/runs/<name>/anomaly ./output/anomaly-<name>/
Monitoring discipline:
. Retrieval:
references/output_retrieval.md
. Presentation:
references/output_rendering.md
. Gotchas:
references/troubleshooting.md
.
Response Template
For "show me the plan / recipe" requests, emit your final response with these labeled sections (so nothing truncates mid-recipe):
Preflights: scripts/preflight_credentials.sh
;
scripts/preflight_urls.sh <0|1|finetune> <usecase> [variant]
Required URL Artifacts under : enumerate per Common Preconditions §3 for the chosen flow.
Submit Command:
bash
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
osmo workflow submit assets/configs/<yaml> --pool <pool> \
--set name=<flow>-$STAMP dig_url_root=<root> usecase=<usecase> \
image_edit_endpoint=<endpoint> image_edit_model=nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL \
checkpoint_step=<step> 'anomaly_types_json=<types>'
Monitoring: load
before running the submit; apply its polling cadence + log-pull thresholds after
returns a workflow id.
Output Location: <dig_url_root>/runs/<flow>-$STAMP/anomaly/
(per-flow override: see flow walkthrough).
Supporting files
Full inventory — workflow YAMLs, cookbooks, scripts table, references, evals, component skills — in
. Top-level dirs:
,
,
,
,
.