physical-ai-defect-image-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Physical AI Defect Image Generation Workflow Orchestrator

物理AI缺陷图像生成工作流编排器

Table of Contents

目录

End-to-end orchestration of defect image generation, augmentation, and labeling pipelines for AOI (Automated Optical Inspection) datasets. Every flow has a canonical OSMO workflow YAML in
assets/configs/
that chains all steps non-interactively. Use-case cookbooks in
assets/cookbooks/
provide PCBA usd2roi/image-edit configs and AnomalyGen training configs for PCBA, metal surface, and glass inspection. This skill governs flow selection, data handoffs, and submit commands; component internals live in each component's
SKILL.md
.
针对AOI(Automated Optical Inspection,自动光学检测)数据集的缺陷图像生成、增强和标注流水线的端到端编排。每个工作流在
assets/configs/
中都有一个标准的OSMO工作流YAML文件,可无交互地串联所有步骤。
assets/cookbooks/
中的用例手册提供了PCBA的usd2roi/图像编辑配置,以及针对PCBA、金属表面和玻璃检测的AnomalyGen训练配置。本技能负责工作流选择、数据传递和提交命令;组件内部细节请查看各组件的
SKILL.md

Supported Flows

支持的工作流

FlowEntry pointOSMO YAMLStepsUse cases
Day 0 — Texture DefectsCAD scene USD (
pcba_target.yaml
ships in the cookbook)
texture_defect_generation_day0.yaml
usd2roi (scan_grid + per-cell ROI crops) → image-edit augmentation (
nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL
) → finetune-or-passthrough → infer (anomalygen labels inline, including missing-component)
PCBA
Day 0 — Good Image (usd2roi + Image-Edit)CAD scene USD + per-board
pcba_target.yaml
/
day0_image.yaml
/
day0_crop.yaml
good_image_generation.yaml
usd2roi-render (scan_grid + per-cell ROI crop) → Qwen Image-Edit (OVSL2SL appearance transfer)PCBA clean-image set (ChangeNet golden halves, finetune positives, real-photo pairing)
Day 0 — Structural DefectsCAD scene USD + per-board
pcba_target.yaml
structural_defect_generation.yaml
isaac-render (pose defects: shift / tombstone / sideflip) + per-component crop (single pod) → Qwen Image-Edit (OVSL2SL lighting transfer; pose geometry preserved)PCBA pose-defect set; ChangeNet defect halves
Day 1 — Infer + Label (real-photo alignment, DEFAULT)CAD-derived USD + real PCBA photo (both ship in
datasets/pcb/assets
)
texture_defect_generation_day1_real_alignment.yaml
usd2roi day-1 render → MI register → per-ROI crop → yq-render config → finetune-or-passthrough → infer (anomalygen labels inline)Default PCBA Day 1. Raw AOI screenshot of any usd2roi-supported board
Day 1 — Infer + Label (manual ROI)Pre-captured clean images + ROI masks (NGC artifact or user upload)
texture_defect_generation_day1_manual_roi.yaml
yq-render config → finetune-or-passthrough → infer (anomalygen labels inline)Metal surface, glass (no USD/real-photo flow); PCBA only when user explicitly asks for pre-captured ROI experimentation
Finetune OnlyLabeled anomaly URL artifact
finetune.yaml
yq-render config → finetune (validate_dataset → prep_testcase → torchrun)Any use case; produces checkpoint for Day 0 or Day 1. Requires raw training data under
<dig_url_root>/datasets/<usecase>/raw
(see
assets/configs/setup/setup_<usecase>.yaml
).
All flows run on OSMO. Day 0 flows require
image_edit_endpoint
(Qwen Image-Edit OVSL2SL — existing URL or local deploy from
references/nim/
); Finetune Only has no external endpoints.
工作流入口点OSMO YAML步骤用例
Day 0 — 纹理缺陷CAD场景USD(手册中附带
pcba_target.yaml
texture_defect_generation_day0.yaml
usd2roi(扫描网格 + 每单元ROI裁剪)→ 图像编辑增强(
nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL
)→ 微调或直通 → 推理(AnomalyGen内置标注,含缺失组件
PCBA
Day 0 — 良品图像 (usd2roi + 图像编辑)CAD场景USD + 每块板的
pcba_target.yaml
/
day0_image.yaml
/
day0_crop.yaml
good_image_generation.yaml
usd2roi渲染(扫描网格 + 每单元ROI裁剪)→ Qwen图像编辑(OVSL2SL外观迁移)PCBA良品图像集(ChangeNet基准样本、微调正样本、真实照片配对)
Day 0 — 结构缺陷CAD场景USD + 每块板的
pcba_target.yaml
structural_defect_generation.yaml
isaac渲染(姿态缺陷:偏移/立碑/侧翻) + 单组件裁剪(单个封装)→ Qwen图像编辑(OVSL2SL光照迁移;保留姿态几何)PCBA姿态缺陷集;ChangeNet缺陷样本
Day 1 — 推理 + 标注(真实照片对齐,默认)CAD导出的USD + PCBA真实照片(均包含在
datasets/pcb/assets
中)
texture_defect_generation_day1_real_alignment.yaml
usd2roi Day 1渲染 → MI配准 → 每ROI裁剪 → yq渲染配置 → 微调或直通 → 推理(AnomalyGen内置标注)默认PCBA Day 1工作流。任何支持usd2roi的电路板的原始AOI截图
Day 1 — 推理 + 标注(手动ROI)预捕获的清洁图像 + ROI掩码(NGC工件或用户上传)
texture_defect_generation_day1_manual_roi.yaml
yq渲染配置 → 微调或直通 → 推理(AnomalyGen内置标注)金属表面、玻璃(无USD/真实照片工作流);仅当用户明确要求时用于PCBA预捕获ROI实验
仅微调标注异常的URL工件
finetune.yaml
yq渲染配置 → 微调(验证数据集 → 准备测试用例 → torchrun)任何用例;生成供Day 0或Day 1使用的检查点。要求原始训练数据存放在
<dig_url_root>/datasets/<usecase>/raw
(见
assets/configs/setup/setup_<usecase>.yaml
)。
所有工作流均在OSMO上运行。Day 0工作流需要
image_edit_endpoint
(Qwen Image-Edit OVSL2SL — 现有URL或从
references/nim/
本地部署);仅微调工作流无外部端点。

Pick the right workflow for the user's defect class

根据用户的缺陷类别选择合适的工作流

Defect classWorkflowMechanism
Clean / good / scan-grid /
normal_img + cad_mask
pairs
good_image_generation.yaml
usd2roi-render + Qwen Image-Edit
Texture defects (solder bridge, scratch, discoloration) AND missing-component (handled natively by AnomalyGen, NOT structural)
texture_defect_generation_day0.yaml
Qwen Image-Edit + AnomalyGen AMP/SDG
Structural / pose defects (tombstone, shift, sideflip)
structural_defect_generation.yaml
IsaacSim pose perturbation
Day 1 inference + labeling on a real image
texture_defect_generation_day1_real_alignment.yaml
(PCBA default) or
texture_defect_generation_day1_manual_roi.yaml
(metal/glass; PCBA only when user explicitly asks for pre-captured ROI / skip-alignment)
usd2roi day-1 registration (real-alignment) or direct inference (manual-ROI)
ChangeNet golden/defect pairs: submit
good_image_generation.yaml
+
structural_defect_generation.yaml
with the same
--set name=
(two-submission pairing convention).
Day 0 and Day 1 share the same downstream shape: a Jinja-gated
finetune-job
(omitted when
use_pretrained_checkpoint=true
) feeding
anomaly-infer
. Day 0 prepends
usd2roi-render
+
augment-image-edit
; Day 1 starts from
<dig_url_root>/datasets/<usecase>/raw
. Per-stage detail: each flow's walkthrough.
缺陷类别工作流机制
清洁/良品/扫描网格/
normal_img + cad_mask
配对
good_image_generation.yaml
usd2roi渲染 + Qwen图像编辑
纹理缺陷(焊桥、划痕、变色)及缺失组件(由AnomalyGen原生处理,非结构类)
texture_defect_generation_day0.yaml
Qwen图像编辑 + AnomalyGen AMP/SDG
结构/姿态缺陷(立碑、偏移、侧翻)
structural_defect_generation.yaml
IsaacSim姿态扰动
真实图像的Day 1推理 + 标注
texture_defect_generation_day1_real_alignment.yaml
(PCBA默认)或
texture_defect_generation_day1_manual_roi.yaml
(金属/玻璃;仅当用户明确要求预捕获ROI/跳过对齐时用于PCBA)
usd2roi Day 1配准(真实对齐)或直接推理(手动ROI)
ChangeNet基准/缺陷配对:提交
good_image_generation.yaml
+
structural_defect_generation.yaml
时使用相同的
--set name=
(两次提交配对约定)。
Day 0和Day 1共享相同的下游流程:一个Jinja控制的
finetune-job
(当
use_pretrained_checkpoint=true
时省略)输入到
anomaly-infer
。Day 0前置
usd2roi-render
+
augment-image-edit
;Day 1从
<dig_url_root>/datasets/<usecase>/raw
开始。各阶段细节:查看对应工作流的演练文档。

User intent → knob mapping

用户意图 → 参数映射

Every OV flow is two-stage:
crop_max_emit=N
caps the final per-cell crops (stage 2);
render_patches=N
caps raw scan-grid patches (stage 1, each yielding multiple crops). DO NOT auto-map "generate N images" →
render_patches=N
(wrong stage).
crop_max_emit
does not exist on
structural_defect_generation.yaml
(one crop per component — use
render_patches
) or
texture_defect_generation_day1_real_alignment.yaml
(narrow via the cookbook's
crop.classes
whitelist). Full knob table, smoke-test recipes, defaults, caveats:
references/knob_mapping.md
.
每个OV工作流分为两个阶段
crop_max_emit=N
限制最终的每单元裁剪数量(阶段2);
render_patches=N
限制原始扫描网格补丁数量(阶段1,每个补丁可生成多个裁剪)。请勿自动将“生成N张图像”映射为
render_patches=N
(阶段错误)。
structural_defect_generation.yaml
不存在
crop_max_emit
参数(每个组件对应一个裁剪 — 使用
render_patches
),
texture_defect_generation_day1_real_alignment.yaml
也无此参数(通过手册中的
crop.classes
白名单缩小范围)。完整参数表、冒烟测试方案、默认值和注意事项见
references/knob_mapping.md

Structural-defect sizing (no
crop_max_emit
knob exists)

结构缺陷输出规模(无
crop_max_emit
参数)

Structural output is non-linear in
render_patches
— doubling frames adds ~1.6–1.7× crops, not 2×. Don't use
crop_max_emit
(no effect) or
render_patches=0
(fails). Validated yield table + target-size formula:
references/flows/structural_defect_generation.md
§"Sizing the output". For ambiguous "generate N images", surface the calibration table via
AskUserQuestion
.

结构缺陷输出与
render_patches
非线性关系 — 帧数翻倍会增加约1.6–1.7倍的裁剪量,而非2倍。请勿使用
crop_max_emit
(无效果)或
render_patches=0
(会失败)。经验证的产出表和目标规模公式见
references/flows/structural_defect_generation.md
中的“输出规模调整”章节。对于模糊的“生成N张图像”请求,通过
AskUserQuestion
展示校准表。

Disambiguation: handle vague requests before committing

歧义消除:提交前处理模糊请求

Underspecified prompts ("generate me some images", "run the PCBA flow", "give me defects") must not be resolved by silently assuming a flow / usecase / knob mapping. When intent is ambiguous, pause and present candidate interpretations via
AskUserQuestion
(2–4 mutually exclusive options) before submitting. Disambiguate the load-bearing choices: which flow, which use case, what stage a count refers to, finetune vs. passthrough.
Settled defaults you should NOT disambiguate: PCBA Day 1 → real-alignment; board →
0603_H100
; image-edit endpoint → local cluster service (
references/nim/
);
use_pretrained_checkpoint=true
; Day 1 real-alignment
default_spatial_dependency=cad
(fall back to
free
only when CAD masks are unavailable, see
references/flows/texture_defect_generation_day1_real_alignment.md
).
dig_url_root
is the one exception — NO silent default.
First-time (no memory entry), MUST elicit via
AskUserQuestion
before any submit /
osmo data upload
/
preflight_urls.sh
.
s3://osmo-workflows/dig
is a suggestion to confirm, never auto-picked (~80 GB+ lands there). Later runs may reuse the remembered value silently. See Step 0 + memory rules (§4).
Full trigger table, prompt construction, and when-NOT-to-ask exceptions:
references/disambiguation.md
— load before assembling
AskUserQuestion
options for any vague request.

未明确的请求(如“给我生成一些图像”、“运行PCBA工作流”、“给我生成缺陷样本”)不得通过默认假设工作流/用例/参数映射来解决。当意图不明确时,暂停操作并通过
AskUserQuestion
提供候选解释(2–4个互斥选项),待用户选择后再提交。需明确关键选择:使用哪个工作流、哪个用例、计数对应的阶段、微调还是直通
无需消除歧义的既定默认值:PCBA Day 1 → 真实对齐;电路板 →
0603_H100
;图像编辑端点 → 本地集群服务(
references/nim/
);
use_pretrained_checkpoint=true
;Day 1真实对齐的
default_spatial_dependency=cad
(仅当CAD掩码不可用时才 fallback 到
free
,见
references/flows/texture_defect_generation_day1_real_alignment.md
)。
dig_url_root
是唯一例外 — 无静默默认值
。首次使用(无记忆记录)时,必须通过
AskUserQuestion
获取该值,之后才能执行任何提交/
osmo data upload
/
preflight_urls.sh
操作。
s3://osmo-workflows/dig
是一个需确认的建议值,绝不能自动选择(该路径会占用80GB+存储空间)。后续运行可静默复用记忆中的值。详见步骤0和记忆规则(第4节)。
完整触发表、提示构建及无需询问的例外情况见
references/disambiguation.md
— 在为任何模糊请求构建
AskUserQuestion
选项前,请先查阅该文档。

Step 0: Select Flow, Cookbook, and Gather Inputs

步骤0:选择工作流、手册并收集输入

Before this step, if the request is vague (e.g. "generate me images", "run the PCBA flow", "give me defects"), pause and run the disambiguation cheat sheet above — present candidate interpretations via
AskUserQuestion
and let the user pick. Don't auto-pick a load-bearing default the user didn't actually choose.
在本步骤之前,如果请求模糊(如“给我生成图像”、“运行PCBA工作流”、“给我生成缺陷样本”),请暂停并执行上述歧义消除流程 — 通过
AskUserQuestion
提供候选解释,让用户选择。请勿自动选择用户未明确指定的关键默认值。

First-time gate

首次使用验证

If memory has no entries for this user, ASK the up-front preference questions in ONE
AskUserQuestion
call BEFORE any preflight /
osmo
/
kubectl
/
osmo data upload
, save to memory (§4), then proceed. Bundle:
  • dig_url_root
    — MUST be elicited, not auto-picked. Offer
    s3://osmo-workflows/dig
    as a confirmable suggestion; else user provides their own OSMO-supported storage prefix. ~80 GB+ lands here. No escape hatch other than memory-recall of a previously confirmed value.
  • Default OSMO
    --pool
    — candidates from
    osmo profile list
    pool.accessible
    .
  • Pod-template confirmation — only when
    osmo config show POD_TEMPLATE
    returns 403 (§2 has the exact question).
  • Image-edit endpoint — Day 0 only: Option A (existing URL) vs Option B (deploy local NIM).
Subsequent conversations read these silently from memory. Per-flow choices (use case, checkpoint vs finetune, board, knobs) are asked each time — see below.
如果用户记忆中无相关记录,在执行任何预检/
osmo
/
kubectl
/
osmo data upload
操作前,通过一次
AskUserQuestion
调用
询问前置偏好问题,保存到记忆中(第4节),然后继续。需询问的内容包括:
  • dig_url_root
    — 必须获取,不得自动选择。提供
    s3://osmo-workflows/dig
    作为可确认的建议值;否则由用户提供自己的OSMO支持存储前缀。该路径会占用80GB+存储空间。除复用已确认的记忆值外,无其他捷径。
  • 默认OSMO
    --pool
    — 候选值来自
    osmo profile list
    pool.accessible
  • Pod模板确认 — 仅当
    osmo config show POD_TEMPLATE
    返回403时询问(第2节有确切问题)。
  • 图像编辑端点 — 仅Day 0需要:选项A(现有URL) vs 选项B(部署本地NIM)。
后续对话将从记忆中静默读取这些值。每次都需询问工作流相关选择(用例、检查点还是微调、电路板、参数) — 见下文。

Preflight ordering (after the first-time gate)

预检顺序(首次使用验证后)

Run §1
preflight_credentials.sh
→ §2
preflight_pod_template.sh
→ §3
preflight_urls.sh <flow> <usecase>
→ §4 generate the run stamp. Cadence: §1 and §2 are once-per-conversation gates with cross-conversation memory caching (see §4a in
references/preconditions.md
) — skip when memory records them as already verified / user-confirmed. §3 runs before every submit (varies by flow). §4 is the agent's job — fresh
$STAMP
per submit.
Pod-template enforcement is two layers: the pre-submit
preflight_pod_template.sh
gate (§2) plus an in-pod runtime preflight on every OV + training task (fails fast on missing
/usr/share/nvidia/nvoptix.bin
or
/dev/shm
< 16 GiB). Runtime failure despite §2 passing → template was patched out → route to
physical-ai-infrastructure-setup-and-resilient-scaling
. Missing creds / URL artifacts → offer to submit
setup/setup_<case>.yaml
+
setup/setup_pretrained.yaml
first.
Then ask the user in one message — per-flow choices only (the first-time gate above already covered
dig_url_root
, pool, pod-template, and endpoint preferences; pull those from memory):
  1. Use case — PCBA (use Day 0 + pcb cookbook), metal surface (Day 1 + metal_surface cookbook), glass (Day 1 + glass cookbook), or custom?
  2. Checkpoint available? — If yes (
    use_pretrained_checkpoint=true
    ), use
    <dig_url_root>/models/<usecase>
    and provide
    checkpoint_step
    . If no, finetune from
    <dig_url_root>/datasets/<usecase>/raw
    .
  3. Local-NIM pool capacity check (Day 0 Option B only) — before
    kubectl apply
    , check
    Total Capacity
    via
    physical-ai-infrastructure-setup-and-resilient-scaling
    .
    Total Capacity < 2
    cannot host NIM + DIG concurrently → ask user to add GPUs or switch to Option A.
    image_edit_model
    is always
    nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL
    , never generic
    qwen-image-edit
    .
  4. Save user preferences to memory — after the first-time gate (and after any submit diverging from a documented default), persist load-bearing choices (
    dig_url_root
    , OSMO pool, default board, image-edit endpoint, pod-template state, osmo-admin role). Never save
    image_edit_model
    (constant — saving invites drift) or ephemeral state (STAMP, one-off
    anomaly_types_json
    ). Full table:
    references/preconditions.md
    §4a "Memory rules"
    . Read relevant memories at the start of every new conversation and apply silently.
Review the relevant flow reference before asking — most values have sensible defaults. Day 1 routing: PCBA defaults to
real_alignment
; metal/glass have no USD flow so always
manual_roi
; don't ask the user "manual or real-alignment?" for PCBA unless they explicitly ask to skip alignment.

执行第1节
preflight_credentials.sh
→ 第2节
preflight_pod_template.sh
→ 第3节
preflight_urls.sh <flow> <usecase>
→ 第4节生成运行标记。频率:第1节和第2节为每对话一次的验证门,且跨对话记忆缓存(见
references/preconditions.md
第4a节) — 当记忆记录已验证/用户确认时跳过。第3节在每次提交前运行(因工作流而异)。第4节由代理完成 — 每次提交生成新的
$STAMP
Pod模板验证分为两层:提交前的
preflight_pod_template.sh
验证门(第2节),以及每个OV和训练任务的容器内运行时预检(当缺少
/usr/share/nvidia/nvoptix.bin
/dev/shm
小于16GiB时快速失败)。若第2节通过但运行时失败 → 模板已被修改 → 转至
physical-ai-infrastructure-setup-and-resilient-scaling
。若缺少凭证/URL工件 → 先提交相关的
setup/setup_<case>.yaml
+
setup/setup_pretrained.yaml
然后在一条消息中询问用户仅与工作流相关的选择(上述首次使用验证已覆盖
dig_url_root
、pool、pod模板和端点偏好;从记忆中提取这些值):
  1. 用例 — PCBA(使用Day 0 + pcb手册)、金属表面(Day 1 + metal_surface手册)、玻璃(Day 1 + glass手册),还是自定义?
  2. 是否有可用检查点? — 如果有(
    use_pretrained_checkpoint=true
    ),使用
    <dig_url_root>/models/<usecase>
    并提供
    checkpoint_step
    。如果没有,从
    <dig_url_root>/datasets/<usecase>/raw
    开始微调。
  3. 本地NIM池容量检查(仅Day 0选项B需要) — 在执行
    kubectl apply
    前,通过
    physical-ai-infrastructure-setup-and-resilient-scaling
    检查
    Total Capacity
    Total Capacity < 2
    无法同时承载NIM + DIG → 询问用户添加GPU或切换到选项A。
    image_edit_model
    始终为
    nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL
    ,绝不能使用通用的
    qwen-image-edit
  4. 将用户偏好保存到记忆 — 首次使用验证后(以及任何偏离文档默认值的提交后),持久化关键选择(
    dig_url_root
    、OSMO pool、默认电路板、图像编辑端点、pod模板状态、osmo-admin角色)。请勿保存
    image_edit_model
    (固定值 — 保存会导致偏差)或临时状态(STAMP、一次性
    anomaly_types_json
    )。完整表格见
    references/preconditions.md
    第4a节“记忆规则”
    。每次新对话开始时读取相关记忆并静默应用。
询问前请查阅相关工作流参考文档 — 大多数值有合理默认值。Day 1路由:PCBA默认使用
real_alignment
;金属/玻璃无USD工作流,因此始终使用
manual_roi
;除非用户明确要求跳过对齐,否则不要询问PCBA用户“手动还是真实对齐?”。

Common Preconditions (all flows)

通用前提条件(所有工作流)

Quick reference. Long-form:
references/preconditions.md
.
  1. OSMO credentials + tokens — once per conversation. If a
    .env
    exists in the workspace, source it first
    (
    set -a; . ./.env; set +a
    ) so
    HF_TOKEN
    is exported. Run
    scripts/preflight_credentials.sh
    ; authoritative check is the OSMO cred
    hf-token
    is provisioned (images are public on
    nvcr.io/nvidia/
    — no registry cred needed). Pass
    --no-probe
    in restricted-egress shells. See
    references/preconditions.md
    §1.
  2. Pod template — once per conversation, with cross-conversation memory caching (see Step 0 §6). Skip when memory records the cluster verified / user-confirmed / 409-skipped. Otherwise run
    scripts/preflight_pod_template.sh
    and branch on exit code (0=verified / 1=patch via infra skill / 2=ask-user (HTTP 403) / 3=skip (HTTP 409) / 4=env-fix). Full branching prose and prompts in
    references/preconditions.md
    §2.
  3. Required URL artifacts — before every submit. Run
    DIG_URL_ROOT=<dig_url_root> scripts/preflight_urls.sh <flow> <usecase> [variant]
    . If anything is missing, stop and submit the relevant
    setup/setup_<case>.yaml
    +
    setup/setup_pretrained.yaml
    first
    (the OSMO setup workflows) — see
    references/setup.md
    . Never download assets locally to work around a problem; if setup fails on credentials, ask the user to rectify them and re-submit on OSMO. Per-flow checklist:
    FlowUse caseRequired URL artifacts under
    <dig_url_root>
    Day 0 — Texture DefectsPCBA
    models/pretrained
    ,
    models/pcb
    ,
    datasets/pcb/raw
    ,
    datasets/pcb/assets
    Day 0 — Good ImagePCBA
    datasets/pcb/assets
    only
    Day 0 — Structural DefectsPCBA
    datasets/pcb/assets
    only
    Day 1Metal surface
    models/pretrained
    ,
    models/metal_surface
    ,
    datasets/metal_surface/raw
    Day 1Glass
    models/pretrained
    ,
    models/glass
    ,
    datasets/glass/raw
    Day 1 real-photo alignmentPCBADay 1 PCBA plus
    datasets/pcb/assets
    Finetune OnlyAny
    models/pretrained
    ,
    datasets/<usecase>/raw
    Built-in
    usecase
    values are
    pcb
    ,
    metal_surface
    ,
    glass
    . See
    references/preconditions.md
    §3.
  4. Name stamping — regenerate
    $STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
    before every submit and pass
    --set name=<flow>-$STAMP
    . Production YAMLs ship no
    name
    default. See
    references/preconditions.md
    §4.
  5. Glass case (UC3) — Roboflow zip — only for
    setup_glass.yaml
    . Upload
    mobile_screen.zip
    to an OSMO URL prefix first; pass
    --set uc3_zip_url_root=<prefix>
    . Full procedure:
    references/setup.md
    §"Glass case (UC3)".

快速参考。详细内容见
references/preconditions.md
  1. OSMO凭证 + 令牌 — 每对话一次。如果工作区中存在
    .env
    文件,先加载它
    set -a; . ./.env; set +a
    ),以便导出
    HF_TOKEN
    。运行
    scripts/preflight_credentials.sh
    ;权威检查是OSMO凭证
    hf-token
    已配置(
    nvcr.io/nvidia/
    上的图像为公开 — 无需注册表凭证)。在受限出口环境中传递
    --no-probe
    。见
    references/preconditions.md
    第1节。
  2. Pod模板 — 每对话一次,且跨对话记忆缓存(见步骤0第6节)。当记忆记录集群已验证/用户确认/409跳过,跳过此步骤。否则运行
    scripts/preflight_pod_template.sh
    并根据退出码分支(0=已验证 / 1=通过基础架构技能修补 / 2=询问用户(HTTP 403) / 3=跳过(HTTP 409) / 4=修复环境)。完整分支说明和提示见
    references/preconditions.md
    第2节。
  3. 必需的URL工件 — 每次提交前。运行
    DIG_URL_ROOT=<dig_url_root> scripts/preflight_urls.sh <flow> <usecase> [variant]
    。如果缺少任何工件,停止操作并先提交相关的
    setup/setup_<case>.yaml
    +
    setup/setup_pretrained.yaml
    (OSMO设置工作流) — 见
    references/setup.md
    绝不能通过本地下载资产来解决问题;如果设置因凭证失败,请询问用户修正后重新在OSMO上提交。各工作流检查清单:
    工作流用例
    <dig_url_root>
    下必需的URL工件
    Day 0 — 纹理缺陷PCBA
    models/pretrained
    ,
    models/pcb
    ,
    datasets/pcb/raw
    ,
    datasets/pcb/assets
    Day 0 — 良品图像PCBA
    datasets/pcb/assets
    Day 0 — 结构缺陷PCBA
    datasets/pcb/assets
    Day 1金属表面
    models/pretrained
    ,
    models/metal_surface
    ,
    datasets/metal_surface/raw
    Day 1玻璃
    models/pretrained
    ,
    models/glass
    ,
    datasets/glass/raw
    Day 1真实照片对齐PCBADay 1 PCBA工件 +
    datasets/pcb/assets
    仅微调任何
    models/pretrained
    ,
    datasets/<usecase>/raw
    内置
    usecase
    值包括
    pcb
    ,
    metal_surface
    ,
    glass
    。见
    references/preconditions.md
    第3节。
  4. 名称标记 — 每次提交前重新生成
    $STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
    ,并传递
    --set name=<flow>-$STAMP
    。生产级YAML文件无默认
    name
    值。见
    references/preconditions.md
    第4节。
  5. 玻璃用例(UC3)— Roboflow压缩包 — 仅适用于
    setup_glass.yaml
    。先将
    mobile_screen.zip
    上传到OSMO URL前缀;传递
    --set uc3_zip_url_root=<prefix>
    。完整流程见
    references/setup.md
    中的“玻璃用例(UC3)”章节。

Flow walkthroughs

工作流演练

Each flow's full walkthrough — group diagrams, prerequisites, submit-command variants, data handoffs, per-stage troubleshooting — lives under
references/flows/
. The agent should read the matching file before submitting any flow it hasn't run in the current conversation.
FlowWorkflow YAMLWalkthrough
Day 0 — Texture Defects (PCBA)
assets/configs/texture_defect_generation_day0.yaml
references/flows/texture_defect_generation_day0.md
Day 0 — Good Image (PCBA)
assets/configs/good_image_generation.yaml
references/flows/good_image_generation.md
Day 0 — Structural Defects (PCBA)
assets/configs/structural_defect_generation.yaml
references/flows/structural_defect_generation.md
Day 1 — Infer + Label (real-photo alignment, default PCBA)
assets/configs/texture_defect_generation_day1_real_alignment.yaml
references/flows/texture_defect_generation_day1_real_alignment.md
Day 1 — Infer + Label (manual ROI, metal/glass + PCBA experimentation)
assets/configs/texture_defect_generation_day1_manual_roi.yaml
references/flows/texture_defect_generation_day1_manual_roi.md
Finetune Only
assets/configs/finetune.yaml
references/flows/finetune.md
每个工作流的完整演练 — 分组图、前提条件、提交命令变体、数据传递、各阶段故障排除 — 均位于
references/flows/
下。代理在提交当前对话中未运行过的工作流前,应先阅读对应文件。
工作流工作流YAML演练文档
Day 0 — 纹理缺陷(PCBA)
assets/configs/texture_defect_generation_day0.yaml
references/flows/texture_defect_generation_day0.md
Day 0 — 良品图像(PCBA)
assets/configs/good_image_generation.yaml
references/flows/good_image_generation.md
Day 0 — 结构缺陷(PCBA)
assets/configs/structural_defect_generation.yaml
references/flows/structural_defect_generation.md
Day 1 — 推理 + 标注(真实照片对齐,PCBA默认)
assets/configs/texture_defect_generation_day1_real_alignment.yaml
references/flows/texture_defect_generation_day1_real_alignment.md
Day 1 — 推理 + 标注(手动ROI,金属/玻璃 + PCBA实验)
assets/configs/texture_defect_generation_day1_manual_roi.yaml
references/flows/texture_defect_generation_day1_manual_roi.md
仅微调
assets/configs/finetune.yaml
references/flows/finetune.md

Cross-flow invariants

跨工作流不变量

  • use_pretrained_checkpoint=true
    (default) → passthrough against
    models/<usecase>
    . Set to
    false
    to insert an in-pod
    finetune-job
    group (cookbook yq-patched in-pod, no pre-submit render step).
  • Day 0 emits per-cell
    crop/<MATERIAL>/<cell>/...
    trees; Day 1 emits per-ROI crops registered against the USD; structural emits flat per-component crops.
  • Shipped per-usecase
    checkpoint_step
    +
    anomaly_types_json
    defaults: see
    references/preconditions.md
    §"Shipped checkpoint and
    anomaly_types_json
    defaults".

  • use_pretrained_checkpoint=true
    (默认)→ 基于
    models/<usecase>
    直通。设置为
    false
    可插入容器内的
    finetune-job
    组(手册yq修补容器内配置,无需提交前渲染步骤)。
  • Day 0输出每单元
    crop/<MATERIAL>/<cell>/...
    目录结构;Day 1输出与USD配准的每ROI裁剪;结构缺陷输出扁平的单组件裁剪。
  • 随附的各用例
    checkpoint_step
    +
    anomaly_types_json
    默认值见
    references/preconditions.md
    中的“随附检查点和
    anomaly_types_json
    默认值”章节。

OSMO Monitoring

OSMO监控

Load
references/monitoring.md
before any
osmo workflow submit
,
osmo workflow query
, or
osmo workflow logs
action in this skill.
It defines the polling cadence, task-status interpretation, log-pull escalation thresholds, failure-classification routing, and what to surface to the user vs. silently retry. Do not assemble a post-submit watch loop or status summary from memory — re-read it on the first such action of every conversation.
bash
osmo workflow query <workflow_id> --format-type json | jq '{status, tasks: [.groups[].tasks[] | {name, status, exit_code}]}'
osmo workflow logs <workflow_id> -t <task_name> -n 200
osmo data download <dig_url_root>/runs/<name>/anomaly ./output/anomaly-<name>/
Monitoring discipline:
references/monitoring.md
. Retrieval:
references/output_retrieval.md
. Presentation:
references/output_rendering.md
. Gotchas:
references/troubleshooting.md
.

在本技能中执行任何
osmo workflow submit
osmo workflow query
osmo workflow logs
操作前,请先查阅
references/monitoring.md
。该文档定义了轮询频率、任务状态解读、日志拉取升级阈值、失败分类路由,以及需向用户展示和需静默重试的内容。请勿凭记忆构建提交后的监控循环或状态摘要 — 每次对话首次执行此类操作时都需重新查阅该文档。
bash
osmo workflow query <workflow_id> --format-type json | jq '{status, tasks: [.groups[].tasks[] | {name, status, exit_code}]}'
osmo workflow logs <workflow_id> -t <task_name> -n 200
osmo data download <dig_url_root>/runs/<name>/anomaly ./output/anomaly-<name>/
监控规范:
references/monitoring.md
。结果检索:
references/output_retrieval.md
。结果展示:
references/output_rendering.md
。常见问题:
references/troubleshooting.md

Response Template

响应模板

For "show me the plan / recipe" requests, emit your final response with these labeled sections (so nothing truncates mid-recipe):
Workflow:
<flow name>
assets/configs/<yaml>
Preflights:
scripts/preflight_credentials.sh
;
scripts/preflight_urls.sh <0|1|finetune> <usecase> [variant]
Required URL Artifacts under
<dig_url_root>
:
enumerate per Common Preconditions §3 for the chosen flow.
Submit Command:
bash
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
osmo workflow submit assets/configs/<yaml> --pool <pool> \
  --set name=<flow>-$STAMP dig_url_root=<root> usecase=<usecase> \
        image_edit_endpoint=<endpoint> image_edit_model=nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL \
        checkpoint_step=<step> 'anomaly_types_json=<types>'
Monitoring: load
references/monitoring.md
before running the submit; apply its polling cadence + log-pull thresholds after
osmo workflow submit
returns a workflow id.
Output Location:
<dig_url_root>/runs/<flow>-$STAMP/anomaly/
(per-flow override: see flow walkthrough).

对于“展示计划/方案”请求,按以下带标签的部分输出最终响应(避免方案被截断):
工作流:
<工作流名称>
assets/configs/<yaml文件名>
预检:
scripts/preflight_credentials.sh
;
scripts/preflight_urls.sh <0|1|finetune> <usecase> [variant]
<dig_url_root>
下必需的URL工件:
根据通用前提条件第3节列举所选工作流对应的工件。
提交命令:
bash
STAMP=$(cat /proc/sys/kernel/random/uuid | cut -c1-8)
osmo workflow submit assets/configs/<yaml文件名> --pool <pool> \
  --set name=<工作流名称>-$STAMP dig_url_root=<根路径> usecase=<用例> \
        image_edit_endpoint=<端点> image_edit_model=nvidia/Qwen-Image-Edit-NVPCB-OVSL2SL \
        checkpoint_step=<步骤> 'anomaly_types_json=<类型>'
监控: 提交前查阅
references/monitoring.md
osmo workflow submit
返回工作流ID后,应用其轮询频率 + 日志拉取阈值。
输出位置:
<dig_url_root>/runs/<工作流名称>-$STAMP/anomaly/
(各工作流覆盖情况见对应工作流演练文档)。

Supporting files

支持文件

Full inventory — workflow YAMLs, cookbooks, scripts table, references, evals, component skills — in
references/contents.md
. Top-level dirs:
assets/configs/
,
assets/cookbooks/
,
scripts/
,
references/
,
evals/
.
完整清单 — 工作流YAML、手册、脚本表、参考文档、评估、组件技能 — 见
references/contents.md
。顶级目录:
assets/configs/
,
assets/cookbooks/
,
scripts/
,
references/
,
evals/