Visual ChangeNet
Visual ChangeNet is a TAO Toolkit model for visual inspection and defect detection. It supports two tasks:
- Classify — Binary image classification using a siamese-style architecture with a shared backbone (C-RADIO ViT) and a learnable difference module. Compares image pairs to classify defects as PASS/NO_PASS.
- Segment — Pixel-level change segmentation using a ViT-Large NVDINOv2 backbone. Compares before/after image pairs to produce a binary change mask.
The backbone weight (
c_radio_v2_vit_base_patch16_224
) is the
model from HuggingFace, distributed as
(~393 MB).
The TAO 7.0.0-rc container does not auto-fetch from HF URLs —
ptm_utils.load_pretrained_weights()
hands the
value to
/
safetensors.torch.load_file(path)
directly. Passing an
https://huggingface.co/...
URL or a repo id produces
and the run fails with
within a few seconds. Stage the file locally before launch:
bash
python3 -c "from huggingface_hub import hf_hub_download; import shutil; \
shutil.copy(hf_hub_download('nvidia/C-RADIOv2-B', 'model.safetensors'), '<workspace>/backbone/c_radio_v2_b.safetensors')"
Mount it into the container (
-v <workspace>/backbone/c_radio_v2_b.safetensors:/data/pretrained_models/C-RADIOv2_B.safetensors
) and set the spec
model.backbone.pretrained_backbone_path
to the container path.
is only needed at staging time, not at training time.
Dataclass Schemas
Generated TAO Core schemas are packaged in
schemas/<action>.schema.json
, with
listing available actions. Each generated schema also emits
references/spec_template_<action>.yaml
from the schema top-level
field. AutoML enablement is declared at the model layer in
references/skill_info.yaml
via
. Runnable AutoML still requires
schemas/train.schema.json
and
references/spec_template_train.yaml
to exist and parse. Use the packaged train schema for
automl_default_parameters
,
automl_disabled_parameters
, defaults, min/max bounds, enums, option weights, math conditions, dependencies, and popular parameters. Do not expect
at runtime; maintainers regenerate schemas/templates before packaging the skill bank.
Train Action Policy
This model is AutoML-enabled at the model layer. Before handling any train-stage request, read
references/skill_info.yaml
and resolve the run override from either an explicit
value or the user's workflow request. Treat phrases like "turn off AutoML", "disable AutoML", "no HPO", or "plain training" as
for this run only; otherwise default to
. When
,
, and both
schemas/train.schema.json
and
references/spec_template_train.yaml
are packaged, route the train action through
tao-skill-bank:tao-run-automl
by default with this model's
. Preserve workflow/application overrides for datasets, specs, output directories, GPU/platform settings, parent checkpoints, and
. Use direct model training only when
or the packaged train schema/template is missing; in the missing-schema case, report that AutoML is enabled but not runnable for this model until schemas are generated.
Non-train actions such as
,
,
, and deploy flows stay in this model skill. The per-run
override does not change model metadata.
For TAO Deploy TensorRT actions (
, TensorRT
, and TensorRT
for classify and segment variants), read
references/tao-deploy-visual-changenet.md
first. Deploy spec templates live in this skill's
folder with the
spec_template_deploy_*.yaml
prefix.
Tasks
Classify (default)
Uses actions:
,
,
. Defaults template:
references/spec_template_train.yaml
.
Segment
Uses actions:
,
,
. Defaults template:
references/spec_template_segment.yaml
.
Segmentation requires compiling custom CUDA ops (
MultiScaleDeformableAttention
) on first run, which takes ~5 minutes. The ViT adapter backbone uses these for multi-scale feature extraction.
Dataset structure for segmentation differs from classify — uses paired directories (
,
,
,
) instead of CSV files. See
in the defaults.
Datasets, Spec Overrides, and Data Format
Visual ChangeNet has two task modes with different dataset types and data source structures. Classify uses a 4-column CSV (
input_path,golden_path,label,object_name
) plus an images directory; segment uses a paired directory structure (
,
,
,
) under a single
. Data source overrides are
mandatory for every action — the agent MUST construct data source paths and include them in
.
See
references/dataset-and-specs.md
for the full per-action dataset requirement tables (classify and segment), every spec-override example (train, export, quantize, evaluate, inference, gen_trt_engine for both variants), the classify CSV format, evaluate/inference and segment input fields, lighting conventions, segment data layout, and the
multi-lighting configuration.
Local Docker Invocation
Without the TAO SDK, resolve the TAO pyt image from
and invoke
visual_changenet <action>
directly with
and the backbone
mounted as a single file. See
references/local-docker-invocation.md
for the full
command, the shared-memory requirement, the backbone mount detail, and the checkpoint/results_dir command-line override pattern.
Parameters, Hardware, and Error Patterns
Key knobs include
train.validation_interval
(default 50, must be ≤ num_epochs),
train.checkpoint_interval
(default 200, must be ≤ num_epochs),
(default 100),
model.classify.eval_margin
(default 0.3, the primary precision/recall threshold), and
train.classify.cls_weight
(default [1.0, 10.0]). Minimum hardware is 1 GPU with 16GB+ VRAM; 8 GPUs (DDP) are recommended for production. GPU count is managed internally by TAO — do not set
.
See
references/parameters-and-troubleshooting.md
for the full parameter reference, hardware guidance, and the complete error-pattern catalog (checkpoint not found, CSV format mismatch, image extension mismatch, OOM, low eval accuracy, the contrastive-loss assertion, non-convergence, the segment-only MultiScaleDeformableAttention build, Lightning epoch misconfiguration, PYTHONPATH/ModuleNotFoundError, and epoch defaults).
Spec Param / Parent Model Inference
Model-specific inference mappings belong in this MD file, not in
. Generated runners should read this section and apply the mappings with SDK helpers before
. This mirrors the old microservices
flow.
Inference mappings from this model skill:
| Action | Spec Field | Inference Function | Meaning |
|---|
| evaluate | | | current job results directory |
| inference | | | current job results directory |
| train | | | current job results directory |
| train | train.resume_training_checkpoint_path
| | model file inferred from the current job results folder |
For
or
, pass the upstream train/export/AutoML child job id as
. The SDK lists the parent result folder, filters checkpoint artifacts, and returns the selected model file or folder. Do not add these mappings back to
and do not patch generated runner scripts to guess checkpoint paths.
Deployment
- tao-deploy-visual-changenet — Visual ChangeNet deploy workflow for TensorRT engine generation, TensorRT evaluation, and TensorRT inference using TAO Deploy.