NV-Segment-CT Finetune

Purpose

Used for smoke or dataset finetuning of NV-Segment-CT VISTA3D on CT NIfTI labels. Not for clinical validation.
Wraps the upstream MONAI bundle entrypoint; do not replace it with handwritten training or inference code.

Manifest inputs are

dataset_dir

datalist

target_anatomy

label_mapping

smoke

sanity

auto_seg

, and

skip_formal_eval

Manifest outputs are
```
finetuned_ckpt
```
and schema-checked
```
result_json
```
.

Instructions

Run
```
scripts/run_finetune.py
```
; do not patch files under
```
bundle/
```
or upstream checkouts during normal skill use.
For standalone Bash, include the fresh-environment setup line before the wrapper; benchmark venvs start empty.
Run the committed script in place from the repo root. Do not copy this skill to a runtime directory, and do not use
```
rm
```
or cleanup commands in generated invocations.

If a host exposes

run_script

, use

run_script("scripts/run_finetune.py", args=[...])

; otherwise run from the repo root.

For the shortest workflow check, use
```
--smoke
```
; for MSD Task06 Lung Tumor reproduction, use
```
--sanity
```
.
Read
```
references/task06-and-results.md
```
only when you need Task06 reference details, output-field definitions, or manual bundle setup notes.

Available Scripts

Script Purpose Arguments

Script	Purpose	Arguments
`scripts/run_finetune.py`	Primary entrypoint declared by `skill_manifest.yaml` ; stages configs, runs MONAI, and writes `output.json` .	`[FIXTURE_OR_DATASET] --output-dir OUT_DIR [--smoke] [--sanity] [--auto-seg] [--dataset-dir DIR] [--datalist JSON] [--target-anatomy TEXT] [--label-mapping JSON] [--patch-size JSON]`

scripts/run_finetune.py

Primary entrypoint declared by

skill_manifest.yaml

; stages configs, runs MONAI, and writes

output.json

[FIXTURE_OR_DATASET] --output-dir OUT_DIR [--smoke] [--sanity] [--auto-seg] [--dataset-dir DIR] [--datalist JSON] [--target-anatomy TEXT] [--label-mapping JSON] [--patch-size JSON]

Prerequisites

Python 3.10+ with CUDA-capable Torch for GPU runs.

Runtime packages from

skill_manifest.yaml

, especially

monai==1.4.0

numpy<2

nibabel

scipy

typer

PyYAML

fire

pytorch-ignite

einops

, and

huggingface_hub

Optional environment variables:
```
CUDA_VISIBLE_DEVICES
```
restricts visible GPUs;
```
NPROC_PER_NODE
```
overrides GPU count and values
```
>=2
```
select multi-GPU mode for non-sanity runs.

Side effects: writes generated bundle configs under

skills/nv-segment-ct-finetune/bundle/configs/

, including

skills/nv-segment-ct-finetune/bundle/configs/auto_override.json

skills/nv-segment-ct-finetune/bundle/configs/train_continual_task06_lung.json

, and

skills/nv-segment-ct-finetune/bundle/configs/dfw_no_logging.json

; writes checkpoints/evidence under

--output-dir

, may cache model assets under

~/.cache/huggingface/

, and may contact

https://huggingface.co

https://raw.githubusercontent.com

Fresh environment setup:

bash

python -m pip install "monai==1.4.0" "numpy<2" pytorch-ignite einops nibabel scipy typer PyYAML fire huggingface_hub

Known upstream compatibility constraints:

DFW Task06 reference: Python
```
3.10.16
```
, MONAI
```
1.4.0
```
, Torch
```
2.7.0+cu126
```
.
Use exact
```
monai==1.4.0
```
for smoke, sanity, and evidence runs; MONAI 1.5.x can crash the upstream finetune loss on boolean labels.
Do not float the dependency as
```
monai>=1.4,<1.6
```
in generated commands.

Usage

Smoke-scale workflow check:

bash

python -m pip install "monai==1.4.0" "numpy<2" pytorch-ignite einops nibabel scipy typer PyYAML fire huggingface_hub && \
python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
  PATH_TO_DATASET \
  --smoke \
  --patch-size '[64,64,64]' \
  --output-dir runs/nvseg_smoke

Use the staged dataset as

PATH_TO_DATASET

. For the micro fixture, use

skills/nv-segment-ct-finetune/fixtures/spleen_micro

. Smoke mode proves wiring, config generation, checkpoint loading, and runtime compatibility; it is not a quality bar.

MSD Task06 Lung Tumor sanity reproduction:

bash

python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
  /path/to/Task06 \
  --sanity \
  --output-dir runs/nvseg_task06_sanity

The sanity preset follows the single-GPU DFW recipe: fold-0 validation, label mapping

[[1, 23]]

for

lung tumor

, automatic class-prompt segmentation, patch

[128,128,128]

, 5 epochs, and original-spacing

configs/evaluate.json

scoring before and after training. Expected reference range is pretrained Dice about

0.6697

, training-best Dice about

0.6905

, and fine-tuned formal Dice about

0.6836

User-data finetune:

bash

python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
  --dataset-dir /path/to/dataset \
  --datalist /path/to/datalist.json \
  --target-anatomy "lung tumor" \
  --auto-seg \
  --epochs 5 \
  --patch-size '[128,128,128]' \
  --output-dir runs/nvseg_user_finetune

Use

--label-mapping '[[1, 23]]'

when local label values are custom or the anatomy name is ambiguous.

Examples

Smoke run on a staged tiny dataset:

bash

python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
  runs/with_vs_without_nv/_inputs/nv_segment_ct_finetune/input_dataset \
  --smoke \
  --patch-size '[64,64,64]' \
  --output-dir runs/nvseg_smoke

Task06 sanity run on a local MSD cache:

bash

python skills/nv-segment-ct-finetune/scripts/run_finetune.py \
  .workbench_data/datasets/Task06_Lung \
  --sanity \
  --output-dir runs/nvseg_task06_sanity

Data Contract

Preferred layout:

dataset/imagesTr/*.nii.gz

and

dataset/labelsTr/*.nii.gz

Labels must align one-to-one with images by basename.
The target label value must be present in the training labels.
Use a datalist when patient-level splitting matters. The bundle default
```
fold
```
is
```
0
```
, so
```
fold: 0
```
entries are validation and all other folds are training.
Every trained foreground label must map to an existing VISTA3D global class id from
```
bundle/label_dict.json
```
; this skill cannot invent a new class.

Results

Check

output.json

in the run directory first:

```
formal_pretrained_val_dice
```
and
```
formal_finetuned_val_dice
```
: original-spacing pre/post scores when formal eval is enabled.

training_start_val_dice

val_dice_per_epoch

, and

training_best_val_dice

: training-time validation trace.

finetuned_ckpt_matches_pretrained_weights

: detects the epoch-0 checkpoint trap when

val_at_start=true

```
recommended_ckpt
```
: checkpoint to keep. Do not blindly use the last epoch or
```
model_finetune.pt
```
.
```
runtime.oom
```
,
```
runtime.peak_gpu_mb
```
, and phase logs: distinguish OOM, slow validation, and process failure.

Decision rule: prefer formal original-spacing pre/post scores when present; reject tensor-identical "fine-tuned" checkpoints for sanity recovery; treat

improved: false

as valid evidence rather than a wrapper failure.

Limitations

Thin wrapper. Training, validation, transforms, and checkpointing are delegated to the upstream bundle in
```
bundle/
```
.
The auto-derived plan is heuristic; caller-provided
```
--patch-size
```
,
```
--cache-rate
```
,
```
--epochs
```
, and
```
--learning-rate
```
win.
The Task06 sanity recipe intentionally forces single-GPU execution to match the DFW reference. Multi-GPU mode for other datasets requires host
```
torchrun
```
support.
The paired verifier is CPU-only and audits the evidence pack; it does not re-run GPU segmentation.
Not for clinical deployment, clinical interpretation, autonomous diagnosis, or regulatory submission.

Troubleshooting

Error	Cause	Fix
Missing dependency or import error	Runtime drift from `skill_manifest.yaml` .	Install the packages above or use the documented environment.
Low Task06 pretrained Dice	Wrong config, wrong checkpoint, data split drift, or dependency drift.	Compare environment fields and staged configs before changing training logic.
`model_finetune.pt` matches pretrained	`val_at_start=true` selected epoch 0 as best.	Use `recommended_ckpt` ; treat sanity recovery as failed unless a changed checkpoint improves formal Dice.
Missing formal Dice fields	Formal eval failed or was skipped.	Inspect `eval_pretrained.log` , `eval_finetuned.log` , and `metrics.csv` .
GPU out of memory	Patch/cache settings too large.	Reduce `--patch-size` , lower `--cache-rate` , or reduce workers.
No validation cases	Datalist lacks `fold: 0` .	Provide at least one validation entry.

Verification

Run the implemented verifier when quality gates matter:

bash

python -m eval_engine.run_trusted skills/nv-segment-ct-finetune \
  --fixture skills/nv-segment-ct-finetune/fixtures/spleen_micro \
  --out runs/nvseg_trusted

nv-segment-ct-finetune

NPX Install

Tags

SKILL.md Content

NV-Segment-CT Finetune

Purpose

Instructions

Available Scripts

Prerequisites

Usage

Examples

Data Contract

Results

Limitations

Troubleshooting

Verification