Loading...
Loading...
Structured framework for verifying numerical parity of HF<->MCore weight conversions. References existing tools and the add-model-support skill.
npx skill4agent add nvidia/skills parity-testingadd-model-support| What you want to verify | Tool | GPU? | When to use |
|---|---|---|---|
| All weights round-trip exactly (single GPU) | | No | First check after writing a bridge |
| Weights round-trip with TP/PP/EP | | Yes | After single-GPU passes |
| Forward-pass logit equivalence | | Yes | After round-trip passes |
| Text generation sanity | | Yes | Large models that OOM compare.py |
| Programmatic weight check | | Yes | Inside Python scripts |
| VLM generation sanity | | Yes | VLM models |
examples/conversion/# Single-GPU round-trip
uv run python examples/conversion/hf_megatron_roundtrip.py \
--hf-model-id <org>/<model>
# Multi-GPU with TP=2
uv run python -m torch.distributed.run --nproc_per_node=2 \
examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
--hf-model-id <org>/<model> --tp 2
# Multi-GPU with PP=2
uv run python -m torch.distributed.run --nproc_per_node=2 \
examples/conversion/hf_megatron_roundtrip_multi_gpu.py \
--hf-model-id <org>/<model> --pp 2max_diff == 0.0from megatron.bridge.models.conversion.utils import weights_verification_table
weights_verification_table(bridge, hf_pretrained, megatron_model)# Compare logits (loads both HF and Megatron models)
uv run python -m torch.distributed.run --nproc_per_node=2 \
examples/conversion/compare_hf_and_megatron/compare.py \
--hf_model_path <org>/<model> --tp 2 \
--prompt "The capital of France is"compare.pyuv run python -m torch.distributed.run --nproc_per_node=2 \
examples/conversion/hf_to_megatron_generate_text.py \
--hf_model_path <org>/<model> --tp 2 \
--prompt "The capital of France is" --max_new_tokens 50add-model-support| Test Level | Dtype | Device | Max Diff | Cosine Sim |
|---|---|---|---|---|
| Round-trip | float32 | CPU | 0.0 (exact) | 1.0 (exact) |
| Forward pass | bfloat16 | GPU | < 1e-2 | > 0.9999 |
| Forward pass | float16 | GPU | < 1e-3 | > 0.99999 |
import torch
def compare_tensors(a, b, name=""):
"""Compare two tensors and report similarity metrics."""
max_diff = (a - b).abs().max().item()
mean_diff = (a - b).abs().mean().item()
cos_sim = torch.nn.functional.cosine_similarity(
a.flatten().float(), b.flatten().float(), dim=0,
).item()
print(f"{name}: max_diff={max_diff:.6e}, mean_diff={mean_diff:.6e}, cosine_sim={cos_sim:.8f}")
return max_diff, mean_diff, cos_sim
def compare_state_dicts(sd_a, sd_b, prefix=""):
"""Compare two state dicts key-by-key, reporting per-parameter differences."""
keys_a, keys_b = set(sd_a.keys()), set(sd_b.keys())
missing, extra = keys_a - keys_b, keys_b - keys_a
if missing:
print(f"{prefix}Missing keys: {sorted(missing)}")
if extra:
print(f"{prefix}Extra keys: {sorted(extra)}")
max_diffs = {}
for key in sorted(keys_a & keys_b):
diff = (sd_a[key].float() - sd_b[key].float()).abs().max().item()
if diff > 0:
max_diffs[key] = diff
print(f"{prefix}{key}: max_diff={diff:.6e}")
if not max_diffs and not missing and not extra:
print(f"{prefix}All {len(keys_a & keys_b)} parameters match exactly.")
return missing, extra, max_diffsmapping_registry()nccl-contiguous-tensorsprovider_bridge()add-model-supportadd-model-support| Component | Path |
|---|---|
| Single-GPU round-trip | |
| Multi-GPU round-trip | |
| Forward-pass comparison | |
| Text generation | |
| VLM generation | |
| Checkpoint CLI | |
| Toy model creator | |
| Verification utility | |
| Adapter verification | |