parity-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Parity Testing for Megatron Bridge

Megatron Bridge 一致性测试

This skill provides the decision framework for choosing the right verification tool and interpreting results. For the full model onboarding workflow (which includes parity testing as milestones 1 and 2), see the
add-model-support
skill.
本技能提供了选择合适验证工具及解读结果的决策框架。如需包含一致性测试(作为里程碑1和2)的完整模型接入工作流,请查看
add-model-support
技能。

Quick Decision: Which Tool to Run

快速决策:选择运行哪个工具

What you want to verifyToolGPU?When to use
All weights round-trip exactly (single GPU)
hf_megatron_roundtrip.py
NoFirst check after writing a bridge
Weights round-trip with TP/PP/EP
hf_megatron_roundtrip_multi_gpu.py
YesAfter single-GPU passes
Forward-pass logit equivalence
compare_hf_and_megatron/compare.py
YesAfter round-trip passes
Text generation sanity
hf_to_megatron_generate_text.py
YesLarge models that OOM compare.py
Programmatic weight check
weights_verification_table()
YesInside Python scripts
VLM generation sanity
hf_to_megatron_generate_vlm.py
YesVLM models
All tools live under
examples/conversion/
.
验证目标工具是否需要GPU使用场景
所有权重往返完全匹配(单GPU)
hf_megatron_roundtrip.py
编写桥接代码后的首次检查
带TP/PP/EP的权重往返
hf_megatron_roundtrip_multi_gpu.py
单GPU测试通过后
前向传播logit一致性
compare_hf_and_megatron/compare.py
往返测试通过后
文本生成合理性
hf_to_megatron_generate_text.py
运行compare.py会出现OOM的大模型
程序化权重检查
weights_verification_table()
Python脚本内部使用
VLM生成合理性
hf_to_megatron_generate_vlm.py
VLM模型
所有工具均位于
examples/conversion/
目录下。

3-Level Test Strategy

三级测试策略

Level 1: State Dict Round-Trip (exact match)

第一级:状态字典往返(精确匹配)

The fastest and most fundamental check. If mappings can't perfectly round-trip weights, nothing else will work.
bash
undefined
这是最快且最基础的检查。如果权重无法完美往返映射,其他测试都无法正常运行。
bash
undefined

Single-GPU round-trip

单GPU往返测试

uv run python examples/conversion/hf_megatron_roundtrip.py
--hf-model-id <org>/<model>
uv run python examples/conversion/hf_megatron_roundtrip.py
--hf-model-id <org>/<model>

Multi-GPU with TP=2

TP=2的多GPU测试

uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --tp 2
uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --tp 2

Multi-GPU with PP=2

PP=2的多GPU测试

uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --pp 2

**Expected:** Every weight shows "Matches Original: checkmark". Any "X"
means the param mapping has an error.

**Tolerance:** Exact match (`max_diff == 0.0`). Round-trip conversions are
pure tensor reshaping — no floating-point arithmetic is involved.

For programmatic verification inside scripts, use the built-in verifier:

```python
from megatron.bridge.models.conversion.utils import weights_verification_table
weights_verification_table(bridge, hf_pretrained, megatron_model)
uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --pp 2

**预期结果:** 每个权重均显示"Matches Original: checkmark"。任何"X"都表示参数映射存在错误。

**容错标准:** 精确匹配(`max_diff == 0.0`)。往返转换仅涉及张量重塑,不涉及浮点运算。

如需在脚本内部进行程序化验证,可使用内置验证器:

```python
from megatron.bridge.models.conversion.utils import weights_verification_table
weights_verification_table(bridge, hf_pretrained, megatron_model)

Level 2: Forward-Pass Parity (GPU / bfloat16)

第二级:前向传播一致性(GPU / bfloat16)

After round-trip passes, verify that converted weights produce identical forward-pass output.
bash
undefined
往返测试通过后,验证转换后的权重能否产生一致的前向传播输出。
bash
undefined

Compare logits (loads both HF and Megatron models)

比较logit(加载HF和Megatron两种模型)

uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/compare_hf_and_megatron/compare.py
--hf_model_path <org>/<model> --tp 2
--prompt "The capital of France is"

**Expected:** Cosine similarity > 99.99%, matching next-token predictions.

For large models that OOM `compare.py` (which loads both models), use text
generation instead:

```bash
uv run python -m torch.distributed.run --nproc_per_node=2 \
    examples/conversion/hf_to_megatron_generate_text.py \
    --hf_model_path <org>/<model> --tp 2 \
    --prompt "The capital of France is" --max_new_tokens 50
uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/compare_hf_and_megatron/compare.py
--hf_model_path <org>/<model> --tp 2
--prompt "The capital of France is"

**预期结果:** 余弦相似度>99.99%,下一个token预测结果一致。

对于运行`compare.py`会出现OOM(同时加载两种模型)的大模型,可改用文本生成测试:

```bash
uv run python -m torch.distributed.run --nproc_per_node=2 \
    examples/conversion/hf_to_megatron_generate_text.py \
    --hf_model_path <org>/<model> --tp 2 \
    --prompt "The capital of France is" --max_new_tokens 50

Level 3: Training Parity (optional)

第三级:训练一致性(可选)

Verify that a few training steps produce decreasing loss. This catches gradient computation issues that forward-pass tests miss. Use a toy model with 2 layers and small dimensions. See the functional test pattern in the
add-model-support
skill (Milestone 3, Phase 6).
验证少量训练步骤能否使损失下降。这可以捕捉前向传播测试未发现的梯度计算问题。使用包含2层的小型玩具模型进行测试。请查看
add-model-support
技能中的功能测试模式(里程碑3,第6阶段)。

Tolerance Table

容错标准表

Test LevelDtypeDeviceMax DiffCosine Sim
Round-tripfloat32CPU0.0 (exact)1.0 (exact)
Forward passbfloat16GPU< 1e-2> 0.9999
Forward passfloat16GPU< 1e-3> 0.99999
测试级别数据类型设备最大差值余弦相似度
往返测试float32CPU0.0(精确)1.0(精确)
前向传播bfloat16GPU< 1e-2> 0.9999
前向传播float16GPU< 1e-3> 0.99999

Comparison Utilities

比较工具函数

These functions are useful when writing custom verification scripts or debugging failures. They are not part of the Bridge library — copy them into your script as needed.
python
import torch


def compare_tensors(a, b, name=""):
    """Compare two tensors and report similarity metrics."""
    max_diff = (a - b).abs().max().item()
    mean_diff = (a - b).abs().mean().item()
    cos_sim = torch.nn.functional.cosine_similarity(
        a.flatten().float(), b.flatten().float(), dim=0,
    ).item()
    print(f"{name}: max_diff={max_diff:.6e}, mean_diff={mean_diff:.6e}, cosine_sim={cos_sim:.8f}")
    return max_diff, mean_diff, cos_sim


def compare_state_dicts(sd_a, sd_b, prefix=""):
    """Compare two state dicts key-by-key, reporting per-parameter differences."""
    keys_a, keys_b = set(sd_a.keys()), set(sd_b.keys())
    missing, extra = keys_a - keys_b, keys_b - keys_a
    if missing:
        print(f"{prefix}Missing keys: {sorted(missing)}")
    if extra:
        print(f"{prefix}Extra keys: {sorted(extra)}")
    max_diffs = {}
    for key in sorted(keys_a & keys_b):
        diff = (sd_a[key].float() - sd_b[key].float()).abs().max().item()
        if diff > 0:
            max_diffs[key] = diff
            print(f"{prefix}{key}: max_diff={diff:.6e}")
    if not max_diffs and not missing and not extra:
        print(f"{prefix}All {len(keys_a & keys_b)} parameters match exactly.")
    return missing, extra, max_diffs
这些函数在编写自定义验证脚本或调试故障时非常有用。它们不属于Bridge库,可根据需要复制到你的脚本中。
python
import torch


def compare_tensors(a, b, name=""):
    """Compare two tensors and report similarity metrics."""
    max_diff = (a - b).abs().max().item()
    mean_diff = (a - b).abs().mean().item()
    cos_sim = torch.nn.functional.cosine_similarity(
        a.flatten().float(), b.flatten().float(), dim=0,
    ).item()
    print(f"{name}: max_diff={max_diff:.6e}, mean_diff={mean_diff:.6e}, cosine_sim={cos_sim:.8f}")
    return max_diff, mean_diff, cos_sim


def compare_state_dicts(sd_a, sd_b, prefix=""):
    """Compare two state dicts key-by-key, reporting per-parameter differences."""
    keys_a, keys_b = set(sd_a.keys()), set(sd_b.keys())
    missing, extra = keys_a - keys_b, keys_b - keys_a
    if missing:
        print(f"{prefix}Missing keys: {sorted(missing)}")
    if extra:
        print(f"{prefix}Extra keys: {sorted(extra)}")
    max_diffs = {}
    for key in sorted(keys_a & keys_b):
        diff = (sd_a[key].float() - sd_b[key].float()).abs().max().item()
        if diff > 0:
            max_diffs[key] = diff
            print(f"{prefix}{key}: max_diff={diff:.6e}")
    if not max_diffs and not missing and not extra:
        print(f"{prefix}All {len(keys_a & keys_b)} parameters match exactly.")
    return missing, extra, max_diffs

Debugging Workflow

调试工作流

When a parity test fails, follow this sequence:
  1. Run single-GPU round-trip — if this fails, the mapping itself is wrong. Check the
    mapping_registry()
    in the bridge file.
  2. If single-GPU passes but multi-GPU fails — the TP/PP scatter/gather is wrong. Compare the TP=1 result against each TP shard. See the
    nccl-contiguous-tensors
    skill for NCCL-specific issues.
  3. If round-trip passes but forward pass fails — weights loaded correctly but the model architecture differs. Check
    provider_bridge()
    config mapping (normalization, activation, RoPE, etc.).
  4. Use the debugging script template from the
    add-model-support
    skill to inspect runtime vs safetensors key naming and bridge config mapping.
For the full catalog of pitfalls (QKV interleaving, MoE fused exports, tied embeddings, FP8 dequantization, TE LayerNorm aliases, etc.), see the Pitfalls section of the
add-model-support
skill.
当一致性测试失败时,请遵循以下步骤:
  1. 运行单GPU往返测试 — 如果失败,说明映射本身存在错误。检查桥接文件中的
    mapping_registry()
  2. 如果单GPU测试通过但多GPU测试失败 — TP/PP的分散/聚合逻辑有误。将TP=1的结果与每个TP分片进行比较。如需了解NCCL相关问题,请查看
    nccl-contiguous-tensors
    技能。
  3. 如果往返测试通过但前向传播测试失败 — 权重加载正确,但模型架构存在差异。检查
    provider_bridge()
    的配置映射(归一化、激活函数、RoPE等)。
  4. 使用
    add-model-support
    技能中的调试脚本模板
    ,检查运行时与safetensors的键名以及桥接配置映射。
如需查看所有常见问题(QKV交错、MoE融合导出、绑定嵌入、FP8反量化、TE LayerNorm别名等),请查看
add-model-support
技能中的“常见问题”部分。

Code Anchors

代码锚点

ComponentPath
Single-GPU round-trip
examples/conversion/hf_megatron_roundtrip.py
Multi-GPU round-trip
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
Forward-pass comparison
examples/conversion/compare_hf_and_megatron/compare.py
Text generation
examples/conversion/hf_to_megatron_generate_text.py
VLM generation
examples/conversion/hf_to_megatron_generate_vlm.py
Checkpoint CLI
examples/conversion/convert_checkpoints.py
Toy model creator
examples/conversion/create_hf_toy_model.py
Verification utility
src/megatron/bridge/models/conversion/utils.py
Adapter verification
examples/conversion/adapter/verify_adapter.py
组件路径
单GPU往返测试
examples/conversion/hf_megatron_roundtrip.py
多GPU往返测试
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
前向传播比较
examples/conversion/compare_hf_and_megatron/compare.py
文本生成
examples/conversion/hf_to_megatron_generate_text.py
VLM生成
examples/conversion/hf_to_megatron_generate_vlm.py
检查点CLI
examples/conversion/convert_checkpoints.py
玩具模型生成器
examples/conversion/create_hf_toy_model.py
验证工具
src/megatron/bridge/models/conversion/utils.py
Adapter验证
examples/conversion/adapter/verify_adapter.py