parity-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Parity Testing for Megatron Bridge

Megatron Bridge 一致性测试

This skill provides the decision framework for choosing the right verification tool and interpreting results. For the full model onboarding workflow (which includes parity testing as milestones 1 and 2), see the

add-model-support

skill.

本技能提供了选择合适验证工具及解读结果的决策框架。如需包含一致性测试（作为里程碑1和2）的完整模型接入工作流，请查看

add-model-support

技能。

Quick Decision: Which Tool to Run

快速决策：选择运行哪个工具

What you want to verify	Tool	GPU?	When to use
All weights round-trip exactly (single GPU)	`hf_megatron_roundtrip.py`	No	First check after writing a bridge
Weights round-trip with TP/PP/EP	`hf_megatron_roundtrip_multi_gpu.py`	Yes	After single-GPU passes
Forward-pass logit equivalence	`compare_hf_and_megatron/compare.py`	Yes	After round-trip passes
Text generation sanity	`hf_to_megatron_generate_text.py`	Yes	Large models that OOM compare.py
Programmatic weight check	`weights_verification_table()`	Yes	Inside Python scripts
VLM generation sanity	`hf_to_megatron_generate_vlm.py`	Yes	VLM models

All tools live under

examples/conversion/

验证目标	工具	是否需要GPU	使用场景
所有权重往返完全匹配（单GPU）	`hf_megatron_roundtrip.py`	否	编写桥接代码后的首次检查
带TP/PP/EP的权重往返	`hf_megatron_roundtrip_multi_gpu.py`	是	单GPU测试通过后
前向传播logit一致性	`compare_hf_and_megatron/compare.py`	是	往返测试通过后
文本生成合理性	`hf_to_megatron_generate_text.py`	是	运行compare.py会出现OOM的大模型
程序化权重检查	`weights_verification_table()`	是	Python脚本内部使用
VLM生成合理性	`hf_to_megatron_generate_vlm.py`	是	VLM模型

所有工具均位于

examples/conversion/

目录下。

3-Level Test Strategy

三级测试策略

Level 1: State Dict Round-Trip (exact match)

第一级：状态字典往返（精确匹配）

The fastest and most fundamental check. If mappings can't perfectly round-trip weights, nothing else will work.

bash

undefined

这是最快且最基础的检查。如果权重无法完美往返映射，其他测试都无法正常运行。

bash

undefined

Single-GPU round-trip

单GPU往返测试

uv run python examples/conversion/hf_megatron_roundtrip.py
--hf-model-id <org>/<model>

Multi-GPU with TP=2

TP=2的多GPU测试

uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --tp 2

Multi-GPU with PP=2

PP=2的多GPU测试

uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --pp 2


**Expected:** Every weight shows "Matches Original: checkmark". Any "X"
means the param mapping has an error.

**Tolerance:** Exact match (`max_diff == 0.0`). Round-trip conversions are
pure tensor reshaping — no floating-point arithmetic is involved.

For programmatic verification inside scripts, use the built-in verifier:

```python
from megatron.bridge.models.conversion.utils import weights_verification_table
weights_verification_table(bridge, hf_pretrained, megatron_model)

uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --pp 2


**预期结果：** 每个权重均显示"Matches Original: checkmark"。任何"X"都表示参数映射存在错误。

**容错标准：** 精确匹配（`max_diff == 0.0`）。往返转换仅涉及张量重塑，不涉及浮点运算。

如需在脚本内部进行程序化验证，可使用内置验证器：

```python
from megatron.bridge.models.conversion.utils import weights_verification_table
weights_verification_table(bridge, hf_pretrained, megatron_model)

Level 2: Forward-Pass Parity (GPU / bfloat16)

第二级：前向传播一致性（GPU / bfloat16）

After round-trip passes, verify that converted weights produce identical forward-pass output.

bash

undefined

往返测试通过后，验证转换后的权重能否产生一致的前向传播输出。

bash

undefined

Compare logits (loads both HF and Megatron models)

比较logit（加载HF和Megatron两种模型）

uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/compare_hf_and_megatron/compare.py
--hf_model_path <org>/<model> --tp 2
--prompt "The capital of France is"


**Expected:** Cosine similarity > 99.99%, matching next-token predictions.

For large models that OOM `compare.py` (which loads both models), use text
generation instead:

```bash
uv run python -m torch.distributed.run --nproc_per_node=2 \
    examples/conversion/hf_to_megatron_generate_text.py \
    --hf_model_path <org>/<model> --tp 2 \
    --prompt "The capital of France is" --max_new_tokens 50

uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/compare_hf_and_megatron/compare.py
--hf_model_path <org>/<model> --tp 2
--prompt "The capital of France is"


**预期结果：** 余弦相似度>99.99%，下一个token预测结果一致。

对于运行`compare.py`会出现OOM（同时加载两种模型）的大模型，可改用文本生成测试：

```bash
uv run python -m torch.distributed.run --nproc_per_node=2 \
    examples/conversion/hf_to_megatron_generate_text.py \
    --hf_model_path <org>/<model> --tp 2 \
    --prompt "The capital of France is" --max_new_tokens 50

Level 3: Training Parity (optional)

第三级：训练一致性（可选）

Verify that a few training steps produce decreasing loss. This catches gradient computation issues that forward-pass tests miss. Use a toy model with 2 layers and small dimensions. See the functional test pattern in the

add-model-support

skill (Milestone 3, Phase 6).

验证少量训练步骤能否使损失下降。这可以捕捉前向传播测试未发现的梯度计算问题。使用包含2层的小型玩具模型进行测试。请查看

add-model-support

技能中的功能测试模式（里程碑3，第6阶段）。

Tolerance Table

容错标准表

Test Level	Dtype	Device	Max Diff	Cosine Sim
Round-trip	float32	CPU	0.0 (exact)	1.0 (exact)
Forward pass	bfloat16	GPU	< 1e-2	> 0.9999
Forward pass	float16	GPU	< 1e-3	> 0.99999

测试级别	数据类型	设备	最大差值	余弦相似度
往返测试	float32	CPU	0.0（精确）	1.0（精确）
前向传播	bfloat16	GPU	< 1e-2	> 0.9999
前向传播	float16	GPU	< 1e-3	> 0.99999

Comparison Utilities

比较工具函数

These functions are useful when writing custom verification scripts or debugging failures. They are not part of the Bridge library — copy them into your script as needed.

python

import torch


def compare_tensors(a, b, name=""):
    """Compare two tensors and report similarity metrics."""
    max_diff = (a - b).abs().max().item()
    mean_diff = (a - b).abs().mean().item()
    cos_sim = torch.nn.functional.cosine_similarity(
        a.flatten().float(), b.flatten().float(), dim=0,
    ).item()
    print(f"{name}: max_diff={max_diff:.6e}, mean_diff={mean_diff:.6e}, cosine_sim={cos_sim:.8f}")
    return max_diff, mean_diff, cos_sim


def compare_state_dicts(sd_a, sd_b, prefix=""):
    """Compare two state dicts key-by-key, reporting per-parameter differences."""
    keys_a, keys_b = set(sd_a.keys()), set(sd_b.keys())
    missing, extra = keys_a - keys_b, keys_b - keys_a
    if missing:
        print(f"{prefix}Missing keys: {sorted(missing)}")
    if extra:
        print(f"{prefix}Extra keys: {sorted(extra)}")
    max_diffs = {}
    for key in sorted(keys_a & keys_b):
        diff = (sd_a[key].float() - sd_b[key].float()).abs().max().item()
        if diff > 0:
            max_diffs[key] = diff
            print(f"{prefix}{key}: max_diff={diff:.6e}")
    if not max_diffs and not missing and not extra:
        print(f"{prefix}All {len(keys_a & keys_b)} parameters match exactly.")
    return missing, extra, max_diffs

这些函数在编写自定义验证脚本或调试故障时非常有用。它们不属于Bridge库，可根据需要复制到你的脚本中。

python

import torch


def compare_tensors(a, b, name=""):
    """Compare two tensors and report similarity metrics."""
    max_diff = (a - b).abs().max().item()
    mean_diff = (a - b).abs().mean().item()
    cos_sim = torch.nn.functional.cosine_similarity(
        a.flatten().float(), b.flatten().float(), dim=0,
    ).item()
    print(f"{name}: max_diff={max_diff:.6e}, mean_diff={mean_diff:.6e}, cosine_sim={cos_sim:.8f}")
    return max_diff, mean_diff, cos_sim


def compare_state_dicts(sd_a, sd_b, prefix=""):
    """Compare two state dicts key-by-key, reporting per-parameter differences."""
    keys_a, keys_b = set(sd_a.keys()), set(sd_b.keys())
    missing, extra = keys_a - keys_b, keys_b - keys_a
    if missing:
        print(f"{prefix}Missing keys: {sorted(missing)}")
    if extra:
        print(f"{prefix}Extra keys: {sorted(extra)}")
    max_diffs = {}
    for key in sorted(keys_a & keys_b):
        diff = (sd_a[key].float() - sd_b[key].float()).abs().max().item()
        if diff > 0:
            max_diffs[key] = diff
            print(f"{prefix}{key}: max_diff={diff:.6e}")
    if not max_diffs and not missing and not extra:
        print(f"{prefix}All {len(keys_a & keys_b)} parameters match exactly.")
    return missing, extra, max_diffs

Debugging Workflow

调试工作流

When a parity test fails, follow this sequence:

Run single-GPU round-trip — if this fails, the mapping itself is wrong. Check the
```
mapping_registry()
```
in the bridge file.
If single-GPU passes but multi-GPU fails — the TP/PP scatter/gather is wrong. Compare the TP=1 result against each TP shard. See the
```
nccl-contiguous-tensors
```
skill for NCCL-specific issues.
If round-trip passes but forward pass fails — weights loaded correctly but the model architecture differs. Check
```
provider_bridge()
```
config mapping (normalization, activation, RoPE, etc.).
Use the debugging script template from the
```
add-model-support
```
skill to inspect runtime vs safetensors key naming and bridge config mapping.

For the full catalog of pitfalls (QKV interleaving, MoE fused exports, tied embeddings, FP8 dequantization, TE LayerNorm aliases, etc.), see the Pitfalls section of the

add-model-support

skill.

当一致性测试失败时，请遵循以下步骤：

运行单GPU往返测试 — 如果失败，说明映射本身存在错误。检查桥接文件中的
```
mapping_registry()
```
。
如果单GPU测试通过但多GPU测试失败 — TP/PP的分散/聚合逻辑有误。将TP=1的结果与每个TP分片进行比较。如需了解NCCL相关问题，请查看
```
nccl-contiguous-tensors
```
技能。
如果往返测试通过但前向传播测试失败 — 权重加载正确，但模型架构存在差异。检查
```
provider_bridge()
```
的配置映射（归一化、激活函数、RoPE等）。
使用
add-model-support
技能中的调试脚本模板，检查运行时与safetensors的键名以及桥接配置映射。

如需查看所有常见问题（QKV交错、MoE融合导出、绑定嵌入、FP8反量化、TE LayerNorm别名等），请查看

add-model-support

技能中的“常见问题”部分。

Code Anchors

代码锚点

Component	Path
Single-GPU round-trip	`examples/conversion/hf_megatron_roundtrip.py`
Multi-GPU round-trip	`examples/conversion/hf_megatron_roundtrip_multi_gpu.py`
Forward-pass comparison	`examples/conversion/compare_hf_and_megatron/compare.py`
Text generation	`examples/conversion/hf_to_megatron_generate_text.py`
VLM generation	`examples/conversion/hf_to_megatron_generate_vlm.py`
Checkpoint CLI	`examples/conversion/convert_checkpoints.py`
Toy model creator	`examples/conversion/create_hf_toy_model.py`
Verification utility	`src/megatron/bridge/models/conversion/utils.py`
Adapter verification	`examples/conversion/adapter/verify_adapter.py`

组件	路径
单GPU往返测试	`examples/conversion/hf_megatron_roundtrip.py`
多GPU往返测试	`examples/conversion/hf_megatron_roundtrip_multi_gpu.py`
前向传播比较	`examples/conversion/compare_hf_and_megatron/compare.py`
文本生成	`examples/conversion/hf_to_megatron_generate_text.py`
VLM生成	`examples/conversion/hf_to_megatron_generate_vlm.py`
检查点CLI	`examples/conversion/convert_checkpoints.py`
玩具模型生成器	`examples/conversion/create_hf_toy_model.py`
验证工具	`src/megatron/bridge/models/conversion/utils.py`
Adapter验证	`examples/conversion/adapter/verify_adapter.py`