parity-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseParity Testing for Megatron Bridge
Megatron Bridge 一致性测试
This skill provides the decision framework for choosing the right
verification tool and interpreting results. For the full model onboarding
workflow (which includes parity testing as milestones 1 and 2), see the
skill.
add-model-support本技能提供了选择合适验证工具及解读结果的决策框架。如需包含一致性测试(作为里程碑1和2)的完整模型接入工作流,请查看技能。
add-model-supportQuick Decision: Which Tool to Run
快速决策:选择运行哪个工具
| What you want to verify | Tool | GPU? | When to use |
|---|---|---|---|
| All weights round-trip exactly (single GPU) | | No | First check after writing a bridge |
| Weights round-trip with TP/PP/EP | | Yes | After single-GPU passes |
| Forward-pass logit equivalence | | Yes | After round-trip passes |
| Text generation sanity | | Yes | Large models that OOM compare.py |
| Programmatic weight check | | Yes | Inside Python scripts |
| VLM generation sanity | | Yes | VLM models |
All tools live under .
examples/conversion/| 验证目标 | 工具 | 是否需要GPU | 使用场景 |
|---|---|---|---|
| 所有权重往返完全匹配(单GPU) | | 否 | 编写桥接代码后的首次检查 |
| 带TP/PP/EP的权重往返 | | 是 | 单GPU测试通过后 |
| 前向传播logit一致性 | | 是 | 往返测试通过后 |
| 文本生成合理性 | | 是 | 运行compare.py会出现OOM的大模型 |
| 程序化权重检查 | | 是 | Python脚本内部使用 |
| VLM生成合理性 | | 是 | VLM模型 |
所有工具均位于目录下。
examples/conversion/3-Level Test Strategy
三级测试策略
Level 1: State Dict Round-Trip (exact match)
第一级:状态字典往返(精确匹配)
The fastest and most fundamental check. If mappings can't perfectly
round-trip weights, nothing else will work.
bash
undefined这是最快且最基础的检查。如果权重无法完美往返映射,其他测试都无法正常运行。
bash
undefinedSingle-GPU round-trip
单GPU往返测试
uv run python examples/conversion/hf_megatron_roundtrip.py
--hf-model-id <org>/<model>
--hf-model-id <org>/<model>
uv run python examples/conversion/hf_megatron_roundtrip.py
--hf-model-id <org>/<model>
--hf-model-id <org>/<model>
Multi-GPU with TP=2
TP=2的多GPU测试
uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --tp 2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --tp 2
uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --tp 2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --tp 2
Multi-GPU with PP=2
PP=2的多GPU测试
uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --pp 2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --pp 2
**Expected:** Every weight shows "Matches Original: checkmark". Any "X"
means the param mapping has an error.
**Tolerance:** Exact match (`max_diff == 0.0`). Round-trip conversions are
pure tensor reshaping — no floating-point arithmetic is involved.
For programmatic verification inside scripts, use the built-in verifier:
```python
from megatron.bridge.models.conversion.utils import weights_verification_table
weights_verification_table(bridge, hf_pretrained, megatron_model)uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --pp 2
examples/conversion/hf_megatron_roundtrip_multi_gpu.py
--hf-model-id <org>/<model> --pp 2
**预期结果:** 每个权重均显示"Matches Original: checkmark"。任何"X"都表示参数映射存在错误。
**容错标准:** 精确匹配(`max_diff == 0.0`)。往返转换仅涉及张量重塑,不涉及浮点运算。
如需在脚本内部进行程序化验证,可使用内置验证器:
```python
from megatron.bridge.models.conversion.utils import weights_verification_table
weights_verification_table(bridge, hf_pretrained, megatron_model)Level 2: Forward-Pass Parity (GPU / bfloat16)
第二级:前向传播一致性(GPU / bfloat16)
After round-trip passes, verify that converted weights produce identical
forward-pass output.
bash
undefined往返测试通过后,验证转换后的权重能否产生一致的前向传播输出。
bash
undefinedCompare logits (loads both HF and Megatron models)
比较logit(加载HF和Megatron两种模型)
uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/compare_hf_and_megatron/compare.py
--hf_model_path <org>/<model> --tp 2
--prompt "The capital of France is"
examples/conversion/compare_hf_and_megatron/compare.py
--hf_model_path <org>/<model> --tp 2
--prompt "The capital of France is"
**Expected:** Cosine similarity > 99.99%, matching next-token predictions.
For large models that OOM `compare.py` (which loads both models), use text
generation instead:
```bash
uv run python -m torch.distributed.run --nproc_per_node=2 \
examples/conversion/hf_to_megatron_generate_text.py \
--hf_model_path <org>/<model> --tp 2 \
--prompt "The capital of France is" --max_new_tokens 50uv run python -m torch.distributed.run --nproc_per_node=2
examples/conversion/compare_hf_and_megatron/compare.py
--hf_model_path <org>/<model> --tp 2
--prompt "The capital of France is"
examples/conversion/compare_hf_and_megatron/compare.py
--hf_model_path <org>/<model> --tp 2
--prompt "The capital of France is"
**预期结果:** 余弦相似度>99.99%,下一个token预测结果一致。
对于运行`compare.py`会出现OOM(同时加载两种模型)的大模型,可改用文本生成测试:
```bash
uv run python -m torch.distributed.run --nproc_per_node=2 \
examples/conversion/hf_to_megatron_generate_text.py \
--hf_model_path <org>/<model> --tp 2 \
--prompt "The capital of France is" --max_new_tokens 50Level 3: Training Parity (optional)
第三级:训练一致性(可选)
Verify that a few training steps produce decreasing loss. This catches
gradient computation issues that forward-pass tests miss. Use a toy model
with 2 layers and small dimensions. See the functional test pattern in the
skill (Milestone 3, Phase 6).
add-model-support验证少量训练步骤能否使损失下降。这可以捕捉前向传播测试未发现的梯度计算问题。使用包含2层的小型玩具模型进行测试。请查看技能中的功能测试模式(里程碑3,第6阶段)。
add-model-supportTolerance Table
容错标准表
| Test Level | Dtype | Device | Max Diff | Cosine Sim |
|---|---|---|---|---|
| Round-trip | float32 | CPU | 0.0 (exact) | 1.0 (exact) |
| Forward pass | bfloat16 | GPU | < 1e-2 | > 0.9999 |
| Forward pass | float16 | GPU | < 1e-3 | > 0.99999 |
| 测试级别 | 数据类型 | 设备 | 最大差值 | 余弦相似度 |
|---|---|---|---|---|
| 往返测试 | float32 | CPU | 0.0(精确) | 1.0(精确) |
| 前向传播 | bfloat16 | GPU | < 1e-2 | > 0.9999 |
| 前向传播 | float16 | GPU | < 1e-3 | > 0.99999 |
Comparison Utilities
比较工具函数
These functions are useful when writing custom verification scripts or
debugging failures. They are not part of the Bridge library — copy them
into your script as needed.
python
import torch
def compare_tensors(a, b, name=""):
"""Compare two tensors and report similarity metrics."""
max_diff = (a - b).abs().max().item()
mean_diff = (a - b).abs().mean().item()
cos_sim = torch.nn.functional.cosine_similarity(
a.flatten().float(), b.flatten().float(), dim=0,
).item()
print(f"{name}: max_diff={max_diff:.6e}, mean_diff={mean_diff:.6e}, cosine_sim={cos_sim:.8f}")
return max_diff, mean_diff, cos_sim
def compare_state_dicts(sd_a, sd_b, prefix=""):
"""Compare two state dicts key-by-key, reporting per-parameter differences."""
keys_a, keys_b = set(sd_a.keys()), set(sd_b.keys())
missing, extra = keys_a - keys_b, keys_b - keys_a
if missing:
print(f"{prefix}Missing keys: {sorted(missing)}")
if extra:
print(f"{prefix}Extra keys: {sorted(extra)}")
max_diffs = {}
for key in sorted(keys_a & keys_b):
diff = (sd_a[key].float() - sd_b[key].float()).abs().max().item()
if diff > 0:
max_diffs[key] = diff
print(f"{prefix}{key}: max_diff={diff:.6e}")
if not max_diffs and not missing and not extra:
print(f"{prefix}All {len(keys_a & keys_b)} parameters match exactly.")
return missing, extra, max_diffs这些函数在编写自定义验证脚本或调试故障时非常有用。它们不属于Bridge库,可根据需要复制到你的脚本中。
python
import torch
def compare_tensors(a, b, name=""):
"""Compare two tensors and report similarity metrics."""
max_diff = (a - b).abs().max().item()
mean_diff = (a - b).abs().mean().item()
cos_sim = torch.nn.functional.cosine_similarity(
a.flatten().float(), b.flatten().float(), dim=0,
).item()
print(f"{name}: max_diff={max_diff:.6e}, mean_diff={mean_diff:.6e}, cosine_sim={cos_sim:.8f}")
return max_diff, mean_diff, cos_sim
def compare_state_dicts(sd_a, sd_b, prefix=""):
"""Compare two state dicts key-by-key, reporting per-parameter differences."""
keys_a, keys_b = set(sd_a.keys()), set(sd_b.keys())
missing, extra = keys_a - keys_b, keys_b - keys_a
if missing:
print(f"{prefix}Missing keys: {sorted(missing)}")
if extra:
print(f"{prefix}Extra keys: {sorted(extra)}")
max_diffs = {}
for key in sorted(keys_a & keys_b):
diff = (sd_a[key].float() - sd_b[key].float()).abs().max().item()
if diff > 0:
max_diffs[key] = diff
print(f"{prefix}{key}: max_diff={diff:.6e}")
if not max_diffs and not missing and not extra:
print(f"{prefix}All {len(keys_a & keys_b)} parameters match exactly.")
return missing, extra, max_diffsDebugging Workflow
调试工作流
When a parity test fails, follow this sequence:
-
Run single-GPU round-trip — if this fails, the mapping itself is wrong. Check thein the bridge file.
mapping_registry() -
If single-GPU passes but multi-GPU fails — the TP/PP scatter/gather is wrong. Compare the TP=1 result against each TP shard. See theskill for NCCL-specific issues.
nccl-contiguous-tensors -
If round-trip passes but forward pass fails — weights loaded correctly but the model architecture differs. Checkconfig mapping (normalization, activation, RoPE, etc.).
provider_bridge() -
Use the debugging script template from theskill to inspect runtime vs safetensors key naming and bridge config mapping.
add-model-support
For the full catalog of pitfalls (QKV interleaving, MoE fused exports, tied
embeddings, FP8 dequantization, TE LayerNorm aliases, etc.), see the
Pitfalls section of the skill.
add-model-support当一致性测试失败时,请遵循以下步骤:
- 运行单GPU往返测试 — 如果失败,说明映射本身存在错误。检查桥接文件中的。
mapping_registry() - 如果单GPU测试通过但多GPU测试失败 — TP/PP的分散/聚合逻辑有误。将TP=1的结果与每个TP分片进行比较。如需了解NCCL相关问题,请查看技能。
nccl-contiguous-tensors - 如果往返测试通过但前向传播测试失败 — 权重加载正确,但模型架构存在差异。检查的配置映射(归一化、激活函数、RoPE等)。
provider_bridge() - 使用技能中的调试脚本模板,检查运行时与safetensors的键名以及桥接配置映射。
add-model-support
如需查看所有常见问题(QKV交错、MoE融合导出、绑定嵌入、FP8反量化、TE LayerNorm别名等),请查看技能中的“常见问题”部分。
add-model-supportCode Anchors
代码锚点
| Component | Path |
|---|---|
| Single-GPU round-trip | |
| Multi-GPU round-trip | |
| Forward-pass comparison | |
| Text generation | |
| VLM generation | |
| Checkpoint CLI | |
| Toy model creator | |
| Verification utility | |
| Adapter verification | |
| 组件 | 路径 |
|---|---|
| 单GPU往返测试 | |
| 多GPU往返测试 | |
| 前向传播比较 | |
| 文本生成 | |
| VLM生成 | |
| 检查点CLI | |
| 玩具模型生成器 | |
| 验证工具 | |
| Adapter验证 | |