nemo-rl-e2e-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

NeMo-RL E2E Testing

NeMo-RL 端到端测试

Validate a Megatron-Bridge model or training API change through NeMo-RL's Megatron backend. This catches integration issues that Bridge-only tests miss: NeMo-RL-owned rollout scheduling, reward handling, policy/reference setup, HF import/export through Bridge, optimizer setup, checkpoint ownership, and policy-to-generation weight transfer.

Use this as an external compatibility smoke test after the focused Bridge tests for the model/provider change pass.

This is not a replacement for Bridge model parity tests. A NeMo-RL GRPO or SFT run proves that Bridge can survive an external RL training loop; architecture correctness still comes from Bridge import/export, logits, roundtrip, and model-specific inference tests.

通过NeMo-RL的Megatron后端验证Megatron-Bridge模型或训练API变更。这能捕获仅Bridge测试无法发现的集成问题：NeMo-RL负责的rollout调度、奖励处理、策略/参考配置、通过Bridge的HF导入/导出、优化器配置、checkpoint归属，以及策略到生成的权重传输。

在针对模型/提供者变更的聚焦Bridge测试通过后，将此作为外部兼容性冒烟测试使用。

这不能替代Bridge模型一致性测试。NeMo-RL的GRPO或SFT运行仅能证明Bridge可在外部RL训练循环中正常工作；架构正确性仍需依赖Bridge的导入/导出、logits、往返测试，以及模型特定的推理测试。

Scope

测试范围

Think in coverage levels. Start with Level 0 and add only the levels justified by the change.

Level	Required when	What it proves
0: Megatron policy GRPO smoke	Any new provider or provider config change that claims NeMo-RL compatibility	NeMo-RL can import the local Bridge provider, build a Megatron policy, initialize optimizer/scheduler state, run rollout/ref/logprob wiring, and finish a short GRPO job
1: LoRA/checkpoint variant	Checkpointing, HF export, optimizer state, resume behavior, or a NeMo-RL-supported PEFT path changed	NeMo-RL can save through its checkpoint schedule, resume without losing training state, and, when PEFT is enabled in that NeMo-RL checkout, apply Bridge LoRA hooks
2: Non-colocated vLLM refit	HF export, weight mapping, policy-to-generation refit, delta compression, packed transfer, or vLLM update behavior changed	Bridge-exported weights can be transferred from the Megatron policy worker into separate vLLM generation workers
3: Optional Megatron generation backend	Only when the NeMo-RL checkout still supports `policy.generation.backend=megatron` and the change explicitly targets that path	NeMo-RL can use Megatron for both policy and generation rather than only vLLM generation
4: Parallelism stress	TP/PP/CP/EP, sequence parallel, MoE dispatch, pipeline stage layout, or distributed optimizer behavior changed	Provider settings remain correct under non-trivial Megatron parallel state
5: Architecture-specific e2e	VLM, audio, MoE, MTP/draft models, FP8/QAT/ModelOpt, quantized weights, or custom layers are involved	The architecture-specific runtime path is exercised, not just a text-only dense GRPO smoke
6: Learning signal	Optimizer, scheduler, loss, reward, PEFT trainability, gradient flow, or training stability changed	Metrics move in the expected direction over a short run and do not silently produce zero/NaN/unstable updates

The default Level 0 target is NeMo-RL's maintained Megatron GRPO functional:

bash

uv run bash tests/functional/grpo_megatron.sh

This is intentionally small. It exercises NeMo-RL's external RL loop without making Megatron-Bridge own rollout scheduling, rewards, checkpoint cadence, or trainer state.

Level 0 is not a convergence test. It only proves the job can complete a small number of updates. Use Level 6 when the question is whether the model actually learns under NeMo-RL.

按覆盖层级来规划测试。从Level 0开始，仅添加变更所需的层级。

Level	适用场景	验证内容
0: Megatron策略GRPO冒烟测试	任何声称支持NeMo-RL兼容性的新提供者或提供者配置变更	NeMo-RL可导入本地Bridge提供者、构建Megatron策略、初始化优化器/调度器状态、运行rollout/ref/logprob连接，并完成一个简短的GRPO任务
1: LoRA/checkpoint变体测试	Checkpointing、HF导出、优化器状态、恢复行为，或NeMo-RL支持的PEFT路径发生变更	NeMo-RL可通过其checkpoint调度保存、恢复时不丢失训练状态，且当NeMo-RL代码库中启用PEFT时，可应用Bridge LoRA钩子
2: 非协同部署vLLM重构测试	HF导出、权重映射、策略到生成的重构、增量压缩、打包传输，或vLLM更新行为发生变更	Bridge导出的权重可从Megatron策略工作节点传输到独立的vLLM生成工作节点
3: 可选Megatron生成后端测试	仅当NeMo-RL代码库仍支持 `policy.generation.backend=megatron` 且变更明确针对该路径时	NeMo-RL可同时使用Megatron作为策略和生成后端，而非仅使用vLLM生成
4: 并行性压力测试	TP/PP/CP/EP、序列并行、MoE调度、流水线阶段布局，或分布式优化器行为发生变更	在复杂Megatron并行状态下，提供者设置仍保持正确
5: 架构特定端到端测试	涉及VLM、音频、MoE、MTP/草稿模型、FP8/QAT/ModelOpt、量化权重或自定义层时	验证架构特定的运行时路径，而非仅进行文本密集型GRPO冒烟测试
6: 学习信号测试	优化器、调度器、损失、奖励、PEFT可训练性、梯度流或训练稳定性发生变更	在短时间运行中，指标按预期方向变化，不会静默产生零/NaN/不稳定更新

默认Level 0目标为NeMo-RL维护的Megatron GRPO功能测试：

bash

uv run bash tests/functional/grpo_megatron.sh

该测试规模较小，仅验证NeMo-RL的外部RL循环，无需Megatron-Bridge负责rollout调度、奖励、checkpoint节奏或训练器状态。

Level 0不是收敛测试，仅证明任务可完成少量更新。当需要验证模型在NeMo-RL下是否真正学习时，使用Level 6。

Repos

代码库

Use explicit repo variables. Do not rely on an installed

megatron-bridge

wheel; the purpose is to test the current Bridge checkout.

Use the upstream NeMo-RL repository as the default source:

text

https://github.com/NVIDIA-NeMo/RL

If a checkout is not already available, clone it next to the Bridge checkout or into the site's standard workspace:

bash

git clone https://github.com/NVIDIA-NeMo/RL.git /path/to/nemo-rl

bash

export BRIDGE_REPO=${BRIDGE_REPO:-/path/to/Megatron-Bridge}
export NEMO_RL_REPO=${NEMO_RL_REPO:-/path/to/nemo-rl}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"

NeMo-RL checkouts often also contain a vendored Bridge tree under:

text

3rdparty/Megatron-Bridge-workspace/Megatron-Bridge

When testing a local Bridge change, either put the local Bridge checkout ahead of everything else in

PYTHONPATH

, or sync the exact local Bridge changes into that vendored checkout. Do not assume the vendored tree matches the Bridge PR under test.

Before running, record both states:

bash

git -C "$BRIDGE_REPO" status --short
git -C "$NEMO_RL_REPO" status --short
git -C "$BRIDGE_REPO" rev-parse --short HEAD
git -C "$NEMO_RL_REPO" rev-parse --short HEAD

If testing on a remote GPU machine, sync the exact local changes first. Do not reset or overwrite unrelated changes in either tree.

Verify that Python imports the checkout under test:

bash

python - <<'PY'
import megatron.bridge
print(megatron.bridge.__file__)
PY

The printed path must live under

$BRIDGE_REPO/src

, or under the NeMo-RL vendored Bridge checkout only if that vendored checkout was intentionally synced to the Bridge change. If it points at site-packages or an unexpected 3rdparty path, fix

PYTHONPATH

before trusting any result.

使用明确的代码库变量。不要依赖已安装的

megatron-bridge

wheel；本测试的目的是验证当前Bridge代码库的变更。

默认使用上游NeMo-RL代码库：

text

https://github.com/NVIDIA-NeMo/RL

如果尚未克隆，将其克隆到Bridge代码库旁或站点标准工作区：

bash

git clone https://github.com/NVIDIA-NeMo/RL.git /path/to/nemo-rl

bash

export BRIDGE_REPO=${BRIDGE_REPO:-/path/to/Megatron-Bridge}
export NEMO_RL_REPO=${NEMO_RL_REPO:-/path/to/nemo-rl}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"

NeMo-RL代码库通常在以下路径包含一个 vendored Bridge代码树：

text

3rdparty/Megatron-Bridge-workspace/Megatron-Bridge

测试本地Bridge变更时，要么将本地Bridge代码库放在

PYTHONPATH

的最前面，要么将本地Bridge的精确变更同步到该vendored代码库中。不要假设vendored代码树与待测试的Bridge PR一致。

运行测试前，记录两个代码库的状态：

bash

git -C "$BRIDGE_REPO" status --short
git -C "$NEMO_RL_REPO" status --short
git -C "$BRIDGE_REPO" rev-parse --short HEAD
git -C "$NEMO_RL_REPO" rev-parse --short HEAD

如果在远程GPU机器上测试，先同步本地的精确变更。不要重置或覆盖任一代码库中的无关变更。

验证Python导入的是待测试的代码库：

bash

python - <<'PY'
import megatron.bridge
print(megatron.bridge.__file__)
PY

打印的路径必须位于

$BRIDGE_REPO/src

下，或仅当vendored代码库已同步Bridge变更时，位于NeMo-RL的vendored Bridge代码库下。如果路径指向site-packages或意外的3rdparty路径，需先修复

PYTHONPATH

再信任测试结果。

Bridge Checks First

先执行Bridge检查

Run focused Bridge tests before the external NeMo-RL e2e. Include any model-specific tests added by the change.

bash

cd "$BRIDGE_REPO"
uv run python -m pytest -q \
  tests/unit_tests/models/test_model_provider_mixin.py \
  tests/unit_tests/models/test_param_mapping.py \
  tests/unit_tests/training/test_integration.py \
  <model-specific-test-paths>

For a new model family, also run the relevant conversion or roundtrip test from the model's PR. See @skills/adding-model-support/tests-and-examples.md for model-test patterns.

Minimum Bridge-side evidence for a new model/provider:

provider/config unit tests
parameter mapping tests
HF to Megatron import or roundtrip on a small model
model-specific generation or logits comparison when available
this NeMo-RL external-loop smoke after the above pass

在执行外部NeMo-RL端到端测试前，先运行聚焦的Bridge测试，包括变更新增的任何模型特定测试。

bash

cd "$BRIDGE_REPO"
uv run python -m pytest -q \
  tests/unit_tests/models/test_model_provider_mixin.py \
  tests/unit_tests/models/test_param_mapping.py \
  tests/unit_tests/training/test_integration.py \
  <model-specific-test-paths>

对于新模型系列，还需运行模型PR中相关的转换或往返测试。查看@skills/adding-model-support/tests-and-examples.md了解模型测试模式。

新模型/提供者的最低Bridge端验证证据：

提供者/配置单元测试
参数映射测试
小模型的HF到Megatron导入或往返测试
可用时的模型特定生成或logits对比测试
上述测试通过后的NeMo-RL外部循环冒烟测试

NeMo-RL Unit Checks

NeMo-RL单元检查

Run the NeMo-RL unit checks that match the surface being exercised. Keep them focused; the e2e is the expensive signal.

bash

cd "$NEMO_RL_REPO"
uv run pytest -q \
  tests/unit/models/megatron/test_megatron_setup.py \
  tests/unit/models/policy/test_megatron_worker.py \
  tests/unit/utils/test_weight_transfer.py

For checkpoint changes, add:

bash

uv run pytest -q \
  tests/unit/utils/test_checkpoint.py \
  tests/unit/utils/test_native_checkpoint.py

For vLLM refit or generation-worker changes, add the relevant vLLM unit tests:

bash

uv run pytest -q \
  tests/unit/models/generation/test_vllm_generation.py \
  tests/unit/models/generation/test_vllm_utils.py

运行与测试表面匹配的NeMo-RL单元检查，保持聚焦；端到端测试是成本较高的验证信号。

bash

cd "$NEMO_RL_REPO"
uv run pytest -q \
  tests/unit/models/megatron/test_megatron_setup.py \
  tests/unit/models/policy/test_megatron_worker.py \
  tests/unit/utils/test_weight_transfer.py

针对checkpoint变更，添加：

bash

uv run pytest -q \
  tests/unit/utils/test_checkpoint.py \
  tests/unit/utils/test_native_checkpoint.py

针对vLLM重构或生成工作节点变更，添加相关vLLM单元测试：

bash

uv run pytest -q \
  tests/unit/models/generation/test_vllm_generation.py \
  tests/unit/models/generation/test_vllm_utils.py

Model Choice

模型选择

Prefer the smallest public HF checkpoint that uses the changed provider family. The maintained Megatron GRPO functional uses

Qwen/Qwen2.5-0.5B

because it is small enough for a 2-GPU smoke and is supported by NeMo-RL's Megatron path.

If there is no small public checkpoint for the new architecture, use the closest NeMo-RL recipe that constructs the model with a minimal config or small local checkpoint, and report that the run validates construction/training mechanics rather than pretrained weight compatibility.

For VLM or audio models, a text-only GRPO smoke is not enough. Pair the Level 0 policy smoke with the relevant NeMo-RL VLM/audio functional, for example:

bash

uv run bash tests/functional/vlm_grpo.sh
uv run bash tests/functional/audio_grpo_megatron.sh

For MoE models, Level 0 with trivial expert parallelism catches many provider issues, but it does not stress expert routing. Add a Level 4 run with expert parallelism when the change touches expert layout, dispatcher config, router behavior, or expert tensor parallelism.

For MTP/draft models, use an Eagle/MTP-specific functional:

bash

uv run bash tests/functional/grpo_megatron_eagle3_online.sh

For FP8/QAT/ModelOpt or quantized checkpoint support, use the closest recipe or functional that explicitly enables the feature. Do not claim the generic GRPO smoke validated quantization unless the config turns it on.

优先选择使用变更提供者家族的最小公开HF checkpoint。维护的Megatron GRPO功能测试使用

Qwen/Qwen2.5-0.5B

，因为它足够小，可在2-GPU上进行冒烟测试，且受NeMo-RL的Megatron路径支持。

如果新架构没有小的公开checkpoint，使用最接近的NeMo-RL recipe，通过最小配置或小本地checkpoint构建模型，并说明该运行验证的是构建/训练机制，而非预训练权重兼容性。

对于VLM或音频模型，仅文本GRPO冒烟测试不够。需将Level 0策略冒烟测试与相关NeMo-RL VLM/音频功能测试结合，例如：

bash

uv run bash tests/functional/vlm_grpo.sh
uv run bash tests/functional/audio_grpo_megatron.sh

对于MoE模型，启用基础专家并行的Level 0测试可捕获许多提供者问题，但不会测试专家路由。当变更涉及专家布局、调度器配置、路由行为或专家张量并行时，需添加带专家并行的Level 4测试。

对于MTP/草稿模型，使用Eagle/MTP特定的功能测试：

bash

uv run bash tests/functional/grpo_megatron_eagle3_online.sh

对于FP8/QAT/ModelOpt或量化checkpoint支持，使用明确启用该功能的最接近recipe或功能测试。除非配置开启了量化，否则不要声称通用GRPO冒烟测试验证了量化。

Environment Setup

环境设置

Use the NeMo-RL development environment or the site-approved NeMo-RL container. Make caches explicit on shared clusters:

bash

export HF_HOME=${HF_HOME:-/scratch/$USER/nemo_rl_hf}
export HF_HUB_CACHE=$HF_HOME/hub
export NEMO_RL_HOME=${NEMO_RL_HOME:-$NEMO_RL_REPO}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"

If the container has a dependency fingerprint mismatch, note it in the report. Prefer rebuilding the container or virtualenv when possible; use environment overrides only as test-environment evidence, not repository changes.

If model downloads fail with

No space left on device

, move

HF_HOME

HF_HUB_CACHE

, and any local

MODEL_PATH

to a larger shared or node-local path.

If Hugging Face API calls fail with rate limits after the model is already cached, point both the model and tokenizer at the local snapshot and run offline:

bash

export MODEL_PATH=/scratch/$USER/hf/hub/models--<org>--<model>/snapshots/<snapshot-sha>
export HF_HOME=/scratch/$USER/hf
export HF_HUB_CACHE=$HF_HOME/hub
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1

Then pass both overrides to NeMo-RL:

bash

policy.model_name="$MODEL_PATH" \
policy.tokenizer.name="$MODEL_PATH"

Before trusting the snapshot, verify it loads locally:

bash

uv run python - <<'PY'
from transformers import AutoConfig, AutoTokenizer

path = "<local-snapshot-path>"
config = AutoConfig.from_pretrained(path, trust_remote_code=True, local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, local_files_only=True)
print(type(config).__name__, getattr(config, "model_type", None), type(tokenizer).__name__, tokenizer.vocab_size)
PY

使用NeMo-RL开发环境或站点批准的NeMo-RL容器。在共享集群上明确设置缓存：

bash

export HF_HOME=${HF_HOME:-/scratch/$USER/nemo_rl_hf}
export HF_HUB_CACHE=$HF_HOME/hub
export NEMO_RL_HOME=${NEMO_RL_HOME:-$NEMO_RL_REPO}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"

如果容器存在依赖指纹不匹配，需在报告中注明。优先重建容器或虚拟环境；仅将环境覆盖作为测试环境证据，而非代码库变更。

如果模型下载时出现

No space left on device

错误，将

HF_HOME

、

HF_HUB_CACHE

和任何本地

MODEL_PATH

移至更大的共享或节点本地路径。

如果模型已缓存，但Hugging Face API调用因速率限制失败，将模型和分词器指向本地快照并离线运行：

bash

export MODEL_PATH=/scratch/$USER/hf/hub/models--<org>--<model>/snapshots/<snapshot-sha>
export HF_HOME=/scratch/$USER/hf
export HF_HUB_CACHE=$HF_HOME/hub
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1

然后将两个覆盖参数传递给NeMo-RL：

bash

policy.model_name="$MODEL_PATH" \
policy.tokenizer.name="$MODEL_PATH"

信任快照前，验证其可在本地加载：

bash

uv run python - <<'PY'
from transformers import AutoConfig, AutoTokenizer

path = "<local-snapshot-path>"
config = AutoConfig.from_pretrained(path, trust_remote_code=True, local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, local_files_only=True)
print(type(config).__name__, getattr(config, "model_type", None), type(tokenizer).__name__, tokenizer.vocab_size)
PY

Minimal NeMo-RL Run

最小NeMo-RL运行

Use NeMo-RL's maintained functional wrapper for the default smoke:

bash

cd "$NEMO_RL_REPO"
ray stop --force || true

export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"

uv run bash tests/functional/grpo_megatron.sh

The wrapper writes:

text

tests/functional/grpo_megatron/run.log
tests/functional/grpo_megatron/metrics.json

Capture the exact command and keep the log path. Prefer a saved log over a pasted terminal excerpt in PR descriptions.

If the test needs a different provider or model, pass Hydra overrides through the wrapper:

bash

uv run bash tests/functional/grpo_megatron.sh \
  policy.model_name=<small-compatible-hf-model> \
  policy.megatron_cfg.converter_type=<BridgeConverterType>

Keep the first smoke small. Increase model size or parallelism only after a small run proves the basic path works.

使用NeMo-RL维护的功能包装器进行默认冒烟测试：

bash

cd "$NEMO_RL_REPO"
ray stop --force || true

export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"

uv run bash tests/functional/grpo_megatron.sh

包装器会生成：

text

tests/functional/grpo_megatron/run.log
tests/functional/grpo_megatron/metrics.json

记录精确命令并保存日志路径。在PR描述中优先使用保存的日志，而非粘贴终端片段。

如果测试需要不同的提供者或模型，通过包装器传递Hydra覆盖参数：

bash

uv run bash tests/functional/grpo_megatron.sh \
  policy.model_name=<small-compatible-hf-model> \
  policy.megatron_cfg.converter_type=<BridgeConverterType>

首次冒烟测试保持规模较小。仅在小规模运行证明基础路径可行后，再增加模型大小或并行性。

LoRA And Checkpoint Coverage

LoRA与Checkpoint覆盖

Use Level 1 when the change touches checkpoint save/load, HF export, optimizer state, resume behavior, or a NeMo-RL PEFT path that is known to work in the checkout being tested.

NeMo-RL PEFT support is backend- and revision-dependent. Do not block a provider-only compatibility smoke solely on a known-broken or unsupported NeMo-RL PEFT path. In that case, record Level 1 PEFT as not applicable or blocked by NeMo-RL, keep the Level 0 GRPO smoke as the required downstream signal, and cover Bridge PEFT behavior with focused Bridge tests.

LoRA + checkpoint save smoke, when the NeMo-RL checkout supports this path:

bash

uv run bash tests/functional/grpo_megatron_lora.sh

SFT resume parity across dtensor and Megatron policy paths:

bash

uv run bash tests/functional/sft_resume_diamond.sh

The LoRA functional intentionally saves checkpoints. Remove stale checkpoint outputs between unrelated experiments, but keep them while validating resume behavior.

Do not claim PEFT coverage from

grpo_megatron.sh

; use the LoRA functional or an equivalent Hydra override with

policy.megatron_cfg.peft.enabled=true

当变更涉及checkpoint保存/加载、HF导出、优化器状态、恢复行为，或待测试代码库中已知可用的NeMo-RL PEFT路径时，使用Level 1测试。

NeMo-RL的PEFT支持依赖后端和版本。不要仅因已知损坏或不支持的NeMo-RL PEFT路径而阻止提供者兼容性冒烟测试。这种情况下，记录Level 1 PEFT不适用或被NeMo-RL阻塞，将Level 0 GRPO冒烟测试作为所需下游信号，并用聚焦的Bridge测试覆盖Bridge PEFT行为。

当NeMo-RL代码库支持该路径时，运行LoRA + checkpoint保存冒烟测试：

bash

uv run bash tests/functional/grpo_megatron_lora.sh

跨dtensor和Megatron策略路径的SFT恢复一致性测试：

bash

uv run bash tests/functional/sft_resume_diamond.sh

LoRA功能测试会特意保存checkpoint。在无关实验之间删除陈旧的checkpoint输出，但在验证恢复行为时保留它们。

不要声称

grpo_megatron.sh

覆盖了PEFT；需使用LoRA功能测试或等效的Hydra覆盖参数

policy.megatron_cfg.peft.enabled=true

。

Non-Colocated vLLM Refit

非协同部署vLLM重构

Use Level 2 when the change touches Bridge HF export, parameter mapping, NeMo-RL weight refit, packed tensor transfer, vLLM loading, delta compression, or policy/generation worker synchronization.

Small 2-GPU non-colocated smoke with the Megatron policy backend:

bash

cd "$NEMO_RL_REPO"
uv run coverage run -a --data-file=tests/.coverage --source=nemo_rl \
  examples/run_grpo.py \
  --config examples/configs/grpo_math_1B_megatron.yaml \
  policy.model_name=Qwen/Qwen2.5-0.5B \
  grpo.num_prompts_per_step=2 \
  grpo.num_generations_per_prompt=4 \
  policy.train_global_batch_size=4 \
  policy.train_micro_batch_size=1 \
  policy.logprob_batch_size=4 \
  policy.generation.colocated.enabled=false \
  policy.generation.colocated.resources.gpus_per_node=1 \
  policy.generation.vllm_cfg.async_engine=true \
  cluster.gpus_per_node=2 \
  grpo.max_num_steps=2 \
  logger.tensorboard_enabled=true \
  logger.log_dir=tests/functional/grpo_megatron_non_colocated/logs \
  logger.wandb_enabled=false \
  checkpointing.enabled=false

After the run, dump metrics:

bash

uv run tests/json_dump_tb_logs.py \
  tests/functional/grpo_megatron_non_colocated/logs \
  --output_path tests/functional/grpo_megatron_non_colocated/metrics.json

Metric assertion helpers differ across NeMo-RL revisions. Inspect

tests/check_metrics.py

or the maintained functional wrapper before assuming an interface. Some checkouts expect positional expressions:

bash

uv run tests/check_metrics.py tests/functional/grpo_megatron_non_colocated/metrics.json \
  'max(data["train/token_mult_prob_error"]) < 1.05' \
  'min(data["train/probs_ratio_clamped_min"]) > 0.79' \
  'max(data["train/probs_ratio_clamped_max"]) < 1.21'

For delta-compression testing, add these overrides:

bash

policy.generation.delta_compression.enabled=true \
policy.generation.delta_compression.dtype=bfloat16 \
policy.generation.delta_compression.transport=sparse_indices \
policy.generation.delta_compression.full_sync_interval=20 \
policy.generation.delta_compression.sparse_bucket_size_bytes=5368709120 \
policy.generation.delta_compression.delta_load_batch_size_bytes=536870912

Report weight-transfer timing metrics when available, especially:

timing/train/prepare_for_generation/total

timing/train/prepare_for_generation/transfer_and_update_weights

timing/train/prepare_for_generation/weight_transfer/producer/collect_tensors

timing/train/prepare_for_generation/weight_transfer/producer/sparse_encode

timing/train/prepare_for_generation/weight_transfer/producer/sparse_nonzero

timing/train/prepare_for_generation/weight_transfer/consumer/decode_sparse

timing/train/prepare_for_generation/weight_transfer/consumer/load_delta

If the payload broadcast time is tiny but sparse encode/decode dominates, report that boundary clearly. It is a weight-preparation bottleneck, not a NCCL broadcast bottleneck.

当变更涉及Bridge HF导出、参数映射、NeMo-RL权重重构、打包张量传输、vLLM加载、增量压缩或策略/生成工作节点同步时，使用Level 2测试。

基于Megatron策略后端的小型2-GPU非协同部署冒烟测试：

bash

cd "$NEMO_RL_REPO"
uv run coverage run -a --data-file=tests/.coverage --source=nemo_rl \
  examples/run_grpo.py \
  --config examples/configs/grpo_math_1B_megatron.yaml \
  policy.model_name=Qwen/Qwen2.5-0.5B \
  grpo.num_prompts_per_step=2 \
  grpo.num_generations_per_prompt=4 \
  policy.train_global_batch_size=4 \
  policy.train_micro_batch_size=1 \
  policy.logprob_batch_size=4 \
  policy.generation.colocated.enabled=false \
  policy.generation.colocated.resources.gpus_per_node=1 \
  policy.generation.vllm_cfg.async_engine=true \
  cluster.gpus_per_node=2 \
  grpo.max_num_steps=2 \
  logger.tensorboard_enabled=true \
  logger.log_dir=tests/functional/grpo_megatron_non_colocated/logs \
  logger.wandb_enabled=false \
  checkpointing.enabled=false

运行后，导出指标：

bash

uv run tests/json_dump_tb_logs.py \
  tests/functional/grpo_megatron_non_colocated/logs \
  --output_path tests/functional/grpo_megatron_non_colocated/metrics.json

不同NeMo-RL版本的指标断言助手有所不同。在假设接口前，查看

tests/check_metrics.py

或维护的功能包装器。部分代码库需要位置表达式：

bash

uv run tests/check_metrics.py tests/functional/grpo_megatron_non_colocated/metrics.json \
  'max(data["train/token_mult_prob_error"]) < 1.05' \
  'min(data["train/probs_ratio_clamped_min"]) > 0.79' \
  'max(data["train/probs_ratio_clamped_max"]) < 1.21'

针对增量压缩测试，添加以下覆盖参数：

bash

policy.generation.delta_compression.enabled=true \
policy.generation.delta_compression.dtype=bfloat16 \
policy.generation.delta_compression.transport=sparse_indices \
policy.generation.delta_compression.full_sync_interval=20 \
policy.generation.delta_compression.sparse_bucket_size_bytes=5368709120 \
policy.generation.delta_compression.delta_load_batch_size_bytes=536870912

报告可用的权重传输时间指标，尤其是：

timing/train/prepare_for_generation/total

timing/train/prepare_for_generation/transfer_and_update_weights

timing/train/prepare_for_generation/weight_transfer/producer/collect_tensors

timing/train/prepare_for_generation/weight_transfer/producer/sparse_encode

timing/train/prepare_for_generation/weight_transfer/producer/sparse_nonzero

timing/train/prepare_for_generation/weight_transfer/consumer/decode_sparse

timing/train/prepare_for_generation/weight_transfer/consumer/load_delta

如果负载广播时间很短，但稀疏编码/解码占主导，需明确报告该边界。这是权重准备瓶颈，而非NCCL广播瓶颈。

Megatron Generation Backend

Megatron生成后端

Use Level 3 only when the NeMo-RL checkout under test supports the Megatron generation backend and the Bridge change explicitly affects that downstream path. Do not require this for normal provider compatibility, HF import/export, vLLM-backed generation, or generic Bridge inference tests.

bash

uv run bash tests/functional/grpo_megatron_generation.sh

This exercises

policy.generation.backend=megatron

, so it validates NeMo-RL's Megatron generation construction and runtime behavior more directly than the default vLLM-backed GRPO functional.

Some NeMo-RL revisions declare

mcore

and

vllm

extras as mutually incompatible. In that environment, a vLLM-backed Level 0 run may be blocked even though the Megatron policy path is testable. Use

policy.generation.backend=megatron

for a Megatron-only smoke, record vLLM as skipped or blocked, and do not claim non-colocated vLLM refit coverage.

仅当待测试的NeMo-RL代码库支持Megatron生成后端，且Bridge变更明确影响该下游路径时，使用Level 3测试。对于普通提供者兼容性、HF导入/导出、vLLM驱动的生成或通用Bridge推理测试，无需此测试。

bash

uv run bash tests/functional/grpo_megatron_generation.sh

该测试验证

policy.generation.backend=megatron

，因此比默认vLLM驱动的GRPO功能测试更直接地验证NeMo-RL的Megatron生成构建和运行时行为。

部分NeMo-RL版本声明

mcore

和

vllm

扩展互斥。在这种环境下，即使Megatron策略路径可测试，vLLM驱动的Level 0运行也可能被阻塞。使用

policy.generation.backend=megatron

进行仅Megatron的冒烟测试，记录vLLM为跳过或阻塞，不要声称覆盖非协同部署vLLM重构。

Parallelism Stress

并行性压力测试

Use Level 4 when provider finalization, model-parallel settings, sequence parallel, context parallel, MoE dispatch, pipeline layout, or distributed optimizer behavior changed.

Start from a maintained recipe that already matches the intended GPU count. For example, use one of the recipe configs under:

text

examples/configs/recipes/llm/*megatron*.yaml
examples/configs/recipes/llm/performance/*megatron*.yaml
examples/configs/recipes/vlm/*megatron*.yaml

For a small manual stress variant, override the Megatron sizes explicitly:

bash

uv run bash tests/functional/grpo_megatron.sh \
  policy.megatron_cfg.tensor_model_parallel_size=2 \
  policy.megatron_cfg.pipeline_model_parallel_size=1 \
  policy.megatron_cfg.context_parallel_size=1 \
  policy.megatron_cfg.sequence_parallel=false \
  cluster.gpus_per_node=2

For MoE, use a MoE recipe and set expert parallelism only when the model and GPU count support it:

bash

policy.megatron_cfg.expert_model_parallel_size=2 \
policy.megatron_cfg.expert_tensor_parallel_size=1

Keep these as follow-up runs. Do not make them the first debugging surface for a new provider.

当提供者初始化、模型并行设置、序列并行、上下文并行、MoE调度、流水线布局或分布式优化器行为发生变更时，使用Level 4测试。

从已匹配预期GPU数量的维护recipe开始。例如，使用以下路径下的recipe配置：

text

examples/configs/recipes/llm/*megatron*.yaml
examples/configs/recipes/llm/performance/*megatron*.yaml
examples/configs/recipes/vlm/*megatron*.yaml

对于小型手动压力变体，显式覆盖Megatron大小：

bash

uv run bash tests/functional/grpo_megatron.sh \
  policy.megatron_cfg.tensor_model_parallel_size=2 \
  policy.megatron_cfg.pipeline_model_parallel_size=1 \
  policy.megatron_cfg.context_parallel_size=1 \
  policy.megatron_cfg.sequence_parallel=false \
  cluster.gpus_per_node=2

对于MoE，使用MoE recipe并仅在模型和GPU数量支持时设置专家并行：

bash

policy.megatron_cfg.expert_model_parallel_size=2 \
policy.megatron_cfg.expert_tensor_parallel_size=1

将这些作为后续运行。不要将其作为新提供者的首个调试环节。

Learning Signal

学习信号测试

Use Level 6 only when the change affects trainability or when downstream validation explicitly asks for learning behavior. Do not require it for every provider-only PR; RL learning is slower, noisier, and more environment-dependent than compatibility smoke tests.

The goal is a short learning-signal run, not a benchmark. Prefer a small model, fixed data, fixed seed when available, and enough steps to observe non-random metric movement:

bash

uv run bash tests/functional/grpo_megatron_lora.sh \
  grpo.max_num_steps=20 \
  data.shuffle=false \
  checkpointing.enabled=false

Acceptable learning-signal evidence depends on the task, but the report should include at least:

no NaNs or infs in loss, reward, KL, entropy, grad norm, or logprob metrics
nonzero trainable parameter count when PEFT is enabled
actor losses and reward-related metrics logged for multiple steps
validation or reward trend compared against the starting point or a known-good baseline
no repeated zero gradients, frozen LoRA adapters, or constant logprobs unless expected

Do not call a 20-step run "converged" in the benchmark sense. Call it "learning-signal passed" unless it reaches a pre-agreed metric threshold.

仅当变更影响可训练性，或下游验证明确要求学习行为时，使用Level 6测试。不要要求每个仅提供者PR都进行此测试；RL学习比兼容性冒烟测试更慢、更嘈杂且更依赖环境。

目标是进行短时间的学习信号运行，而非基准测试。优先选择小模型、固定数据、固定种子（如果可用），以及足够的步骤以观察非随机指标变化：

bash

uv run bash tests/functional/grpo_megatron_lora.sh \
  grpo.max_num_steps=20 \
  data.shuffle=false \
  checkpointing.enabled=false

可接受的学习信号证据取决于任务，但报告至少应包含：

损失、奖励、KL、熵、梯度范数或logprob指标中无NaN或inf值
启用PEFT时，可训练参数数量非零
多个步骤记录的actor损失和奖励相关指标
与起始点或已知良好基线相比的验证或奖励趋势
无重复零梯度、冻结LoRA适配器或恒定logprob（除非预期）

不要将20步运行称为“基准意义上的收敛”。除非达到预先约定的指标阈值，否则称为“学习信号通过”。

Slurm Or Container Runs

Slurm或容器运行

Use the cluster's standard NeMo-RL container and mount both checkouts into the container. Keep setup and the actual run in the same container step when using node-local paths such as

/tmp

; node-local model caches and ad-hoc installs disappear when a fresh container step starts.

If the home filesystem is full or Megatron-Core tries to build helper extensions into a read-only/full checkout, copy the MCore submodule to node-local storage and put that copy on

PYTHONPATH

instead of editing

3rdparty/Megatron-LM/

bash

export MCORE_REPO=${MCORE_REPO:-/tmp/$USER/Megatron-LM}
if [[ ! -d "$MCORE_REPO/.git" ]]; then
  cp -a "$BRIDGE_REPO/3rdparty/Megatron-LM" "$MCORE_REPO"
fi

EXT_SUFFIX=$(uv run python - <<'PY'
import sysconfig

print(sysconfig.get_config_var("EXT_SUFFIX") or ".so")
PY
)
make -C "$MCORE_REPO/megatron/core/datasets" LIBEXT="$EXT_SUFFIX"
export PYTHONPATH="${BRIDGE_REPO}/src:${MCORE_REPO}:${NEMO_RL_REPO}:${PYTHONPATH:-}"

Overriding

LIBEXT

avoids a suffixless

helpers_cpp

binary on containers where

python3-config

is absent from

PATH

. Verify the built file is named like

helpers_cpp.cpython-<ver>-<platform>.so

before launching a long run.

For NeMo-RL multi-node jobs, prefer NeMo-RL's own

ray.sub

launcher when it is available. It starts the Ray head and worker nodes under Slurm, mounts the requested container/filesystems, and executes

COMMAND

from the NeMo-RL root. Launch it from

$NEMO_RL_REPO

, not from the Bridge checkout:

bash

cd "$NEMO_RL_REPO"

COMMAND="uv run ./examples/run_grpo.py \
  --config examples/configs/grpo_math_1B_megatron.yaml \
  cluster.num_nodes=2 \
  cluster.gpus_per_node=8 \
  logger.log_dir=results/grpo_megatron_2n \
  logger.wandb_enabled=false" \
CONTAINER="$NEMO_RL_IMAGE" \
MOUNTS="$BRIDGE_REPO:$BRIDGE_REPO,$NEMO_RL_REPO:$NEMO_RL_REPO,$HF_HOME:$HF_HOME" \
sbatch \
  --nodes=2 \
  --account=<account> \
  --partition=<partition> \
  --job-name=nemo-rl-bridge-e2e \
  --time=4:00:00 \
  --gres=gpu:8 \
  ray.sub

Include the local Bridge checkout in

MOUNTS

and in

PYTHONPATH

inside

COMMAND

when the container does not already see the same path. If using a vendored Bridge under

3rdparty/Megatron-Bridge-workspace/Megatron-Bridge

, sync the exact Bridge changes there instead and report that path.

Use a direct

srun

only when

ray.sub

is unavailable, stale for the target cluster, or when debugging the container/Slurm layer itself. Keep paths generic in scripts committed to Megatron-Bridge:

bash

srun <site-specific-slurm-options> \
  --container-image="${NEMO_RL_IMAGE}" \
  --container-mounts="${BRIDGE_REPO}:/workspace/Megatron-Bridge,${NEMO_RL_REPO}:/workspace/nemo-rl,<data-root>:<data-root>" \
  --container-workdir=/workspace/nemo-rl \
  bash -lc '
    export BRIDGE_REPO=/workspace/Megatron-Bridge
    export NEMO_RL_REPO=/workspace/nemo-rl
    export PYTHONPATH=$BRIDGE_REPO/src:$BRIDGE_REPO/3rdparty/Megatron-LM:$NEMO_RL_REPO
    ray stop --force || true
    uv run bash tests/functional/grpo_megatron.sh
  '

If an attach helper enters a container that no longer sees the expected checkouts or log directory, treat that helper as stale. Start a fresh

srun

step against the existing allocation with explicit

--container-image

--container-mounts

, and

--container-workdir

Attach helpers that use

--no-container-mount-home

can enter a minimal

/home/$USER

in follow-up steps even when the original run saw the real checkout. Keep metric dumping and assertions in the same container step as the run when possible. If a follow-up step must inspect compute-local artifacts, use paths under the node-local run directory and do not assume

$NEMO_RL_REPO

is visible.

For general Slurm debugging and multi-node patterns, read @skills/multi-node-slurm/SKILL.md.

使用集群标准的NeMo-RL容器，并将两个代码库挂载到容器中。当使用节点本地路径（如

/tmp

）时，将设置和实际运行放在同一容器步骤中；节点本地模型缓存和临时安装会在新容器步骤启动时消失。

如果主文件系统已满，或Megatron-Core尝试将辅助扩展构建到只读/已满的代码库中，将MCore子模块复制到节点本地存储，并将该副本放在

PYTHONPATH

上，而非编辑

3rdparty/Megatron-LM/

：

bash

export MCORE_REPO=${MCORE_REPO:-/tmp/$USER/Megatron-LM}
if [[ ! -d "$MCORE_REPO/.git" ]]; then
  cp -a "$BRIDGE_REPO/3rdparty/Megatron-LM" "$MCORE_REPO"
fi

EXT_SUFFIX=$(uv run python - <<'PY'
import sysconfig

print(sysconfig.get_config_var("EXT_SUFFIX") or ".so")
PY
)
make -C "$MCORE_REPO/megatron/core/datasets" LIBEXT="$EXT_SUFFIX"
export PYTHONPATH="${BRIDGE_REPO}/src:${MCORE_REPO}:${NEMO_RL_REPO}:${PYTHONPATH:-}"

覆盖

LIBEXT

可避免在

PATH

中缺少

python3-config

的容器上生成无后缀的

helpers_cpp

二进制文件。在启动长时间运行前，验证构建的文件命名为

helpers_cpp.cpython-<ver>-<platform>.so

。

对于NeMo-RL多节点任务，当可用时优先使用NeMo-RL自身的

ray.sub

启动器。它会在Slurm下启动Ray头节点和工作节点，挂载请求的容器/文件系统，并从NeMo-RL根目录执行

COMMAND

。从

$NEMO_RL_REPO

启动，而非Bridge代码库：

bash

cd "$NEMO_RL_REPO"

COMMAND="uv run ./examples/run_grpo.py \
  --config examples/configs/grpo_math_1B_megatron.yaml \
  cluster.num_nodes=2 \
  cluster.gpus_per_node=8 \
  logger.log_dir=results/grpo_megatron_2n \
  logger.wandb_enabled=false" \
CONTAINER="$NEMO_RL_IMAGE" \
MOUNTS="$BRIDGE_REPO:$BRIDGE_REPO,$NEMO_RL_REPO:$NEMO_RL_REPO,$HF_HOME:$HF_HOME" \
sbatch \
  --nodes=2 \
  --account=<account> \
  --partition=<partition> \
  --job-name=nemo-rl-bridge-e2e \
  --time=4:00:00 \
  --gres=gpu:8 \
  ray.sub

当容器无法自动识别相同路径时，将本地Bridge代码库包含在

MOUNTS

和

COMMAND

内的

PYTHONPATH

中。如果使用

3rdparty/Megatron-Bridge-workspace/Megatron-Bridge

下的vendored Bridge，需将Bridge的精确变更同步到该路径并报告此路径。

仅当

ray.sub

不可用、针对目标集群过时，或调试容器/Slurm层本身时，才使用直接

srun

。在提交到Megatron-Bridge的脚本中保持路径通用：

bash

srun <site-specific-slurm-options> \
  --container-image="${NEMO_RL_IMAGE}" \
  --container-mounts="${BRIDGE_REPO}:/workspace/Megatron-Bridge,${NEMO_RL_REPO}:/workspace/nemo-rl,<data-root>:<data-root>" \
  --container-workdir=/workspace/nemo-rl \
  bash -lc '
    export BRIDGE_REPO=/workspace/Megatron-Bridge
    export NEMO_RL_REPO=/workspace/nemo-rl
    export PYTHONPATH=$BRIDGE_REPO/src:$BRIDGE_REPO/3rdparty/Megatron-LM:$NEMO_RL_REPO
    ray stop --force || true
    uv run bash tests/functional/grpo_megatron.sh
  '

如果附加工具进入容器后无法看到预期的代码库或日志目录，说明该工具已过时。针对现有分配启动新的

srun

步骤，显式指定

--container-image

、

--container-mounts

和

--container-workdir

。

使用

--no-container-mount-home

的附加工具可能在后续步骤中进入最小化的

/home/$USER

，即使原始运行能看到真实代码库。尽可能将指标导出和断言放在与运行相同的容器步骤中。如果后续步骤必须检查计算本地 artifacts，使用节点本地运行目录下的路径，不要假设

$NEMO_RL_REPO

可见。

有关Slurm调试和多节点模式的一般信息，请阅读@skills/multi-node-slurm/SKILL.md。

Pass Criteria

通过标准

A useful pass has all of the following:

Focused Bridge tests pass for provider/config/mapping behavior.
NeMo-RL imports the intended Bridge checkout, verified by
```
megatron.bridge.__file__
```
.
The NeMo-RL config has
```
policy.megatron_cfg.enabled=true
```
for Megatron policy validation.
The run reaches the requested step count and writes
```
metrics.json
```
.
```
tests/check_metrics.py
```
passes when the maintained functional includes metric assertions.
No exception occurs during Bridge provider setup, HF import/export, enabled PEFT/LoRA wrapping, Megatron initialization, optimizer setup, checkpoint manager setup, weight transfer, or the training step.

Ray shutdown warnings, Python resource-tracker warnings, or post-completion process-group warnings can be acceptable if the training step completed, metrics were written, and the process exits successfully. Mention them as residual log noise.

Do not claim full model e2e if the run used a dummy config, text-only data for a VLM/audio model, trivial expert parallelism for an expert-parallel change, or disabled save/resume for a checkpointing change. Call it the exact level that passed.

Do not claim convergence from Level 0. Claim learning signal only from Level 6, and distinguish "learning signal" from benchmark convergence in the report.

有效的测试通过需满足以下所有条件：

针对提供者/配置/映射行为的聚焦Bridge测试通过。
NeMo-RL导入的是预期的Bridge代码库，通过
```
megatron.bridge.__file__
```
验证。
NeMo-RL配置中
```
policy.megatron_cfg.enabled=true
```
，用于Megatron策略验证。
运行达到请求的步骤数，并生成
```
metrics.json
```
。
当维护的功能测试包含指标断言时，
```
tests/check_metrics.py
```
通过。
在Bridge提供者设置、HF导入/导出、启用的PEFT/LoRA包装、Megatron初始化、优化器设置、checkpoint管理器设置、权重传输或训练步骤期间无异常发生。

如果训练步骤完成、指标已生成且进程成功退出，Ray关闭警告、Python资源跟踪器警告或完成后的进程组警告可视为可接受。需在报告中提及这些残留日志噪音。

如果运行使用了虚拟配置、VLM/音频模型的仅文本数据、专家并行变更的基础专家并行，或针对checkpoint变更禁用了保存/恢复，不要声称完整的模型端到端测试。仅报告实际通过的层级。

不要从Level 0测试声称收敛。仅从Level 6测试声称学习信号，并在报告中区分“学习信号”与基准收敛。

Failure Triage

失败排查

If model construction fails, verify that NeMo-RL is importing the Bridge checkout under test and that

policy.megatron_cfg.converter_type

matches the provider.

If the config silently uses dtensor instead of Megatron, set

policy.dtensor_cfg.enabled=false

and

policy.megatron_cfg.enabled=true

, or use

grpo_megatron.sh

If LoRA fails, check NeMo-RL PEFT config names and Bridge target module names. Reproduce with

grpo_megatron_lora.sh

before adding larger model or parallelism changes.

If checkpoint save/load fails, first rerun with

checkpointing.enabled=false

to separate model construction from checkpoint behavior, then use

sft_resume_diamond.sh

for resume parity.

If non-colocated refit fails, separate the boundary:

producer export and metadata preparation on the policy worker
payload packing/broadcast
consumer decode and model loading on the generation worker
vLLM-specific weight-loader behavior

If NeMo-RL rejects TP >= 4 with the batch-variant accuracy guard, prefer TP 1 or 2 for the smoke, or set

policy.train_micro_batch_size

and

policy.logprob_batch_size

equal. Do not bypass with

NRL_IGNORE_TP_ACCURACY_CHECK=1

for pass/fail evidence unless the user explicitly wants an unsupported diagnostic run.

If Megatron generation fails during

cuda graph warmup

with

CUDA error: an illegal memory access was encountered

, rerun the same config with:

bash

policy.generation.mcore_generation_config.num_cuda_graphs=null \
policy.generation.mcore_generation_config.use_cuda_graphs_for_non_decode_steps=false

If the no-graph run passes, report the original result as a Megatron generation CUDA-graph failure and the no-graph run as a reduced-optimization pass. Keep both logs.

If the run reaches the requested step count but

tests/check_metrics.py

fails on

train/token_mult_prob_error

, treat it as a real metric failure, not a harness failure. NeMo-RL computes this metric from

exp(abs(generation_logprobs - prev_logprobs))

; huge values mean the generation backend logprobs disagree with the policy logprobs recomputed for training. Isolate by retrying with simpler parallelism or kernels such as

policy.megatron_cfg.sequence_parallel=false

policy.megatron_cfg.apply_rope_fusion=false

, shorter sequence lengths, or vLLM generation when available. Do not relax the metric threshold or use sequence masking to claim a pass; run Bridge logits/import/export parity to localize whether the mismatch is in Bridge conversion, Megatron generation logprob collection, or NeMo-RL recomputation.

If model download fails, move HF caches to a larger path and rerun with explicit cache settings.

If Hugging Face returns

429 Too Many Requests

during tokenizer/config setup, first check whether the snapshot already exists under

$HF_HUB_CACHE

. If it does, switch

policy.model_name

and

policy.tokenizer.name

to the local snapshot path and enable offline mode. This is an environment failure unless the local snapshot cannot load with

local_files_only=True

helpers_cpp

fails to link with

No space left on device

, or if logs show

make: python3-config: No such file or directory

, rebuild the helper in a node-local copy of Megatron-LM with

LIBEXT

set from

sysconfig.get_config_var("EXT_SUFFIX")

. Do not patch files under

3rdparty/Megatron-LM/

in the Bridge checkout.

If a baseline fails before model build because of data, Ray, vLLM, package setup, or container mismatch, fix the environment first and do not report it as a Bridge provider failure.

如果模型构建失败，验证NeMo-RL导入的是待测试的Bridge代码库，且

policy.megatron_cfg.converter_type

与提供者匹配。

如果配置默认使用dtensor而非Megatron，设置

policy.dtensor_cfg.enabled=false

和

policy.megatron_cfg.enabled=true

，或使用

grpo_megatron.sh

。

如果LoRA失败，检查NeMo-RL PEFT配置名称和Bridge目标模块名称。在添加更大模型或并行性变更前，先用

grpo_megatron_lora.sh

重现问题。

如果checkpoint保存/加载失败，先使用

checkpointing.enabled=false

重新运行，以分离模型构建与checkpoint行为，然后使用

sft_resume_diamond.sh

测试恢复一致性。

如果非协同部署重构失败，拆分边界排查：

策略工作节点上的生产者导出和元数据准备
负载打包/广播
生成工作节点上的消费者解码和模型加载
vLLM特定的权重加载器行为

如果NeMo-RL因批量变体精度防护拒绝TP >=4，优先选择TP 1或2进行冒烟测试，或设置

policy.train_micro_batch_size

和

policy.logprob_batch_size

相等。除非用户明确要求不支持的诊断运行，否则不要使用

NRL_IGNORE_TP_ACCURACY_CHECK=1

来声称测试通过。

如果Megatron生成在

cuda graph warmup

期间因

CUDA error: an illegal memory access was encountered

失败，使用以下参数重新运行相同配置：

bash

policy.generation.mcore_generation_config.num_cuda_graphs=null \
policy.generation.mcore_generation_config.use_cuda_graphs_for_non_decode_steps=false

如果禁用图的运行通过，报告原始结果为Megatron生成CUDA图失败，禁用图的运行为优化降级后的通过。保留两个日志。

如果运行达到请求的步骤数，但

tests/check_metrics.py

在

train/token_mult_prob_error

上失败，将其视为真实指标失败，而非测试工具失败。NeMo-RL通过

exp(abs(generation_logprobs - prev_logprobs))

计算该指标；数值过大意味着生成后端的logprobs与训练时重新计算的策略logprobs不一致。通过重试更简单的并行性或内核（如

policy.megatron_cfg.sequence_parallel=false

、

policy.megatron_cfg.apply_rope_fusion=false

、更短序列长度，或可用的vLLM生成）来隔离问题。不要放宽指标阈值或使用序列掩码来声称通过；运行Bridge logits/导入/导出一致性测试，以定位不匹配是在Bridge转换、Megatron生成logprob收集还是NeMo-RL重新计算中。

如果模型下载失败，将HF缓存移至更大路径并使用明确的缓存设置重新运行。

如果Hugging Face在分词器/配置设置期间返回

429 Too Many Requests

，先检查

$HF_HUB_CACHE

下是否已存在快照。如果存在，将

policy.model_name

和

policy.tokenizer.name

切换到本地快照路径并启用离线模式。这是环境失败，除非本地快照无法通过

local_files_only=True

加载。

如果

helpers_cpp

因

No space left on device

链接失败，或日志显示

make: python3-config: No such file or directory

，在Megatron-LM的节点本地副本中重新构建辅助工具，并从

sysconfig.get_config_var("EXT_SUFFIX")

设置

LIBEXT

。不要修改Bridge代码库中

3rdparty/Megatron-LM/

下的文件。

如果基线在模型构建前因数据、Ray、vLLM、包设置或容器不匹配而失败，先修复环境，不要将其报告为Bridge提供者失败。

Summary Format

摘要格式

End every run with a short user-facing summary that answers "Did the requested deliverables pass?" before adding details. Use

Pass

Fail

Skipped

, or

Blocked

for each deliverable, and do not report an overall

Pass

unless the pass criteria for the requested coverage level were met.

text

Result: <Pass/Fail/Blocked> - <one sentence stating what was validated>
Requested coverage: <Level 0/1/2/3/4/5/6 and requested variants>
Model: <policy.model_name or local model path>

Deliverables:
- Bridge-side checks: <Pass/Fail/Skipped> - <test command or skipped reason>
- Local Bridge import in NeMo-RL: <Pass/Fail> - <megatron.bridge.__file__ path>
- NeMo-RL Megatron policy run: <Pass/Fail/Skipped> - <GRPO Megatron or requested variant>
- Requested variants: <Pass/Fail/Skipped/Not requested> - <LoRA/checkpoint, non-colocated vLLM refit, Megatron generation, parallelism stress, architecture-specific, or learning-signal>
- Metrics/log capture: <Pass/Fail> - <log path, metrics path, and metric assertion status>

Evidence:
- Bridge repo: <commit> plus dirty files
- NeMo-RL repo: <commit> plus dirty files
- Command: <exact command or script path>
- Key lines: <policy.megatron_cfg.enabled=true, step completion, metrics.json creation, tests/check_metrics.py result, or the first relevant error>

Limitations:
- <dummy model, skipped save/resume, text-only VLM/audio smoke, trivial EP, no learning-signal claim, known shutdown warnings, etc.>

Follow-ups:
- <needed rerun, environment fix, provider fix, NeMo-RL issue, or "none">

If the job is blocked before Bridge model/provider construction by data, Ray, vLLM, dependency, disk, container, or cluster setup, mark the overall result as

Blocked

, not

Fail

, and state that it is not evidence against the Bridge provider.

If any requested deliverable was not run, mark it

Skipped

Not requested

with the reason. Do not leave it implicit in the limitations.

每次运行结束后，先给出面向用户的简短摘要，回答“请求的交付项是否通过？”，再添加详细信息。对每个交付项使用

Pass

、

Fail

、

Skipped

或

Blocked

，只有当请求覆盖层级的通过标准满足时，才报告整体

Pass

。

text

结果: <Pass/Fail/Blocked> - <一句话说明验证内容>
请求覆盖: <Level 0/1/2/3/4/5/6及请求变体>
模型: <policy.model_name或本地模型路径>

交付项:
- Bridge端检查: <Pass/Fail/Skipped> - <测试命令或跳过原因>
- NeMo-RL中导入本地Bridge: <Pass/Fail> - <megatron.bridge.__file__路径>
- NeMo-RL Megatron策略运行: <Pass/Fail/Skipped> - <GRPO Megatron或请求变体>
- 请求变体: <Pass/Fail/Skipped/Not requested> - <LoRA/checkpoint、非协同部署vLLM重构、Megatron生成、并行性压力、架构特定或学习信号>
- 指标/日志捕获: <Pass/Fail> - <日志路径、指标路径及指标断言状态>

证据:
- Bridge代码库: <commit>加未提交文件
- NeMo-RL代码库: <commit>加未提交文件
- 命令: <精确命令或脚本路径>
- 关键行: <policy.megatron_cfg.enabled=true、步骤完成、metrics.json生成、tests/check_metrics.py结果或首个相关错误>

局限性:
- <虚拟模型、跳过保存/恢复、仅文本VLM/音频冒烟测试、基础EP、无学习信号声明、已知关闭警告等>

后续动作:
- <需要重新运行、环境修复、提供者修复、NeMo-RL问题或“无”>

如果在Bridge模型/提供者构建前因数据、Ray、vLLM、依赖、磁盘、容器或集群设置而阻塞，将整体结果标记为

Blocked

而非

Fail

，并说明这不是Bridge提供者的负面证据。

如果任何请求的交付项未运行，标记为

Skipped

或

Not requested

并说明原因。不要在局限性中隐含未运行的情况。