nemo-rl-e2e-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNeMo-RL E2E Testing
NeMo-RL 端到端测试
Validate a Megatron-Bridge model or training API change through NeMo-RL's Megatron backend. This catches integration issues that Bridge-only tests miss: NeMo-RL-owned rollout scheduling, reward handling, policy/reference setup, HF import/export through Bridge, optimizer setup, checkpoint ownership, and policy-to-generation weight transfer.
Use this as an external compatibility smoke test after the focused Bridge tests for the model/provider change pass.
This is not a replacement for Bridge model parity tests. A NeMo-RL GRPO or SFT run proves that Bridge can survive an external RL training loop; architecture correctness still comes from Bridge import/export, logits, roundtrip, and model-specific inference tests.
通过NeMo-RL的Megatron后端验证Megatron-Bridge模型或训练API变更。这能捕获仅Bridge测试无法发现的集成问题:NeMo-RL负责的rollout调度、奖励处理、策略/参考配置、通过Bridge的HF导入/导出、优化器配置、checkpoint归属,以及策略到生成的权重传输。
在针对模型/提供者变更的聚焦Bridge测试通过后,将此作为外部兼容性冒烟测试使用。
这不能替代Bridge模型一致性测试。NeMo-RL的GRPO或SFT运行仅能证明Bridge可在外部RL训练循环中正常工作;架构正确性仍需依赖Bridge的导入/导出、logits、往返测试,以及模型特定的推理测试。
Scope
测试范围
Think in coverage levels. Start with Level 0 and add only the levels justified by the change.
| Level | Required when | What it proves |
|---|---|---|
| 0: Megatron policy GRPO smoke | Any new provider or provider config change that claims NeMo-RL compatibility | NeMo-RL can import the local Bridge provider, build a Megatron policy, initialize optimizer/scheduler state, run rollout/ref/logprob wiring, and finish a short GRPO job |
| 1: LoRA/checkpoint variant | Checkpointing, HF export, optimizer state, resume behavior, or a NeMo-RL-supported PEFT path changed | NeMo-RL can save through its checkpoint schedule, resume without losing training state, and, when PEFT is enabled in that NeMo-RL checkout, apply Bridge LoRA hooks |
| 2: Non-colocated vLLM refit | HF export, weight mapping, policy-to-generation refit, delta compression, packed transfer, or vLLM update behavior changed | Bridge-exported weights can be transferred from the Megatron policy worker into separate vLLM generation workers |
| 3: Optional Megatron generation backend | Only when the NeMo-RL checkout still supports | NeMo-RL can use Megatron for both policy and generation rather than only vLLM generation |
| 4: Parallelism stress | TP/PP/CP/EP, sequence parallel, MoE dispatch, pipeline stage layout, or distributed optimizer behavior changed | Provider settings remain correct under non-trivial Megatron parallel state |
| 5: Architecture-specific e2e | VLM, audio, MoE, MTP/draft models, FP8/QAT/ModelOpt, quantized weights, or custom layers are involved | The architecture-specific runtime path is exercised, not just a text-only dense GRPO smoke |
| 6: Learning signal | Optimizer, scheduler, loss, reward, PEFT trainability, gradient flow, or training stability changed | Metrics move in the expected direction over a short run and do not silently produce zero/NaN/unstable updates |
The default Level 0 target is NeMo-RL's maintained Megatron GRPO functional:
bash
uv run bash tests/functional/grpo_megatron.shThis is intentionally small. It exercises NeMo-RL's external RL loop without making Megatron-Bridge own rollout scheduling, rewards, checkpoint cadence, or trainer state.
Level 0 is not a convergence test. It only proves the job can complete a small number of updates. Use Level 6 when the question is whether the model actually learns under NeMo-RL.
按覆盖层级来规划测试。从Level 0开始,仅添加变更所需的层级。
| Level | 适用场景 | 验证内容 |
|---|---|---|
| 0: Megatron策略GRPO冒烟测试 | 任何声称支持NeMo-RL兼容性的新提供者或提供者配置变更 | NeMo-RL可导入本地Bridge提供者、构建Megatron策略、初始化优化器/调度器状态、运行rollout/ref/logprob连接,并完成一个简短的GRPO任务 |
| 1: LoRA/checkpoint变体测试 | Checkpointing、HF导出、优化器状态、恢复行为,或NeMo-RL支持的PEFT路径发生变更 | NeMo-RL可通过其checkpoint调度保存、恢复时不丢失训练状态,且当NeMo-RL代码库中启用PEFT时,可应用Bridge LoRA钩子 |
| 2: 非协同部署vLLM重构测试 | HF导出、权重映射、策略到生成的重构、增量压缩、打包传输,或vLLM更新行为发生变更 | Bridge导出的权重可从Megatron策略工作节点传输到独立的vLLM生成工作节点 |
| 3: 可选Megatron生成后端测试 | 仅当NeMo-RL代码库仍支持 | NeMo-RL可同时使用Megatron作为策略和生成后端,而非仅使用vLLM生成 |
| 4: 并行性压力测试 | TP/PP/CP/EP、序列并行、MoE调度、流水线阶段布局,或分布式优化器行为发生变更 | 在复杂Megatron并行状态下,提供者设置仍保持正确 |
| 5: 架构特定端到端测试 | 涉及VLM、音频、MoE、MTP/草稿模型、FP8/QAT/ModelOpt、量化权重或自定义层时 | 验证架构特定的运行时路径,而非仅进行文本密集型GRPO冒烟测试 |
| 6: 学习信号测试 | 优化器、调度器、损失、奖励、PEFT可训练性、梯度流或训练稳定性发生变更 | 在短时间运行中,指标按预期方向变化,不会静默产生零/NaN/不稳定更新 |
默认Level 0目标为NeMo-RL维护的Megatron GRPO功能测试:
bash
uv run bash tests/functional/grpo_megatron.sh该测试规模较小,仅验证NeMo-RL的外部RL循环,无需Megatron-Bridge负责rollout调度、奖励、checkpoint节奏或训练器状态。
Level 0不是收敛测试,仅证明任务可完成少量更新。当需要验证模型在NeMo-RL下是否真正学习时,使用Level 6。
Repos
代码库
Use explicit repo variables. Do not rely on an installed wheel; the purpose is to test the current Bridge checkout.
megatron-bridgeUse the upstream NeMo-RL repository as the default source:
text
https://github.com/NVIDIA-NeMo/RLIf a checkout is not already available, clone it next to the Bridge checkout or into the site's standard workspace:
bash
git clone https://github.com/NVIDIA-NeMo/RL.git /path/to/nemo-rlbash
export BRIDGE_REPO=${BRIDGE_REPO:-/path/to/Megatron-Bridge}
export NEMO_RL_REPO=${NEMO_RL_REPO:-/path/to/nemo-rl}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"NeMo-RL checkouts often also contain a vendored Bridge tree under:
text
3rdparty/Megatron-Bridge-workspace/Megatron-BridgeWhen testing a local Bridge change, either put the local Bridge checkout ahead of everything else in , or sync the exact local Bridge changes into that vendored checkout. Do not assume the vendored tree matches the Bridge PR under test.
PYTHONPATHBefore running, record both states:
bash
git -C "$BRIDGE_REPO" status --short
git -C "$NEMO_RL_REPO" status --short
git -C "$BRIDGE_REPO" rev-parse --short HEAD
git -C "$NEMO_RL_REPO" rev-parse --short HEADIf testing on a remote GPU machine, sync the exact local changes first. Do not reset or overwrite unrelated changes in either tree.
Verify that Python imports the checkout under test:
bash
python - <<'PY'
import megatron.bridge
print(megatron.bridge.__file__)
PYThe printed path must live under , or under the NeMo-RL vendored Bridge checkout only if that vendored checkout was intentionally synced to the Bridge change. If it points at site-packages or an unexpected 3rdparty path, fix before trusting any result.
$BRIDGE_REPO/srcPYTHONPATH使用明确的代码库变量。不要依赖已安装的 wheel;本测试的目的是验证当前Bridge代码库的变更。
megatron-bridge默认使用上游NeMo-RL代码库:
text
https://github.com/NVIDIA-NeMo/RL如果尚未克隆,将其克隆到Bridge代码库旁或站点标准工作区:
bash
git clone https://github.com/NVIDIA-NeMo/RL.git /path/to/nemo-rlbash
export BRIDGE_REPO=${BRIDGE_REPO:-/path/to/Megatron-Bridge}
export NEMO_RL_REPO=${NEMO_RL_REPO:-/path/to/nemo-rl}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"NeMo-RL代码库通常在以下路径包含一个 vendored Bridge代码树:
text
3rdparty/Megatron-Bridge-workspace/Megatron-Bridge测试本地Bridge变更时,要么将本地Bridge代码库放在的最前面,要么将本地Bridge的精确变更同步到该vendored代码库中。不要假设vendored代码树与待测试的Bridge PR一致。
PYTHONPATH运行测试前,记录两个代码库的状态:
bash
git -C "$BRIDGE_REPO" status --short
git -C "$NEMO_RL_REPO" status --short
git -C "$BRIDGE_REPO" rev-parse --short HEAD
git -C "$NEMO_RL_REPO" rev-parse --short HEAD如果在远程GPU机器上测试,先同步本地的精确变更。不要重置或覆盖任一代码库中的无关变更。
验证Python导入的是待测试的代码库:
bash
python - <<'PY'
import megatron.bridge
print(megatron.bridge.__file__)
PY打印的路径必须位于下,或仅当vendored代码库已同步Bridge变更时,位于NeMo-RL的vendored Bridge代码库下。如果路径指向site-packages或意外的3rdparty路径,需先修复再信任测试结果。
$BRIDGE_REPO/srcPYTHONPATHBridge Checks First
先执行Bridge检查
Run focused Bridge tests before the external NeMo-RL e2e. Include any model-specific tests added by the change.
bash
cd "$BRIDGE_REPO"
uv run python -m pytest -q \
tests/unit_tests/models/test_model_provider_mixin.py \
tests/unit_tests/models/test_param_mapping.py \
tests/unit_tests/training/test_integration.py \
<model-specific-test-paths>For a new model family, also run the relevant conversion or roundtrip test from the model's PR. See @skills/adding-model-support/tests-and-examples.md for model-test patterns.
Minimum Bridge-side evidence for a new model/provider:
- provider/config unit tests
- parameter mapping tests
- HF to Megatron import or roundtrip on a small model
- model-specific generation or logits comparison when available
- this NeMo-RL external-loop smoke after the above pass
在执行外部NeMo-RL端到端测试前,先运行聚焦的Bridge测试,包括变更新增的任何模型特定测试。
bash
cd "$BRIDGE_REPO"
uv run python -m pytest -q \
tests/unit_tests/models/test_model_provider_mixin.py \
tests/unit_tests/models/test_param_mapping.py \
tests/unit_tests/training/test_integration.py \
<model-specific-test-paths>对于新模型系列,还需运行模型PR中相关的转换或往返测试。查看@skills/adding-model-support/tests-and-examples.md了解模型测试模式。
新模型/提供者的最低Bridge端验证证据:
- 提供者/配置单元测试
- 参数映射测试
- 小模型的HF到Megatron导入或往返测试
- 可用时的模型特定生成或logits对比测试
- 上述测试通过后的NeMo-RL外部循环冒烟测试
NeMo-RL Unit Checks
NeMo-RL单元检查
Run the NeMo-RL unit checks that match the surface being exercised. Keep them focused; the e2e is the expensive signal.
bash
cd "$NEMO_RL_REPO"
uv run pytest -q \
tests/unit/models/megatron/test_megatron_setup.py \
tests/unit/models/policy/test_megatron_worker.py \
tests/unit/utils/test_weight_transfer.pyFor checkpoint changes, add:
bash
uv run pytest -q \
tests/unit/utils/test_checkpoint.py \
tests/unit/utils/test_native_checkpoint.pyFor vLLM refit or generation-worker changes, add the relevant vLLM unit tests:
bash
uv run pytest -q \
tests/unit/models/generation/test_vllm_generation.py \
tests/unit/models/generation/test_vllm_utils.py运行与测试表面匹配的NeMo-RL单元检查,保持聚焦;端到端测试是成本较高的验证信号。
bash
cd "$NEMO_RL_REPO"
uv run pytest -q \
tests/unit/models/megatron/test_megatron_setup.py \
tests/unit/models/policy/test_megatron_worker.py \
tests/unit/utils/test_weight_transfer.py针对checkpoint变更,添加:
bash
uv run pytest -q \
tests/unit/utils/test_checkpoint.py \
tests/unit/utils/test_native_checkpoint.py针对vLLM重构或生成工作节点变更,添加相关vLLM单元测试:
bash
uv run pytest -q \
tests/unit/models/generation/test_vllm_generation.py \
tests/unit/models/generation/test_vllm_utils.pyModel Choice
模型选择
Prefer the smallest public HF checkpoint that uses the changed provider family. The maintained Megatron GRPO functional uses because it is small enough for a 2-GPU smoke and is supported by NeMo-RL's Megatron path.
Qwen/Qwen2.5-0.5BIf there is no small public checkpoint for the new architecture, use the closest NeMo-RL recipe that constructs the model with a minimal config or small local checkpoint, and report that the run validates construction/training mechanics rather than pretrained weight compatibility.
For VLM or audio models, a text-only GRPO smoke is not enough. Pair the Level 0 policy smoke with the relevant NeMo-RL VLM/audio functional, for example:
bash
uv run bash tests/functional/vlm_grpo.sh
uv run bash tests/functional/audio_grpo_megatron.shFor MoE models, Level 0 with trivial expert parallelism catches many provider issues, but it does not stress expert routing. Add a Level 4 run with expert parallelism when the change touches expert layout, dispatcher config, router behavior, or expert tensor parallelism.
For MTP/draft models, use an Eagle/MTP-specific functional:
bash
uv run bash tests/functional/grpo_megatron_eagle3_online.shFor FP8/QAT/ModelOpt or quantized checkpoint support, use the closest recipe or functional that explicitly enables the feature. Do not claim the generic GRPO smoke validated quantization unless the config turns it on.
优先选择使用变更提供者家族的最小公开HF checkpoint。维护的Megatron GRPO功能测试使用,因为它足够小,可在2-GPU上进行冒烟测试,且受NeMo-RL的Megatron路径支持。
Qwen/Qwen2.5-0.5B如果新架构没有小的公开checkpoint,使用最接近的NeMo-RL recipe,通过最小配置或小本地checkpoint构建模型,并说明该运行验证的是构建/训练机制,而非预训练权重兼容性。
对于VLM或音频模型,仅文本GRPO冒烟测试不够。需将Level 0策略冒烟测试与相关NeMo-RL VLM/音频功能测试结合,例如:
bash
uv run bash tests/functional/vlm_grpo.sh
uv run bash tests/functional/audio_grpo_megatron.sh对于MoE模型,启用基础专家并行的Level 0测试可捕获许多提供者问题,但不会测试专家路由。当变更涉及专家布局、调度器配置、路由行为或专家张量并行时,需添加带专家并行的Level 4测试。
对于MTP/草稿模型,使用Eagle/MTP特定的功能测试:
bash
uv run bash tests/functional/grpo_megatron_eagle3_online.sh对于FP8/QAT/ModelOpt或量化checkpoint支持,使用明确启用该功能的最接近recipe或功能测试。除非配置开启了量化,否则不要声称通用GRPO冒烟测试验证了量化。
Environment Setup
环境设置
Use the NeMo-RL development environment or the site-approved NeMo-RL container. Make caches explicit on shared clusters:
bash
export HF_HOME=${HF_HOME:-/scratch/$USER/nemo_rl_hf}
export HF_HUB_CACHE=$HF_HOME/hub
export NEMO_RL_HOME=${NEMO_RL_HOME:-$NEMO_RL_REPO}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"If the container has a dependency fingerprint mismatch, note it in the report. Prefer rebuilding the container or virtualenv when possible; use environment overrides only as test-environment evidence, not repository changes.
If model downloads fail with , move , , and any local to a larger shared or node-local path.
No space left on deviceHF_HOMEHF_HUB_CACHEMODEL_PATHIf Hugging Face API calls fail with rate limits after the model is already cached, point both the model and tokenizer at the local snapshot and run offline:
bash
export MODEL_PATH=/scratch/$USER/hf/hub/models--<org>--<model>/snapshots/<snapshot-sha>
export HF_HOME=/scratch/$USER/hf
export HF_HUB_CACHE=$HF_HOME/hub
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1Then pass both overrides to NeMo-RL:
bash
policy.model_name="$MODEL_PATH" \
policy.tokenizer.name="$MODEL_PATH"Before trusting the snapshot, verify it loads locally:
bash
uv run python - <<'PY'
from transformers import AutoConfig, AutoTokenizer
path = "<local-snapshot-path>"
config = AutoConfig.from_pretrained(path, trust_remote_code=True, local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, local_files_only=True)
print(type(config).__name__, getattr(config, "model_type", None), type(tokenizer).__name__, tokenizer.vocab_size)
PY使用NeMo-RL开发环境或站点批准的NeMo-RL容器。在共享集群上明确设置缓存:
bash
export HF_HOME=${HF_HOME:-/scratch/$USER/nemo_rl_hf}
export HF_HUB_CACHE=$HF_HOME/hub
export NEMO_RL_HOME=${NEMO_RL_HOME:-$NEMO_RL_REPO}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"如果容器存在依赖指纹不匹配,需在报告中注明。优先重建容器或虚拟环境;仅将环境覆盖作为测试环境证据,而非代码库变更。
如果模型下载时出现错误,将、和任何本地移至更大的共享或节点本地路径。
No space left on deviceHF_HOMEHF_HUB_CACHEMODEL_PATH如果模型已缓存,但Hugging Face API调用因速率限制失败,将模型和分词器指向本地快照并离线运行:
bash
export MODEL_PATH=/scratch/$USER/hf/hub/models--<org>--<model>/snapshots/<snapshot-sha>
export HF_HOME=/scratch/$USER/hf
export HF_HUB_CACHE=$HF_HOME/hub
export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1然后将两个覆盖参数传递给NeMo-RL:
bash
policy.model_name="$MODEL_PATH" \
policy.tokenizer.name="$MODEL_PATH"信任快照前,验证其可在本地加载:
bash
uv run python - <<'PY'
from transformers import AutoConfig, AutoTokenizer
path = "<local-snapshot-path>"
config = AutoConfig.from_pretrained(path, trust_remote_code=True, local_files_only=True)
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, local_files_only=True)
print(type(config).__name__, getattr(config, "model_type", None), type(tokenizer).__name__, tokenizer.vocab_size)
PYMinimal NeMo-RL Run
最小NeMo-RL运行
Use NeMo-RL's maintained functional wrapper for the default smoke:
bash
cd "$NEMO_RL_REPO"
ray stop --force || true
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"
uv run bash tests/functional/grpo_megatron.shThe wrapper writes:
text
tests/functional/grpo_megatron/run.log
tests/functional/grpo_megatron/metrics.jsonCapture the exact command and keep the log path. Prefer a saved log over a pasted terminal excerpt in PR descriptions.
If the test needs a different provider or model, pass Hydra overrides through the wrapper:
bash
uv run bash tests/functional/grpo_megatron.sh \
policy.model_name=<small-compatible-hf-model> \
policy.megatron_cfg.converter_type=<BridgeConverterType>Keep the first smoke small. Increase model size or parallelism only after a small run proves the basic path works.
使用NeMo-RL维护的功能包装器进行默认冒烟测试:
bash
cd "$NEMO_RL_REPO"
ray stop --force || true
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${NEMO_RL_REPO}:${PYTHONPATH:-}"
uv run bash tests/functional/grpo_megatron.sh包装器会生成:
text
tests/functional/grpo_megatron/run.log
tests/functional/grpo_megatron/metrics.json记录精确命令并保存日志路径。在PR描述中优先使用保存的日志,而非粘贴终端片段。
如果测试需要不同的提供者或模型,通过包装器传递Hydra覆盖参数:
bash
uv run bash tests/functional/grpo_megatron.sh \
policy.model_name=<small-compatible-hf-model> \
policy.megatron_cfg.converter_type=<BridgeConverterType>首次冒烟测试保持规模较小。仅在小规模运行证明基础路径可行后,再增加模型大小或并行性。
LoRA And Checkpoint Coverage
LoRA与Checkpoint覆盖
Use Level 1 when the change touches checkpoint save/load, HF export, optimizer state, resume behavior, or a NeMo-RL PEFT path that is known to work in the checkout being tested.
NeMo-RL PEFT support is backend- and revision-dependent. Do not block a provider-only compatibility smoke solely on a known-broken or unsupported NeMo-RL PEFT path. In that case, record Level 1 PEFT as not applicable or blocked by NeMo-RL, keep the Level 0 GRPO smoke as the required downstream signal, and cover Bridge PEFT behavior with focused Bridge tests.
LoRA + checkpoint save smoke, when the NeMo-RL checkout supports this path:
bash
uv run bash tests/functional/grpo_megatron_lora.shSFT resume parity across dtensor and Megatron policy paths:
bash
uv run bash tests/functional/sft_resume_diamond.shThe LoRA functional intentionally saves checkpoints. Remove stale checkpoint outputs between unrelated experiments, but keep them while validating resume behavior.
Do not claim PEFT coverage from ; use the LoRA functional or an equivalent Hydra override with .
grpo_megatron.shpolicy.megatron_cfg.peft.enabled=true当变更涉及checkpoint保存/加载、HF导出、优化器状态、恢复行为,或待测试代码库中已知可用的NeMo-RL PEFT路径时,使用Level 1测试。
NeMo-RL的PEFT支持依赖后端和版本。不要仅因已知损坏或不支持的NeMo-RL PEFT路径而阻止提供者兼容性冒烟测试。这种情况下,记录Level 1 PEFT不适用或被NeMo-RL阻塞,将Level 0 GRPO冒烟测试作为所需下游信号,并用聚焦的Bridge测试覆盖Bridge PEFT行为。
当NeMo-RL代码库支持该路径时,运行LoRA + checkpoint保存冒烟测试:
bash
uv run bash tests/functional/grpo_megatron_lora.sh跨dtensor和Megatron策略路径的SFT恢复一致性测试:
bash
uv run bash tests/functional/sft_resume_diamond.shLoRA功能测试会特意保存checkpoint。在无关实验之间删除陈旧的checkpoint输出,但在验证恢复行为时保留它们。
不要声称覆盖了PEFT;需使用LoRA功能测试或等效的Hydra覆盖参数。
grpo_megatron.shpolicy.megatron_cfg.peft.enabled=trueNon-Colocated vLLM Refit
非协同部署vLLM重构
Use Level 2 when the change touches Bridge HF export, parameter mapping, NeMo-RL weight refit, packed tensor transfer, vLLM loading, delta compression, or policy/generation worker synchronization.
Small 2-GPU non-colocated smoke with the Megatron policy backend:
bash
cd "$NEMO_RL_REPO"
uv run coverage run -a --data-file=tests/.coverage --source=nemo_rl \
examples/run_grpo.py \
--config examples/configs/grpo_math_1B_megatron.yaml \
policy.model_name=Qwen/Qwen2.5-0.5B \
grpo.num_prompts_per_step=2 \
grpo.num_generations_per_prompt=4 \
policy.train_global_batch_size=4 \
policy.train_micro_batch_size=1 \
policy.logprob_batch_size=4 \
policy.generation.colocated.enabled=false \
policy.generation.colocated.resources.gpus_per_node=1 \
policy.generation.vllm_cfg.async_engine=true \
cluster.gpus_per_node=2 \
grpo.max_num_steps=2 \
logger.tensorboard_enabled=true \
logger.log_dir=tests/functional/grpo_megatron_non_colocated/logs \
logger.wandb_enabled=false \
checkpointing.enabled=falseAfter the run, dump metrics:
bash
uv run tests/json_dump_tb_logs.py \
tests/functional/grpo_megatron_non_colocated/logs \
--output_path tests/functional/grpo_megatron_non_colocated/metrics.jsonMetric assertion helpers differ across NeMo-RL revisions. Inspect or the maintained functional wrapper before assuming an interface. Some checkouts expect positional expressions:
tests/check_metrics.pybash
uv run tests/check_metrics.py tests/functional/grpo_megatron_non_colocated/metrics.json \
'max(data["train/token_mult_prob_error"]) < 1.05' \
'min(data["train/probs_ratio_clamped_min"]) > 0.79' \
'max(data["train/probs_ratio_clamped_max"]) < 1.21'For delta-compression testing, add these overrides:
bash
policy.generation.delta_compression.enabled=true \
policy.generation.delta_compression.dtype=bfloat16 \
policy.generation.delta_compression.transport=sparse_indices \
policy.generation.delta_compression.full_sync_interval=20 \
policy.generation.delta_compression.sparse_bucket_size_bytes=5368709120 \
policy.generation.delta_compression.delta_load_batch_size_bytes=536870912Report weight-transfer timing metrics when available, especially:
timing/train/prepare_for_generation/totaltiming/train/prepare_for_generation/transfer_and_update_weightstiming/train/prepare_for_generation/weight_transfer/producer/collect_tensorstiming/train/prepare_for_generation/weight_transfer/producer/sparse_encodetiming/train/prepare_for_generation/weight_transfer/producer/sparse_nonzerotiming/train/prepare_for_generation/weight_transfer/consumer/decode_sparsetiming/train/prepare_for_generation/weight_transfer/consumer/load_delta
If the payload broadcast time is tiny but sparse encode/decode dominates, report that boundary clearly. It is a weight-preparation bottleneck, not a NCCL broadcast bottleneck.
当变更涉及Bridge HF导出、参数映射、NeMo-RL权重重构、打包张量传输、vLLM加载、增量压缩或策略/生成工作节点同步时,使用Level 2测试。
基于Megatron策略后端的小型2-GPU非协同部署冒烟测试:
bash
cd "$NEMO_RL_REPO"
uv run coverage run -a --data-file=tests/.coverage --source=nemo_rl \
examples/run_grpo.py \
--config examples/configs/grpo_math_1B_megatron.yaml \
policy.model_name=Qwen/Qwen2.5-0.5B \
grpo.num_prompts_per_step=2 \
grpo.num_generations_per_prompt=4 \
policy.train_global_batch_size=4 \
policy.train_micro_batch_size=1 \
policy.logprob_batch_size=4 \
policy.generation.colocated.enabled=false \
policy.generation.colocated.resources.gpus_per_node=1 \
policy.generation.vllm_cfg.async_engine=true \
cluster.gpus_per_node=2 \
grpo.max_num_steps=2 \
logger.tensorboard_enabled=true \
logger.log_dir=tests/functional/grpo_megatron_non_colocated/logs \
logger.wandb_enabled=false \
checkpointing.enabled=false运行后,导出指标:
bash
uv run tests/json_dump_tb_logs.py \
tests/functional/grpo_megatron_non_colocated/logs \
--output_path tests/functional/grpo_megatron_non_colocated/metrics.json不同NeMo-RL版本的指标断言助手有所不同。在假设接口前,查看或维护的功能包装器。部分代码库需要位置表达式:
tests/check_metrics.pybash
uv run tests/check_metrics.py tests/functional/grpo_megatron_non_colocated/metrics.json \
'max(data["train/token_mult_prob_error"]) < 1.05' \
'min(data["train/probs_ratio_clamped_min"]) > 0.79' \
'max(data["train/probs_ratio_clamped_max"]) < 1.21'针对增量压缩测试,添加以下覆盖参数:
bash
policy.generation.delta_compression.enabled=true \
policy.generation.delta_compression.dtype=bfloat16 \
policy.generation.delta_compression.transport=sparse_indices \
policy.generation.delta_compression.full_sync_interval=20 \
policy.generation.delta_compression.sparse_bucket_size_bytes=5368709120 \
policy.generation.delta_compression.delta_load_batch_size_bytes=536870912报告可用的权重传输时间指标,尤其是:
timing/train/prepare_for_generation/totaltiming/train/prepare_for_generation/transfer_and_update_weightstiming/train/prepare_for_generation/weight_transfer/producer/collect_tensorstiming/train/prepare_for_generation/weight_transfer/producer/sparse_encodetiming/train/prepare_for_generation/weight_transfer/producer/sparse_nonzerotiming/train/prepare_for_generation/weight_transfer/consumer/decode_sparsetiming/train/prepare_for_generation/weight_transfer/consumer/load_delta
如果负载广播时间很短,但稀疏编码/解码占主导,需明确报告该边界。这是权重准备瓶颈,而非NCCL广播瓶颈。
Megatron Generation Backend
Megatron生成后端
Use Level 3 only when the NeMo-RL checkout under test supports the Megatron generation backend and the Bridge change explicitly affects that downstream path. Do not require this for normal provider compatibility, HF import/export, vLLM-backed generation, or generic Bridge inference tests.
bash
uv run bash tests/functional/grpo_megatron_generation.shThis exercises , so it validates NeMo-RL's Megatron generation construction and runtime behavior more directly than the default vLLM-backed GRPO functional.
policy.generation.backend=megatronSome NeMo-RL revisions declare and extras as mutually incompatible. In that environment, a vLLM-backed Level 0 run may be blocked even though the Megatron policy path is testable. Use for a Megatron-only smoke, record vLLM as skipped or blocked, and do not claim non-colocated vLLM refit coverage.
mcorevllmpolicy.generation.backend=megatron仅当待测试的NeMo-RL代码库支持Megatron生成后端,且Bridge变更明确影响该下游路径时,使用Level 3测试。对于普通提供者兼容性、HF导入/导出、vLLM驱动的生成或通用Bridge推理测试,无需此测试。
bash
uv run bash tests/functional/grpo_megatron_generation.sh该测试验证,因此比默认vLLM驱动的GRPO功能测试更直接地验证NeMo-RL的Megatron生成构建和运行时行为。
policy.generation.backend=megatron部分NeMo-RL版本声明和扩展互斥。在这种环境下,即使Megatron策略路径可测试,vLLM驱动的Level 0运行也可能被阻塞。使用进行仅Megatron的冒烟测试,记录vLLM为跳过或阻塞,不要声称覆盖非协同部署vLLM重构。
mcorevllmpolicy.generation.backend=megatronParallelism Stress
并行性压力测试
Use Level 4 when provider finalization, model-parallel settings, sequence parallel, context parallel, MoE dispatch, pipeline layout, or distributed optimizer behavior changed.
Start from a maintained recipe that already matches the intended GPU count. For example, use one of the recipe configs under:
text
examples/configs/recipes/llm/*megatron*.yaml
examples/configs/recipes/llm/performance/*megatron*.yaml
examples/configs/recipes/vlm/*megatron*.yamlFor a small manual stress variant, override the Megatron sizes explicitly:
bash
uv run bash tests/functional/grpo_megatron.sh \
policy.megatron_cfg.tensor_model_parallel_size=2 \
policy.megatron_cfg.pipeline_model_parallel_size=1 \
policy.megatron_cfg.context_parallel_size=1 \
policy.megatron_cfg.sequence_parallel=false \
cluster.gpus_per_node=2For MoE, use a MoE recipe and set expert parallelism only when the model and GPU count support it:
bash
policy.megatron_cfg.expert_model_parallel_size=2 \
policy.megatron_cfg.expert_tensor_parallel_size=1Keep these as follow-up runs. Do not make them the first debugging surface for a new provider.
当提供者初始化、模型并行设置、序列并行、上下文并行、MoE调度、流水线布局或分布式优化器行为发生变更时,使用Level 4测试。
从已匹配预期GPU数量的维护recipe开始。例如,使用以下路径下的recipe配置:
text
examples/configs/recipes/llm/*megatron*.yaml
examples/configs/recipes/llm/performance/*megatron*.yaml
examples/configs/recipes/vlm/*megatron*.yaml对于小型手动压力变体,显式覆盖Megatron大小:
bash
uv run bash tests/functional/grpo_megatron.sh \
policy.megatron_cfg.tensor_model_parallel_size=2 \
policy.megatron_cfg.pipeline_model_parallel_size=1 \
policy.megatron_cfg.context_parallel_size=1 \
policy.megatron_cfg.sequence_parallel=false \
cluster.gpus_per_node=2对于MoE,使用MoE recipe并仅在模型和GPU数量支持时设置专家并行:
bash
policy.megatron_cfg.expert_model_parallel_size=2 \
policy.megatron_cfg.expert_tensor_parallel_size=1将这些作为后续运行。不要将其作为新提供者的首个调试环节。
Learning Signal
学习信号测试
Use Level 6 only when the change affects trainability or when downstream validation explicitly asks for learning behavior. Do not require it for every provider-only PR; RL learning is slower, noisier, and more environment-dependent than compatibility smoke tests.
The goal is a short learning-signal run, not a benchmark. Prefer a small model, fixed data, fixed seed when available, and enough steps to observe non-random metric movement:
bash
uv run bash tests/functional/grpo_megatron_lora.sh \
grpo.max_num_steps=20 \
data.shuffle=false \
checkpointing.enabled=falseAcceptable learning-signal evidence depends on the task, but the report should include at least:
- no NaNs or infs in loss, reward, KL, entropy, grad norm, or logprob metrics
- nonzero trainable parameter count when PEFT is enabled
- actor losses and reward-related metrics logged for multiple steps
- validation or reward trend compared against the starting point or a known-good baseline
- no repeated zero gradients, frozen LoRA adapters, or constant logprobs unless expected
Do not call a 20-step run "converged" in the benchmark sense. Call it "learning-signal passed" unless it reaches a pre-agreed metric threshold.
仅当变更影响可训练性,或下游验证明确要求学习行为时,使用Level 6测试。不要要求每个仅提供者PR都进行此测试;RL学习比兼容性冒烟测试更慢、更嘈杂且更依赖环境。
目标是进行短时间的学习信号运行,而非基准测试。优先选择小模型、固定数据、固定种子(如果可用),以及足够的步骤以观察非随机指标变化:
bash
uv run bash tests/functional/grpo_megatron_lora.sh \
grpo.max_num_steps=20 \
data.shuffle=false \
checkpointing.enabled=false可接受的学习信号证据取决于任务,但报告至少应包含:
- 损失、奖励、KL、熵、梯度范数或logprob指标中无NaN或inf值
- 启用PEFT时,可训练参数数量非零
- 多个步骤记录的actor损失和奖励相关指标
- 与起始点或已知良好基线相比的验证或奖励趋势
- 无重复零梯度、冻结LoRA适配器或恒定logprob(除非预期)
不要将20步运行称为“基准意义上的收敛”。除非达到预先约定的指标阈值,否则称为“学习信号通过”。
Slurm Or Container Runs
Slurm或容器运行
Use the cluster's standard NeMo-RL container and mount both checkouts into the container. Keep setup and the actual run in the same container step when using node-local paths such as ; node-local model caches and ad-hoc installs disappear when a fresh container step starts.
/tmpIf the home filesystem is full or Megatron-Core tries to build helper extensions into a read-only/full checkout, copy the MCore submodule to node-local storage and put that copy on instead of editing :
PYTHONPATH3rdparty/Megatron-LM/bash
export MCORE_REPO=${MCORE_REPO:-/tmp/$USER/Megatron-LM}
if [[ ! -d "$MCORE_REPO/.git" ]]; then
cp -a "$BRIDGE_REPO/3rdparty/Megatron-LM" "$MCORE_REPO"
fi
EXT_SUFFIX=$(uv run python - <<'PY'
import sysconfig
print(sysconfig.get_config_var("EXT_SUFFIX") or ".so")
PY
)
make -C "$MCORE_REPO/megatron/core/datasets" LIBEXT="$EXT_SUFFIX"
export PYTHONPATH="${BRIDGE_REPO}/src:${MCORE_REPO}:${NEMO_RL_REPO}:${PYTHONPATH:-}"Overriding avoids a suffixless binary on containers where is absent from . Verify the built file is named like before launching a long run.
LIBEXThelpers_cpppython3-configPATHhelpers_cpp.cpython-<ver>-<platform>.soFor NeMo-RL multi-node jobs, prefer NeMo-RL's own launcher when it is available. It starts the Ray head and worker nodes under Slurm, mounts the requested container/filesystems, and executes from the NeMo-RL root. Launch it from , not from the Bridge checkout:
ray.subCOMMAND$NEMO_RL_REPObash
cd "$NEMO_RL_REPO"
COMMAND="uv run ./examples/run_grpo.py \
--config examples/configs/grpo_math_1B_megatron.yaml \
cluster.num_nodes=2 \
cluster.gpus_per_node=8 \
logger.log_dir=results/grpo_megatron_2n \
logger.wandb_enabled=false" \
CONTAINER="$NEMO_RL_IMAGE" \
MOUNTS="$BRIDGE_REPO:$BRIDGE_REPO,$NEMO_RL_REPO:$NEMO_RL_REPO,$HF_HOME:$HF_HOME" \
sbatch \
--nodes=2 \
--account=<account> \
--partition=<partition> \
--job-name=nemo-rl-bridge-e2e \
--time=4:00:00 \
--gres=gpu:8 \
ray.subInclude the local Bridge checkout in and in inside when the container does not already see the same path. If using a vendored Bridge under , sync the exact Bridge changes there instead and report that path.
MOUNTSPYTHONPATHCOMMAND3rdparty/Megatron-Bridge-workspace/Megatron-BridgeUse a direct only when is unavailable, stale for the target cluster, or when debugging the container/Slurm layer itself. Keep paths generic in scripts committed to Megatron-Bridge:
srunray.subbash
srun <site-specific-slurm-options> \
--container-image="${NEMO_RL_IMAGE}" \
--container-mounts="${BRIDGE_REPO}:/workspace/Megatron-Bridge,${NEMO_RL_REPO}:/workspace/nemo-rl,<data-root>:<data-root>" \
--container-workdir=/workspace/nemo-rl \
bash -lc '
export BRIDGE_REPO=/workspace/Megatron-Bridge
export NEMO_RL_REPO=/workspace/nemo-rl
export PYTHONPATH=$BRIDGE_REPO/src:$BRIDGE_REPO/3rdparty/Megatron-LM:$NEMO_RL_REPO
ray stop --force || true
uv run bash tests/functional/grpo_megatron.sh
'If an attach helper enters a container that no longer sees the expected checkouts or log directory, treat that helper as stale. Start a fresh step against the existing allocation with explicit , , and .
srun--container-image--container-mounts--container-workdirAttach helpers that use can enter a minimal in follow-up steps even when the original run saw the real checkout. Keep metric dumping and assertions in the same container step as the run when possible. If a follow-up step must inspect compute-local artifacts, use paths under the node-local run directory and do not assume is visible.
--no-container-mount-home/home/$USER$NEMO_RL_REPOFor general Slurm debugging and multi-node patterns, read @skills/multi-node-slurm/SKILL.md.
使用集群标准的NeMo-RL容器,并将两个代码库挂载到容器中。当使用节点本地路径(如)时,将设置和实际运行放在同一容器步骤中;节点本地模型缓存和临时安装会在新容器步骤启动时消失。
/tmp如果主文件系统已满,或Megatron-Core尝试将辅助扩展构建到只读/已满的代码库中,将MCore子模块复制到节点本地存储,并将该副本放在上,而非编辑:
PYTHONPATH3rdparty/Megatron-LM/bash
export MCORE_REPO=${MCORE_REPO:-/tmp/$USER/Megatron-LM}
if [[ ! -d "$MCORE_REPO/.git" ]]; then
cp -a "$BRIDGE_REPO/3rdparty/Megatron-LM" "$MCORE_REPO"
fi
EXT_SUFFIX=$(uv run python - <<'PY'
import sysconfig
print(sysconfig.get_config_var("EXT_SUFFIX") or ".so")
PY
)
make -C "$MCORE_REPO/megatron/core/datasets" LIBEXT="$EXT_SUFFIX"
export PYTHONPATH="${BRIDGE_REPO}/src:${MCORE_REPO}:${NEMO_RL_REPO}:${PYTHONPATH:-}"覆盖可避免在中缺少的容器上生成无后缀的二进制文件。在启动长时间运行前,验证构建的文件命名为。
LIBEXTPATHpython3-confighelpers_cpphelpers_cpp.cpython-<ver>-<platform>.so对于NeMo-RL多节点任务,当可用时优先使用NeMo-RL自身的启动器。它会在Slurm下启动Ray头节点和工作节点,挂载请求的容器/文件系统,并从NeMo-RL根目录执行。从启动,而非Bridge代码库:
ray.subCOMMAND$NEMO_RL_REPObash
cd "$NEMO_RL_REPO"
COMMAND="uv run ./examples/run_grpo.py \
--config examples/configs/grpo_math_1B_megatron.yaml \
cluster.num_nodes=2 \
cluster.gpus_per_node=8 \
logger.log_dir=results/grpo_megatron_2n \
logger.wandb_enabled=false" \
CONTAINER="$NEMO_RL_IMAGE" \
MOUNTS="$BRIDGE_REPO:$BRIDGE_REPO,$NEMO_RL_REPO:$NEMO_RL_REPO,$HF_HOME:$HF_HOME" \
sbatch \
--nodes=2 \
--account=<account> \
--partition=<partition> \
--job-name=nemo-rl-bridge-e2e \
--time=4:00:00 \
--gres=gpu:8 \
ray.sub当容器无法自动识别相同路径时,将本地Bridge代码库包含在和内的中。如果使用下的vendored Bridge,需将Bridge的精确变更同步到该路径并报告此路径。
MOUNTSCOMMANDPYTHONPATH3rdparty/Megatron-Bridge-workspace/Megatron-Bridge仅当不可用、针对目标集群过时,或调试容器/Slurm层本身时,才使用直接。在提交到Megatron-Bridge的脚本中保持路径通用:
ray.subsrunbash
srun <site-specific-slurm-options> \
--container-image="${NEMO_RL_IMAGE}" \
--container-mounts="${BRIDGE_REPO}:/workspace/Megatron-Bridge,${NEMO_RL_REPO}:/workspace/nemo-rl,<data-root>:<data-root>" \
--container-workdir=/workspace/nemo-rl \
bash -lc '
export BRIDGE_REPO=/workspace/Megatron-Bridge
export NEMO_RL_REPO=/workspace/nemo-rl
export PYTHONPATH=$BRIDGE_REPO/src:$BRIDGE_REPO/3rdparty/Megatron-LM:$NEMO_RL_REPO
ray stop --force || true
uv run bash tests/functional/grpo_megatron.sh
'如果附加工具进入容器后无法看到预期的代码库或日志目录,说明该工具已过时。针对现有分配启动新的步骤,显式指定、和。
srun--container-image--container-mounts--container-workdir使用的附加工具可能在后续步骤中进入最小化的,即使原始运行能看到真实代码库。尽可能将指标导出和断言放在与运行相同的容器步骤中。如果后续步骤必须检查计算本地 artifacts,使用节点本地运行目录下的路径,不要假设可见。
--no-container-mount-home/home/$USER$NEMO_RL_REPO有关Slurm调试和多节点模式的一般信息,请阅读@skills/multi-node-slurm/SKILL.md。
Pass Criteria
通过标准
A useful pass has all of the following:
- Focused Bridge tests pass for provider/config/mapping behavior.
- NeMo-RL imports the intended Bridge checkout, verified by .
megatron.bridge.__file__ - The NeMo-RL config has for Megatron policy validation.
policy.megatron_cfg.enabled=true - The run reaches the requested step count and writes .
metrics.json - passes when the maintained functional includes metric assertions.
tests/check_metrics.py - No exception occurs during Bridge provider setup, HF import/export, enabled PEFT/LoRA wrapping, Megatron initialization, optimizer setup, checkpoint manager setup, weight transfer, or the training step.
Ray shutdown warnings, Python resource-tracker warnings, or post-completion process-group warnings can be acceptable if the training step completed, metrics were written, and the process exits successfully. Mention them as residual log noise.
Do not claim full model e2e if the run used a dummy config, text-only data for a VLM/audio model, trivial expert parallelism for an expert-parallel change, or disabled save/resume for a checkpointing change. Call it the exact level that passed.
Do not claim convergence from Level 0. Claim learning signal only from Level 6, and distinguish "learning signal" from benchmark convergence in the report.
有效的测试通过需满足以下所有条件:
- 针对提供者/配置/映射行为的聚焦Bridge测试通过。
- NeMo-RL导入的是预期的Bridge代码库,通过验证。
megatron.bridge.__file__ - NeMo-RL配置中,用于Megatron策略验证。
policy.megatron_cfg.enabled=true - 运行达到请求的步骤数,并生成。
metrics.json - 当维护的功能测试包含指标断言时,通过。
tests/check_metrics.py - 在Bridge提供者设置、HF导入/导出、启用的PEFT/LoRA包装、Megatron初始化、优化器设置、checkpoint管理器设置、权重传输或训练步骤期间无异常发生。
如果训练步骤完成、指标已生成且进程成功退出,Ray关闭警告、Python资源跟踪器警告或完成后的进程组警告可视为可接受。需在报告中提及这些残留日志噪音。
如果运行使用了虚拟配置、VLM/音频模型的仅文本数据、专家并行变更的基础专家并行,或针对checkpoint变更禁用了保存/恢复,不要声称完整的模型端到端测试。仅报告实际通过的层级。
不要从Level 0测试声称收敛。仅从Level 6测试声称学习信号,并在报告中区分“学习信号”与基准收敛。
Failure Triage
失败排查
If model construction fails, verify that NeMo-RL is importing the Bridge checkout under test and that matches the provider.
policy.megatron_cfg.converter_typeIf the config silently uses dtensor instead of Megatron, set and , or use .
policy.dtensor_cfg.enabled=falsepolicy.megatron_cfg.enabled=truegrpo_megatron.shIf LoRA fails, check NeMo-RL PEFT config names and Bridge target module names. Reproduce with before adding larger model or parallelism changes.
grpo_megatron_lora.shIf checkpoint save/load fails, first rerun with to separate model construction from checkpoint behavior, then use for resume parity.
checkpointing.enabled=falsesft_resume_diamond.shIf non-colocated refit fails, separate the boundary:
- producer export and metadata preparation on the policy worker
- payload packing/broadcast
- consumer decode and model loading on the generation worker
- vLLM-specific weight-loader behavior
If NeMo-RL rejects TP >= 4 with the batch-variant accuracy guard, prefer TP 1 or 2 for the smoke, or set and equal. Do not bypass with for pass/fail evidence unless the user explicitly wants an unsupported diagnostic run.
policy.train_micro_batch_sizepolicy.logprob_batch_sizeNRL_IGNORE_TP_ACCURACY_CHECK=1If Megatron generation fails during with , rerun the same config with:
cuda graph warmupCUDA error: an illegal memory access was encounteredbash
policy.generation.mcore_generation_config.num_cuda_graphs=null \
policy.generation.mcore_generation_config.use_cuda_graphs_for_non_decode_steps=falseIf the no-graph run passes, report the original result as a Megatron generation CUDA-graph failure and the no-graph run as a reduced-optimization pass. Keep both logs.
If the run reaches the requested step count but fails on , treat it as a real metric failure, not a harness failure. NeMo-RL computes this metric from ; huge values mean the generation backend logprobs disagree with the policy logprobs recomputed for training. Isolate by retrying with simpler parallelism or kernels such as , , shorter sequence lengths, or vLLM generation when available. Do not relax the metric threshold or use sequence masking to claim a pass; run Bridge logits/import/export parity to localize whether the mismatch is in Bridge conversion, Megatron generation logprob collection, or NeMo-RL recomputation.
tests/check_metrics.pytrain/token_mult_prob_errorexp(abs(generation_logprobs - prev_logprobs))policy.megatron_cfg.sequence_parallel=falsepolicy.megatron_cfg.apply_rope_fusion=falseIf model download fails, move HF caches to a larger path and rerun with explicit cache settings.
If Hugging Face returns during tokenizer/config setup, first check whether the snapshot already exists under . If it does, switch and to the local snapshot path and enable offline mode. This is an environment failure unless the local snapshot cannot load with .
429 Too Many Requests$HF_HUB_CACHEpolicy.model_namepolicy.tokenizer.namelocal_files_only=TrueIf fails to link with , or if logs show , rebuild the helper in a node-local copy of Megatron-LM with set from . Do not patch files under in the Bridge checkout.
helpers_cppNo space left on devicemake: python3-config: No such file or directoryLIBEXTsysconfig.get_config_var("EXT_SUFFIX")3rdparty/Megatron-LM/If a baseline fails before model build because of data, Ray, vLLM, package setup, or container mismatch, fix the environment first and do not report it as a Bridge provider failure.
如果模型构建失败,验证NeMo-RL导入的是待测试的Bridge代码库,且与提供者匹配。
policy.megatron_cfg.converter_type如果配置默认使用dtensor而非Megatron,设置和,或使用。
policy.dtensor_cfg.enabled=falsepolicy.megatron_cfg.enabled=truegrpo_megatron.sh如果LoRA失败,检查NeMo-RL PEFT配置名称和Bridge目标模块名称。在添加更大模型或并行性变更前,先用重现问题。
grpo_megatron_lora.sh如果checkpoint保存/加载失败,先使用重新运行,以分离模型构建与checkpoint行为,然后使用测试恢复一致性。
checkpointing.enabled=falsesft_resume_diamond.sh如果非协同部署重构失败,拆分边界排查:
- 策略工作节点上的生产者导出和元数据准备
- 负载打包/广播
- 生成工作节点上的消费者解码和模型加载
- vLLM特定的权重加载器行为
如果NeMo-RL因批量变体精度防护拒绝TP >=4,优先选择TP 1或2进行冒烟测试,或设置和相等。除非用户明确要求不支持的诊断运行,否则不要使用来声称测试通过。
policy.train_micro_batch_sizepolicy.logprob_batch_sizeNRL_IGNORE_TP_ACCURACY_CHECK=1如果Megatron生成在期间因失败,使用以下参数重新运行相同配置:
cuda graph warmupCUDA error: an illegal memory access was encounteredbash
policy.generation.mcore_generation_config.num_cuda_graphs=null \
policy.generation.mcore_generation_config.use_cuda_graphs_for_non_decode_steps=false如果禁用图的运行通过,报告原始结果为Megatron生成CUDA图失败,禁用图的运行为优化降级后的通过。保留两个日志。
如果运行达到请求的步骤数,但在上失败,将其视为真实指标失败,而非测试工具失败。NeMo-RL通过计算该指标;数值过大意味着生成后端的logprobs与训练时重新计算的策略logprobs不一致。通过重试更简单的并行性或内核(如、、更短序列长度,或可用的vLLM生成)来隔离问题。不要放宽指标阈值或使用序列掩码来声称通过;运行Bridge logits/导入/导出一致性测试,以定位不匹配是在Bridge转换、Megatron生成logprob收集还是NeMo-RL重新计算中。
tests/check_metrics.pytrain/token_mult_prob_errorexp(abs(generation_logprobs - prev_logprobs))policy.megatron_cfg.sequence_parallel=falsepolicy.megatron_cfg.apply_rope_fusion=false如果模型下载失败,将HF缓存移至更大路径并使用明确的缓存设置重新运行。
如果Hugging Face在分词器/配置设置期间返回,先检查下是否已存在快照。如果存在,将和切换到本地快照路径并启用离线模式。这是环境失败,除非本地快照无法通过加载。
429 Too Many Requests$HF_HUB_CACHEpolicy.model_namepolicy.tokenizer.namelocal_files_only=True如果因链接失败,或日志显示,在Megatron-LM的节点本地副本中重新构建辅助工具,并从设置。不要修改Bridge代码库中下的文件。
helpers_cppNo space left on devicemake: python3-config: No such file or directorysysconfig.get_config_var("EXT_SUFFIX")LIBEXT3rdparty/Megatron-LM/如果基线在模型构建前因数据、Ray、vLLM、包设置或容器不匹配而失败,先修复环境,不要将其报告为Bridge提供者失败。
Summary Format
摘要格式
End every run with a short user-facing summary that answers "Did the requested deliverables pass?" before adding details. Use , , , or for each deliverable, and do not report an overall unless the pass criteria for the requested coverage level were met.
PassFailSkippedBlockedPasstext
Result: <Pass/Fail/Blocked> - <one sentence stating what was validated>
Requested coverage: <Level 0/1/2/3/4/5/6 and requested variants>
Model: <policy.model_name or local model path>
Deliverables:
- Bridge-side checks: <Pass/Fail/Skipped> - <test command or skipped reason>
- Local Bridge import in NeMo-RL: <Pass/Fail> - <megatron.bridge.__file__ path>
- NeMo-RL Megatron policy run: <Pass/Fail/Skipped> - <GRPO Megatron or requested variant>
- Requested variants: <Pass/Fail/Skipped/Not requested> - <LoRA/checkpoint, non-colocated vLLM refit, Megatron generation, parallelism stress, architecture-specific, or learning-signal>
- Metrics/log capture: <Pass/Fail> - <log path, metrics path, and metric assertion status>
Evidence:
- Bridge repo: <commit> plus dirty files
- NeMo-RL repo: <commit> plus dirty files
- Command: <exact command or script path>
- Key lines: <policy.megatron_cfg.enabled=true, step completion, metrics.json creation, tests/check_metrics.py result, or the first relevant error>
Limitations:
- <dummy model, skipped save/resume, text-only VLM/audio smoke, trivial EP, no learning-signal claim, known shutdown warnings, etc.>
Follow-ups:
- <needed rerun, environment fix, provider fix, NeMo-RL issue, or "none">If the job is blocked before Bridge model/provider construction by data, Ray, vLLM, dependency, disk, container, or cluster setup, mark the overall result as , not , and state that it is not evidence against the Bridge provider.
BlockedFailIf any requested deliverable was not run, mark it or with the reason. Do not leave it implicit in the limitations.
SkippedNot requested每次运行结束后,先给出面向用户的简短摘要,回答“请求的交付项是否通过?”,再添加详细信息。对每个交付项使用、、或,只有当请求覆盖层级的通过标准满足时,才报告整体。
PassFailSkippedBlockedPasstext
结果: <Pass/Fail/Blocked> - <一句话说明验证内容>
请求覆盖: <Level 0/1/2/3/4/5/6及请求变体>
模型: <policy.model_name或本地模型路径>
交付项:
- Bridge端检查: <Pass/Fail/Skipped> - <测试命令或跳过原因>
- NeMo-RL中导入本地Bridge: <Pass/Fail> - <megatron.bridge.__file__路径>
- NeMo-RL Megatron策略运行: <Pass/Fail/Skipped> - <GRPO Megatron或请求变体>
- 请求变体: <Pass/Fail/Skipped/Not requested> - <LoRA/checkpoint、非协同部署vLLM重构、Megatron生成、并行性压力、架构特定或学习信号>
- 指标/日志捕获: <Pass/Fail> - <日志路径、指标路径及指标断言状态>
证据:
- Bridge代码库: <commit>加未提交文件
- NeMo-RL代码库: <commit>加未提交文件
- 命令: <精确命令或脚本路径>
- 关键行: <policy.megatron_cfg.enabled=true、步骤完成、metrics.json生成、tests/check_metrics.py结果或首个相关错误>
局限性:
- <虚拟模型、跳过保存/恢复、仅文本VLM/音频冒烟测试、基础EP、无学习信号声明、已知关闭警告等>
后续动作:
- <需要重新运行、环境修复、提供者修复、NeMo-RL问题或“无”>如果在Bridge模型/提供者构建前因数据、Ray、vLLM、依赖、磁盘、容器或集群设置而阻塞,将整体结果标记为而非,并说明这不是Bridge提供者的负面证据。
BlockedFail如果任何请求的交付项未运行,标记为或并说明原因。不要在局限性中隐含未运行的情况。
SkippedNot requested