verl-e2e-testing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

verl E2E Testing

verl端到端测试

Validate a Megatron-Bridge model addition through verl's Megatron backend. This catches integration issues that Bridge-only conversion tests miss: provider configuration, HF import through Bridge, PEFT wrapping, DDP wrapping, optimizer setup, rollout/ref wiring, and checkpoint ownership by an external RL loop.

Use this as an external compatibility smoke test after the Bridge unit and functional tests for a new model provider are green.

This is not a replacement for Bridge model parity tests. The default verl PPO run proves that the provider can survive an external RL training loop; architecture-specific correctness still comes from Bridge import/export, logits/roundtrip, and model-specific inference tests.

通过verl的Megatron后端验证Megatron-Bridge模型的新增。此流程可捕获仅Bridge转换测试无法发现的集成问题：提供者配置、通过Bridge的HF导入、PEFT封装、DDP封装、优化器设置、rollout/ref连接，以及外部RL循环的checkpoint归属。

在新模型提供者的Bridge单元测试和功能测试通过后，将此作为外部兼容性冒烟测试使用。

这不能替代Bridge模型一致性测试。默认的verl PPO运行可验证提供者能在外部RL训练循环中正常运行；架构特定的正确性仍需通过Bridge导入/导出、logits/往返测试以及模型特定推理测试来保障。

Scope

覆盖范围

Think in coverage levels. Start with Level 0 and add only the levels justified by the change.

Level	Required when	What it proves
0: LoRA + DDP smoke	Any new provider or provider config change that claims verl compatibility	verl can import the local Bridge provider, apply PEFT, wrap with Megatron DDP, build optimizer state, run rollout/ref/critic wiring, and finish one PPO step
1: Save/resume	PEFT, checkpointing, HF export, adapter export, optimizer state, or resume behavior changed	verl-owned checkpoint scheduling can save and reload Bridge-built model state
2: Parallelism stress	Provider finalization, mpu-derived settings, TP/PP/CP/EP, sequence parallel, or dispatcher behavior changed	provider settings remain correct under non-trivial Megatron parallel state
3: Optional Megatron-FSDP	Only when downstream explicitly asks for verl Megatron-FSDP coverage or the change directly touches that integration path	the same provider works when verl selects Megatron-FSDP instead of DDP
4: Architecture-specific e2e	VLM, MoE, MTP, QAT/ModelOpt, quantized weights, or custom layer behavior is involved	the part of the architecture not exercised by text-only GSM8K also has a targeted runtime check
5: Convergence / learning signal	Optimizer, scheduler, loss, reward, PEFT trainability, gradient flow, or model-specific training stability changed	metrics move in the expected direction over a short run and do not silently produce zero/NaN/unstable updates

The default Level 0 target is a short, non-vanilla Bridge run in verl with LoRA enabled and Megatron DDP selected:

bash

USE_MBRIDGE=True
VANILLA_MBRIDGE=False
VALUE_VANILLA_MBRIDGE=False
LORA_RANK=4
USE_MEGATRON_FSDP=False
TOTAL_TRAIN_STEPS=1

This is intentionally small. It exercises the Bridge-facing path in verl without making Megatron-Bridge own rollout scheduling, reward handling, optimizer scheduling, or checkpoint orchestration.

Level 0 is not a convergence test. It only proves the training loop can complete one update. Use Level 5 when the question is whether the model actually learns under verl.

Megatron-FSDP is not part of the default validation expected for current provider compatibility work. Run it only for Level 3 coverage when FSDP is explicitly in scope:

bash

USE_MEGATRON_FSDP=True
ALL_OFFLOAD=False
COMMON_PP=1
COMMON_VPP=null
COMMON_CP=1
COMMON_TP=1
INFER_TP=1

按覆盖层级来规划。从Level 0开始，仅添加变更所需的层级。

层级	适用场景	验证内容
0: LoRA + DDP冒烟测试	任何声称支持verl的新提供者或提供者配置变更	verl可导入本地Bridge提供者、应用PEFT、用Megatron DDP封装、构建优化器状态、完成rollout/ref/critic连接，并执行完一个PPO步骤
1: 保存/恢复	PEFT、checkpoint、HF导出、适配器导出、优化器状态或恢复行为发生变更	verl管理的checkpoint调度可保存并重新加载Bridge构建的模型状态
2: 并行压力测试	提供者最终化、mpu派生设置、TP/PP/CP/EP、序列并行或调度器行为发生变更	在非平凡Megatron并行状态下，提供者设置仍保持正确
3: 可选Megatron-FSDP测试	仅当下游明确要求verl Megatron-FSDP覆盖，或变更直接涉及该集成路径时	同一提供者在verl选择Megatron-FSDP而非DDP时仍可正常工作
4: 架构特定端到端测试	涉及VLM、MoE、MTP、QAT/ModelOpt、量化权重或自定义层行为时	文本-only GSM8K未覆盖到的架构部分也能通过针对性运行时检查
5: 收敛/学习信号测试	优化器、调度器、损失函数、奖励机制、PEFT可训练性、梯度流或模型特定训练稳定性发生变更	短期内指标朝预期方向变化，不会静默产生零/NaN/不稳定更新

默认的Level 0目标是在verl中运行一个简短的非原生Bridge任务，启用LoRA并选择Megatron DDP：

bash

USE_MBRIDGE=True
VANILLA_MBRIDGE=False
VALUE_VANILLA_MBRIDGE=False
LORA_RANK=4
USE_MEGATRON_FSDP=False
TOTAL_TRAIN_STEPS=1

此任务特意设置得很简短。它可验证verl中面向Bridge的路径，无需让Megatron-Bridge负责rollout调度、奖励处理、优化器调度或checkpoint编排。

Level 0不是收敛测试。它仅验证训练循环能完成一次更新。当需要验证模型在verl下是否真正学习时，使用Level 5。

Megatron-FSDP不属于当前提供者兼容性工作的默认验证内容。仅当FSDP明确在覆盖范围内时，才运行Level 3测试：

bash

USE_MEGATRON_FSDP=True
ALL_OFFLOAD=False
COMMON_PP=1
COMMON_VPP=null
COMMON_CP=1
COMMON_TP=1
INFER_TP=1

Repos

代码仓库

Use explicit repo variables. Do not rely on an installed

megatron-bridge

wheel; the purpose is to test the current Bridge checkout.

Use the upstream verl repository as the default source:

text

https://github.com/verl-project/verl

If a checkout is not already available, clone it next to the Bridge checkout or into the site's standard workspace:

bash

git clone https://github.com/verl-project/verl.git /path/to/verl

bash

export BRIDGE_REPO=${BRIDGE_REPO:-/path/to/Megatron-Bridge}
export VERL_REPO=${VERL_REPO:-/path/to/verl}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${VERL_REPO}:${PYTHONPATH:-}"

Before running, record both states:

bash

git -C "$BRIDGE_REPO" status --short
git -C "$VERL_REPO" status --short
git -C "$BRIDGE_REPO" rev-parse --short HEAD
git -C "$VERL_REPO" rev-parse --short HEAD

If testing on a remote GPU machine, sync the exact local changes first. Do not reset or overwrite unrelated changes in either tree.

Verify that Python imports the checkout under test:

bash

python - <<'PY'
import megatron.bridge
print(megatron.bridge.__file__)
PY

The printed path must live under

$BRIDGE_REPO/src

. If it points at site-packages, fix

PYTHONPATH

before trusting any result.

When running against an existing Ray cluster, the driver import is not enough. Ray workers may not inherit the shell

PYTHONPATH

, and the run can fail later with

ModuleNotFoundError: No module named 'megatron.bridge'

even though the driver import passed. Verify a worker import:

bash

python - <<'PY'
import os
import ray

ray.init(address="auto", runtime_env={"env_vars": {"PYTHONPATH": os.environ["PYTHONPATH"]}})

@ray.remote
def bridge_path():
    import megatron.bridge
    return megatron.bridge.__file__

print(ray.get(bridge_path.remote()))
PY

If the worker import fails, pass the checkout path through Ray's runtime environment when launching the wrapper:

bash

bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  "++ray_kwargs.ray_init.runtime_env.env_vars.PYTHONPATH=${PYTHONPATH}"

If this import fails before model construction, fix the runtime environment first. The official verl image may not contain every Bridge dependency; for example, Bridge imports

modelopt

through

AutoBridge

, so a missing

nvidia-modelopt

can fail the smoke before verl exercises the provider:

bash

python -m pip show nvidia-modelopt || \
  python -m pip install --extra-index-url https://pypi.nvidia.com nvidia-modelopt

Treat ad-hoc installs as container setup evidence, not repository changes. If the image lacks

uv

, run focused Bridge unit tests in a Bridge development environment instead of forcing them through the verl container.

使用明确的仓库变量。不要依赖已安装的

megatron-bridge

wheel；本测试的目的是验证当前Bridge检出的代码。

默认使用上游verl仓库作为源：

text

https://github.com/verl-project/verl

如果尚未检出代码，将其克隆到Bridge代码旁边或站点的标准工作区：

bash

git clone https://github.com/verl-project/verl.git /path/to/verl

bash

export BRIDGE_REPO=${BRIDGE_REPO:-/path/to/Megatron-Bridge}
export VERL_REPO=${VERL_REPO:-/path/to/verl}
export PYTHONPATH="${BRIDGE_REPO}/src:${BRIDGE_REPO}/3rdparty/Megatron-LM:${VERL_REPO}:${PYTHONPATH:-}"

运行前，记录两个仓库的状态：

bash

git -C "$BRIDGE_REPO" status --short
git -C "$VERL_REPO" status --short
git -C "$BRIDGE_REPO" rev-parse --short HEAD
git -C "$VERL_REPO" rev-parse --short HEAD

如果在远程GPU机器上测试，先同步本地的精确变更。不要重置或覆盖任一代码树中的无关变更。

验证Python是否导入了待测试的检出代码：

bash

python - <<'PY'
import megatron.bridge
print(megatron.bridge.__file__)
PY

打印的路径必须位于

$BRIDGE_REPO/src

下。如果指向site-packages，在信任任何结果前先修复

PYTHONPATH

。

当针对现有Ray集群运行时，仅验证驱动程序导入是不够的。Ray工作进程可能不会继承shell的

PYTHONPATH

，即使驱动程序导入通过，运行仍可能稍后失败并提示

ModuleNotFoundError: No module named 'megatron.bridge'

。验证工作进程的导入：

bash

python - <<'PY'
import os
import ray

ray.init(address="auto", runtime_env={"env_vars": {"PYTHONPATH": os.environ["PYTHONPATH"]}})

@ray.remote
def bridge_path():
    import megatron.bridge
    return megatron.bridge.__file__

print(ray.get(bridge_path.remote()))
PY

如果工作进程导入失败，在启动包装器时通过Ray的运行时环境传递检出路径：

bash

bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  "++ray_kwargs.ray_init.runtime_env.env_vars.PYTHONPATH=${PYTHONPATH}"

如果在模型构建前导入失败，先修复运行时环境。官方verl镜像可能不包含所有Bridge依赖；例如，Bridge通过

AutoBridge

导入

modelopt

，因此缺少

nvidia-modelopt

会在verl验证提供者前导致冒烟测试失败：

bash

python -m pip show nvidia-modelopt || \
  python -m pip install --extra-index-url https://pypi.nvidia.com nvidia-modelopt

将临时安装视为容器设置证据，而非仓库变更。如果镜像缺少

uv

，在Bridge开发环境中运行针对性的Bridge单元测试，而非强制通过verl容器运行。

Model Choice

模型选择

Prefer the smallest public HF checkpoint that uses the changed provider family. For example, use a 0.5B or 0.6B dense checkpoint for dense provider changes before testing larger variants.

If there is no small public checkpoint for the new architecture, use verl's dummy-model path with a minimal HF config from that architecture:

bash

USE_DUMMY_MODEL=True
DUMMY_MODEL_CONFIG_PATH=/path/to/minimal_config.json
MODEL_ID=<org>/<representative-model-name>

Report dummy-model results carefully: they validate model construction and training mechanics, not pretrained weight compatibility.

For VLMs, the generic GSM8K PPO run is text-only. It can validate the language-side Bridge provider and external-loop wrapping, but it does not prove image/video preprocessing or vision encoder execution. Pair it with the VLM conversion/inference tests from @skills/adding-model-support/tests-and-examples.md, or use a verl multimodal training command if one exists for the model family.

For MoE models, Level 0 with

COMMON_EP=1

still catches many provider and PEFT issues, but it does not stress expert parallel routing. Add a Level 2 run with expert parallelism when the change touches expert layout, dispatcher config, router replay, or expert tensor parallelism.

For MTP, QAT/ModelOpt, or quantized checkpoint support, the generic wrapper may not activate the feature. Use the closest verl example or model-family script that turns the feature on, and record the extra Hydra overrides in the report.

优先选择使用变更后提供者家族的最小公开HF checkpoint。例如，在测试更大变体前，针对稠密提供者变更使用0.5B或0.6B的稠密checkpoint。

如果新架构没有小型公开checkpoint，使用verl的dummy-model路径，并搭配该架构的最小HF配置：

bash

USE_DUMMY_MODEL=True
DUMMY_MODEL_CONFIG_PATH=/path/to/minimal_config.json
MODEL_ID=<org>/<representative-model-name>

谨慎报告dummy-model结果：它们验证模型构建和训练机制，而非预训练权重兼容性。

对于VLM，通用GSM8K PPO运行是文本-only的。它可验证语言侧的Bridge提供者和外部循环封装，但无法证明图像/视频预处理或视觉编码器执行。需搭配@skills/adding-model-support/tests-and-examples.md中的VLM转换/推理测试，或使用该模型家族对应的verl多模态训练命令（如果存在）。

对于MoE模型，设置

COMMON_EP=1

的Level 0测试仍能发现许多提供者和PEFT问题，但不会测试专家并行路由。当变更涉及专家布局、调度器配置、路由重放或专家张量并行时，添加带有专家并行的Level 2运行。

对于MTP、QAT/ModelOpt或量化checkpoint支持，通用包装器可能无法激活该功能。使用最接近的verl示例或模型家族脚本启用该功能，并在报告中记录额外的Hydra覆盖参数。

Bridge Checks First

先运行Bridge检查

Run focused Bridge tests before the external verl e2e. Include any model-specific tests added by the change.

bash

cd "$BRIDGE_REPO"
uv run python -m pytest -q \
  tests/unit_tests/models/test_model_provider_mixin.py \
  tests/unit_tests/models/test_param_mapping.py \
  tests/unit_tests/training/test_integration.py \
  <model-specific-test-paths>

For a new model family, also run the relevant conversion or roundtrip test from the model's PR. See @skills/adding-model-support/tests-and-examples.md for model-test patterns.

Minimum Bridge-side evidence for a new model/provider:

provider/config unit tests
parameter mapping tests
HF to Megatron import or roundtrip on a small model
model-specific generation or logits comparison when available
this verl external-loop smoke after the above pass

在外部verl端到端测试前，运行针对性的Bridge测试。包括变更新增的任何模型特定测试。

bash

cd "$BRIDGE_REPO"
uv run python -m pytest -q \
  tests/unit_tests/models/test_model_provider_mixin.py \
  tests/unit_tests/models/test_param_mapping.py \
  tests/unit_tests/training/test_integration.py \
  <model-specific-test-paths>

对于新模型家族，还需运行模型PR中相关的转换或往返测试。请参阅@skills/adding-model-support/tests-and-examples.md了解模型测试模式。

新模型/提供者的最低Bridge侧验证证据：

提供者/配置单元测试
参数映射测试
小型模型的HF到Megatron导入或往返测试
可用时的模型特定生成或logits对比测试
上述测试通过后，运行本verl外部循环冒烟测试

verl Data Setup

verl数据设置

verl's Megatron PPO smoke wrapper expects GSM8K parquet files by default. Prepare them once from the verl checkout if they are missing:

bash

cd "$VERL_REPO"
export PYTHONPATH="$VERL_REPO:${PYTHONPATH:-}"
python3 examples/data_preprocess/gsm8k.py \
  --local_save_dir "${GSM8K_DIR:-$HOME/data/gsm8k}"

Use

--local_dataset_path "$GSM8K_SOURCE_DIR"

only when that raw local dataset path actually exists. Otherwise let

datasets

fetch

openai/gsm8k

Set explicit paths when running in a container or shared filesystem:

bash

export TRAIN_FILES=/path/to/gsm8k/train.parquet
export VAL_FILES=/path/to/gsm8k/test.parquet

The wrapper also enables a reward model by default. Ensure the default reward model path exists, or set:

bash

export RM_MODEL_PATH=/path/to/local/reward/model

For a Level 0 rule-reward smoke, it is acceptable to disable the reward-model rollout when no local reward model is available:

bash

bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  reward.reward_model.enable=False

Report this as a limitation; it still tests Bridge actor/ref/critic construction, LoRA, DDP wrapping, rollout, and one PPO update, but not reward-model serving.

verl的Megatron PPO冒烟包装器默认期望GSM8K parquet文件。如果缺失，从verl检出代码中准备一次：

bash

cd "$VERL_REPO"
export PYTHONPATH="$VERL_REPO:${PYTHONPATH:-}"
python3 examples/data_preprocess/gsm8k.py \
  --local_save_dir "${GSM8K_DIR:-$HOME/data/gsm8k}"

仅当原始本地数据集路径实际存在时，才使用

--local_dataset_path "$GSM8K_SOURCE_DIR"

。否则让

datasets

库自动获取

openai/gsm8k

。

在容器或共享文件系统中运行时，设置明确路径：

bash

export TRAIN_FILES=/path/to/gsm8k/train.parquet
export VAL_FILES=/path/to/gsm8k/test.parquet

包装器默认启用奖励模型。确保默认奖励模型路径存在，或设置：

bash

export RM_MODEL_PATH=/path/to/local/reward/model

对于Level 0规则奖励冒烟测试，当没有本地奖励模型时，可禁用奖励模型rollout：

bash

bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  reward.reward_model.enable=False

需报告此限制；它仍能测试Bridge actor/ref/critic构建、LoRA、DDP封装、rollout和一次PPO更新，但无法测试奖励模型服务。

Minimal verl Run

最小verl运行

Use verl's maintained wrapper rather than constructing a long Hydra command manually:

bash

cd "$VERL_REPO"
ray stop --force || true

export MODEL_ID=<small-compatible-hf-model>
export TRAIN_FILES=${TRAIN_FILES:-/path/to/gsm8k/train.parquet}
export VAL_FILES=${VAL_FILES:-/path/to/gsm8k/test.parquet}
export RM_MODEL_PATH=${RM_MODEL_PATH:-/path/to/local/reward/model}
export ENGINE=vllm
export USE_MBRIDGE=True
export VANILLA_MBRIDGE=False
export VALUE_VANILLA_MBRIDGE=False
export LORA_RANK=4
export USE_MEGATRON_FSDP=False
export COMMON_PP=1
export COMMON_VPP=null
export COMMON_CP=1
export COMMON_TP=1
export INFER_TP=1
export ALL_OFFLOAD=False
export TOTAL_TRAIN_STEPS=1
export SAVE_FREQ=-1
export VAL_BEFORE_TRAIN=False
export TEST_FREQ=-1

bash tests/special_e2e/run_ppo_trainer_megatron.sh

Use

MODEL_ID

when the checkpoint is available through the wrapper's default cache layout. Add

MODEL_PATH=/path/to/local/hf/model

only when testing a local or converted checkpoint.

When

$HOME

is small or shared slowly, put HF caches and downloaded checkpoints on a larger shared or node-local scratch path and pass

MODEL_PATH

explicitly. Pre-download large models once in the same container environment to avoid Ray workers racing the cache:

bash

export HF_HOME=${HF_HOME:-/scratch/$USER/verl_hf}
export HF_HUB_CACHE=$HF_HOME/hub
MODEL_PATH=${MODEL_PATH:-/scratch/$USER/models/<org>/<model>}
hf download <org>/<model> --local-dir "$MODEL_PATH"

Capture logs to a file for review:

bash

mkdir -p "${LOG_DIR:-$PWD/verl_e2e_logs}"
LOG_FILE="${LOG_DIR:-$PWD/verl_e2e_logs}/verl_lora_ddp_$(date +%Y%m%d_%H%M%S).log"
bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  "++ray_kwargs.ray_init.runtime_env.env_vars.PYTHONPATH=${PYTHONPATH}" \
  2>&1 | tee "$LOG_FILE"
grep -E "Training Progress|VANILLA_MBRIDGE|Traceback|RuntimeError|KeyError|ValueError" "$LOG_FILE"

Prefer a saved log over a pasted terminal excerpt in PR descriptions.

For time-limited smoke runs where vLLM CUDA graph capture dominates setup and the goal is Bridge provider validation, it is acceptable to add:

bash

bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  actor_rollout_ref.rollout.enforce_eager=True

Report this override as a limitation. It still validates Bridge import, HF import, LoRA, Megatron DDP, rollout wiring, and one PPO update, but not vLLM CUDA graph capture.

使用verl维护的包装器，而非手动构建冗长的Hydra命令：

bash

cd "$VERL_REPO"
ray stop --force || true

export MODEL_ID=<small-compatible-hf-model>
export TRAIN_FILES=${TRAIN_FILES:-/path/to/gsm8k/train.parquet}
export VAL_FILES=${VAL_FILES:-/path/to/gsm8k/test.parquet}
export RM_MODEL_PATH=${RM_MODEL_PATH:-/path/to/local/reward/model}
export ENGINE=vllm
export USE_MBRIDGE=True
export VANILLA_MBRIDGE=False
export VALUE_VANILLA_MBRIDGE=False
export LORA_RANK=4
export USE_MEGATRON_FSDP=False
export COMMON_PP=1
export COMMON_VPP=null
export COMMON_CP=1
export COMMON_TP=1
export INFER_TP=1
export ALL_OFFLOAD=False
export TOTAL_TRAIN_STEPS=1
export SAVE_FREQ=-1
export VAL_BEFORE_TRAIN=False
export TEST_FREQ=-1

bash tests/special_e2e/run_ppo_trainer_megatron.sh

当checkpoint可通过包装器的默认缓存布局获取时，使用

MODEL_ID

。仅在测试本地或转换后的checkpoint时，添加

MODEL_PATH=/path/to/local/hf/model

。

当

$HOME

空间不足或共享缓慢时，将HF缓存和下载的checkpoint放到更大的共享或节点本地临时路径，并显式传递

MODEL_PATH

。在同一容器环境中预先下载大型模型，避免Ray工作进程争抢缓存：

bash

export HF_HOME=${HF_HOME:-/scratch/$USER/verl_hf}
export HF_HUB_CACHE=$HF_HOME/hub
MODEL_PATH=${MODEL_PATH:-/scratch/$USER/models/<org>/<model>}
hf download <org>/<model> --local-dir "$MODEL_PATH"

将日志捕获到文件以便查看：

bash

mkdir -p "${LOG_DIR:-$PWD/verl_e2e_logs}"
LOG_FILE="${LOG_DIR:-$PWD/verl_e2e_logs}/verl_lora_ddp_$(date +%Y%m%d_%H%M%S).log"
bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  "++ray_kwargs.ray_init.runtime_env.env_vars.PYTHONPATH=${PYTHONPATH}" \
  2>&1 | tee "$LOG_FILE"
grep -E "Training Progress|VANILLA_MBRIDGE|Traceback|RuntimeError|KeyError|ValueError" "$LOG_FILE"

在PR描述中优先使用保存的日志，而非粘贴终端片段。

对于时间受限的冒烟测试，如果vLLM CUDA图捕获占主导设置时间，且目标是验证Bridge提供者，可添加以下参数：

bash

bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  actor_rollout_ref.rollout.enforce_eager=True

需报告此覆盖参数的限制。它仍能验证Bridge导入、HF导入、LoRA、Megatron DDP、rollout连接和一次PPO更新，但无法测试vLLM CUDA图捕获。

Save/Resume Coverage

保存/恢复覆盖

After the minimal run passes, add checkpoint coverage if the change touches PEFT, checkpointing, export, or optimizer state:

bash

undefined

当变更涉及PEFT、checkpoint、导出或优化器状态时，在最小运行通过后添加checkpoint覆盖测试：

bash

undefined

Save once.

保存一次。

SAVE_FREQ=1 TOTAL_TRAIN_STEPS=1
bash tests/special_e2e/run_ppo_trainer_megatron.sh

Resume and train one more step.

恢复并再训练一步。

RESUME_MODE=auto SAVE_FREQ=1 TOTAL_TRAIN_STEPS=2
bash tests/special_e2e/run_ppo_trainer_megatron.sh


Remove stale verl `checkpoints/` output between unrelated experiments. Keep it for resume validation.

RESUME_MODE=auto SAVE_FREQ=1 TOTAL_TRAIN_STEPS=2
bash tests/special_e2e/run_ppo_trainer_megatron.sh


在无关实验之间删除过时的verl `checkpoints/`输出。保留该目录用于恢复验证。

Parallelism Stress

并行压力测试

Use Level 2 when the provider reads or mutates parallelism-related fields, or when the change touches

provider.configure(...)

, Megatron

mpu

, sequence parallel, context parallel, MoE dispatcher behavior, or tensor/expert tensor parallel settings.

The variants below assume the Level 0 exports above are still in the shell; each command overrides only the values being tested.

Example dense stress variant:

bash

COMMON_TP=2 \
COMMON_PP=2 \
COMMON_VPP=null \
COMMON_CP=1 \
INFER_TP=2 \
USE_MEGATRON_FSDP=False \
bash tests/special_e2e/run_ppo_trainer_megatron.sh

Example MoE stress variant, only for compatible MoE checkpoints:

bash

COMMON_EP=2 \
COMMON_ETP=1 \
ROUTING_REPLAY_MODE=disabled \
bash tests/special_e2e/run_ppo_trainer_megatron.sh

Keep these as follow-up runs. Do not make them the first debugging surface for a new provider.

当提供者读取或修改并行相关字段，或变更涉及

provider.configure(...)

、Megatron

mpu

、序列并行、上下文并行、MoE调度器行为或张量/专家张量并行设置时，使用Level 2测试。

以下变体假设上述Level 0的环境变量仍在shell中；每个命令仅覆盖待测试的值。

稠密模型压力测试示例：

bash

COMMON_TP=2 \
COMMON_PP=2 \
COMMON_VPP=null \
COMMON_CP=1 \
INFER_TP=2 \
USE_MEGATRON_FSDP=False \
bash tests/special_e2e/run_ppo_trainer_megatron.sh

MoE模型压力测试示例（仅兼容MoE checkpoint）：

bash

COMMON_EP=2 \
COMMON_ETP=1 \
ROUTING_REPLAY_MODE=disabled \
bash tests/special_e2e/run_ppo_trainer_megatron.sh

将这些作为后续运行，不要将其作为新提供者的首个调试环节。

Optional Megatron-FSDP Variant

可选Megatron-FSDP变体

Use Level 3 after Level 0 passes only when downstream explicitly requests Megatron-FSDP coverage or the Bridge change directly touches FSDP wrapping, sharding, checkpoint format, or distributed optimizer behavior:

bash

USE_MEGATRON_FSDP=True \
ALL_OFFLOAD=False \
COMMON_PP=1 \
COMMON_VPP=null \
COMMON_CP=1 \
COMMON_TP=1 \
INFER_TP=1 \
bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  ++actor_rollout_ref.actor.megatron.override_transformer_config.gradient_accumulation_fusion=False \
  ++actor_rollout_ref.ref.megatron.override_transformer_config.gradient_accumulation_fusion=False \
  ++critic.megatron.override_transformer_config.gradient_accumulation_fusion=False

For Bridge-native FSDP behavior and constraints, also read @skills/perf-megatron-fsdp/SKILL.md.

仅当Level 0通过，且下游明确要求Megatron-FSDP覆盖，或Bridge变更直接涉及FSDP封装、分片、checkpoint格式或分布式优化器行为时，使用Level 3测试：

bash

USE_MEGATRON_FSDP=True \
ALL_OFFLOAD=False \
COMMON_PP=1 \
COMMON_VPP=null \
COMMON_CP=1 \
COMMON_TP=1 \
INFER_TP=1 \
bash tests/special_e2e/run_ppo_trainer_megatron.sh \
  ++actor_rollout_ref.actor.megatron.override_transformer_config.gradient_accumulation_fusion=False \
  ++actor_rollout_ref.ref.megatron.override_transformer_config.gradient_accumulation_fusion=False \
  ++critic.megatron.override_transformer_config.gradient_accumulation_fusion=False

关于Bridge原生FSDP行为和约束，还可参阅@skills/perf-megatron-fsdp/SKILL.md。

Convergence / Learning Signal

收敛/学习信号测试

Use Level 5 only when the change affects trainability or when downstream validation explicitly asks for convergence. Do not require it for every provider-only PR; RL convergence is slower, noisier, and more environment-dependent than the compatibility smoke.

The goal is a short learning-signal run, not a full benchmark. Prefer a small model, fixed data, fixed seed when available, and enough steps to observe non-random metric movement:

bash

TOTAL_TRAIN_STEPS=20 \
SAVE_FREQ=-1 \
VAL_BEFORE_TRAIN=True \
TEST_FREQ=10 \
LORA_RANK=4 \
USE_MBRIDGE=True \
VANILLA_MBRIDGE=False \
VALUE_VANILLA_MBRIDGE=False \
USE_MEGATRON_FSDP=False \
ENGINE=vllm \
bash tests/special_e2e/run_ppo_trainer_megatron.sh

For a stronger signal, run 50-100 steps if GPU time allows. Keep rollout, reward model, dataset, batch sizes, and model checkpoint fixed between baseline and candidate runs.

Acceptable convergence evidence depends on the task, but the report should include at least:

no NaNs or infs in loss, reward, KL, entropy, grad norm, or logprob metrics
nonzero trainable parameter count when PEFT is enabled
actor/critic losses and reward-related metrics logged for multiple steps
validation or reward trend compared against the starting point or a known-good baseline
no repeated zero gradients, frozen LoRA adapters, or constant logprobs unless expected

Do not call a 20-step run "converged" in the benchmark sense. Call it "learning-signal passed" unless it reaches a pre-agreed metric threshold.

仅当变更影响可训练性，或下游验证明确要求收敛时，使用Level 5测试。不要要求每个仅涉及提供者的PR都执行此测试；RL收敛比兼容性冒烟测试更慢、更嘈杂且更依赖环境。

目标是进行一次简短的学习信号运行，而非完整基准测试。优先选择小型模型、固定数据、固定种子（如果可用），并执行足够步骤以观察非随机指标变化：

bash

TOTAL_TRAIN_STEPS=20 \
SAVE_FREQ=-1 \
VAL_BEFORE_TRAIN=True \
TEST_FREQ=10 \
LORA_RANK=4 \
USE_MBRIDGE=True \
VANILLA_MBRIDGE=False \
VALUE_VANILLA_MBRIDGE=False \
USE_MEGATRON_FSDP=False \
ENGINE=vllm \
bash tests/special_e2e/run_ppo_trainer_megatron.sh

如果GPU时间允许，运行50-100步以获得更强信号。在基线和候选运行之间保持rollout、奖励模型、数据集、批量大小和模型checkpoint固定。

可接受的收敛证据取决于任务，但报告至少应包含：

损失、奖励、KL散度、熵、梯度范数或logprob指标中无NaN或无穷大值
启用PEFT时可训练参数数量非零
记录多步的actor/critic损失和奖励相关指标
验证或奖励趋势与起始点或已知良好基线的对比
无重复零梯度、冻结的LoRA适配器或恒定logprob（除非预期如此）

不要将20步运行称为基准意义上的“收敛”。除非达到预先约定的指标阈值，否则称其为“学习信号通过”。

Container Image

容器镜像

Use the official verl Docker images as the default source:

text

https://hub.docker.com/r/verlai/verl

For this skill's default PPO smoke path, pick a vLLM-flavored

verlai/verl

image tag unless the test intentionally changes the rollout engine. The maintained wrapper defaults to vLLM, and the command should make that explicit with:

bash

ENGINE=vllm

Avoid using sglang, TRT-LLM, or generic images for the default Level 0 run unless the point of the test is to validate that rollout backend. A backend-specific image can fail before Bridge model construction, which makes the result a poor signal for a Megatron-Bridge provider change.

Pin the exact image tag in the test log or PR description:

bash

export VERL_IMAGE=${VERL_IMAGE:-verlai/verl:<vllm-compatible-tag>}

If the cluster requires Enroot/SquashFS images, convert or mirror the selected

verlai/verl

tag through the site's normal process, but keep the source tag visible in the report.

默认使用官方verl Docker镜像：

text

https://hub.docker.com/r/verlai/verl

对于本技能的默认PPO冒烟路径，选择vLLM版本的

verlai/verl

镜像标签，除非测试有意更改rollout引擎。维护的包装器默认使用vLLM，命令应通过以下参数明确指定：

bash

ENGINE=vllm

默认Level 0运行避免使用sglang、TRT-LLM或通用镜像，除非测试的目的是验证该rollout后端。特定后端的镜像可能在Bridge模型构建前失败，这会使结果无法有效反映Megatron-Bridge提供者的变更情况。

在测试日志或PR描述中固定精确的镜像标签：

bash

export VERL_IMAGE=${VERL_IMAGE:-verlai/verl:<vllm-compatible-tag>}

如果集群要求使用Enroot/SquashFS镜像，通过站点常规流程转换或镜像所选的

verlai/verl

标签，但需在报告中保留源标签。

Slurm or Container Runs

Slurm或容器运行

Use the cluster's standard container and mount both checkouts into the container. Keep setup and the actual PPO run in the same container step when using node-local paths such as

/tmp

; node-local model caches and ad-hoc pip installs disappear when a fresh container step starts. Keep paths generic in scripts committed to Megatron-Bridge:

bash

export VERL_IMAGE=${VERL_IMAGE:-verlai/verl:<vllm-compatible-tag>}

srun <site-specific-slurm-options> \
  --container-image="${VERL_IMAGE}" \
  --container-mounts="${BRIDGE_REPO}:/workspace/Megatron-Bridge,${VERL_REPO}:/workspace/verl,<data-root>:<data-root>" \
  --container-workdir=/workspace/verl \
  bash -lc '
    export BRIDGE_REPO=/workspace/Megatron-Bridge
    export VERL_REPO=/workspace/verl
    export PYTHONPATH=$BRIDGE_REPO/src:$BRIDGE_REPO/3rdparty/Megatron-LM:$VERL_REPO
    ray stop --force || true
    MODEL_ID=<small-compatible-hf-model> \
    ENGINE=vllm \
    USE_MBRIDGE=True VANILLA_MBRIDGE=False VALUE_VANILLA_MBRIDGE=False \
    LORA_RANK=4 USE_MEGATRON_FSDP=False TOTAL_TRAIN_STEPS=1 SAVE_FREQ=-1 \
    bash tests/special_e2e/run_ppo_trainer_megatron.sh
  '

Use a persistent log directory on a shared filesystem or

$HOME

for long Slurm jobs. Logs written only to node-local

/tmp

can disappear when the allocation expires or is canceled. If a cluster attach helper runs the command through

srun --pty

, do not background the workload inside that attached shell; the step cleanup can terminate it immediately. To detach safely, background the attach helper itself from the login node:

bash

mkdir -p "$HOME/verl_e2e_logs"
nohup env COMMAND='bash /path/to/run_verl_e2e.sh' \
  bash /path/to/<jobid>-attach.sh \
  > "$HOME/verl_e2e_logs/attach_driver_$(date +%Y%m%d_%H%M%S).out" 2>&1 &

If an attach helper enters a container that no longer sees the expected checkouts or log directory, treat that helper as stale. Start a fresh

srun

step against the existing allocation with explicit

--container-image

--container-mounts

, and

--container-workdir

On CUDA/H100 clusters, some launchers set both

CUDA_VISIBLE_DEVICES

and ROCm variables such as

ROCR_VISIBLE_DEVICES

. If verl workers fail before model construction with

Please don't set ROCR_VISIBLE_DEVICES when HIP/CUDA_VISIBLE_DEVICES is set

, fix the launcher/container environment or apply a temporary local verl workaround that drops

ROCR_VISIBLE_DEVICES

when CUDA is already set. Do not report this as a Bridge provider failure.

For general Slurm debugging and multi-node patterns, read @skills/multi-node-slurm/SKILL.md.

使用集群的标准容器，并将两个检出代码挂载到容器中。当使用节点本地路径（如

/tmp

）时，将设置和实际PPO运行放在同一容器步骤中；节点本地模型缓存和临时pip安装会在新容器步骤启动时消失。在提交到Megatron-Bridge的脚本中保持路径通用：

bash

export VERL_IMAGE=${VERL_IMAGE:-verlai/verl:<vllm-compatible-tag>}

srun <site-specific-slurm-options> \
  --container-image="${VERL_IMAGE}" \
  --container-mounts="${BRIDGE_REPO}:/workspace/Megatron-Bridge,${VERL_REPO}:/workspace/verl,<data-root>:<data-root>" \
  --container-workdir=/workspace/verl \
  bash -lc '
    export BRIDGE_REPO=/workspace/Megatron-Bridge
    export VERL_REPO=/workspace/verl
    export PYTHONPATH=$BRIDGE_REPO/src:$BRIDGE_REPO/3rdparty/Megatron-LM:$VERL_REPO
    ray stop --force || true
    MODEL_ID=<small-compatible-hf-model> \
    ENGINE=vllm \
    USE_MBRIDGE=True VANILLA_MBRIDGE=False VALUE_VANILLA_MBRIDGE=False \
    LORA_RANK=4 USE_MEGATRON_FSDP=False TOTAL_TRAIN_STEPS=1 SAVE_FREQ=-1 \
    bash tests/special_e2e/run_ppo_trainer_megatron.sh
  '

对于长时间Slurm作业，在共享文件系统或

$HOME

中使用持久化日志目录。仅写入节点本地

/tmp

的日志会在分配到期或取消时消失。如果集群附加助手通过

srun --pty

运行命令，不要在附加shell中将工作负载后台运行；步骤清理会立即终止它。要安全分离，从登录节点将附加助手本身后台运行：

bash

mkdir -p "$HOME/verl_e2e_logs"
nohup env COMMAND='bash /path/to/run_verl_e2e.sh' \
  bash /path/to/<jobid>-attach.sh \
  > "$HOME/verl_e2e_logs/attach_driver_$(date +%Y%m%d_%H%M%S).out" 2>&1 &

如果附加助手进入的容器无法看到预期的检出代码或日志目录，将该助手视为过期。针对现有分配启动新的

srun

步骤，并显式指定

--container-image

、

--container-mounts

和

--container-workdir

。

在CUDA/H100集群上，一些启动器会同时设置

CUDA_VISIBLE_DEVICES

和ROCm变量（如

ROCR_VISIBLE_DEVICES

）。如果verl工作进程在模型构建前失败并提示

Please don't set ROCR_VISIBLE_DEVICES when HIP/CUDA_VISIBLE_DEVICES is set

，修复启动器/容器环境，或应用临时本地verl workaround，在已设置CUDA时移除

ROCR_VISIBLE_DEVICES

。不要将此报告为Bridge提供者失败。

有关Slurm调试和多节点模式的一般信息，可参阅@skills/multi-node-slurm/SKILL.md。

Pass Criteria

通过标准

A useful pass has all of the following:

Focused Bridge tests pass for provider/config/mapping behavior.
verl uses the local Bridge checkout through
```
PYTHONPATH
```
.
The verl log shows
```
VANILLA_MBRIDGE=False
```
.
One training step reaches completion, for example
```
Training Progress: 100%|1/1|
```
.
No exception occurs during Bridge provider setup, HF import, LoRA wrapping, DDP wrapping, optional FSDP wrapping when enabled, optimizer setup, checkpoint manager setup, or the training step.

Ray shutdown, Python resource-tracker warnings, or post-completion DataLoader worker termination can be acceptable if the training step completed, metrics for

training/global_step:1

were logged, and the process exits successfully. Mention them as residual log noise.

Do not claim full model e2e if the run used a dummy config, text-only data for a VLM,

COMMON_EP=1

for an expert-parallel change, or disabled save/resume for a checkpointing change. Call it the exact level that passed.

Do not claim convergence from Level 0. Claim convergence only from Level 5, and distinguish "learning signal" from "benchmark convergence" in the report.

有效的通过需满足以下所有条件：

针对提供者/配置/映射行为的Bridge针对性测试通过。
verl通过
```
PYTHONPATH
```
使用本地Bridge检出代码。
verl日志显示
```
VANILLA_MBRIDGE=False
```
。
完成一个训练步骤，例如日志显示
```
Training Progress: 100%|1/1|
```
。
在Bridge提供者设置、HF导入、LoRA封装、DDP封装（启用时的可选FSDP封装）、优化器设置、checkpoint管理器设置或训练步骤期间无异常发生。

如果训练步骤已完成、记录了

training/global_step:1

的指标且进程成功退出，Ray关闭、Python资源跟踪器警告或完成后DataLoader工作进程终止等情况可视为可接受。需在报告中提及这些残留日志噪音。

如果运行使用了dummy配置、VLM的文本-only数据、专家并行变更的

COMMON_EP=1

，或针对checkpoint变更禁用了保存/恢复，不要声称完成了完整模型端到端测试。应明确说明通过的具体层级。

不要从Level 0测试声称收敛。仅从Level 5测试声称收敛，并在报告中区分“学习信号”和“基准收敛”。

Failure Triage

失败排查

If model construction fails, check whether the Bridge provider is finalized with the same parallel sizes that verl initialized through Megatron

mpu

If LoRA fails, check target module names and whether the provider path uses the non-vanilla Bridge PEFT helpers expected by verl.

If checkpoint save/load fails, first rerun without save/resume (

SAVE_FREQ=-1

) to separate model construction from checkpoint behavior.

If rollout fails before actor construction, this may be a verl rollout engine issue rather than a Bridge provider issue. Report the boundary clearly.

If the log shows the wrong Bridge path, stop. Any later failure or pass is not evidence for the local Bridge change.

If the baseline fails before model build because of data, reward model, Ray, vLLM, or package setup, fix the environment first and do not report it as a provider failure.

If model download fails with

No space left on device

, move

HF_HOME

HF_HUB_CACHE

, and

MODEL_PATH

to a larger shared or node-local path, then rerun with the explicit local

MODEL_PATH

如果模型构建失败，检查Bridge提供者是否使用verl通过Megatron

mpu

初始化的相同并行大小完成最终化。

如果LoRA失败，检查目标模块名称，以及提供者路径是否使用verl期望的非原生Bridge PEFT助手。

如果checkpoint保存/加载失败，先不使用保存/恢复（

SAVE_FREQ=-1

）重新运行，以分离模型构建和checkpoint行为。

如果rollout在actor构建前失败，这可能是verl rollout引擎问题而非Bridge提供者问题。需清晰报告边界。

如果日志显示错误的Bridge路径，立即停止。后续任何失败或通过都不能作为本地Bridge变更的有效证据。

如果基线在模型构建前因数据、奖励模型、Ray、vLLM或包设置失败，先修复环境，不要将其报告为提供者失败。

如果模型下载失败并提示

No space left on device

，将

HF_HOME

、

HF_HUB_CACHE

和

MODEL_PATH

移至更大的共享或节点本地路径，然后使用显式本地

MODEL_PATH

重新运行。

Summary Format

总结格式

End every run with a short user-facing summary that answers "Did the requested deliverables pass?" before adding details. Use

Pass

Fail

Skipped

, or

Blocked

for each deliverable, and do not report an overall

Pass

unless the pass criteria for the requested coverage level were met.

text

Result: <Pass/Fail/Blocked> - <one sentence stating what was validated>
Requested coverage: <Level 0/1/2/3/4/5 and requested variants>
Model: <MODEL_ID or MODEL_PATH>

Deliverables:
- Bridge-side checks: <Pass/Fail/Skipped> - <test command or skipped reason>
- Local Bridge import in verl: <Pass/Fail> - <PYTHONPATH or imported Bridge path>
- verl Megatron backend run: <Pass/Fail/Skipped> - <LoRA + DDP or requested variant>
- Requested variants: <Pass/Fail/Skipped/Not requested> - <save/resume, parallelism stress, Megatron-FSDP, architecture-specific run, or learning-signal/convergence>
- Log capture: <Pass/Fail> - <log path>

Evidence:
- Bridge repo: <commit> plus dirty files
- verl repo: <commit> plus dirty files
- Command: <exact command or script path>
- Key lines: <VANILLA_MBRIDGE=False, Training Progress completion, training/global_step:1, or the first relevant error>

Limitations:
- <dummy model, skipped save/resume, COMMON_EP=1, text-only data for VLM, no convergence claim, known shutdown warnings, etc.>

Follow-ups:
- <needed rerun, environment fix, provider fix, or "none">

If the job is blocked before Bridge model/provider construction by data, reward model, Ray, vLLM, dependency, disk, or cluster setup, mark the overall result as

Blocked

, not

Fail

, and state that it is not evidence against the Bridge provider.

If any requested deliverable was not run, mark it

Skipped

Not requested

with the reason. Do not leave it implicit in the limitations.

每次运行结束后，先给出面向用户的简短摘要，回答“请求的交付项是否通过？”，再添加详细信息。对每个交付项使用

Pass

、

Fail

、

Skipped

或

Blocked

，仅当达到请求覆盖层级的通过标准时，才报告整体

Pass

。

text

结果: <Pass/Fail/Blocked> - <一句话说明验证内容>
请求覆盖层级: <Level 0/1/2/3/4/5及请求的变体>
模型: <MODEL_ID或MODEL_PATH>

交付项:
- Bridge侧检查: <Pass/Fail/Skipped> - <测试命令或跳过原因>
- verl中本地Bridge导入: <Pass/Fail> - <PYTHONPATH或导入的Bridge路径>
- verl Megatron后端运行: <Pass/Fail/Skipped> - <LoRA + DDP或请求的变体>
- 请求的变体: <Pass/Fail/Skipped/Not requested> - <保存/恢复、并行压力测试、Megatron-FSDP、架构特定运行或学习信号/收敛>
- 日志捕获: <Pass/Fail> - <日志路径>

证据:
- Bridge仓库: <commit哈希> 加上未提交文件
- verl仓库: <commit哈希> 加上未提交文件
- 命令: <精确命令或脚本路径>
- 关键行: <VANILLA_MBRIDGE=False、训练进度完成、training/global_step:1或首个相关错误>

限制:
- <dummy模型、跳过保存/恢复、COMMON_EP=1、VLM的文本-only数据、无收敛声明、已知关闭警告等>

后续工作:
- <需要重新运行、环境修复、提供者修复或“无”>

如果在Bridge模型/提供者构建前因数据、奖励模型、Ray、vLLM、依赖、磁盘或集群设置导致任务受阻，将整体结果标记为

Blocked

而非

Fail

，并说明这不能作为Bridge提供者的负面证据。

如果任何请求的交付项未运行，将其标记为

Skipped

或

Not requested

并说明原因。不要在限制中隐含说明。