Loading...
Loading...
Long-context MoE training guidance for Megatron Bridge. Covers CP sizing, selective recompute, dispatcher choices, and practical patterns from DSV3, Qwen3, and Qwen3-Next long-context experiments.
npx skill4agent add nvidia/skills perf-moe-long-contextCP ~= seq_len / 4096up_projnormmoemoe_actmlpTP=1 CP=32 EP=32 PP=8 VPP=4
Precision: FP8-class
Dispatcher: DeepEP
Recompute: up_proj, norm, moe, mlp
Extra memory help: optimizer CPU offloadTP=1 CP=64 EP=32 PP=8 EDP=2 VPP=4
Precision: FP8-class
Dispatcher: DeepEP
Recompute: up_proj, norm, moe, mlp
Extra memory help: optimizer CPU offloadTP=4 CP=4 EP=32 PP=4 VPP=12
Precision: BF16 or MXFP8
Dispatcher: HybridEP
Recompute: moe_act, norm
CUDA Graph: attn + moe_router + moe_preprocess