Search Results: megatron-bridge

Found 32 Skills

perf-cpu-offloading

Validate and use CPU offloading in Megatron Bridge, including layer-level activation offloading and fractional optimizer state offloading with HybridDeviceOptimizer.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-activation-recompute

Validate and use selective and full activation recompute in Megatron Bridge to reduce GPU memory usage at the cost of extra compute.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-cuda-graphs

Validate and use CUDA graph capture in Megatron Bridge, including local full-iteration graphs and Transformer Engine scoped graphs for attention, MLP, and MoE modules.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-memory-tuning

Techniques for reducing peak GPU memory in Megatron Bridge — expandable segments, parallelism resizing, activation recompute, CPU offloading constraints, and common OOM fixes.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesnvidia/skills

bump-dependency

Bump a pinned dependency (TransformerEngine, Megatron-LM, NRX, etc.), regenerate the lockfile, open a PR, and drive it to green by attaching a watchdog to the "CICD NeMo" workflow and quarantining failing functional tests as flaky until the run is green.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-moe-vlm-training

Practical guidance for training MoE VLMs in Megatron Bridge. Compares FSDP and 3D-parallel approaches, using rounded lessons from Qwen3-VL, Qwen3-Next, and other multimodal experiments.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-moe-comm-overlap

MoE expert-parallel communication overlap in Megatron Bridge. Covers dispatch/combine overlap, flex dispatcher backends, and expert wgrad scheduling.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-moe-long-context

Long-context MoE training guidance for Megatron Bridge. Covers CP sizing, selective recompute, dispatcher choices, and practical patterns from DSV3, Qwen3, and Qwen3-Next long-context experiments.

🇺🇸|EnglishTranslated