Search Results: megatron-bridge

Found 32 Skills

adding-model-support

Guide for adding support for new LLM or VLM models in Megatron-Bridge. Covers bridge, provider, recipe, tests, docs, and examples.

🇺🇸|EnglishTranslated

AI & Machine Learningpromptingcompany/nv-skill...

nemotron-customize

Plan, configure, and chain repo-native Nemotron customization steps into single-step or multi-step pipelines: curation, translation, SFT/PEFT (AutoModel or Megatron-Bridge), pretraining/CPT, RL alignment (DPO/RLVR/GRPO/RLHF), BYOB/MCQ benchmarks, checkpoint conversion, ModelOpt optimization, env profiles, and evaluation of trained checkpoints or existing/hosted endpoints. Use when a request names a Nemotron step or workflow, or asks to clean, translate, train, fine-tune, align, convert, optimize, evaluate, or compose these into a pipeline. Do NOT use for frontend/dashboard/visualization work, generic ML advice, billing/access, or non-Nemotron coding tasks.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemotron-customize

Plan Nemotron customization pipelines from repo steps: SFT, PEFT/LoRA, AutoModel vs Megatron-Bridge, DPO/RLVR/GRPO/RLHF, curate-then-translate, BYOB/MCQ benchmark prep or translation, checkpoint conversion, ModelOpt optimization, and endpoint or checkpoint evaluation.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemo-mbridge-perf-moe-long-context

Long-context MoE training guidance for Megatron Bridge. Covers CP sizing, selective recompute, dispatcher choices, and practical patterns from DSV3, Qwen3, and Qwen3-Next long-context experiments.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemo-mbridge-mlm-bridge-training

Run Megatron-LM (MLM) and Megatron Bridge training with mock or real data. Covers correlation testing, available recipes, and multi-GPU examples.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemo-mbridge-perf-memory-tuning

Techniques for reducing peak GPU memory in Megatron Bridge — expandable segments, parallelism resizing, activation recompute, CPU offloading constraints, and common OOM fixes.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

mlm-bridge-training

Run Megatron-LM (MLM) and Megatron Bridge training with mock or real data. Covers correlation testing, available recipes, and multi-GPU examples.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemo-mbridge-perf-cuda-graphs

Validate and use CUDA graph capture in Megatron Bridge, including local full-iteration graphs and Transformer Engine scoped graphs for attention, MLP, and MoE modules.

🇺🇸|EnglishTranslated

AI & Machine Learningpromptingcompany/nv-skill...

nemo-mbridge-perf-moe-vlm-training

Practical guidance for training MoE VLMs in Megatron Bridge. Compares FSDP and 3D-parallel approaches, using rounded lessons from Qwen3-VL, Qwen3-Next, and other multimodal experiments.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesnvidia/skills

cicd

CI/CD reference for Megatron Bridge — pipeline structure, commit and PR workflow, CI failure investigation, and common failure patterns.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemo-mbridge-perf-cpu-offloading

Validate and use CPU offloading in Megatron Bridge, including layer-level activation offloading and fractional optimizer state offloading with HybridDeviceOptimizer.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemo-mbridge-perf-moe-comm-overlap

MoE expert-parallel communication overlap in Megatron Bridge. Covers dispatch/combine overlap, flex dispatcher backends, and expert wgrad scheduling.

🇺🇸|EnglishTranslated