All Skills Search Tools

|

Loading...

AI Agent Skills Directory with categorization, English/Chinese translation, and script security checks.

Sitemap

Home
All Skills
Search
Tools

About

About Us
Disclaimer
Copyright

Help

FAQ
Privacy
Terms

Contact Us:osulivan147@qq.com

© 2026 Skill4Agent. All rights reserved.

Search Results: moe-training

Found 6 Skills

AI & Machine Learningdavila7/claude-code-templ...

moe-training

Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), implementing sparse architectures like Mixtral 8x7B or DeepSeek-V3, or scaling model capacity without proportional compute increase. Covers MoE architectures, routing mechanisms, load balancing, expert parallelism, and inference optimization.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemo-mbridge-perf-moe-hardware-configs

Representative MoE training playbooks by hardware platform and model family. Summarizes rounded throughput bands, parallelism patterns, and common tuning stacks.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

nemo-mbridge-perf-moe-optimization-workflow

Systematic workflow for MoE training optimization in Megatron Bridge, based on the Megatron-Core MoE paper. Covers the Three Walls framework, parallel folding, recompute strategy, dispatcher choice, and CUDA-graph bring-up.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-moe-hardware-configs

Representative MoE training playbooks by hardware platform and model family. Summarizes rounded throughput bands, parallelism patterns, and common tuning stacks.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-moe-optimization-workflow

Systematic workflow for MoE training optimization in Megatron Bridge, based on the Megatron-Core MoE paper. Covers the Three Walls framework, parallel folding, recompute strategy, dispatcher choice, and CUDA-graph bring-up.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-moe-long-context

Long-context MoE training guidance for Megatron Bridge. Covers CP sizing, selective recompute, dispatcher choices, and practical patterns from DSV3, Qwen3, and Qwen3-Next long-context experiments.

🇺🇸|EnglishTranslated