Search Results: large-language-models

Found 5 Skills

AI & Machine Learningitsmostafa/llm-engineerin...

qlora

Memory-efficient fine-tuning with 4-bit quantization and LoRA adapters. Use when fine-tuning large models (7B+) on consumer GPUs, when VRAM is limited, or when standard LoRA still exceeds memory. Builds on the lora skill.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-activation-recompute

Validate and use selective and full activation recompute in Megatron Bridge to reduce GPU memory usage at the cost of extra compute.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-moe-dispatcher-selection

Choose the right MoE token dispatcher (`alltoall`, DeepEP, or HybridEP) for the hardware, EP degree, and optimization stage. Summarizes patterns from DSV3, Qwen3, Qwen3-Next, and VLM bring-up work.

🇺🇸|EnglishTranslated

AI & Machine Learningmindrally/skills

deep-learning-python

Guidelines for deep learning development with PyTorch, Transformers, Diffusers, and Gradio for LLM and diffusion model work.

🇺🇸|EnglishTranslated

AI & Machine Learningpepperu96/hyper-mla

mla-analysis

MLA (Multi-Latent Attention) cost models, regime analysis, and kernel selection guide. Use when: (1) reasoning about which kernel approach to use for a given regime, (2) understanding cost model tradeoffs between FlashMLA, FlashAttention, and MLAvar6+, (3) analyzing roofline behavior across decode/speculative/prefill regimes, (4) setting optimization targets, (5) understanding MLA math and absorption trick.

🇺🇸|EnglishTranslated