Search Results: pytorch

Found 95 Skills

AI & Machine Learningorchestra-research/ai-res...

ml-training-recipes

Battle-tested PyTorch training recipes for all domains — LLMs, vision, diffusion, medical imaging, protein/drug discovery, spatial omics, genomics. Covers training loops, optimizer selection (AdamW, Muon), LR scheduling, mixed precision, debugging, and systematic experimentation. Use when training or fine-tuning neural networks, debugging loss spikes or OOM, choosing architectures, or optimizing GPU throughput.

🇺🇸|EnglishTranslated

AI & Machine Learningpluginagentmarketplace/cu...

deep-learning

PyTorch, TensorFlow, neural networks, CNNs, transformers, and deep learning for production

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningmindrally/skills

deep-learning-python

Guidelines for deep learning development with PyTorch, Transformers, Diffusers, and Gradio for LLM and diffusion model work.

🇺🇸|EnglishTranslated

AI & Machine Learningorchestra-research/ai-res...

fine-tuning-serving-openpi

Fine-tune and serve Physical Intelligence OpenPI models (pi0, pi0-fast, pi0.5) using JAX or PyTorch backends for robot policy inference across ALOHA, DROID, and LIBERO environments. Use when adapting pi0 models to custom datasets, converting JAX checkpoints to PyTorch, running policy inference servers, or debugging norm stats and GPU memory issues.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

torchforge-rl-training

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or scalable training with Monarch and TorchTitan.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

ray-train

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic scaling. Use when training massive models across multiple machines or running distributed hyperparameter sweeps.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

distributed-llm-pretraining-torchtitan

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+ GPUs with Float8, torch.compile, and distributed checkpointing.

🇺🇸|EnglishTranslated

AI & Machine Learningleoliang-zihao/agent-skil...

torch-recsys-model-dev

Guides systematic PyTorch recommender-system model development across compact data facts, existing source code, configs, focused tests, and training loops without overloading context from broad research archives. Use when building, debugging, or refactoring torch/nn.Module RecSys models with Transformer/HSTU/attention blocks, sparse/dense/list feature fusion, pCVR/CTR heads, ablation axes, or competition codebases where many model ideas exist but bugs and interface drift must be controlled. 用来指导推荐系统 PyTorch 模型开发、Transformer/HSTU 建模、关键数据事实、特征交互、shape/debug、训练闭环和已有模型结构的系统化推进。

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia-nemo/nemo

debug-training-logs

Debug distributed training failures (NeMo, Megatron, PyTorch) from worker stderr logs and optional AIStore daemon logs. Finds root cause across NCCL timeouts, data loading errors, and storage failures.

🇺🇸|EnglishTranslated

Data Processingdavila7/claude-code-templ...

cellxgene-census

Query CZ CELLxGENE Census (61M+ cells). Filter by cell type/tissue/disease, retrieve expression data, integrate with scanpy/PyTorch, for population-scale single-cell analysis.

🇺🇸|EnglishTranslated

AI & Machine Learningllama-farm/llamafarm

runtime-skills

Universal Runtime best practices for PyTorch inference, Transformers models, and FastAPI serving. Covers device management, model loading, memory optimization, and performance tuning.

🇺🇸|EnglishTranslated

AI & Machine Learningmathews-tom/praxis-skills

gpu-optimizer

Expert GPU optimization for modern consumer GPUs (8-24GB VRAM). Use this skill when you need to optimize GPU training, speed up CUDA code, reduce OOM errors, tune XGBoost for GPU, migrate NumPy to CuPy, make a model faster, manage GPU memory, optimize VRAM usage, or benchmark PyTorch. Covers mixed precision, gradient checkpointing, XGBoost GPU acceleration, CuPy/cuDF migration, vectorization, torch.compile, and diagnostics. NVIDIA GPUs only. PyTorch, XGBoost, and RAPIDS frameworks.

🇺🇸|EnglishTranslated