Loading...
Loading...
Found 115 Skills
Guidance for creating standalone CLI tools that perform neural network inference by extracting PyTorch model weights and reimplementing inference in C/C++. This skill applies when tasks involve converting PyTorch models to standalone executables, extracting model weights to portable formats (JSON), implementing neural network forward passes in C/C++, or creating CLI tools that load images and run inference without Python dependencies.
PyTorch implementation of TurboQuant for LLM KV cache compression using two-stage vector quantization (random rotation + Lloyd-Max + QJL residual correction).
Write docstrings for PyTorch functions and methods following PyTorch conventions. Use when writing or updating docstrings in PyTorch code.
Building and training neural networks with PyTorch. Use when implementing deep learning models, training loops, data pipelines, model optimization with torch.compile, distributed training, or deploying PyTorch models.
Debug PyTorch 2 compiler stack failures including Dynamo graph breaks, Inductor codegen errors, AOTAutograd crashes, and accuracy mismatches. Use when encountering torch.compile errors, BackendCompilerFailed exceptions, recompilation issues, Triton kernel failures, FX graph problems, or when the user mentions debugging PT2, Dynamo, Inductor, or compiled model issues.
LeetCode-style PyTorch interview practice environment with auto-grading for implementing softmax, attention, GPT-2 and more from scratch.
Review PyTorch pull requests for code quality, test coverage, security, and backward compatibility. Use when reviewing PRs, when asked to review code changes, or when the user mentions "review PR", "code review", or "check this PR".
Document undocumented public APIs in PyTorch by removing functions from coverage_ignore_functions and coverage_ignore_classes in docs/source/conf.py, running Sphinx coverage, and adding the appropriate autodoc directives to the correct .md or .rst doc files. Use when a user asks to remove functions from conf.py ignore lists.
High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.
Refactor PyTorch code to improve maintainability, readability, and adherence to best practices. Identifies and fixes DRY violations, long functions, deep nesting, SRP violations, and opportunities for modular components. Applies PyTorch 2.x patterns including torch.compile optimization, Automatic Mixed Precision (AMP), optimized DataLoader configuration, modular nn.Module design, gradient checkpointing, CUDA memory management, PyTorch Lightning integration, custom Dataset classes, model factory patterns, weight initialization, and reproducibility patterns.
Debug AOTInductor (AOTI) errors and crashes. Use when encountering AOTI segfaults, device mismatch errors, constant loading failures, or runtime errors from aot_compile, aot_load, aoti_compile_and_package, or aoti_load_package.
Expert guidance for computer vision development using OpenCV, PyTorch, and modern deep learning techniques for image and video processing.