Loading...
Loading...
Found 39 Skills
Generate standard Draw.io (.drawio) format visual diagrams for deep learning models, network architectures, algorithm workflows, etc. Two modes are supported: generation from scratch and style migration. Generation from scratch: model architecture diagrams, flowcharts, receptive field schematic diagrams, etc. Style migration: reference image + content description/project → generate new diagrams following the style of the reference image. Ensure the XML format is correct and can be directly opened and edited in Draw.io.
Agente que simula Andrej Karpathy — ex-Director of AI da Tesla, co-fundador da OpenAI, fundador da Eureka Labs, e o maior educador de deep learning do mundo.
GPU Code to Ascend NPU Adaptation Review Expert. When users need to migrate GPU-based code (especially deep learning and model inference-related code) to Huawei Ascend NPU, this skill must be used for comprehensive review. This skill can identify bottlenecks in GPU-to-NPU migration, write adaptation scripts, generate verification plans, and output a complete Markdown review report. Trigger scenarios include: users mentioning keywords such as "NPU adaptation", "Ascend migration", "GPU to NPU", "Ascend", "CANN", "model migration", "operator adaptation", or users requesting to review GPU code repositories and migrate to the NPU platform.
Deep learning framework development with tinygrad - a minimal tensor library with autograd, JIT compilation, and multi-device support. Use when writing neural networks, training models, implementing tensor operations, working with UOps/PatternMatcher for graph transformations, or contributing to tinygrad internals. Triggers on tinygrad imports, Tensor operations, nn modules, optimizer usage, schedule/codegen work, or device backends.
Building and training neural networks with PyTorch. Use when implementing deep learning models, training loops, data pipelines, model optimization with torch.compile, distributed training, or deploying PyTorch models.
Graph Neural Networks (PyG). Node/graph classification, link prediction, GCN, GAT, GraphSAGE, heterogeneous graphs, molecular property prediction, for geometric deep learning.
LeetCode-style PyTorch interview practice environment with auto-grading for implementing softmax, attention, GPT-2 and more from scratch.
Transform raw content (text/URL) into structured learning documents with 6-phase framework combining AI analysis + reflection prompts. Use when the user wants to deeply understand content, create study notes, or learn from articles/books/documents. Triggers on "learn this", "deep learn", "study this", "tạo tài liệu học", "phân tích nội dung", "hiểu sâu", or "deep learner".
Explore-lane experimental execution skill for deep learning research repositories. Use when the researcher explicitly authorizes exploratory runs such as small-subset validation, short-cycle guess-and-check, batch sweeps, idle-GPU search, or quick transfer-learning trials, with results summarized in `explore_outputs/`. Do not use for end-to-end exploration orchestration on top of `current_research`, trusted baseline execution, conservative training verification, default routing, or implicit experimentation.
High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks system, and minimal boilerplate. Scales from laptop to supercomputer with same code. Use when you want clean training loops with built-in best practices.
Guidance for building Caffe from source and training CIFAR-10 models. This skill applies when tasks involve compiling Caffe deep learning framework, configuring Makefile.config, preparing CIFAR-10 dataset, or training CNN models with Caffe solvers. Use for legacy ML framework installation, LMDB dataset preparation, and CPU-only deep learning training tasks.
Nsight Systems (nsys) CLI for system-level timeline profiling. Use when the user wants to run nsys profile, analyze .nsys-rep reports, use nsys stats/analyze/recipe commands, diagnose GPU idle time from timeline traces, or profile distributed training with NCCL overlap analysis. NOT for kernel-level metrics like SOL%, occupancy, or roofline (use perf-nsight-compute-analysis for ncu). NOT for writing or generating kernels. NOT for applying optimizations like CUDA Graphs.