Loading...
Loading...
Found 4 Skills
Autonomous ML experimentation framework by Andrej Karpathy. AI agent autonomously modifies train.py, runs 5-minute GPU experiments, evaluates with val_bpb, and commits only improvements via git ratcheting — so you wake up to 100+ experiments and a better model. Use when setting up autoresearch, writing program.md directives, interpreting results, configuring hardware, or running overnight autonomous ML experiments. Triggers on: autoresearch, autonomous ml experiments, overnight gpu experiments, karpathy autoresearch, train.py experiments, val_bpb, program.md research directives, ai runs experiments.
Autonomous LLM training optimization with GPU support. Runs 5-minute training experiments, measures val_bpb, keeps improvements or reverts — repeat forever. Use this skill when the user asks to "train a model autonomously", "optimize LLM training", "run ML experiments", "autoresearch with GPU", "optimize val_bpb", "autonomous ML training", "LLM pretraining loop", "setup ML autoresearch", "GPU training experiments", "pretrain from scratch", "speed up training", "lower my loss", "GPU optimization", "CUDA training", or mentions "train.py", "prepare.py", "bits per byte", "val_bpb", "NVIDIA GPU training", "RTX training", "H100 training", "autonomous model training", "consumer GPU training", "low VRAM training". Always use this skill when the user wants to autonomously optimize any ML training metric.
Submit or run an ML experiment on a compute environment (local, SLURM HPC, RunAI/Kubernetes). Use when the user wants to launch a training run, submit a job, run ablations, or execute an experiment script on any compute cluster.
Runs ML experiments reproducibly — single runs or autonomous BFS batches. Single mode: isolated venv, time-budgeted, failure-handled, logs to RESEARCH.md. BFS mode (opt-in): designs N hypotheses, runs each for a fixed budget, compares via a single verifiable metric, keeps improvements and git-resets failures — fully autonomous until done. Respects the RESEARCH.md supervision policy for notifications, approvals, and stop limits. Trigger phrases: "run experiment", "train model", "explore design space", "find best config", "autoresearch".