vast-gpu

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Vast.ai GPU Management

Vast.ai GPU 管理

Manage vast.ai GPU instance: $ARGUMENTS

管理vast.ai GPU实例：$ARGUMENTS

Overview

概述

Rent cheap, capable GPUs from vast.ai on demand. This skill analyzes the training task to determine GPU requirements, searches for the best-value offers, presents options with estimated total cost, and handles the full lifecycle: rent → setup → run → destroy.

Users do NOT specify GPU models or hardware. They describe the task — the skill figures out what to rent.

Prerequisites: The

vastai

CLI must be installed (requires Python ≥ 3.10) and authenticated:

bash

pip install vastai
vastai set api-key YOUR_API_KEY

If your system Python is < 3.10, create a virtual environment with Python ≥ 3.10 (e.g.,
conda create
,
pyenv
,
uv venv
, etc.) and install
vastai
there.

SSH public key must be uploaded at https://cloud.vast.ai/manage-keys/ BEFORE creating any instance. Keys are baked into instances at creation time — if you add a key after renting, you must destroy and re-create the instance.

按需从vast.ai租用性价比高的GPU。本技能会分析训练任务以确定GPU需求，搜索最具性价比的方案，展示包含预估总成本的选项，并处理完整生命周期：租用 → 配置 → 运行 → 销毁。

用户无需指定GPU型号或硬件参数，只需描述任务——技能会自动确定需要租用的资源。

前提条件： 必须安装

vastai

CLI（要求 Python ≥ 3.10）并完成认证：

bash

pip install vastai
vastai set api-key YOUR_API_KEY

如果系统Python版本低于3.10，请创建一个Python ≥3.10的虚拟环境（例如
conda create
、
pyenv
、
uv venv
等），并在该环境中安装
vastai
。

必须在创建任何实例之前，将SSH公钥上传至 https://cloud.vast.ai/manage-keys/。密钥会在创建实例时嵌入其中——如果在租用后添加密钥，必须销毁并重新创建实例。

State File

状态文件

All active vast.ai instances are tracked in

vast-instances.json

at the project root:

json

[
  {
    "instance_id": 33799165,
    "offer_id": 25831376,
    "gpu_name": "RTX_3060",
    "num_gpus": 1,
    "dph": 0.0414,
    "ssh_url": "ssh://root@1.208.108.242:58955",
    "ssh_host": "1.208.108.242",
    "ssh_port": 58955,
    "created_at": "2026-03-29T21:12:00Z",
    "status": "running",
    "experiment": "exp01_baseline",
    "estimated_hours": 4.0,
    "estimated_cost": 0.17
  }
]

This file is the source of truth for

/run-experiment

and

/monitor-experiment

to connect to vast.ai instances.

所有活跃的vast.ai实例都会在项目根目录的

vast-instances.json

文件中追踪：

json

[
  {
    "instance_id": 33799165,
    "offer_id": 25831376,
    "gpu_name": "RTX_3060",
    "num_gpus": 1,
    "dph": 0.0414,
    "ssh_url": "ssh://root@1.208.108.242:58955",
    "ssh_host": "1.208.108.242",
    "ssh_port": 58955,
    "created_at": "2026-03-29T21:12:00Z",
    "status": "running",
    "experiment": "exp01_baseline",
    "estimated_hours": 4.0,
    "estimated_cost": 0.17
  }
]

该文件是

/run-experiment

和

/monitor-experiment

连接vast.ai实例的唯一可信来源。

Workflow

工作流程

Action: Provision (default)

操作：资源配置（默认）

Analyze the task, find the best GPU, and present cost-optimized options. This is the main entry point — called directly or automatically by

/run-experiment

when

gpu: vast

is set.

Step 1: Analyze Task Requirements

Read available context to determine what the task needs:

From the experiment plan (
```
refine-logs/EXPERIMENT_PLAN.md
```
):
- Compute budget (total GPU-hours)
- Hardware hints (e.g., "4x RTX 3090")
- Model architecture and dataset size
- Run order and per-milestone cost estimates
From experiment scripts (if already written):
- Model size — scan for model class,
```
num_parameters
```
  , config files
- Batch size, sequence length — estimate VRAM from these
- Dataset — estimate training time from dataset size + epochs
- Multi-GPU — check for
```
DataParallel
```
  ,
```
DistributedDataParallel
```
  ,
```
accelerate
```
  ,
```
deepspeed
```
From user description (if no plan/scripts exist):
- Model name/size (e.g., "fine-tune LLaMA-7B", "train ResNet-50")
- Dataset scale (e.g., "ImageNet", "10k samples")
- Estimated duration (e.g., "about 2 hours")

Step 2: Determine GPU Requirements

Based on the task analysis, determine:

Factor	How to estimate
Min VRAM	Model params × 4 bytes (fp32) or × 2 (fp16/bf16) + optimizer states + activations. Rules of thumb: 7B model ≈ 16 GB (fp16), 13B ≈ 28 GB, 70B ≈ 140 GB (needs multi-GPU). ResNet/ViT ≈ 4-8 GB. Add 20% headroom.
Num GPUs	1 unless: model doesn't fit in single GPU VRAM, or scripts use DDP/FSDP/DeepSpeed, or plan specifies multi-GPU
Est. hours	From experiment plan's cost column, or: (dataset_size × epochs) / (throughput × batch_size). Default to user estimate if available. Add 30% buffer for setup + unexpected slowdowns
Min disk	20 GB base + model checkpoint size + dataset size. Default: 50 GB
CUDA version	Match PyTorch version. PyTorch 2.x needs CUDA ≥ 11.8. Default: 12.1

Step 3: Search Offers

Search across multiple GPU tiers to find the best value. Always search broadly — do NOT limit to one GPU model:

bash

undefined

分析任务需求，找到最优GPU，并展示成本优化后的选项。这是主要入口——可直接调用，或当设置

gpu: vast

时由

/run-experiment

自动调用。

步骤1：分析任务需求

读取可用上下文以确定任务所需资源：

来自实验计划（
```
refine-logs/EXPERIMENT_PLAN.md
```
）：
- 计算预算（总GPU时长）
- 硬件提示（例如“4x RTX 3090”）
- 模型架构和数据集大小
- 运行顺序和各里程碑成本预估
来自实验脚本（若已编写）：
- 模型大小——扫描模型类、
```
num_parameters
```
  、配置文件
- 批量大小、序列长度——据此预估显存占用
- 数据集——根据数据集大小和轮数预估训练时间
- 多GPU支持——检查是否使用
```
DataParallel
```
  、
```
DistributedDataParallel
```
  、
```
accelerate
```
  、
```
deepspeed
```
来自用户描述（若无计划/脚本）：
- 模型名称/大小（例如“微调LLaMA-7B”、“训练ResNet-50”）
- 数据集规模（例如“ImageNet”、“10k样本”）
- 预估时长（例如“约2小时”）

步骤2：确定GPU需求

基于任务分析，确定以下参数：

因素	预估方式
最小显存	模型参数×4字节（fp32）或×2（fp16/bf16） + 优化器状态 + 激活值。经验法则：7B模型≈16 GB（fp16），13B≈28 GB，70B≈140 GB（需多GPU）。ResNet/ViT≈4-8 GB。额外预留20%余量。
GPU数量	默认1个，除非：模型无法装入单GPU显存、脚本使用DDP/FSDP/DeepSpeed、计划指定多GPU
预估时长	来自实验计划的成本列，或：(数据集大小×轮数)/(吞吐量×批量大小)。若有用户预估则默认使用。额外预留30%缓冲时间用于配置和意外减速
最小磁盘	基础20 GB + 模型 checkpoint 大小 + 数据集大小。默认：50 GB
CUDA版本	匹配PyTorch版本。PyTorch 2.x需要CUDA ≥11.8。默认：12.1

步骤3：搜索方案

跨多个GPU层级搜索以找到最优性价比。始终进行广泛搜索——不要局限于单一GPU型号：

bash

undefined

Tier 1: Budget GPUs (good for small models, fine-tuning, ablations)

层级1：经济型GPU（适用于小型模型、微调、消融实验）

vastai search offers "gpu_ram>=<MIN_VRAM> num_gpus>=<N> reliability>0.95 inet_down>100" -o 'dph+' --storage <DISK> --limit 10

Tier 2: If VRAM > 24 GB, also search high-VRAM cards specifically

层级2：若显存>24 GB，额外搜索高显存显卡

vastai search offers "gpu_ram>=48 num_gpus>=<N> reliability>0.95" -o 'dph+' --storage <DISK> --limit 5


The output is a table with columns: `ID`, `CUDA`, `N` (GPU count), `Model`, `PCIE`, `cpu_ghz`, `vCPUs`, `RAM`, `Disk`, `$/hr`, `DLP` (deep learning perf), `score`, `NV Driver`, `Net_up`, `Net_down`, `R` (reliability %), `Max_Days`, `mach_id`, `status`, `host_id`, `ports`, `country`.

The **first column (`ID`)** is the offer ID needed for `vastai create instance`.

**Step 4: Present Cost-Optimized Options**

Present **3 options** to the user, ranked by estimated total cost:

Task analysis:

Model: [model name/size] → estimated VRAM: ~[X] GB
Training: ~[Y] hours estimated
Requirements: [N] GPU(s), ≥[X] GB VRAM, ~[Z] GB disk

Recommended options (sorted by estimated total cost):

#	GPU	VRAM	$/hr	Est. Hours	Est. Total	Reliability	Offer ID
1	RTX 3060	12 GB	$0.04	~6h	~$0.25	99.4%	25831376
2	RTX 4090	24 GB	$0.28	~4h	~$1.12	99.2%	6995713
3	A100 SXM	80 GB	$0.95	~2h	~$1.90	99.5%	7023456

Option 1 is cheapest overall. Option 3 finishes fastest. Pick a number (or type a different offer ID):


**Key presentation rules:**
- Always show **estimated total cost** ($/hr × estimated hours), not just $/hr
- Faster GPUs have shorter estimated hours (scale by relative FLOPS)
- Flag if a cheap option has reliability < 0.97 ("budget pick — 3% chance of interruption")
- If task is small (<1 hour), recommend interruptible pricing for even lower cost
- If no offers meet VRAM requirements, explain why and suggest alternatives (e.g., multi-GPU, quantization)

**Relative speed scaling (approximate, for estimating hours across GPU tiers):**

| GPU | Relative Speed (FP16) |
|-----|-----------------------:|
| RTX 3060 | 0.5× |
| RTX 3090 | 1.0× |
| RTX 4090 | 1.6× |
| A5000 | 0.9× |
| A6000 | 1.1× |
| L40S | 1.5× |
| A100 SXM | 2.0× |
| H100 SXM | 3.3× |

Use these to scale the base estimated hours across offers.

vastai search offers "gpu_ram>=48 num_gpus>=<N> reliability>0.95" -o 'dph+' --storage <DISK> --limit 5


输出为表格，包含以下列：`ID`、`CUDA`、`N`（GPU数量）、`Model`、`PCIE`、`cpu_ghz`、`vCPUs`、`RAM`、`Disk`、`$/hr`、`DLP`（深度学习性能）、`score`、`NV Driver`、`Net_up`、`Net_down`、`R`（可靠性%）、`Max_Days`、`mach_id`、`status`、`host_id`、`ports`、`country`。

**第一列（`ID`）**是`vastai create instance`所需的方案ID。

**步骤4：展示成本优化后的选项**

向用户展示**3个选项**，按预估总成本排序：

任务分析：

模型：[模型名称/大小] → 预估显存：~[X] GB
训练：预估~[Y]小时
需求：[N]个GPU，≥[X] GB显存，~[Z] GB磁盘

推荐选项（按预估总成本排序）：

#	GPU	显存	每小时费用	预估时长	预估总成本	可靠性	方案ID
1	RTX 3060	12 GB	$0.04	~6h	~$0.25	99.4%	25831376
2	RTX 4090	24 GB	$0.28	~4h	~$1.12	99.2%	6995713
3	A100 SXM	80 GB	$0.95	~2h	~$1.90	99.5%	7023456

选项1总成本最低，选项3完成速度最快。选择序号（或输入其他方案ID）：


**关键展示规则：**
- 始终展示**预估总成本**（每小时费用×预估时长），而非仅展示每小时费用
- 更快的GPU预估时长更短（按相对FLOPS比例调整）
- 若低价选项可靠性<0.97，需标注说明（“经济型选择——3%概率中断”）
- 若任务时长较短（<1小时），推荐使用可中断定价以进一步降低成本
- 若无方案满足显存需求，说明原因并建议替代方案（例如多GPU、量化）

**相对速度比例（近似值，用于跨GPU层级预估时长）：**

| GPU | 相对速度（FP16） |
|-----|-----------------------:|
| RTX 3060 | 0.5× |
| RTX 3090 | 1.0× |
| RTX 4090 | 1.6× |
| A5000 | 0.9× |
| A6000 | 1.1× |
| L40S | 1.5× |
| A100 SXM | 2.0× |
| H100 SXM | 3.3× |

使用这些比例在不同方案间调整基础预估时长。

Action: Rent

操作：租用

Create an instance from a user-selected offer.

Step 1: Create Instance

bash

vastai create instance <OFFER_ID> \
  --image <DOCKER_IMAGE> \
  --disk <DISK_GB> \
  --ssh \
  --direct \
  --onstart-cmd "apt-get update && apt-get install -y git screen rsync"

Default Docker image:

pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel

(override via

CLAUDE.md

image:

field if set).

The output looks like:

Started. {'success': True, 'new_contract': 33799165, 'instance_api_key': '...'}

The new_contract
value is the instance ID — save this for all subsequent commands.

Step 2: Wait for Instance Ready

Poll instance status every 20 seconds until it's running (typically takes 30-60 seconds, max ~5 minutes):

bash

vastai show instances --raw | python3 -c "
import sys, json
instances = json.load(sys.stdin)
for inst in instances:
    if inst['id'] == <INSTANCE_ID>:
        print(inst['actual_status'])
"

Wait states:

loading

→

running

. If stuck in

loading

for >5 minutes, warn the user — the host may be slow or the image may be large.

Step 3: Get SSH Connection Details

bash

vastai ssh-url <INSTANCE_ID>

This returns a URL in the format:

ssh://root@<HOST>:<PORT>

Parse out host and port from this URL. Example:

Input:
```
ssh://root@1.208.108.242:58955
```
Host:
```
1.208.108.242
```
, Port:
```
58955
```

Important: Always use
vastai ssh-url
to get connection details — do NOT rely on
ssh_host
/
ssh_port
from
vastai show instances
, as those may point to proxy servers that differ from the direct connection endpoint.

Step 4: Verify SSH Connectivity

bash

ssh -o StrictHostKeyChecking=no -o ConnectTimeout=15 -p <PORT> root@<HOST> "nvidia-smi && echo 'CONNECTION_OK'"

If SSH fails with "Permission denied (publickey)":

The user's SSH key was not uploaded to https://cloud.vast.ai/manage-keys/ before the instance was created
Fix: Destroy this instance, have user upload their key, then create a new instance. Keys are baked in at creation time — there is no way to add keys to a running instance.

If SSH fails with "Connection refused":

The instance may still be initializing. Retry up to 3 times with 15-second intervals.

Step 5: Update State File

Write/update

vast-instances.json

with the new instance details including the

ssh_url

from Step 3, estimated hours and cost.

Step 6: Report

Vast.ai instance ready:
- Instance ID: <ID>
- GPU: <GPU_NAME> x <NUM_GPUS>
- Cost: $<DPH>/hr (estimated total: ~$<TOTAL>)
- SSH: ssh -p <PORT> root@<HOST>
- Docker: <IMAGE>

To deploy: /run-experiment (will auto-detect this instance)
To destroy when done: /vast-gpu destroy <ID>

根据用户选择的方案创建实例。

步骤1：创建实例

bash

vastai create instance <OFFER_ID> \
  --image <DOCKER_IMAGE> \
  --disk <DISK_GB> \
  --ssh \
  --direct \
  --onstart-cmd "apt-get update && apt-get install -y git screen rsync"

默认Docker镜像：

pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel

（若

CLAUDE.md

中设置了

image:

字段则覆盖默认值）。

输出示例：

Started. {'success': True, 'new_contract': 33799165, 'instance_api_key': '...'}

new_contract
的值即为实例ID——请保存该值用于后续所有命令。

步骤2：等待实例就绪

每20秒轮询一次实例状态，直到实例运行（通常需要30-60秒，最长约5分钟）：

bash

vastai show instances --raw | python3 -c "
import sys, json
instances = json.load(sys.stdin)
for inst in instances:
    if inst['id'] == <INSTANCE_ID>:
        print(inst['actual_status'])
"

等待状态：

loading

→

running

。若

loading

状态持续超过5分钟，需向用户发出警告——主机可能运行缓慢或镜像过大。

步骤3：获取SSH连接信息

bash

vastai ssh-url <INSTANCE_ID>

返回格式为：

ssh://root@<HOST>:<PORT>

的URL。

从该URL中解析主机和端口。示例：

输入：
```
ssh://root@1.208.108.242:58955
```
主机：
```
1.208.108.242
```
，端口：
```
58955
```

重要提示： 始终使用
vastai ssh-url
获取连接信息——不要依赖
vastai show instances
中的
ssh_host
/
ssh_port
，因为这些可能指向与直接连接端点不同的代理服务器。

步骤4：验证SSH连通性

bash

ssh -o StrictHostKeyChecking=no -o ConnectTimeout=15 -p <PORT> root@<HOST> "nvidia-smi && echo 'CONNECTION_OK'"

若SSH失败并提示“Permission denied (publickey)”：

用户的SSH密钥未在创建实例之前上传至https://cloud.vast.ai/manage-keys/
解决方法： 销毁该实例，让用户上传密钥，然后重新创建实例。密钥会在创建时嵌入——无法向运行中的实例添加密钥。

若SSH失败并提示“Connection refused”：

实例可能仍在初始化中。最多重试3次，每次间隔15秒。

步骤5：更新状态文件

将新实例的详细信息（包括步骤3中的

ssh_url

、预估时长和成本）写入/更新

vast-instances.json

。

步骤6：报告

Vast.ai实例已就绪：
- 实例ID：<ID>
- GPU：<GPU_NAME> × <NUM_GPUS>
- 成本：$<DPH>/小时（预估总成本：~$<TOTAL>）
- SSH连接：ssh -p <PORT> root@<HOST>
- Docker镜像：<IMAGE>

部署命令：/run-experiment（会自动检测该实例）
完成后销毁命令：/vast-gpu destroy <ID>

Action: Setup

操作：配置

Set up the rented instance for a specific experiment. Called automatically by

/run-experiment

when targeting a vast.ai instance.

Step 1: Install Dependencies

bash

ssh -p <PORT> root@<HOST> "pip install -q wandb tensorboard scipy scikit-learn pandas"

If a

requirements.txt

exists in the project, install that instead:

bash

scp -P <PORT> requirements.txt root@<HOST>:/workspace/
ssh -p <PORT> root@<HOST> "pip install -q -r /workspace/requirements.txt"

Note:
scp
uses uppercase
-P
for port, while
ssh
uses lowercase
-p
.

Step 2: Sync Code

bash

rsync -avz -e "ssh -p <PORT>" \
  --include='*.py' --include='*.yaml' --include='*.yml' --include='*.json' \
  --include='*.txt' --include='*.sh' --include='*/' \
  --exclude='*.pt' --exclude='*.pth' --exclude='*.ckpt' \
  --exclude='__pycache__' --exclude='.git' --exclude='data/' \
  --exclude='wandb/' --exclude='outputs/' \
  ./ root@<HOST>:/workspace/project/

Step 3: Verify Setup

bash

ssh -p <PORT> root@<HOST> "cd /workspace/project && python -c 'import torch; print(f\"PyTorch {torch.__version__}, CUDA: {torch.cuda.is_available()}, GPUs: {torch.cuda.device_count()}\")'"

Expected output:

PyTorch 2.1.0, CUDA: True, GPUs: 1

(or more GPUs if multi-GPU instance).

为特定实验配置租用的实例。当目标为vast.ai实例时，由

/run-experiment

自动调用。

步骤1：安装依赖

bash

ssh -p <PORT> root@<HOST> "pip install -q wandb tensorboard scipy scikit-learn pandas"

若项目中存在

requirements.txt

，则安装该文件中的依赖：

bash

scp -P <PORT> requirements.txt root@<HOST>:/workspace/
ssh -p <PORT> root@<HOST> "pip install -q -r /workspace/requirements.txt"

注意：
scp
使用大写
-P
指定端口，而
ssh
使用小写
-p
。

步骤2：同步代码

bash

rsync -avz -e "ssh -p <PORT>" \
  --include='*.py' --include='*.yaml' --include='*.yml' --include='*.json' \
  --include='*.txt' --include='*.sh' --include='*/' \
  --exclude='*.pt' --exclude='*.pth' --exclude='*.ckpt' \
  --exclude='__pycache__' --exclude='.git' --exclude='data/' \
  --exclude='wandb/' --exclude='outputs/' \
  ./ root@<HOST>:/workspace/project/

步骤3：验证配置

bash

ssh -p <PORT> root@<HOST> "cd /workspace/project && python -c 'import torch; print(f\"PyTorch {torch.__version__}, CUDA: {torch.cuda.is_available()}, GPUs: {torch.cuda.device_count()}\")'"

预期输出：

PyTorch 2.1.0, CUDA: True, GPUs: 1

（若为多GPU实例则显示更多GPU数量）。

Action: Destroy

操作：销毁

Tear down a vast.ai instance to stop billing.

Step 1: Confirm Results Collected

Before destroying, check if there are experiment results to download:

bash

ssh -p <PORT> root@<HOST> "ls /workspace/project/results/ 2>/dev/null || echo 'NO_RESULTS_DIR'"

If results exist, download them first:

bash

rsync -avz -e "ssh -p <PORT>" root@<HOST>:/workspace/project/results/ ./results/

Also download logs:

bash

scp -P <PORT> root@<HOST>:/workspace/*.log ./logs/ 2>/dev/null

Step 2: Destroy Instance

bash

vastai destroy instance <INSTANCE_ID>

Output:

destroying instance <INSTANCE_ID>.

Destruction is irreversible — all data on the instance is permanently deleted.

Step 3: Update State File

Remove the instance from

vast-instances.json

or mark its status as

destroyed

Step 4: Report Cost

Calculate actual cost based on creation time and $/hr:

Instance <ID> destroyed.
- Duration: ~X.X hours
- Actual cost: ~$X.XX (estimated was $Y.YY)
- Results downloaded to: ./results/

销毁vast.ai实例以停止计费。

步骤1：确认结果已收集

销毁前，检查是否有实验结果需要下载：

bash

ssh -p <PORT> root@<HOST> "ls /workspace/project/results/ 2>/dev/null || echo 'NO_RESULTS_DIR'"

若存在结果，先下载：

bash

rsync -avz -e "ssh -p <PORT>" root@<HOST>:/workspace/project/results/ ./results/

同时下载日志：

bash

scp -P <PORT> root@<HOST>:/workspace/*.log ./logs/ 2>/dev/null

步骤2：销毁实例

bash

vastai destroy instance <INSTANCE_ID>

输出：

destroying instance <INSTANCE_ID>.

销毁操作不可逆转——实例上的所有数据会被永久删除。

步骤3：更新状态文件

从

vast-instances.json

中移除该实例，或将其状态标记为

destroyed

。

步骤4：报告成本

根据创建时间和每小时费用计算实际成本：

实例<ID>已销毁。
- 运行时长：~X.X小时
- 实际成本：~$X.XX（预估为$Y.YY）
- 结果已下载至：./results/

Action: List

操作：列出实例

Show all active vast.ai instances:

bash

vastai show instances

Cross-reference with

vast-instances.json

for experiment associations.

展示所有活跃的vast.ai实例：

bash

vastai show instances

结合

vast-instances.json

查看实例与实验的关联关系。

Action: Destroy All

操作：销毁所有实例

Tear down all active instances (use after all experiments complete):

Download results from each instance
Destroy all instances
Clear
```
vast-instances.json
```
Report total cost

销毁所有活跃实例（所有实验完成后使用）：

从每个实例下载结果
销毁所有实例
清空
```
vast-instances.json
```
报告总成本

Key Rules

核心规则

Task-driven selection — NEVER ask users to pick GPU models. Analyze the task, estimate requirements, present cost-optimized options with total price
ALWAYS destroy instances when experiments are done — vast.ai bills per second, leaving instances running wastes money
Download results before destroying — data is lost permanently on destroy
Prefer on-demand pricing for short experiments (<2 hours). Suggest interruptible/bid pricing for long runs (>4 hours) with checkpointing
Check reliability > 0.95 — unreliable hosts may crash mid-training
Use
--direct
SSH when creating instances — faster than proxy SSH
Always use
vastai ssh-url <ID>
to get connection details — the host/port from
```
show instances
```
may differ
SSH keys must be uploaded BEFORE creating instances — keys are baked in at creation time. If SSH fails with "Permission denied", destroy and recreate after adding the key
Default Docker image:
```
pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel
```
unless user specifies otherwise
Working directory on instance:
```
/workspace/
```
(Docker default). Code syncs to
```
/workspace/project/
```
State file
vast-instances.json
must stay up to date — other skills depend on it
Show estimated total cost, not just $/hr — a $0.90/hr GPU that finishes in 2h ($1.80) beats a $0.30/hr GPU that takes 8h ($2.40)
vastai
CLI requires Python ≥ 3.10 — if system Python is older, use a conda env

任务驱动选择——绝不要求用户选择GPU型号。分析任务、预估需求、展示包含总价的成本优化选项
实验完成后务必销毁实例——vast.ai按秒计费，让实例持续运行会浪费资金
销毁前下载结果——销毁后数据会永久丢失
短实验（<2小时）优先按需定价。长运行（>4小时）且支持 checkpointing 的任务建议使用可中断/竞价定价
选择可靠性>0.95的主机——不可靠的主机可能在训练中途崩溃
创建实例时使用
--direct
SSH——比代理SSH更快
始终使用
vastai ssh-url <ID>
获取连接信息——
```
show instances
```
中的主机/端口可能不同
SSH密钥必须在创建实例前上传——密钥会在创建时嵌入。若SSH提示“Permission denied”，销毁实例并添加密钥后重新创建
默认Docker镜像：
```
pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel
```
，除非用户指定其他镜像
实例上的工作目录：
```
/workspace/
```
（Docker默认）。代码同步至
```
/workspace/project/
```
状态文件
vast-instances.json
必须保持更新——其他技能依赖该文件
展示预估总成本而非仅每小时费用——每小时$0.90、2小时完成的GPU（总成本$1.80）优于每小时$0.30、8小时完成的GPU（总成本$2.40）
vastai
CLI要求Python ≥3.10——若系统Python版本较旧，使用conda环境

CLAUDE.md Example

CLAUDE.md示例

Users only need to set

gpu: vast

— no hardware preferences required:

markdown

undefined

用户只需设置

gpu: vast

——无需指定硬件偏好：

markdown

undefined

Vast.ai

gpu: vast # tells run-experiment to use vast.ai
auto_destroy: true # auto-destroy after experiment completes (default: true)
max_budget: 5.00 # optional: max total $ to spend (skill warns if estimate exceeds this)
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel # optional: override Docker image


The skill analyzes experiment scripts and plans to determine what GPU to rent. No need to specify GPU model, VRAM, or instance count.

gpu: vast # 告知run-experiment使用vast.ai
auto_destroy: true # 实验完成后自动销毁实例（默认值：true）
max_budget: 5.00 # 可选：最大总花费（若预估超过该值，技能会发出警告）
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel # 可选：覆盖默认Docker镜像


技能会分析实验脚本和计划以确定需要租用的GPU。无需指定GPU型号、显存或实例数量。

Composing with Other Skills

与其他技能组合使用

/run-experiment "train model"       ← detects gpu: vast, calls /vast-gpu provision
  ↳ /vast-gpu provision             ← analyzes task, presents options with cost
  ↳ user picks option               ← rent + setup + deploy
  ↳ /vast-gpu destroy               ← auto-destroy when done (if auto_destroy: true)

/vast-gpu provision                 ← manual: analyze task + show options
/vast-gpu rent <offer_id>           ← manual: rent a specific offer
/vast-gpu list                      ← show active instances
/vast-gpu destroy <instance_id>     ← tear down, stop billing
/vast-gpu destroy-all               ← tear down everything

/run-experiment "train model"       ← 检测到gpu: vast，调用/vast-gpu provision
  ↳ /vast-gpu provision             ← 分析任务，展示成本优化选项
  ↳ 用户选择选项               ← 租用 + 配置 + 部署
  ↳ /vast-gpu destroy               ← 完成后自动销毁（若auto_destroy: true）

/vast-gpu provision                 ← 手动操作：分析任务 + 展示选项
/vast-gpu rent <offer_id>           ← 手动操作：租用特定方案
/vast-gpu list                      ← 展示活跃实例
/vast-gpu destroy <instance_id>     ← 销毁实例，停止计费
/vast-gpu destroy-all               ← 销毁所有实例