dstack

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

dstack

dstack

Overview

概述

dstack
provisions and orchestrates workloads across GPU clouds, Kubernetes, and on-prem via fleets.
When to use this skill:
  • Running or managing dev environments, tasks, or services on dstack
  • Creating, editing, or applying
    *.dstack.yml
    configurations
  • Managing fleets, volumes, and resource availability
dstack
通过资源组(fleets)跨GPU云、Kubernetes和本地环境调配并编排工作负载。
何时使用该工具:
  • 在dstack上运行或管理开发环境、任务或服务
  • 创建、编辑或应用
    *.dstack.yml
    配置文件
  • 管理资源组、存储卷和资源可用性

How it works

工作原理

dstack
operates through three core components:
  1. dstack
    server - Can run locally, remotely, or via dstack Sky (managed)
  2. dstack
    CLI - Applies configurations and manages resources; the CLI can be pointed to a server and default project (
    ~/.dstack/config.yml
    or via
    dstack project
    )
  3. dstack
    configuration files - YAML files ending with
    .dstack.yml
dstack apply
plans, provisions cloud resources, and schedules containers/runners. By default it attaches when the run reaches
running
(opens SSH tunnel, forwards ports, streams logs). With
-d
, it submits and exits.
dstack
通过三个核心组件运行:
  1. dstack
    服务器 - 可在本地、远程或通过dstack Sky(托管版)运行
  2. dstack
    CLI - 应用配置并管理资源;CLI可指向服务器和默认项目(通过
    ~/.dstack/config.yml
    dstack project
    命令)
  3. dstack
    配置文件 - 以
    .dstack.yml
    结尾的YAML文件
dstack apply
会规划、调配云资源并调度容器/运行器。默认情况下,当运行进入
running
状态时会自动连接(开启SSH隧道、转发端口、流式输出日志)。使用
-d
参数时,会提交任务后直接退出。

Quick agent flow (detached runs)

快速代理流程(后台运行)

  1. Show plan:
    echo "n" | dstack apply -f <config>
  2. If plan is OK and user confirms, apply detached:
    dstack apply -f <config> -y -d
  3. Check status once:
    dstack ps -v
  4. If dev-environment or task with ports and running: attach to surface IDE link/ports/SSH alias (agent runs attach in background); ask to open link
  5. If attach fails in sandbox: request escalation; if not approved, ask the user to run
    dstack attach
    locally and share the output
CRITICAL: Never propose
dstack
CLI commands or YAML syntaxes that don't exist.
  • Only use CLI commands and YAML syntax documented here or verified via
    --help
  • If uncertain about a command or its syntax, check the links or use
    --help
NEVER do the following:
  • Invent CLI flags not documented here or shown in
    --help
  • Guess YAML property names - verify in configuration reference links
  • Run
    dstack apply
    for runs without
    -d
    in automated contexts (blocks indefinitely)
  • Retry failed commands without addressing the underlying error
  • Summarize or reformat tabular CLI output - show it as-is
  • Use
    echo "y" |
    when
    -y
    flag is available
  • Assume a command succeeded without checking output for errors
  1. 查看规划:
    echo "n" | dstack apply -f <config>
  2. 如果规划无误且用户确认,后台执行:
    dstack apply -f <config> -y -d
  3. 查看状态:
    dstack ps -v
  4. 如果是带端口的开发环境或任务且已运行:连接以获取IDE链接/端口/SSH别名(代理在后台运行连接);询问用户是否打开链接
  5. 如果在沙箱中连接失败:请求权限提升;若未获批准,让用户在本地运行
    dstack attach
    并分享输出
重要提示:切勿使用不存在的
dstack
CLI命令或YAML语法。
  • 仅使用此处文档或
    --help
    中验证过的CLI命令和YAML语法
  • 若不确定命令或语法,查看链接或使用
    --help
绝对禁止以下操作:
  • 编造未在此处文档或
    --help
    中显示的CLI参数
  • 猜测YAML属性名称 - 请在配置参考链接中验证
  • 在自动化场景中运行不带
    -d
    dstack apply
    (会无限阻塞)
  • 未解决根本错误就重试失败的命令
  • 总结或重新格式化表格形式的CLI输出 - 原样展示
  • 当有
    -y
    参数可用时使用
    echo "y" |
  • 未检查输出错误就假设命令执行成功

Agent execution guidelines

代理执行指南

Output accuracy

输出准确性

  • NEVER reformat, summarize, or paraphrase CLI output. Display tables, status output, and error messages exactly as returned.
  • When showing command results, use code blocks to preserve formatting.
  • If output is truncated due to length, indicate this clearly (e.g., "Output truncated. Full output shows X entries.").
  • 绝对不要重新格式化、总结或改写CLI输出。 表格、状态输出和错误消息应完全按返回结果展示。
  • 展示命令结果时,使用代码块保留格式。
  • 如果输出因长度被截断,需明确说明(例如:“输出已截断。完整输出包含X条条目。”)。

Verification before execution

执行前验证

  • When uncertain about any CLI flag or YAML property, run
    dstack <command> --help
    first.
  • Never guess or invent flags. Example verification commands:
    bash
    dstack --help                               # List all commands
    dstack apply --help <configuration type>    # Flags for apply per configuration type (dev-environment, task, service, fleet, etc)
    dstack fleet --help                         # Fleet subcommands
    dstack ps --help                            # Flags for ps
  • If a command or flag isn't documented, it doesn't exist.
  • 若对任何CLI参数或YAML属性不确定,先运行
    dstack <command> --help
  • 切勿猜测或编造参数。示例验证命令:
    bash
    dstack --help                               # 列出所有命令
    dstack apply --help <configuration type>    # 不同配置类型(开发环境、任务、服务、资源组等)的apply参数
    dstack fleet --help                         # 资源组子命令
    dstack ps --help                            # ps命令的参数
  • 如果命令或参数未被文档记录,则不存在该命令或参数。

Command timing and confirmation handling

命令时序与确认处理

Commands that stream indefinitely in the foreground:
  • dstack attach
  • dstack apply
    without
    -d
    for runs
  • dstack ps -w
Agents should avoid blocking: use
-d
, timeouts, or background attach. When attach is needed, run it in the background by default (
nohup ...
), but describe it to the user simply as "attach" unless they ask for a live foreground session. Prefer
dstack ps -v
and poll in a loop if the user wants to watch status.
All other commands: Use 10-60s timeout. Most complete within this range. While waiting, monitor the output - it may contain errors, warnings, or prompts requiring attention.
Confirmation handling:
  • dstack apply
    ,
    dstack stop
    ,
    dstack fleet delete
    require confirmation
  • Use
    -y
    flag to auto-confirm when user has already approved
  • For
    dstack stop
    , always use
    -y
    after the user confirms to avoid interactive prompts
  • Use
    echo "n" |
    to preview
    dstack apply
    plan without executing (avoid
    echo "y" |
    , prefer
    -y
    )
Best practices:
  • Prefer modifying configuration files over passing parameters to
    dstack apply
    (unless it's an exception)
  • When user confirms deletion/stop operations, use
    -y
    flag to skip confirmation prompts
会在前台无限流式输出的命令:
  • dstack attach
  • 不带
    -d
    dstack apply
  • dstack ps -w
代理应避免阻塞:使用
-d
、超时或后台连接。当需要连接时,默认在后台运行(
nohup ...
),除非用户要求实时前台会话,否则只需向用户描述为“连接”。如果用户想要监控状态,优先使用
dstack ps -v
并循环轮询。
所有其他命令: 使用10-60秒超时。大多数命令会在此时间内完成。等待时,监控输出 - 可能包含需要处理的错误、警告或提示。
确认处理:
  • dstack apply
    dstack stop
    dstack fleet delete
    需要确认
  • 当用户已批准时,使用
    -y
    参数自动确认
  • 对于
    dstack stop
    ,用户确认后务必使用
    -y
    以避免交互式提示
  • 使用
    echo "n" |
    预览
    dstack apply
    的规划而不执行(避免使用
    echo "y" |
    ,优先使用
    -y
最佳实践:
  • 优先修改配置文件,而非向
    dstack apply
    传递参数(除非是例外情况)
  • 当用户确认删除/停止操作时,使用
    -y
    参数跳过确认提示

Detached run follow-up (after
-d
)

后台运行后续操作(使用
-d
后)

After submitting a run with
-d
(dev-environment, task, service), first determine whether submission failed. If the apply output shows errors (validation, no offers, etc.), stop and surface the error.
If the run was submitted, do a quick status check with
dstack ps -v
, then guide the user through relevant next steps: If you need to prompt for next actions, be explicit about the dstack step and command (avoid vague questions). When speaking to the user, refer to the action as "attach" (not "background attach").
  • Monitor status: Report the current status (provisioning/pulling/running/finished) and offer to keep watching. Poll
    dstack ps -v
    every 10-20s if the user wants updates.
  • Attach when running: For agents, run attach in the background by default so the session does not block. Use it to capture IDE links/SSH alias or enable port forwarding; when describing the action to the user, just say "attach".
  • Dev environments or tasks with ports: Once
    running
    , attach to surface the IDE link/port forwarding/SSH alias, then ask whether to open the IDE link. Never open links without explicit approval.
  • Services: Prefer using service endpoints. Attach only if the user explicitly needs port forwarding or full log replay.
  • Tasks without ports: Default to
    dstack logs
    for progress; attach only if full log replay is required.
使用
-d
提交运行(开发环境、任务、服务)后,首先判断提交是否失败。如果apply输出显示错误(验证错误、无可用资源等),停止操作并显示错误。
如果运行已提交,使用
dstack ps -v
快速检查状态,然后引导用户完成相关后续步骤: 如果需要提示用户下一步操作,明确说明dstack步骤和命令(避免模糊问题)。与用户沟通时,将操作称为“连接”(而非“后台连接”)。
  • 监控状态: 报告当前状态(调配中/拉取中/运行中/已完成),并询问用户是否需要持续监控。如果用户需要更新,每10-20秒轮询一次
    dstack ps -v
  • 运行时连接: 对于代理,默认在后台运行连接,以免会话阻塞。使用连接获取IDE链接/SSH别名或启用端口转发;向用户描述时,只需说“连接”。
  • 带端口的开发环境或任务: 进入
    running
    状态后,连接以获取IDE链接/端口转发/SSH别名,然后询问用户是否打开IDE链接。未经明确批准,切勿打开链接。
  • 服务: 优先使用服务端点。仅当用户明确需要端口转发或完整日志重放时才进行连接。
  • 无端口的任务: 默认使用
    dstack logs
    查看进度;仅当需要完整日志重放时才进行连接。

Attaching behavior (blocking vs non-blocking)

连接行为(阻塞 vs 非阻塞)

dstack attach
runs until interrupted and blocks the terminal. Agents must avoid indefinite blocking. If a brief attach is needed, use a timeout to capture initial output (IDE link, SSH alias) and then detach.
Note:
dstack attach
writes SSH alias info under
~/.dstack/ssh/config
(and may update
~/.ssh/config
) to enable
ssh <run name>
, IDE connections, port forwarding, and real-time logs (
dstack attach --logs
). If the sandbox cannot write there, the alias will not be created.
Permissions guardrail: If
dstack attach
fails due to sandbox permissions, request permission escalation to run it outside the sandbox. If escalation isn’t approved or attach still fails, ask the user to run
dstack attach
locally and share the IDE link/SSH alias output.
Background attach (non-blocking default for agents):
bash
nohup dstack attach <run name> --logs > /tmp/<run name>.attach.log 2>&1 & echo $! > /tmp/<run name>.attach.pid
Then read the output:
bash
tail -n 50 /tmp/<run name>.attach.log
Offer live follow only if asked:
bash
tail -f /tmp/<run name>.attach.log
Stop the background attach (preferred):
bash
kill "$(cat /tmp/<run name>.attach.pid)"
If the PID file is missing, fall back to a specific match (avoid killing all attaches):
bash
pkill -f "dstack attach <run name>"
Why this helps: it keeps the attach session alive (including port forwarding) while the agent remains usable. IDE links and SSH instructions appear in the log file -- surface them and ask whether to open the link (
open "<link>"
on macOS,
xdg-open "<link>"
on Linux) only after explicit approval.
If background attach fails in the sandbox (permissions writing
~/.dstack
or
~/.ssh
, timeouts), request escalation to run attach outside the sandbox. If not approved, ask the user to run attach locally and share the IDE link/SSH alias.
dstack attach
会运行到被中断为止,会阻塞终端。代理必须避免无限阻塞。 如果需要短暂连接,使用超时捕获初始输出(IDE链接、SSH别名)然后断开连接。
注意:
dstack attach
会在
~/.dstack/ssh/config
下写入SSH别名信息(并可能更新
~/.ssh/config
),以支持
ssh <run name>
、IDE连接、端口转发和实时日志(
dstack attach --logs
)。如果沙箱无法写入该路径,别名将无法创建。
权限防护: 如果
dstack attach
因沙箱权限失败,请求权限提升以在沙箱外运行。如果未获批准或连接仍然失败,让用户在本地运行
dstack attach
并分享IDE链接/SSH别名输出。
后台连接(代理默认非阻塞方式):
bash
nohup dstack attach <run name> --logs > /tmp/<run name>.attach.log 2>&1 & echo $! > /tmp/<run name>.attach.pid
然后读取输出:
bash
tail -n 50 /tmp/<run name>.attach.log
仅当用户要求时提供实时跟进:
bash
tail -f /tmp/<run name>.attach.log
停止后台连接(推荐方式):
bash
kill "$(cat /tmp/<run name>.attach.pid)"
如果PID文件丢失,回退到精确匹配(避免终止所有连接):
bash
pkill -f "dstack attach <run name>"
这样做的好处: 保持连接会话活跃(包括端口转发),同时代理仍可使用。IDE链接和SSH说明会出现在日志文件中 -- 提取这些信息并询问用户是否打开链接(macOS使用
open "<link>"
,Linux使用
xdg-open "<link>"
),但仅在明确批准后执行。
如果沙箱中的后台连接失败(写入
~/.dstack
~/.ssh
权限问题、超时),请求权限提升以在沙箱外运行连接。如果未获批准,让用户在本地运行连接并分享IDE链接/SSH别名。

Interpreting user requests

理解用户请求

"Run something": When the user asks to run a workload (dev environment, task, service), use
dstack apply
with the appropriate configuration. Note:
dstack run
only supports
dstack run get --json
for retrieving run details -- it cannot start workloads.
"Connect to" or "open" a dev environment: If a dev environment is already running, use
dstack attach <run name> --logs
(agent runs it in the background by default) to surface the IDE URL (
cursor://
,
vscode://
, etc.) and SSH alias. If sandboxed attach fails, request escalation or ask the user to run attach locally and share the link.
“运行某内容”: 当用户要求运行工作负载(开发环境、任务、服务)时,使用
dstack apply
配合相应配置。注意:
dstack run
仅支持
dstack run get --json
用于获取运行详情 -- 无法启动工作负载。
“连接到”或“打开”开发环境: 如果开发环境已在运行,使用
dstack attach <run name> --logs
(代理默认在后台运行)获取IDE URL(
cursor://
vscode://
等)和SSH别名。如果沙箱连接失败,请求权限提升或让用户在本地运行连接并分享链接。

Configuration types

配置类型

dstack
supports five main configuration types. Configuration files can be named
<name>.dstack.yml
or simply
.dstack.yml
.
Common parameters: All run configurations (dev environments, tasks, services) support many parameters including:
  • Git integration: Clone repos automatically (
    repo
    ), mount existing repos (
    repos
    ), upload local files (
    working_dir
    )
  • File upload:
    files
    (see concept docs for examples)
  • Docker support: Use custom Docker images (
    image
    ); use
    docker: true
    if you want to use Docker from inside the container (VM-based backends only)
  • Environment: Set environment variables (
    env
    ), often via
    .envrc
    . Secrets are supported but less common.
  • Storage: Persistent network volumes (
    volumes
    ), specify disk size
  • Resources: Define GPU, CPU, memory, and disk requirements
Best practices:
  • Prefer giving configurations a
    name
    property for easier management
  • When configurations need credentials (API keys, tokens), list only env var names in the
    env
    section (e.g.,
    - HF_TOKEN
    ), not values. Recommend storing actual values in a
    .envrc
    file alongside the configuration, applied via
    source .envrc && dstack apply
    .
dstack
支持五种主要配置类型。配置文件可命名为
<name>.dstack.yml
或直接使用
.dstack.yml
通用参数: 所有运行配置(开发环境、任务、服务)支持众多参数,包括:
  • Git集成: 自动克隆仓库(
    repo
    )、挂载现有仓库(
    repos
    )、上传本地文件(
    working_dir
  • 文件上传:
    files
    (查看概念文档示例)
  • Docker支持: 使用自定义Docker镜像(
    image
    );如果要在容器内使用Docker,设置
    docker: true
    (仅基于VM的后端支持)
  • 环境变量: 设置环境变量(
    env
    ),通常通过
    .envrc
    。支持密钥但不常用。
  • 存储: 持久化网络卷(
    volumes
    ),指定磁盘大小
  • 资源: 定义GPU、CPU、内存和磁盘需求
最佳实践:
  • 优先为配置设置
    name
    属性,便于管理
  • 当配置需要凭证(API密钥、令牌)时,仅在
    env
    部分列出环境变量名称(例如:
    - HF_TOKEN
    ),而非具体值。建议将实际值存储在配置文件旁的
    .envrc
    文件中,通过
    source .envrc && dstack apply
    应用。

1. Dev environments

1. 开发环境

Use for: Interactive development with IDE integration (VS Code, Cursor, etc.).
yaml
type: dev-environment
name: cursor

python: "3.12"
ide: vscode

resources:
  gpu: 80GB
用途: 支持IDE集成(VS Code、Cursor等)的交互式开发。
yaml
type: dev-environment
name: cursor

python: "3.12"
ide: vscode

resources:
  gpu: 80GB

2. Tasks

2. 任务

Use for: Batch jobs, training runs, fine-tuning, web applications, any executable workload.
Key features: Distributed training (multi-node) and port forwarding for web apps.
yaml
type: task
name: train

python: "3.12"
env:
  - HUGGING_FACE_HUB_TOKEN
commands:
  - uv pip install -r requirements.txt
  - uv run python train.py
ports:
  - 8501  # Optional: expose ports for web apps

resources:
  gpu: A100:40GB:2
Port forwarding: When you specify
ports
,
dstack apply
forwards them to
localhost
while attached. Use
dstack attach <run name>
to reconnect and restore port forwarding. The run name becomes an SSH alias (e.g.,
ssh <run name>
) for direct access.
Distributed training: Multi-node tasks are supported (e.g., via
nodes
) and require fleets that support inter-node communication (see
placement: cluster
in fleets).
用途: 批处理作业、训练运行、微调、Web应用等可执行工作负载。
核心特性: 分布式训练(多节点)和Web应用端口转发。
yaml
type: task
name: train

python: "3.12"
env:
  - HUGGING_FACE_HUB_TOKEN
commands:
  - uv pip install -r requirements.txt
  - uv run python train.py
ports:
  - 8501  # 可选:为Web应用暴露端口

resources:
  gpu: A100:40GB:2
端口转发: 当指定
ports
时,
dstack apply
会在连接时将端口转发到
localhost
。使用
dstack attach <run name>
重新连接并恢复端口转发。运行名称会成为SSH别名(例如:
ssh <run name>
)以实现直接访问。
分布式训练: 支持多节点任务(例如通过
nodes
),需要支持节点间通信的资源组(查看资源组中的
placement: cluster
)。

3. Services

3. 服务

Use for: Deploying models or web applications as production endpoints.
Key features: OpenAI-compatible model serving, auto-scaling (RPS/queue), custom gateways with HTTPS.
yaml
type: service
name: llama31

python: "3.12"
env:
  - HF_TOKEN
commands:
  - uv pip install vllm
  - uv run vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct
port: 8000
model: meta-llama/Meta-Llama-3.1-8B-Instruct

resources:
  gpu: 80GB
  disk: 200GB
Service endpoints:
  • Without gateway:
    <dstack server URL>/proxy/services/f/<run name>/
  • With gateway:
    https://<run name>.<gateway domain>/
  • Authentication: Unless
    auth
    is
    false
    , include
    Authorization: Bearer <DSTACK_TOKEN>
    on all service requests.
  • OpenAI-compatible models: Use
    service.url
    from
    dstack run get <run name> --json
    and append
    /v1
    as the base URL; do not use deprecated
    service.model.base_url
    for requests.
  • Example (with gateway):
    bash
    curl -sS -X POST "https://<run name>.<gateway domain>/v1/chat/completions" \
      -H "Authorization: Bearer <dstack token>" \
      -H "Content-Type: application/json" \
      -d '{"model":"<model name>","messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'
用途: 将模型或Web应用部署为生产端点。
核心特性: 兼容OpenAI的模型服务、自动扩缩容(基于请求数/队列)、带HTTPS的自定义网关。
yaml
type: service
name: llama31

python: "3.12"
env:
  - HF_TOKEN
commands:
  - uv pip install vllm
  - uv run vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct
port: 8000
model: meta-llama/Meta-Llama-3.1-8B-Instruct

resources:
  gpu: 80GB
  disk: 200GB
服务端点:
  • 无网关:
    <dstack server URL>/proxy/services/f/<run name>/
  • 有网关:
    https://<run name>.<gateway domain>/
  • 认证:除非
    auth
    设为
    false
    ,否则所有服务请求需包含
    Authorization: Bearer <DSTACK_TOKEN>
  • 兼容OpenAI的模型:使用
    dstack run get <run name> --json
    中的
    service.url
    并追加
    /v1
    作为基础URL;请勿使用已弃用的
    service.model.base_url
    发送请求。
  • 示例(带网关):
    bash
    curl -sS -X POST "https://<run name>.<gateway domain>/v1/chat/completions" \
      -H "Authorization: Bearer <dstack token>" \
      -H "Content-Type: application/json" \
      -d '{"model":"<model name>","messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'

[概念文档](https://dstack.ai/docs/concepts/services.md) | [配置参考](https://dstack.ai/docs/reference/dstack.yml/service.md)

4. Fleets

4. 资源组

Use for: Pre-provisioning infrastructure for workloads, managing on-prem GPU servers, creating auto-scaling instance pools.
yaml
type: fleet
name: my-fleet
nodes: 0..2

resources:
  gpu: 24GB..
  disk: 200GB

spot_policy: auto # other values: spot, on-demand
idle_duration: 5m
On-demand provisioning: When
nodes
is a range (e.g.,
0..2
), dstack creates a template and provisions instances on demand within the min/max. Use
idle_duration
to terminate idle instances.
Distributed workloads: Use
placement: cluster
for fleets intended for multi-node tasks that require inter-node networking.
SSH fleet (on-prem or pre-provisioned):
yaml
type: fleet
name: on-prem-fleet

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 192.168.1.10
    - 192.168.1.11
用途: 为工作负载预调配基础设施、管理本地GPU服务器、创建自动扩缩容实例池。
yaml
type: fleet
name: my-fleet
nodes: 0..2

resources:
  gpu: 24GB..
  disk: 200GB

spot_policy: auto # 其他可选值:spot, on-demand
idle_duration: 5m
按需调配:
nodes
为范围(例如
0..2
)时,dstack会创建模板并在最小/最大值范围内按需调配实例。使用
idle_duration
终止空闲实例。
分布式工作负载: 对于需要节点间网络通信的多节点任务,资源组需设置
placement: cluster
SSH资源组(本地或预调配):
yaml
type: fleet
name: on-prem-fleet

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 192.168.1.10
    - 192.168.1.11

5. Volumes

5. 存储卷

Use for: Persistent storage for datasets, model checkpoints, training artifacts.
yaml
type: volume
name: my-volume

backend: aws
region: us-east-1

resources:
  disk: 500GB
Instance volumes (local, ephemeral, often optional):
yaml
type: dev-environment
用途: 为数据集、模型检查点、训练工件提供持久化存储。
yaml
type: volume
name: my-volume

backend: aws
region: us-east-1

resources:
  disk: 500GB
实例卷(本地、临时,通常可选):
yaml
type: dev-environment

... other config

... 其他配置

volumes:
  • instance_path: /dstack-cache/pip path: /root/.cache/pip optional: true
  • instance_path: /dstack-cache/huggingface path: /root/.cache/huggingface optional: true

**Attach to runs:** Use `volumes` in dev environments, tasks, and services. Network volumes persist independently; instance volumes are tied to the instance lifecycle.

[Concept documentation](https://dstack.ai/docs/concepts/volumes.md) | [Configuration reference](https://dstack.ai/docs/reference/dstack.yml/volume.md)
volumes:
  • instance_path: /dstack-cache/pip path: /root/.cache/pip optional: true
  • instance_path: /dstack-cache/huggingface path: /root/.cache/huggingface optional: true

**附加到运行:** 在开发环境、任务和服务中使用`volumes`。网络卷独立持久化;实例卷与实例生命周期绑定。

[概念文档](https://dstack.ai/docs/concepts/volumes.md) | [配置参考](https://dstack.ai/docs/reference/dstack.yml/volume.md)

Essential CLI commands

核心CLI命令

Apply configurations

应用配置

Important behavior:
  • dstack apply
    shows a plan with estimated costs and may ask for confirmation
  • In attached mode (default), the terminal blocks and shows output
  • In detached mode (
    -d
    ), runs in background without blocking the terminal
Workflow for applying run configurations (dev-environment, task, service):
  1. Show plan:
    bash
    echo "n" | dstack apply -f config.dstack.yml
    Display the FULL output including the offers table and cost estimate. Do NOT summarize or reformat.
  2. Wait for user confirmation. Do NOT proceed if:
    • Output shows "No offers found" or similar errors
    • Output shows validation errors
    • User has not explicitly confirmed
  3. Execute (only after user confirms):
    bash
    dstack apply -f config.dstack.yml -y -d
  4. Verify apply status:
    bash
    dstack ps -v
Workflow for infrastructure (fleet, volume, gateway):
  1. Show plan:
    bash
    echo "n" | dstack apply -f fleet.dstack.yml
    Display the FULL output. Do NOT summarize or reformat.
  2. Wait for user confirmation.
  3. Execute:
    bash
    dstack apply -f fleet.dstack.yml -y
  4. Verify: Use
    dstack fleet
    ,
    dstack volume
    , or
    dstack gateway
    respectively.
重要行为:
  • dstack apply
    会显示包含预估成本的规划,并可能要求确认
  • 在连接模式(默认)下,终端会阻塞并显示输出
  • 在后台模式(
    -d
    )下,任务在后台运行,不会阻塞终端
应用运行配置(开发环境、任务、服务)的流程:
  1. 查看规划:
    bash
    echo "n" | dstack apply -f config.dstack.yml
    显示完整输出,包括资源报价表和成本估算。请勿总结或重新格式化。
  2. 等待用户确认。 若出现以下情况,请勿继续:
    • 输出显示“未找到可用资源”或类似错误
    • 输出显示验证错误
    • 用户未明确确认
  3. 执行(仅在用户确认后):
    bash
    dstack apply -f config.dstack.yml -y -d
  4. 验证应用状态:
    bash
    dstack ps -v
应用基础设施配置(资源组、存储卷、网关)的流程:
  1. 查看规划:
    bash
    echo "n" | dstack apply -f fleet.dstack.yml
    显示完整输出。请勿总结或重新格式化。
  2. 等待用户确认。
  3. 执行:
    bash
    dstack apply -f fleet.dstack.yml -y
  4. 验证: 分别使用
    dstack fleet
    dstack volume
    dstack gateway

Fleet management

资源组管理

bash
undefined
bash
undefined

Create/update fleet

创建/更新资源组

dstack apply -f fleet.dstack.yml
dstack apply -f fleet.dstack.yml

List fleets

列出资源组

dstack fleet
dstack fleet

Get fleet details

获取资源组详情

dstack fleet get my-fleet
dstack fleet get my-fleet

Get fleet details as JSON (for troubleshooting)

以JSON格式获取资源组详情(用于故障排查)

dstack fleet get my-fleet --json
dstack fleet get my-fleet --json

Delete entire fleet (use -y when user already confirmed)

删除整个资源组(用户确认后使用-y)

dstack fleet delete my-fleet -y
dstack fleet delete my-fleet -y

Delete specific instance from fleet (use -y when user already confirmed)

删除资源组中的特定实例(用户确认后使用-y)

dstack fleet delete my-fleet -i <instance num> -y
undefined
dstack fleet delete my-fleet -i <instance num> -y
undefined

Monitor runs

监控运行

bash
undefined
bash
undefined

List all runs

列出所有运行

dstack ps
dstack ps

Verbose output with full details

详细输出

dstack ps -v
dstack ps -v

JSON output (for troubleshooting/scripting)

JSON格式输出(用于故障排查/脚本)

dstack ps --json
dstack ps --json

Get specific run details as JSON

以JSON格式获取特定运行详情

dstack run get my-run-name --json
undefined
dstack run get my-run-name --json
undefined

Attach to runs

连接到运行

bash
undefined
bash
undefined

Attach and replay logs from start (preferred, unless asked otherwise)

连接并重放日志(优先使用,除非用户要求其他方式)

dstack attach my-run-name --logs
dstack attach my-run-name --logs

Attach without replaying logs (restores port forwarding + SSH only)

连接但不重放日志(仅恢复端口转发 + SSH)

dstack attach my-run-name
undefined
dstack attach my-run-name
undefined

View logs

查看日志

bash
undefined
bash
undefined

Stream logs (tail mode)

流式输出日志(尾部模式)

dstack logs my-run-name
dstack logs my-run-name

Debug mode (includes additional runner logs)

调试模式(包含额外的运行器日志)

dstack logs my-run-name -d
dstack logs my-run-name -d

Fetch logs from specific replica (multi-node runs)

获取特定副本的日志(多节点运行)

dstack logs my-run-name --replica 1
dstack logs my-run-name --replica 1

Fetch logs from specific job

获取特定作业的日志

dstack logs my-run-name --job 0
undefined
dstack logs my-run-name --job 0
undefined

Stop runs

停止运行

bash
undefined
bash
undefined

Stop specific run (use -y after user confirms)

停止特定运行(用户确认后使用-y)

dstack stop my-run-name -y
dstack stop my-run-name -y

Abort (force stop)

强制停止

dstack stop my-run-name --abort
undefined
dstack stop my-run-name --abort
undefined

List offers

列出资源报价

Offers represent available instance configurations available for provisioning across backends.
dstack offer
lists offers regardless of configured fleets.
bash
undefined
资源报价代表跨后端可用于调配的实例配置。
dstack offer
会列出所有可用报价,无论是否配置了资源组。
bash
undefined

Filter by specific backend

按特定后端过滤

dstack offer --backend aws
dstack offer --backend aws

Filter by GPU type

按GPU类型过滤

dstack offer --gpu A100
dstack offer --gpu A100

Filter by GPU memory

按GPU显存过滤

dstack offer --gpu 24GB..80GB
dstack offer --gpu 24GB..80GB

Combine filters

组合过滤

dstack offer --backend aws --gpu A100:80GB
dstack offer --backend aws --gpu A100:80GB

JSON output (for troubleshooting/scripting)

JSON格式输出(用于故障排查/脚本)

dstack offer --json

**Max offers:** By default, `dstack offer` returns first N offers (output also includes the total number). Use `--max-offers N` to increase the limit.
**Grouping:** Prefer `--group-by gpu` (other supported values: `gpu,backend`, `gpu,backend,region`) for aggregated output across all offers, not `--max-offers`.
dstack offer --json

**最大报价数:** 默认情况下,`dstack offer`返回前N条报价(输出中也会包含总数)。使用`--max-offers N`增加限制。
**分组:** 优先使用`--group-by gpu`(其他支持值:`gpu,backend`、`gpu,backend,region`)获取所有报价的聚合输出,而非使用`--max-offers`。

Troubleshooting

故障排查

When diagnosing issues with dstack workloads or infrastructure:
  1. Use JSON output for detailed inspection:
    bash
    dstack fleet get my-fleet --json
    dstack run get my-run --json
    dstack ps -n 10 --json
    dstack offer --json
  2. Check verbose run status:
    bash
    dstack ps -v
  3. Examine logs with debug output:
    bash
    dstack logs my-run -d
  4. Attach with log replay:
    bash
    dstack attach my-run --logs
Common issues:
  • No offers: Check
    dstack offer
    and ensure that at least one fleet matches requirements
  • No fleet: Ensure at least one fleet is created
  • Configuration errors: Validate YAML syntax; check
    dstack apply
    output for specific errors
  • Provisioning timeouts: Use
    dstack ps -v
    to see provisioning status; consider spot vs on-demand
  • Connection issues: Verify server status, check authentication, ensure network access to backends
When errors occur:
  1. Display the full error message unchanged
  2. Do NOT retry the same command without addressing the error
  3. Refer to the Troubleshooting guide for guidance
排查dstack工作负载或基础设施问题时:
  1. 使用JSON输出进行详细检查:
    bash
    dstack fleet get my-fleet --json
    dstack run get my-run --json
    dstack ps -n 10 --json
    dstack offer --json
  2. 查看详细运行状态:
    bash
    dstack ps -v
  3. 查看带调试输出的日志:
    bash
    dstack logs my-run -d
  4. 连接并重放日志:
    bash
    dstack attach my-run --logs
常见问题:
  • 无可用资源: 查看
    dstack offer
    并确保至少有一个资源组匹配需求
  • 无资源组: 确保已创建至少一个资源组
  • 配置错误: 验证YAML语法;查看
    dstack apply
    输出中的具体错误
  • 调配超时: 使用
    dstack ps -v
    查看调配状态;考虑使用竞价实例或按需实例
  • 连接问题: 验证服务器状态、检查认证、确保网络可访问后端
出现错误时:
  1. 完整显示错误消息
  2. 未解决根本错误前,请勿重试相同命令
  3. 参考故障排查指南获取指导

Additional resources

额外资源

Core documentation:
Additional concepts:
Guides:
Accelerator-specific examples:
核心文档:
附加概念:
指南:
加速器特定示例: