dstack
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesedstack
dstack
Overview
概述
dstackWhen to use this skill:
- Running or managing dev environments, tasks, or services on dstack
- Creating, editing, or applying configurations
*.dstack.yml - Managing fleets, volumes, and resource availability
dstack何时使用该工具:
- 在dstack上运行或管理开发环境、任务或服务
- 创建、编辑或应用配置文件
*.dstack.yml - 管理资源组、存储卷和资源可用性
How it works
工作原理
dstack- server - Can run locally, remotely, or via dstack Sky (managed)
dstack - CLI - Applies configurations and manages resources; the CLI can be pointed to a server and default project (
dstackor via~/.dstack/config.yml)dstack project - configuration files - YAML files ending with
dstack.dstack.yml
dstack applyrunning-ddstack- 服务器 - 可在本地、远程或通过dstack Sky(托管版)运行
dstack - CLI - 应用配置并管理资源;CLI可指向服务器和默认项目(通过
dstack或~/.dstack/config.yml命令)dstack project - 配置文件 - 以
dstack结尾的YAML文件.dstack.yml
dstack applyrunning-dQuick agent flow (detached runs)
快速代理流程(后台运行)
- Show plan:
echo "n" | dstack apply -f <config> - If plan is OK and user confirms, apply detached:
dstack apply -f <config> -y -d - Check status once:
dstack ps -v - If dev-environment or task with ports and running: attach to surface IDE link/ports/SSH alias (agent runs attach in background); ask to open link
- If attach fails in sandbox: request escalation; if not approved, ask the user to run locally and share the output
dstack attach
CRITICAL: Never propose CLI commands or YAML syntaxes that don't exist.
dstack- Only use CLI commands and YAML syntax documented here or verified via
--help - If uncertain about a command or its syntax, check the links or use
--help
NEVER do the following:
- Invent CLI flags not documented here or shown in
--help - Guess YAML property names - verify in configuration reference links
- Run for runs without
dstack applyin automated contexts (blocks indefinitely)-d - Retry failed commands without addressing the underlying error
- Summarize or reformat tabular CLI output - show it as-is
- Use when
echo "y" |flag is available-y - Assume a command succeeded without checking output for errors
- 查看规划:
echo "n" | dstack apply -f <config> - 如果规划无误且用户确认,后台执行:
dstack apply -f <config> -y -d - 查看状态:
dstack ps -v - 如果是带端口的开发环境或任务且已运行:连接以获取IDE链接/端口/SSH别名(代理在后台运行连接);询问用户是否打开链接
- 如果在沙箱中连接失败:请求权限提升;若未获批准,让用户在本地运行并分享输出
dstack attach
重要提示:切勿使用不存在的 CLI命令或YAML语法。
dstack- 仅使用此处文档或中验证过的CLI命令和YAML语法
--help - 若不确定命令或语法,查看链接或使用
--help
绝对禁止以下操作:
- 编造未在此处文档或中显示的CLI参数
--help - 猜测YAML属性名称 - 请在配置参考链接中验证
- 在自动化场景中运行不带的
-d(会无限阻塞)dstack apply - 未解决根本错误就重试失败的命令
- 总结或重新格式化表格形式的CLI输出 - 原样展示
- 当有参数可用时使用
-yecho "y" | - 未检查输出错误就假设命令执行成功
Agent execution guidelines
代理执行指南
Output accuracy
输出准确性
- NEVER reformat, summarize, or paraphrase CLI output. Display tables, status output, and error messages exactly as returned.
- When showing command results, use code blocks to preserve formatting.
- If output is truncated due to length, indicate this clearly (e.g., "Output truncated. Full output shows X entries.").
- 绝对不要重新格式化、总结或改写CLI输出。 表格、状态输出和错误消息应完全按返回结果展示。
- 展示命令结果时,使用代码块保留格式。
- 如果输出因长度被截断,需明确说明(例如:“输出已截断。完整输出包含X条条目。”)。
Verification before execution
执行前验证
- When uncertain about any CLI flag or YAML property, run first.
dstack <command> --help - Never guess or invent flags. Example verification commands:
bash
dstack --help # List all commands dstack apply --help <configuration type> # Flags for apply per configuration type (dev-environment, task, service, fleet, etc) dstack fleet --help # Fleet subcommands dstack ps --help # Flags for ps - If a command or flag isn't documented, it doesn't exist.
- 若对任何CLI参数或YAML属性不确定,先运行。
dstack <command> --help - 切勿猜测或编造参数。示例验证命令:
bash
dstack --help # 列出所有命令 dstack apply --help <configuration type> # 不同配置类型(开发环境、任务、服务、资源组等)的apply参数 dstack fleet --help # 资源组子命令 dstack ps --help # ps命令的参数 - 如果命令或参数未被文档记录,则不存在该命令或参数。
Command timing and confirmation handling
命令时序与确认处理
Commands that stream indefinitely in the foreground:
dstack attach- without
dstack applyfor runs-d dstack ps -w
Agents should avoid blocking: use , timeouts, or background attach. When attach is needed, run it in the background by default (), but describe it to the user simply as "attach" unless they ask for a live foreground session. Prefer and poll in a loop if the user wants to watch status.
-dnohup ...dstack ps -vAll other commands: Use 10-60s timeout. Most complete within this range. While waiting, monitor the output - it may contain errors, warnings, or prompts requiring attention.
Confirmation handling:
- ,
dstack apply,dstack stoprequire confirmationdstack fleet delete - Use flag to auto-confirm when user has already approved
-y - For , always use
dstack stopafter the user confirms to avoid interactive prompts-y - Use to preview
echo "n" |plan without executing (avoiddstack apply, preferecho "y" |)-y
Best practices:
- Prefer modifying configuration files over passing parameters to (unless it's an exception)
dstack apply - When user confirms deletion/stop operations, use flag to skip confirmation prompts
-y
会在前台无限流式输出的命令:
dstack attach- 不带的
-ddstack apply dstack ps -w
代理应避免阻塞:使用、超时或后台连接。当需要连接时,默认在后台运行(),除非用户要求实时前台会话,否则只需向用户描述为“连接”。如果用户想要监控状态,优先使用并循环轮询。
-dnohup ...dstack ps -v所有其他命令: 使用10-60秒超时。大多数命令会在此时间内完成。等待时,监控输出 - 可能包含需要处理的错误、警告或提示。
确认处理:
- 、
dstack apply、dstack stop需要确认dstack fleet delete - 当用户已批准时,使用参数自动确认
-y - 对于,用户确认后务必使用
dstack stop以避免交互式提示-y - 使用预览
echo "n" |的规划而不执行(避免使用dstack apply,优先使用echo "y" |)-y
最佳实践:
- 优先修改配置文件,而非向传递参数(除非是例外情况)
dstack apply - 当用户确认删除/停止操作时,使用参数跳过确认提示
-y
Detached run follow-up (after -d
)
-d后台运行后续操作(使用-d
后)
-dAfter submitting a run with (dev-environment, task, service), first determine whether submission failed. If the apply output shows errors (validation, no offers, etc.), stop and surface the error.
-dIf the run was submitted, do a quick status check with , then guide the user through relevant next steps:
If you need to prompt for next actions, be explicit about the dstack step and command (avoid vague questions). When speaking to the user, refer to the action as "attach" (not "background attach").
dstack ps -v- Monitor status: Report the current status (provisioning/pulling/running/finished) and offer to keep watching. Poll every 10-20s if the user wants updates.
dstack ps -v - Attach when running: For agents, run attach in the background by default so the session does not block. Use it to capture IDE links/SSH alias or enable port forwarding; when describing the action to the user, just say "attach".
- Dev environments or tasks with ports: Once , attach to surface the IDE link/port forwarding/SSH alias, then ask whether to open the IDE link. Never open links without explicit approval.
running - Services: Prefer using service endpoints. Attach only if the user explicitly needs port forwarding or full log replay.
- Tasks without ports: Default to for progress; attach only if full log replay is required.
dstack logs
使用提交运行(开发环境、任务、服务)后,首先判断提交是否失败。如果apply输出显示错误(验证错误、无可用资源等),停止操作并显示错误。
-d如果运行已提交,使用快速检查状态,然后引导用户完成相关后续步骤:
如果需要提示用户下一步操作,明确说明dstack步骤和命令(避免模糊问题)。与用户沟通时,将操作称为“连接”(而非“后台连接”)。
dstack ps -v- 监控状态: 报告当前状态(调配中/拉取中/运行中/已完成),并询问用户是否需要持续监控。如果用户需要更新,每10-20秒轮询一次。
dstack ps -v - 运行时连接: 对于代理,默认在后台运行连接,以免会话阻塞。使用连接获取IDE链接/SSH别名或启用端口转发;向用户描述时,只需说“连接”。
- 带端口的开发环境或任务: 进入状态后,连接以获取IDE链接/端口转发/SSH别名,然后询问用户是否打开IDE链接。未经明确批准,切勿打开链接。
running - 服务: 优先使用服务端点。仅当用户明确需要端口转发或完整日志重放时才进行连接。
- 无端口的任务: 默认使用查看进度;仅当需要完整日志重放时才进行连接。
dstack logs
Attaching behavior (blocking vs non-blocking)
连接行为(阻塞 vs 非阻塞)
dstack attachNote: writes SSH alias info under (and may update ) to enable , IDE connections, port forwarding, and real-time logs (). If the sandbox cannot write there, the alias will not be created.
dstack attach~/.dstack/ssh/config~/.ssh/configssh <run name>dstack attach --logsPermissions guardrail: If fails due to sandbox permissions, request permission escalation to run it outside the sandbox. If escalation isn’t approved or attach still fails, ask the user to run locally and share the IDE link/SSH alias output.
dstack attachdstack attachBackground attach (non-blocking default for agents):
bash
nohup dstack attach <run name> --logs > /tmp/<run name>.attach.log 2>&1 & echo $! > /tmp/<run name>.attach.pidThen read the output:
bash
tail -n 50 /tmp/<run name>.attach.logOffer live follow only if asked:
bash
tail -f /tmp/<run name>.attach.logStop the background attach (preferred):
bash
kill "$(cat /tmp/<run name>.attach.pid)"If the PID file is missing, fall back to a specific match (avoid killing all attaches):
bash
pkill -f "dstack attach <run name>"Why this helps: it keeps the attach session alive (including port forwarding) while the agent remains usable. IDE links and SSH instructions appear in the log file -- surface them and ask whether to open the link ( on macOS, on Linux) only after explicit approval.
open "<link>"xdg-open "<link>"If background attach fails in the sandbox (permissions writing or , timeouts), request escalation to run attach outside the sandbox. If not approved, ask the user to run attach locally and share the IDE link/SSH alias.
~/.dstack~/.sshdstack attach注意: 会在下写入SSH别名信息(并可能更新),以支持、IDE连接、端口转发和实时日志()。如果沙箱无法写入该路径,别名将无法创建。
dstack attach~/.dstack/ssh/config~/.ssh/configssh <run name>dstack attach --logs权限防护: 如果因沙箱权限失败,请求权限提升以在沙箱外运行。如果未获批准或连接仍然失败,让用户在本地运行并分享IDE链接/SSH别名输出。
dstack attachdstack attach后台连接(代理默认非阻塞方式):
bash
nohup dstack attach <run name> --logs > /tmp/<run name>.attach.log 2>&1 & echo $! > /tmp/<run name>.attach.pid然后读取输出:
bash
tail -n 50 /tmp/<run name>.attach.log仅当用户要求时提供实时跟进:
bash
tail -f /tmp/<run name>.attach.log停止后台连接(推荐方式):
bash
kill "$(cat /tmp/<run name>.attach.pid)"如果PID文件丢失,回退到精确匹配(避免终止所有连接):
bash
pkill -f "dstack attach <run name>"这样做的好处: 保持连接会话活跃(包括端口转发),同时代理仍可使用。IDE链接和SSH说明会出现在日志文件中 -- 提取这些信息并询问用户是否打开链接(macOS使用,Linux使用),但仅在明确批准后执行。
open "<link>"xdg-open "<link>"如果沙箱中的后台连接失败(写入或权限问题、超时),请求权限提升以在沙箱外运行连接。如果未获批准,让用户在本地运行连接并分享IDE链接/SSH别名。
~/.dstack~/.sshInterpreting user requests
理解用户请求
"Run something": When the user asks to run a workload (dev environment, task, service), use with the appropriate configuration. Note: only supports for retrieving run details -- it cannot start workloads.
dstack applydstack rundstack run get --json"Connect to" or "open" a dev environment: If a dev environment is already running, use (agent runs it in the background by default) to surface the IDE URL (, , etc.) and SSH alias. If sandboxed attach fails, request escalation or ask the user to run attach locally and share the link.
dstack attach <run name> --logscursor://vscode://“运行某内容”: 当用户要求运行工作负载(开发环境、任务、服务)时,使用配合相应配置。注意:仅支持用于获取运行详情 -- 无法启动工作负载。
dstack applydstack rundstack run get --json“连接到”或“打开”开发环境: 如果开发环境已在运行,使用(代理默认在后台运行)获取IDE URL(、等)和SSH别名。如果沙箱连接失败,请求权限提升或让用户在本地运行连接并分享链接。
dstack attach <run name> --logscursor://vscode://Configuration types
配置类型
dstack<name>.dstack.yml.dstack.ymlCommon parameters: All run configurations (dev environments, tasks, services) support many parameters including:
- Git integration: Clone repos automatically (), mount existing repos (
repo), upload local files (repos)working_dir - File upload: (see concept docs for examples)
files - Docker support: Use custom Docker images (); use
imageif you want to use Docker from inside the container (VM-based backends only)docker: true - Environment: Set environment variables (), often via
env. Secrets are supported but less common..envrc - Storage: Persistent network volumes (), specify disk size
volumes - Resources: Define GPU, CPU, memory, and disk requirements
Best practices:
- Prefer giving configurations a property for easier management
name - When configurations need credentials (API keys, tokens), list only env var names in the section (e.g.,
env), not values. Recommend storing actual values in a- HF_TOKENfile alongside the configuration, applied via.envrc.source .envrc && dstack apply
dstack<name>.dstack.yml.dstack.yml通用参数: 所有运行配置(开发环境、任务、服务)支持众多参数,包括:
- Git集成: 自动克隆仓库()、挂载现有仓库(
repo)、上传本地文件(repos)working_dir - 文件上传: (查看概念文档示例)
files - Docker支持: 使用自定义Docker镜像();如果要在容器内使用Docker,设置
image(仅基于VM的后端支持)docker: true - 环境变量: 设置环境变量(),通常通过
env。支持密钥但不常用。.envrc - 存储: 持久化网络卷(),指定磁盘大小
volumes - 资源: 定义GPU、CPU、内存和磁盘需求
最佳实践:
- 优先为配置设置属性,便于管理
name - 当配置需要凭证(API密钥、令牌)时,仅在部分列出环境变量名称(例如:
env),而非具体值。建议将实际值存储在配置文件旁的- HF_TOKEN文件中,通过.envrc应用。source .envrc && dstack apply
1. Dev environments
1. 开发环境
Use for: Interactive development with IDE integration (VS Code, Cursor, etc.).
yaml
type: dev-environment
name: cursor
python: "3.12"
ide: vscode
resources:
gpu: 80GB2. Tasks
2. 任务
Use for: Batch jobs, training runs, fine-tuning, web applications, any executable workload.
Key features: Distributed training (multi-node) and port forwarding for web apps.
yaml
type: task
name: train
python: "3.12"
env:
- HUGGING_FACE_HUB_TOKEN
commands:
- uv pip install -r requirements.txt
- uv run python train.py
ports:
- 8501 # Optional: expose ports for web apps
resources:
gpu: A100:40GB:2Port forwarding: When you specify , forwards them to while attached. Use to reconnect and restore port forwarding. The run name becomes an SSH alias (e.g., ) for direct access.
portsdstack applylocalhostdstack attach <run name>ssh <run name>Distributed training: Multi-node tasks are supported (e.g., via ) and require fleets that support inter-node communication (see in fleets).
nodesplacement: cluster用途: 批处理作业、训练运行、微调、Web应用等可执行工作负载。
核心特性: 分布式训练(多节点)和Web应用端口转发。
yaml
type: task
name: train
python: "3.12"
env:
- HUGGING_FACE_HUB_TOKEN
commands:
- uv pip install -r requirements.txt
- uv run python train.py
ports:
- 8501 # 可选:为Web应用暴露端口
resources:
gpu: A100:40GB:2端口转发: 当指定时,会在连接时将端口转发到。使用重新连接并恢复端口转发。运行名称会成为SSH别名(例如:)以实现直接访问。
portsdstack applylocalhostdstack attach <run name>ssh <run name>分布式训练: 支持多节点任务(例如通过),需要支持节点间通信的资源组(查看资源组中的)。
nodesplacement: cluster3. Services
3. 服务
Use for: Deploying models or web applications as production endpoints.
Key features: OpenAI-compatible model serving, auto-scaling (RPS/queue), custom gateways with HTTPS.
yaml
type: service
name: llama31
python: "3.12"
env:
- HF_TOKEN
commands:
- uv pip install vllm
- uv run vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct
port: 8000
model: meta-llama/Meta-Llama-3.1-8B-Instruct
resources:
gpu: 80GB
disk: 200GBService endpoints:
- Without gateway:
<dstack server URL>/proxy/services/f/<run name>/ - With gateway:
https://<run name>.<gateway domain>/ - Authentication: Unless is
auth, includefalseon all service requests.Authorization: Bearer <DSTACK_TOKEN> - OpenAI-compatible models: Use from
service.urland appenddstack run get <run name> --jsonas the base URL; do not use deprecated/v1for requests.service.model.base_url - Example (with gateway):
bash
curl -sS -X POST "https://<run name>.<gateway domain>/v1/chat/completions" \ -H "Authorization: Bearer <dstack token>" \ -H "Content-Type: application/json" \ -d '{"model":"<model name>","messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'
用途: 将模型或Web应用部署为生产端点。
核心特性: 兼容OpenAI的模型服务、自动扩缩容(基于请求数/队列)、带HTTPS的自定义网关。
yaml
type: service
name: llama31
python: "3.12"
env:
- HF_TOKEN
commands:
- uv pip install vllm
- uv run vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct
port: 8000
model: meta-llama/Meta-Llama-3.1-8B-Instruct
resources:
gpu: 80GB
disk: 200GB服务端点:
- 无网关:
<dstack server URL>/proxy/services/f/<run name>/ - 有网关:
https://<run name>.<gateway domain>/ - 认证:除非设为
auth,否则所有服务请求需包含false。Authorization: Bearer <DSTACK_TOKEN> - 兼容OpenAI的模型:使用中的
dstack run get <run name> --json并追加service.url作为基础URL;请勿使用已弃用的/v1发送请求。service.model.base_url - 示例(带网关):
bash
curl -sS -X POST "https://<run name>.<gateway domain>/v1/chat/completions" \ -H "Authorization: Bearer <dstack token>" \ -H "Content-Type: application/json" \ -d '{"model":"<model name>","messages":[{"role":"user","content":"Hello"}],"max_tokens":64}'
[概念文档](https://dstack.ai/docs/concepts/services.md) | [配置参考](https://dstack.ai/docs/reference/dstack.yml/service.md)4. Fleets
4. 资源组
Use for: Pre-provisioning infrastructure for workloads, managing on-prem GPU servers, creating auto-scaling instance pools.
yaml
type: fleet
name: my-fleet
nodes: 0..2
resources:
gpu: 24GB..
disk: 200GB
spot_policy: auto # other values: spot, on-demand
idle_duration: 5mOn-demand provisioning: When is a range (e.g., ), dstack creates a template and provisions instances on demand within the min/max. Use to terminate idle instances.
nodes0..2idle_durationDistributed workloads: Use for fleets intended for multi-node tasks that require inter-node networking.
placement: clusterSSH fleet (on-prem or pre-provisioned):
yaml
type: fleet
name: on-prem-fleet
ssh_config:
user: ubuntu
identity_file: ~/.ssh/id_rsa
hosts:
- 192.168.1.10
- 192.168.1.11用途: 为工作负载预调配基础设施、管理本地GPU服务器、创建自动扩缩容实例池。
yaml
type: fleet
name: my-fleet
nodes: 0..2
resources:
gpu: 24GB..
disk: 200GB
spot_policy: auto # 其他可选值:spot, on-demand
idle_duration: 5m按需调配: 当为范围(例如)时,dstack会创建模板并在最小/最大值范围内按需调配实例。使用终止空闲实例。
nodes0..2idle_duration分布式工作负载: 对于需要节点间网络通信的多节点任务,资源组需设置。
placement: clusterSSH资源组(本地或预调配):
yaml
type: fleet
name: on-prem-fleet
ssh_config:
user: ubuntu
identity_file: ~/.ssh/id_rsa
hosts:
- 192.168.1.10
- 192.168.1.115. Volumes
5. 存储卷
Use for: Persistent storage for datasets, model checkpoints, training artifacts.
yaml
type: volume
name: my-volume
backend: aws
region: us-east-1
resources:
disk: 500GBInstance volumes (local, ephemeral, often optional):
yaml
type: dev-environment用途: 为数据集、模型检查点、训练工件提供持久化存储。
yaml
type: volume
name: my-volume
backend: aws
region: us-east-1
resources:
disk: 500GB实例卷(本地、临时,通常可选):
yaml
type: dev-environment... other config
... 其他配置
volumes:
- instance_path: /dstack-cache/pip path: /root/.cache/pip optional: true
- instance_path: /dstack-cache/huggingface path: /root/.cache/huggingface optional: true
**Attach to runs:** Use `volumes` in dev environments, tasks, and services. Network volumes persist independently; instance volumes are tied to the instance lifecycle.
[Concept documentation](https://dstack.ai/docs/concepts/volumes.md) | [Configuration reference](https://dstack.ai/docs/reference/dstack.yml/volume.md)volumes:
- instance_path: /dstack-cache/pip path: /root/.cache/pip optional: true
- instance_path: /dstack-cache/huggingface path: /root/.cache/huggingface optional: true
**附加到运行:** 在开发环境、任务和服务中使用`volumes`。网络卷独立持久化;实例卷与实例生命周期绑定。
[概念文档](https://dstack.ai/docs/concepts/volumes.md) | [配置参考](https://dstack.ai/docs/reference/dstack.yml/volume.md)Essential CLI commands
核心CLI命令
Apply configurations
应用配置
Important behavior:
- shows a plan with estimated costs and may ask for confirmation
dstack apply - In attached mode (default), the terminal blocks and shows output
- In detached mode (), runs in background without blocking the terminal
-d
Workflow for applying run configurations (dev-environment, task, service):
-
Show plan:bash
echo "n" | dstack apply -f config.dstack.ymlDisplay the FULL output including the offers table and cost estimate. Do NOT summarize or reformat. -
Wait for user confirmation. Do NOT proceed if:
- Output shows "No offers found" or similar errors
- Output shows validation errors
- User has not explicitly confirmed
-
Execute (only after user confirms):bash
dstack apply -f config.dstack.yml -y -d -
Verify apply status:bash
dstack ps -v
Workflow for infrastructure (fleet, volume, gateway):
-
Show plan:bash
echo "n" | dstack apply -f fleet.dstack.ymlDisplay the FULL output. Do NOT summarize or reformat. -
Wait for user confirmation.
-
Execute:bash
dstack apply -f fleet.dstack.yml -y -
Verify: Use,
dstack fleet, ordstack volumerespectively.dstack gateway
重要行为:
- 会显示包含预估成本的规划,并可能要求确认
dstack apply - 在连接模式(默认)下,终端会阻塞并显示输出
- 在后台模式()下,任务在后台运行,不会阻塞终端
-d
应用运行配置(开发环境、任务、服务)的流程:
-
查看规划:bash
echo "n" | dstack apply -f config.dstack.yml显示完整输出,包括资源报价表和成本估算。请勿总结或重新格式化。 -
等待用户确认。 若出现以下情况,请勿继续:
- 输出显示“未找到可用资源”或类似错误
- 输出显示验证错误
- 用户未明确确认
-
执行(仅在用户确认后):bash
dstack apply -f config.dstack.yml -y -d -
验证应用状态:bash
dstack ps -v
应用基础设施配置(资源组、存储卷、网关)的流程:
-
查看规划:bash
echo "n" | dstack apply -f fleet.dstack.yml显示完整输出。请勿总结或重新格式化。 -
等待用户确认。
-
执行:bash
dstack apply -f fleet.dstack.yml -y -
验证: 分别使用、
dstack fleet或dstack volume。dstack gateway
Fleet management
资源组管理
bash
undefinedbash
undefinedCreate/update fleet
创建/更新资源组
dstack apply -f fleet.dstack.yml
dstack apply -f fleet.dstack.yml
List fleets
列出资源组
dstack fleet
dstack fleet
Get fleet details
获取资源组详情
dstack fleet get my-fleet
dstack fleet get my-fleet
Get fleet details as JSON (for troubleshooting)
以JSON格式获取资源组详情(用于故障排查)
dstack fleet get my-fleet --json
dstack fleet get my-fleet --json
Delete entire fleet (use -y when user already confirmed)
删除整个资源组(用户确认后使用-y)
dstack fleet delete my-fleet -y
dstack fleet delete my-fleet -y
Delete specific instance from fleet (use -y when user already confirmed)
删除资源组中的特定实例(用户确认后使用-y)
dstack fleet delete my-fleet -i <instance num> -y
undefineddstack fleet delete my-fleet -i <instance num> -y
undefinedMonitor runs
监控运行
bash
undefinedbash
undefinedList all runs
列出所有运行
dstack ps
dstack ps
Verbose output with full details
详细输出
dstack ps -v
dstack ps -v
JSON output (for troubleshooting/scripting)
JSON格式输出(用于故障排查/脚本)
dstack ps --json
dstack ps --json
Get specific run details as JSON
以JSON格式获取特定运行详情
dstack run get my-run-name --json
undefineddstack run get my-run-name --json
undefinedAttach to runs
连接到运行
bash
undefinedbash
undefinedAttach and replay logs from start (preferred, unless asked otherwise)
连接并重放日志(优先使用,除非用户要求其他方式)
dstack attach my-run-name --logs
dstack attach my-run-name --logs
Attach without replaying logs (restores port forwarding + SSH only)
连接但不重放日志(仅恢复端口转发 + SSH)
dstack attach my-run-name
undefineddstack attach my-run-name
undefinedView logs
查看日志
bash
undefinedbash
undefinedStream logs (tail mode)
流式输出日志(尾部模式)
dstack logs my-run-name
dstack logs my-run-name
Debug mode (includes additional runner logs)
调试模式(包含额外的运行器日志)
dstack logs my-run-name -d
dstack logs my-run-name -d
Fetch logs from specific replica (multi-node runs)
获取特定副本的日志(多节点运行)
dstack logs my-run-name --replica 1
dstack logs my-run-name --replica 1
Fetch logs from specific job
获取特定作业的日志
dstack logs my-run-name --job 0
undefineddstack logs my-run-name --job 0
undefinedStop runs
停止运行
bash
undefinedbash
undefinedStop specific run (use -y after user confirms)
停止特定运行(用户确认后使用-y)
dstack stop my-run-name -y
dstack stop my-run-name -y
Abort (force stop)
强制停止
dstack stop my-run-name --abort
undefineddstack stop my-run-name --abort
undefinedList offers
列出资源报价
Offers represent available instance configurations available for provisioning across backends. lists offers regardless of configured fleets.
dstack offerbash
undefined资源报价代表跨后端可用于调配的实例配置。会列出所有可用报价,无论是否配置了资源组。
dstack offerbash
undefinedFilter by specific backend
按特定后端过滤
dstack offer --backend aws
dstack offer --backend aws
Filter by GPU type
按GPU类型过滤
dstack offer --gpu A100
dstack offer --gpu A100
Filter by GPU memory
按GPU显存过滤
dstack offer --gpu 24GB..80GB
dstack offer --gpu 24GB..80GB
Combine filters
组合过滤
dstack offer --backend aws --gpu A100:80GB
dstack offer --backend aws --gpu A100:80GB
JSON output (for troubleshooting/scripting)
JSON格式输出(用于故障排查/脚本)
dstack offer --json
**Max offers:** By default, `dstack offer` returns first N offers (output also includes the total number). Use `--max-offers N` to increase the limit.
**Grouping:** Prefer `--group-by gpu` (other supported values: `gpu,backend`, `gpu,backend,region`) for aggregated output across all offers, not `--max-offers`.dstack offer --json
**最大报价数:** 默认情况下,`dstack offer`返回前N条报价(输出中也会包含总数)。使用`--max-offers N`增加限制。
**分组:** 优先使用`--group-by gpu`(其他支持值:`gpu,backend`、`gpu,backend,region`)获取所有报价的聚合输出,而非使用`--max-offers`。Troubleshooting
故障排查
When diagnosing issues with dstack workloads or infrastructure:
-
Use JSON output for detailed inspection:bash
dstack fleet get my-fleet --json dstack run get my-run --json dstack ps -n 10 --json dstack offer --json -
Check verbose run status:bash
dstack ps -v -
Examine logs with debug output:bash
dstack logs my-run -d -
Attach with log replay:bash
dstack attach my-run --logs
Common issues:
- No offers: Check and ensure that at least one fleet matches requirements
dstack offer - No fleet: Ensure at least one fleet is created
- Configuration errors: Validate YAML syntax; check output for specific errors
dstack apply - Provisioning timeouts: Use to see provisioning status; consider spot vs on-demand
dstack ps -v - Connection issues: Verify server status, check authentication, ensure network access to backends
When errors occur:
- Display the full error message unchanged
- Do NOT retry the same command without addressing the error
- Refer to the Troubleshooting guide for guidance
排查dstack工作负载或基础设施问题时:
-
使用JSON输出进行详细检查:bash
dstack fleet get my-fleet --json dstack run get my-run --json dstack ps -n 10 --json dstack offer --json -
查看详细运行状态:bash
dstack ps -v -
查看带调试输出的日志:bash
dstack logs my-run -d -
连接并重放日志:bash
dstack attach my-run --logs
常见问题:
- 无可用资源: 查看并确保至少有一个资源组匹配需求
dstack offer - 无资源组: 确保已创建至少一个资源组
- 配置错误: 验证YAML语法;查看输出中的具体错误
dstack apply - 调配超时: 使用查看调配状态;考虑使用竞价实例或按需实例
dstack ps -v - 连接问题: 验证服务器状态、检查认证、确保网络可访问后端
出现错误时:
- 完整显示错误消息
- 未解决根本错误前,请勿重试相同命令
- 参考故障排查指南获取指导
Additional resources
额外资源
Core documentation:
Additional concepts:
Guides:
Accelerator-specific examples:
Full documentation: https://dstack.ai/llms-full.txt
核心文档:
附加概念:
指南:
加速器特定示例: