truefoundry-volumes

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.
<objective>
路由说明:如果用户意图不明确,请使用 references/intent-clarification.md 中的通用澄清模板。
<objective>

Volumes

Create and manage persistent volumes on TrueFoundry. Volumes provide shared, low-latency disk storage that persists across container restarts and can be mounted by multiple pods.
在TrueFoundry上创建和管理持久卷。卷提供共享的低延迟磁盘存储,数据在容器重启后依然留存,并且可被多个Pod挂载。

When to Use

适用场景

Create, list, or mount persistent volumes on TrueFoundry, including dynamic provisioning, static PV attachment, storage class selection, and Volume Browser setup.
需要在TrueFoundry上创建、查询或挂载持久卷的场景,包括动态配置、静态PV关联、存储类选择和卷浏览器设置。

When NOT to Use

不适用场景

  • User needs large archival storage or global access -> suggest blob storage (S3/GCS) instead
  • User wants ephemeral scratch space -> use
    ephemeral_storage
    in resource config
  • User wants to deploy an app -> prefer
    deploy
    skill; ask if the user wants another valid path
  • User wants to manage secrets -> prefer
    secrets
    skill; ask if the user wants another valid path
</objective> <context>
  • 用户需要大容量归档存储或全局访问权限 -> 建议使用对象存储(S3/GCS)替代
  • 用户需要临时暂存空间 -> 在资源配置中使用
    ephemeral_storage
  • 用户想要部署应用 -> 优先使用
    deploy
    skill;询问用户是否需要跳转其他有效路径
  • 用户想要管理密钥 -> 优先使用
    secrets
    skill;询问用户是否需要跳转其他有效路径
</objective> <context>

Volumes vs Blob Storage

卷 vs 对象存储

Help the user choose the right storage type:
AspectVolumesBlob Storage (S3/GCS)
Access methodStandard file system APIs (open, read, write)SDK clients (boto3, gcsfs)
SpeedFaster (local-disk latency)Slower (network round-trips)
DurabilityHighExtremely high (11 nines)
CostHigher per GBLower per GB
ScopeRegion/cluster limitedGlobal access
Best forShared model weights, training checkpoints, low-latency readsLarge archives, datasets accessed infrequently, cross-region data
Choose volumes when:
  • Multiple pods need concurrent file system access to the same data
  • You need file system semantics (locking, renaming, directory listing)
  • Low-latency access to frequently-read data (model weights, config files)
  • ML training checkpointing where write speed matters
Choose blob storage when:
  • Data is larger than a few hundred GB
  • Global or cross-region access is needed
  • Data is written once and read occasionally
  • Cost is the primary concern
Warning: Do not write to the same file path from multiple pods simultaneously -- this can cause data corruption. Coordinate writes across pods or use separate paths.
帮助用户选择合适的存储类型:
对比项对象存储(S3/GCS)
访问方式标准文件系统API(open、read、write)SDK客户端(boto3、gcsfs)
访问速度更快(本地磁盘延迟)更慢(网络往返开销)
耐久性极高(11个9)
成本每GB成本更高每GB成本更低
作用范围区域/集群限制全局访问
最佳适用场景共享模型权重、训练检查点、低延迟读取大型归档、低频访问数据集、跨区域数据
选择卷的场景:
  • 多个Pod需要并发对同一份数据进行文件系统访问
  • 需要文件系统语义(锁、重命名、目录列表)
  • 对高频访问数据(模型权重、配置文件)有低延迟要求
  • 对写入速度有要求的ML训练检查点存储
选择对象存储的场景:
  • 数据大小超过几百GB
  • 需要全局或跨区域访问
  • 数据为一次写入、偶尔读取的类型
  • 成本是首要考量因素
警告: 不要同时从多个Pod写入同一路径的文件 -- 这可能导致数据损坏。请协调跨Pod的写入操作,或使用独立路径。

Prerequisites

前置条件

Always verify before deploying:
  1. Credentials --
    TFY_BASE_URL
    and
    TFY_API_KEY
    must be set (env or
    .env
    )
  2. Workspace --
    TFY_WORKSPACE_FQN
    required. Never auto-pick. Ask the user if missing. Volumes are workspace-scoped: a volume created in one workspace can only be used by applications in that same workspace.
  3. Cluster storage class -- The target cluster must have a storage provisioner configured for the desired storage class.
For credential check commands and .env setup, see
references/prerequisites.md
.
部署前务必验证:
  1. 凭证 -- 必须设置
    TFY_BASE_URL
    TFY_API_KEY
    (环境变量或
    .env
    文件中)
  2. 工作空间 -- 需要
    TFY_WORKSPACE_FQN
    绝对不要自动选择,缺失时询问用户。 卷是工作空间级别的资源:在一个工作空间创建的卷仅能被同一工作空间的应用使用。
  3. 集群存储类 -- 目标集群必须为所需的存储类配置了存储配置器。
凭证检查命令和.env设置方式请查看
references/prerequisites.md

Volume Types

卷类型

Dynamic Volumes (Create New)

动态卷(新建)

TrueFoundry provisions a new Kubernetes PersistentVolumeClaim (PVC). You specify size and storage class; the cluster allocator handles the rest.
Key properties:
  • Size is expandable after creation but cannot be reduced
  • Access mode is
    ReadWriteMany
    (multiple pods can mount simultaneously)
  • Reclaim policy is
    Retain
    (data persists even if the volume resource is deleted from TrueFoundry)
TrueFoundry会配置新的Kubernetes PersistentVolumeClaim(PVC)。你只需要指定大小和存储类,集群分配器会处理剩余工作。
核心属性:
  • 创建后可扩容,但无法缩容
  • 访问模式为
    ReadWriteMany
    (多个Pod可同时挂载)
  • 回收策略为
    Retain
    (即使从TrueFoundry删除卷资源,数据依然留存)

Static Volumes (Use Existing)

静态卷(使用现有卷)

Mount a pre-existing Kubernetes PersistentVolume by name. Use this for:
  • AWS EFS -- Elastic File System via
    efs.csi.aws.com
    driver
  • AWS S3 -- S3 buckets via
    s3.csi.aws.com
    driver
  • GCP Filestore / GCS -- via
    gcsfuse.csi.storage.gke.io
    driver
  • Azure Files / Azure Blob -- via
    file.csi.azure.com
    or
    blob.csi.azure.com
    drivers
Static volumes require the PersistentVolume to already exist in the Kubernetes cluster. See the "Static Volume Setup" section below.
按名称挂载预先存在的Kubernetes PersistentVolume。适用场景:
  • AWS EFS -- 通过
    efs.csi.aws.com
    驱动的弹性文件系统
  • AWS S3 -- 通过
    s3.csi.aws.com
    驱动的S3存储桶
  • GCP Filestore / GCS -- 通过
    gcsfuse.csi.storage.gke.io
    驱动
  • Azure Files / Azure Blob -- 通过
    file.csi.azure.com
    blob.csi.azure.com
    驱动
静态卷要求Kubernetes集群中已经存在对应的PersistentVolume。请查看下方「静态卷设置」章节。

Storage Classes by Cloud Provider

各云厂商存储类

For storage class tables by cloud provider (AWS, GCP, Azure) and discovery commands, see
references/volume-storage-classes.md
.
</context> <instructions>
各云厂商(AWS、GCP、Azure)的存储类对照表和查询命令请查看
references/volume-storage-classes.md
</context> <instructions>

Creating a Volume

创建卷

When using direct API, set
TFY_API_SH
to the full path of this skill's
scripts/tfy-api.sh
. See
references/tfy-api-setup.md
for paths per agent.
使用直接API时,请将
TFY_API_SH
设置为当前skill的
scripts/tfy-api.sh
完整路径。各Agent的路径请查看
references/tfy-api-setup.md

Before Creating

创建前须知

ALWAYS ask the user these questions in order:
  1. Volume type -- "Do you want to create a new volume or use an existing Kubernetes PersistentVolume?"
    • Create new → proceed with dynamic volume questions below
    • Use existing → ask for the PersistentVolume name in Kubernetes, then skip to workspace
  2. Volume name -- What should the volume be called?
  3. Size -- How much storage? (integer in GB, e.g.
    50
    ). Cannot be reduced later.
  4. Storage class -- Which storage class? Present available options from the cluster.
  5. Workspace -- Which workspace? Volumes are workspace-scoped.
  6. Volume Browser -- "Do you want to enable Volume Browser? It provides a web UI to browse and manage files in your volume without SSH." (Optional)
    • If yes, ask for:
      • Endpoint host -- The hostname where the browser will be accessible (e.g.
        my-cluster.example.truefoundry.com
        ). Present available hosts from the cluster's base domain.
      • Endpoint path -- URL path prefix (optional, defaults to
        /
        )
      • Username -- Login username for the browser (optional, defaults to
        admin
        )
      • Password secret -- FQN of a TrueFoundry secret containing the browser password. If user doesn't have one, help them create it using the
        secrets
        skill first.
Present a summary and ask for confirmation:
Volume to create:
  Type:          Create new (dynamic)
  Name:          training-data
  Size:          100 GB
  Storage class: efs-sc
  Workspace:     my-cluster:my-workspace
  Volume Browser: Enabled
    Endpoint:    https://my-cluster.example.truefoundry.com/training-data/
    Username:    admin
    Password:    (secret: my-cluster:my-workspace:vol-browser-pw)

Note: Size can be expanded later but not reduced.
Proceed?
For volumes without Volume Browser:
Volume to create:
  Type:          Create new (dynamic)
  Name:          training-data
  Size:          100 GB
  Storage class: efs-sc
  Workspace:     my-cluster:my-workspace
  Volume Browser: Disabled

Note: Size can be expanded later but not reduced.
Proceed?
务必按顺序向用户确认以下问题:
  1. 卷类型 -- 「你想要创建新卷,还是使用现有Kubernetes PersistentVolume?」
    • 创建新卷 → 继续下方动态卷相关问题
    • 使用现有卷 → 询问Kubernetes中PersistentVolume的名称,然后跳过到工作空间确认环节
  2. 卷名称 -- 卷的命名是什么?
  3. 容量 -- 需要多少存储?(GB为单位的整数,例如
    50
    )。后续无法缩容。
  4. 存储类 -- 使用哪个存储类?提供集群中可用的选项。
  5. 工作空间 -- 所属工作空间是哪个?卷是工作空间级别的资源。
  6. 卷浏览器 -- 「你是否需要启用卷浏览器?它提供Web UI,无需SSH即可浏览和管理卷中的文件。」(可选)
    • 若选择是,询问以下信息:
      • 访问域名 -- 浏览器可访问的主机名(例如
        my-cluster.example.truefoundry.com
        )。提供集群基础域名下的可用主机。
      • 访问路径 -- URL路径前缀(可选,默认为
        /
      • 用户名 -- 浏览器登录用户名(可选,默认为
        admin
      • 密码密钥 -- 存储浏览器密码的TrueFoundry密钥FQN。如果用户没有对应密钥,先使用
        secrets
        skill帮助用户创建。
提供信息摘要并请求确认:
待创建卷信息:
  类型:          新建(动态卷)
  名称:          training-data
  容量:          100 GB
  存储类:        efs-sc
  工作空间:      my-cluster:my-workspace
  卷浏览器:      已启用
    访问地址:    https://my-cluster.example.truefoundry.com/training-data/
    用户名:      admin
    密码:        (密钥: my-cluster:my-workspace:vol-browser-pw)

注意:容量后续可扩容但无法缩容。
是否继续?
未启用卷浏览器的卷确认模板:
待创建卷信息:
  类型:          新建(动态卷)
  名称:          training-data
  容量:          100 GB
  存储类:        efs-sc
  工作空间:      my-cluster:my-workspace
  卷浏览器:      未启用

注意:容量后续可扩容但无法缩容。
是否继续?

Via Tool Call

通过工具调用创建

tfy_applications_create_deployment(
    manifest={"type": "volume", "name": "my-volume", "config": {"type": "dynamic", "size": 100, "storage_class": "efs-sc"}},
    options={"workspace_id": "ws-id-here"}
)
For Volume Browser fields and static volume tool-call examples, use the same fields as the Direct API examples below.
tfy_applications_create_deployment(
    manifest={"type": "volume", "name": "my-volume", "config": {"type": "dynamic", "size": 100, "storage_class": "efs-sc"}},
    options={"workspace_id": "ws-id-here"}
)
卷浏览器字段和静态卷的工具调用示例,使用与下方直接API示例相同的字段即可。

Via Direct API

通过直接API创建

Create new volume (without Volume Browser):
bash
$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": {
    "type": "volume",
    "name": "my-volume",
    "config": {
      "type": "dynamic",
      "size": 100,
      "storage_class": "efs-sc"
    }
  },
  "workspaceId": "ws-id-here"
}'
Create new volume (with Volume Browser):
bash
$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": {
    "type": "volume",
    "name": "my-volume",
    "config": {
      "type": "dynamic",
      "size": 100,
      "storage_class": "efs-sc"
    },
    "volume_browser": {
      "username": "admin",
      "password_secret_fqn": "my-cluster:my-workspace:vol-browser-pw",
      "endpoint": {
        "host": "my-cluster.example.truefoundry.com",
        "path": "/my-volume/"
      }
    }
  },
  "workspaceId": "ws-id-here"
}'
新建卷(未启用卷浏览器):
bash
$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": {
    "type": "volume",
    "name": "my-volume",
    "config": {
      "type": "dynamic",
      "size": 100,
      "storage_class": "efs-sc"
    }
  },
  "workspaceId": "ws-id-here"
}'
新建卷(启用卷浏览器):
bash
$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": {
    "type": "volume",
    "name": "my-volume",
    "config": {
      "type": "dynamic",
      "size": 100,
      "storage_class": "efs-sc"
    },
    "volume_browser": {
      "username": "admin",
      "password_secret_fqn": "my-cluster:my-workspace:vol-browser-pw",
      "endpoint": {
        "host": "my-cluster.example.truefoundry.com",
        "path": "/my-volume/"
      }
    }
  },
  "workspaceId": "ws-id-here"
}'

Using an Existing Kubernetes PersistentVolume

使用现有Kubernetes PersistentVolume

bash
$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": {
    "type": "volume",
    "name": "my-existing-vol",
    "config": {
      "type": "static",
      "persistent_volume_name": "pv-name-in-k8s"
    }
  },
  "workspaceId": "ws-id-here"
}'
bash
$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": {
    "type": "volume",
    "name": "my-existing-vol",
    "config": {
      "type": "static",
      "persistent_volume_name": "pv-name-in-k8s"
    }
  },
  "workspaceId": "ws-id-here"
}'

Listing Volumes

查询卷列表

Via Tool Call

通过工具调用查询

tfy_applications_list(filters={"workspace_fqn": "my-cluster:my-workspace", "application_type": "volume"})
tfy_applications_list(filters={"workspace_fqn": "my-cluster:my-workspace", "application_type": "volume"})

Via Direct API

通过直接API查询

bash
undefined
bash
undefined

List volumes in a workspace

查询工作空间下的卷列表

$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=my-cluster:my-workspace&applicationType=volume'
$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=my-cluster:my-workspace&applicationType=volume'

Get a specific volume by ID

通过ID查询指定卷

$TFY_API_SH GET /api/svc/v1/apps/VOLUME_APP_ID
undefined
$TFY_API_SH GET /api/svc/v1/apps/VOLUME_APP_ID
undefined

Presenting Volumes

卷列表展示格式

Volumes in my-cluster:my-workspace:
| Name           | Size   | Storage Class | Status   | Created            |
|----------------|--------|---------------|----------|--------------------|
| training-data  | 100Gi  | efs-sc        | RUNNING  | 2026-02-10 14:30   |
| model-cache    | 50Gi   | premium-rwx   | RUNNING  | 2026-02-08 09:15   |
my-cluster:my-workspace下的卷:
| 名称           | 容量   | 存储类 | 状态   | 创建时间            |
|----------------|--------|---------------|----------|--------------------|
| training-data  | 100Gi  | efs-sc        | RUNNING  | 2026-02-10 14:30   |
| model-cache    | 50Gi   | premium-rwx   | RUNNING  | 2026-02-08 09:15   |

Attaching Volumes to Services and Jobs

将卷挂载到服务和任务

Volumes are mounted into containers at a specified path. The volume must be in the same workspace as the application.
卷会被挂载到容器的指定路径。卷必须和应用属于同一工作空间。

SDK (in deploy.py)

SDK(deploy.py中使用)

python
from truefoundry.deploy import Service, VolumeMount

service = Service(
    name="my-service",
    # ... image, ports, resources ...
    mounts=[
        VolumeMount(
            mount_path="/data",
            volume_fqn="my-cluster:my-workspace:my-volume",
        ),
    ],
)
python
from truefoundry.deploy import Service, VolumeMount

service = Service(
    name="my-service",
    # ... 镜像、端口、资源配置 ...
    mounts=[
        VolumeMount(
            mount_path="/data",
            volume_fqn="my-cluster:my-workspace:my-volume",
        ),
    ],
)

API Manifest (Service)

API配置清单(服务)

json
{
  "manifest": {
    "kind": "Service",
    "name": "my-service",
    "image": {"type": "image", "image_uri": "my-image:latest"},
    "mounts": [
      {
        "type": "volume",
        "mount_path": "/data",
        "volume_fqn": "my-cluster:my-workspace:my-volume"
      }
    ],
    "resources": {
      "cpu_request": 0.5, "cpu_limit": 1.0,
      "memory_request": 512, "memory_limit": 1024
    }
  },
  "workspaceId": "ws-id-here"
}
json
{
  "manifest": {
    "kind": "Service",
    "name": "my-service",
    "image": {"type": "image", "image_uri": "my-image:latest"},
    "mounts": [
      {
        "type": "volume",
        "mount_path": "/data",
        "volume_fqn": "my-cluster:my-workspace:my-volume"
      }
    ],
    "resources": {
      "cpu_request": 0.5, "cpu_limit": 1.0,
      "memory_request": 512, "memory_limit": 1024
    }
  },
  "workspaceId": "ws-id-here"
}

API Manifest (Job)

API配置清单(任务)

json
{
  "manifest": {
    "kind": "Job",
    "name": "my-training-job",
    "image": {"type": "image", "image_uri": "my-training:latest"},
    "mounts": [
      {
        "type": "volume",
        "mount_path": "/data",
        "volume_fqn": "my-cluster:my-workspace:training-data"
      },
      {
        "type": "volume",
        "mount_path": "/checkpoints",
        "volume_fqn": "my-cluster:my-workspace:checkpoint-vol"
      }
    ],
    "resources": {
      "cpu_request": 4.0, "cpu_limit": 8.0,
      "memory_request": 16384, "memory_limit": 32768
    }
  },
  "workspaceId": "ws-id-here"
}
json
{
  "manifest": {
    "kind": "Job",
    "name": "my-training-job",
    "image": {"type": "image", "image_uri": "my-training:latest"},
    "mounts": [
      {
        "type": "volume",
        "mount_path": "/data",
        "volume_fqn": "my-cluster:my-workspace:training-data"
      },
      {
        "type": "volume",
        "mount_path": "/checkpoints",
        "volume_fqn": "my-cluster:my-workspace:checkpoint-vol"
      }
    ],
    "resources": {
      "cpu_request": 4.0, "cpu_limit": 8.0,
      "memory_request": 16384, "memory_limit": 32768
    }
  },
  "workspaceId": "ws-id-here"
}

Volume FQN Format

卷FQN格式

The volume FQN follows the pattern:
{cluster}:{workspace}:{volume-name}
Example:
my-cluster:my-workspace:training-data
卷FQN遵循以下模式:
{集群}:{工作空间}:{卷名称}
示例:
my-cluster:my-workspace:training-data

LLM Cache Volumes

LLM缓存卷

For LLM deployments, TrueFoundry supports a
cache_volume
shorthand that creates a volume for model weight caching. This avoids re-downloading large models on every pod restart. See the
llm-deploy
skill for details.
yaml
undefined
针对LLM部署,TrueFoundry支持
cache_volume
简写配置,可创建用于模型权重缓存的卷,避免每次Pod重启都重新下载大模型。详情请查看
llm-deploy
skill。
yaml
undefined

In LLM deployment manifest

LLM部署配置清单中

cache_volume: cache_size: 50 storage_class: efs-sc
undefined
cache_volume: cache_size: 50 storage_class: efs-sc
undefined

Volume Sizing Guidelines

卷容量规划指南

Use CaseRecommended SizeNotes
Small model cache (< 7B params)20-50 Gi2x the model size in FP16
Large model cache (7B-70B params)50-200 Gi2x the model size; account for multiple formats
Shared training dataset50-500 GiDepends on dataset size; leave 20% headroom
Checkpointing20-100 GiDepends on checkpoint frequency and model size
General shared storage10-50 GiStart small, expand as needed
Sizing tips:
  • Always add 20% headroom above your expected data size
  • Volume size can be expanded later but never reduced -- start conservatively if unsure
  • For model caching, use 2x the model's disk size to account for download + extraction
  • Monitor volume usage after deployment and expand proactively before hitting limits
场景推荐容量说明
小模型缓存(<7B参数)20-50 GiFP16精度下模型大小的2倍
大模型缓存(7B-70B参数)50-200 Gi模型大小的2倍;预留多格式存储空间
共享训练数据集50-500 Gi取决于数据集大小;预留20%冗余空间
检查点存储20-100 Gi取决于检查点频率和模型大小
通用共享存储10-50 Gi从小容量开始,按需扩容
容量规划提示:
  • 始终在预期数据大小基础上增加20%冗余空间
  • 卷容量后续可扩容但永远无法缩容 -- 如果不确定请保守配置初始容量
  • 模型缓存场景请配置模型磁盘大小的2倍容量,预留下载+解压空间
  • 部署后监控卷使用情况,在达到容量上限前主动扩容

Static Volume Setup

静态卷设置

For detailed setup instructions for AWS EFS, AWS S3, GCP GCS Fuse, and Azure Files/Blob, see
references/static-volume-setup.md
.
AWS EFS、AWS S3、GCP GCS Fuse和Azure Files/Blob的详细设置说明请查看
references/static-volume-setup.md

Volume Browser

卷浏览器

For Volume Browser configuration fields, setup steps, and access instructions, see
references/volume-browser-setup.md
.
</instructions>
<success_criteria>
  • The agent asked "create new or use existing?" before proceeding
  • The agent has confirmed volume name, size, storage class, and workspace with the user before creating
  • The agent asked whether to enable Volume Browser and collected endpoint/password details if yes
  • The volume was successfully created and is in RUNNING status
  • The user can list all volumes in their target workspace
  • The user can attach the volume to a service or job using the correct volume FQN
  • The agent has advised on appropriate sizing based on the user's use case
  • The user understands the difference between volumes and blob storage for their scenario
</success_criteria>
<references>
卷浏览器的配置字段、设置步骤和访问说明请查看
references/volume-browser-setup.md
</instructions>
<success_criteria>
  • Agent在执行操作前询问用户「是要创建新卷还是使用现有卷?」
  • Agent在创建前已和用户确认卷名称、容量、存储类和工作空间
  • Agent询问用户是否启用卷浏览器,若启用则收集访问地址/密码相关信息
  • 卷创建成功且处于RUNNING状态
  • 用户可以查询目标工作空间下的所有卷
  • 用户可以使用正确的卷FQN将卷挂载到服务或任务
  • Agent根据用户场景提供了合适的容量建议
  • 用户理解卷和对象存储在自身场景下的差异
</success_criteria>
<references>

Composability

组合使用说明

  • Before deploying with volumes: Use
    workspaces
    skill to get workspace FQN, then create the volume in the same workspace
  • With secrets skill: Create a password secret before enabling Volume Browser (password_secret_fqn is required)
  • With deploy skill: After creating a volume, add
    VolumeMount
    to the service's deploy.py to attach it
  • With llm-deploy skill: Use
    cache_volume
    in LLM deployment manifests for model weight caching
  • With jobs skill: Mount volumes to training jobs for checkpointing and shared data access
  • With applications skill: List volumes alongside other application types to see what storage exists
  • After creating: Use
    applications
    skill to verify the volume was created successfully
</references> <troubleshooting>
  • 使用卷部署前:使用
    workspaces
    skill获取工作空间FQN,然后在同一工作空间下创建卷
  • 结合secrets skill使用:启用卷浏览器前先创建密码密钥(必填项password_secret_fqn)
  • 结合deploy skill使用:创建卷后,在服务的deploy.py中添加
    VolumeMount
    完成挂载
  • 结合llm-deploy skill使用:在LLM部署配置中使用
    cache_volume
    实现模型权重缓存
  • 结合jobs skill使用:将卷挂载到训练任务,用于存储检查点和共享数据
  • 结合applications skill使用:将卷和其他应用类型一同查询,了解现有存储资源
  • 创建完成后:使用
    applications
    skill验证卷是否创建成功
</references> <troubleshooting>

Error Handling

错误处理

ErrorCauseFix
Volume not foundWrong name or workspaceVerify FQN; volumes are workspace-scoped
Storage class not availableCluster missing provisionerCheck
GET /api/svc/v1/clusters/CLUSTER_ID
for available classes
Size cannot be reducedPVC limitationCreate new smaller volume and migrate data
Workspace mismatchVolume in different workspaceCreate volume in same workspace as the app
Permission deniedAPI key lacks accessCheck API key permissions for this workspace
PV not found (static)K8s PV doesn't existVerify with
kubectl get pv <pv-name>
Data corruptionMultiple pods writing same pathUse per-pod sub-directories (e.g.,
/data/pod-{POD_NAME}/
)
</troubleshooting>
错误原因解决方案
卷未找到名称或工作空间错误验证FQN;卷是工作空间级别的资源
存储类不可用集群缺少对应配置器调用
GET /api/svc/v1/clusters/CLUSTER_ID
查看可用存储类
无法缩容PVC固有约束创建新的小容量卷并迁移数据
工作空间不匹配卷和应用属于不同工作空间在应用所属的同一工作空间下创建卷
权限不足API密钥缺少访问权限检查API密钥对当前工作空间的权限
PV未找到(静态卷场景)K8s PV不存在执行
kubectl get pv <pv-name>
验证
数据损坏多个Pod同时写入同一路径使用Pod专属子目录(例如
/data/pod-{POD_NAME}/
</troubleshooting>