truefoundry-notebooks
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese<objective>Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.
<objective>路由说明:针对模糊的用户意图,请使用 references/intent-clarification.md 中的通用澄清模板。
Jupyter Notebooks
Jupyter Notebooks
Launch Jupyter Notebooks on TrueFoundry with GPU support, persistent storage, auto-shutdown, and VS Code integration. Write a YAML manifest and apply with . REST API fallback when CLI unavailable.
tfy apply在TrueFoundry上启动Jupyter Notebooks,支持GPU、持久化存储、自动关机和VS Code集成。编写YAML清单并使用命令应用。CLI不可用时可回退使用REST API。
tfy applyWhen to Use
适用场景
- User asks "launch a notebook", "start jupyter", "create notebook"
- User needs a development environment with GPU access
- User wants to explore data or prototype ML models
- User asks about notebook images, auto-shutdown, or persistent storage
- 用户询问“启动notebook”、“启动jupyter”、“创建notebook”
- 用户需要带GPU访问权限的开发环境
- 用户想要探索数据或原型化ML模型
- 用户询问notebook镜像、自动关机或持久化存储相关问题
When NOT to Use
不适用场景
- User wants to deploy a production service → prefer skill; ask if the user wants another valid path
deploy - User wants to deploy a model → prefer skill; ask if the user wants another valid path
llm-deploy - User wants an SSH server → prefer skill; ask if the user wants another valid path
ssh-server
- 用户想要部署生产服务 → 优先使用skill;询问用户是否需要其他可行路径
deploy - 用户想要部署模型 → 优先使用skill;询问用户是否需要其他可行路径
llm-deploy - 用户想要SSH服务器 → 优先使用skill;询问用户是否需要其他可行路径
ssh-server
Prerequisites
前置条件
Always verify before launching a notebook:
- Credentials — and
TFY_BASE_URLmust be set (env orTFY_API_KEY).env - Workspace — required. Never auto-pick. Ask the user if missing.
TFY_WORKSPACE_FQN - CLI — Check . Install if missing:
tfy --versionpip install 'truefoundry==0.5.0' && tfy login --host "$TFY_BASE_URL"
For credential check commands and .env setup, see .
references/prerequisites.md启动notebook前请始终确认以下条件:
- 凭证 — 必须设置和
TFY_BASE_URL(环境变量或TFY_API_KEY文件中).env - 工作区 — 需要。切勿自动选择。如果缺失请询问用户。
TFY_WORKSPACE_FQN - CLI — 检查。如果缺失请安装:
tfy --versionpip install 'truefoundry==0.5.0' && tfy login --host "$TFY_BASE_URL"
凭证检查命令和.env配置相关内容请查看。
references/prerequisites.mdCLI Detection
CLI检测
bash
tfy --version| CLI Output | Status | Action |
|---|---|---|
| Current | Use |
| Outdated | Upgrade: install a pinned version (e.g. |
| Command not found | Not installed | Install: |
| CLI unavailable (no pip/Python) | Fallback | Use REST API via |
bash
tfy --version| CLI输出 | 状态 | 操作 |
|---|---|---|
| 最新版本 | 按照下文文档使用 |
| 版本过旧 | 升级:安装固定版本(例如 |
| 找不到命令 | 未安装 | 安装: |
| CLI不可用(无pip/Python) | 回退方案 | 通过 |
Launch Notebook via UI
通过UI启动Notebook
The fastest way is through the TrueFoundry dashboard:
- Go to Deployments → New Deployment → Jupyter Notebook
- Select workspace and configure resources
- Click Deploy
最快的方式是通过TrueFoundry dashboard操作:
- 进入 部署 → 新建部署 → Jupyter Notebook
- 选择工作区并配置资源
- 点击部署
Launch Notebook via tfy apply
(CLI — Recommended)
tfy apply通过tfy apply
启动Notebook(CLI — 推荐)
tfy applyConfiguration Questions
配置确认问题
Before generating the manifest, ask the user:
- Name — What to call the notebook
- GPU needed? — CPU notebook (default) or GPU notebook (for ML/training)
- Home directory size — How much persistent storage in GB (default: 20)
- Auto-shutdown — Enable auto-shutdown after inactivity? If yes, how many minutes? (default: 30 minutes). Set to disable.
cull_timeout: 0
生成清单前,请询问用户以下信息:
- 名称 — notebook的名称
- 是否需要GPU? — CPU notebook(默认)或GPU notebook(用于ML/训练)
- 家目录大小 — 持久化存储容量,单位GB(默认:20)
- 自动关机 — 是否启用闲置自动关机?如果启用,闲置多少分钟后关机?(默认:30分钟)。设置可禁用自动关机。
cull_timeout: 0
CPU Notebook
CPU Notebook
1. Generate the manifest:
yaml
undefined1. 生成清单:
yaml
undefinedtfy-manifest.yaml — Jupyter Notebook
tfy-manifest.yaml — Jupyter Notebook
name: my-notebook
type: notebook
image:
image_uri: public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudo
home_directory_size: 20
cull_timeout: 30
resources:
node:
type: node_selector
capacity_type: on_demand
cpu_request: 1
cpu_limit: 3
memory_request: 4000
memory_limit: 6000
ephemeral_storage_request: 5000
ephemeral_storage_limit: 10000
workspace_fqn: "YOUR_WORKSPACE_FQN"
**2. Preview:**
```bash
tfy apply -f tfy-manifest.yaml --dry-run --show-diff3. Apply:
bash
tfy apply -f tfy-manifest.yamlname: my-notebook
type: notebook
image:
image_uri: public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudo
home_directory_size: 20
cull_timeout: 30
resources:
node:
type: node_selector
capacity_type: on_demand
cpu_request: 1
cpu_limit: 3
memory_request: 4000
memory_limit: 6000
ephemeral_storage_request: 5000
ephemeral_storage_limit: 10000
workspace_fqn: "YOUR_WORKSPACE_FQN"
**2. 预览:**
```bash
tfy apply -f tfy-manifest.yaml --dry-run --show-diff3. 应用:
bash
tfy apply -f tfy-manifest.yamlGPU Notebook
GPU Notebook
yaml
undefinedyaml
undefinedtfy-manifest.yaml — GPU Jupyter Notebook
tfy-manifest.yaml — GPU Jupyter Notebook
name: gpu-notebook
type: notebook
image:
image_uri: public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudo
home_directory_size: 20
cull_timeout: 30
resources:
node:
type: node_selector
capacity_type: on_demand
cpu_request: 4
cpu_limit: 8
memory_request: 16000
memory_limit: 32000
ephemeral_storage_request: 10000
ephemeral_storage_limit: 20000
devices:
- type: nvidia_gpu
name: T4
count: 1
workspace_fqn: "YOUR_WORKSPACE_FQN"
undefinedname: gpu-notebook
type: notebook
image:
image_uri: public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudo
home_directory_size: 20
cull_timeout: 30
resources:
node:
type: node_selector
capacity_type: on_demand
cpu_request: 4
cpu_limit: 8
memory_request: 16000
memory_limit: 32000
ephemeral_storage_request: 10000
ephemeral_storage_limit: 20000
devices:
- type: nvidia_gpu
name: T4
count: 1
workspace_fqn: "YOUR_WORKSPACE_FQN"
undefinedLaunch Notebook via REST API (Fallback)
通过REST API启动Notebook(回退方案)
When CLI is not available, use . Set to the full path of this skill's . See for paths per agent.
tfy-api.shTFY_API_SHscripts/tfy-api.shreferences/tfy-api-setup.md当CLI不可用时,使用。将设置为该skill的的完整路径。各agent对应的路径请查看。
tfy-api.shTFY_API_SHscripts/tfy-api.shreferences/tfy-api-setup.mdCreate Notebook
创建Notebook
bash
TFY_API_SH=~/.claude/skills/truefoundry-notebooks/scripts/tfy-api.sh
$TFY_API_SH PUT /api/svc/v1/apps -d '{
"name": "my-notebook",
"type": "notebook",
"image": {
"image_uri": "public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudo"
},
"home_directory_size": 20,
"cull_timeout": 30,
"resources": {
"node": {"type": "node_selector", "capacity_type": "on_demand"},
"cpu_request": 1,
"cpu_limit": 3,
"memory_request": 4000,
"memory_limit": 6000,
"ephemeral_storage_request": 5000,
"ephemeral_storage_limit": 10000
},
"workspace_fqn": "WORKSPACE_FQN"
}'bash
TFY_API_SH=~/.claude/skills/truefoundry-notebooks/scripts/tfy-api.sh
$TFY_API_SH PUT /api/svc/v1/apps -d '{
"name": "my-notebook",
"type": "notebook",
"image": {
"image_uri": "public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudo"
},
"home_directory_size": 20,
"cull_timeout": 30,
"resources": {
"node": {"type": "node_selector", "capacity_type": "on_demand"},
"cpu_request": 1,
"cpu_limit": 3,
"memory_request": 4000,
"memory_limit": 6000,
"ephemeral_storage_request": 5000,
"ephemeral_storage_limit": 10000
},
"workspace_fqn": "WORKSPACE_FQN"
}'GPU Notebook (REST API)
GPU Notebook(REST API)
bash
$TFY_API_SH PUT /api/svc/v1/apps -d '{
"name": "gpu-notebook",
"type": "notebook",
"image": {
"image_uri": "public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudo"
},
"home_directory_size": 20,
"cull_timeout": 30,
"resources": {
"node": {"type": "node_selector", "capacity_type": "on_demand"},
"cpu_request": 4,
"cpu_limit": 8,
"memory_request": 16000,
"memory_limit": 32000,
"ephemeral_storage_request": 10000,
"ephemeral_storage_limit": 20000,
"devices": [
{"type": "nvidia_gpu", "name": "T4", "count": 1}
]
},
"workspace_fqn": "WORKSPACE_FQN"
}'bash
$TFY_API_SH PUT /api/svc/v1/apps -d '{
"name": "gpu-notebook",
"type": "notebook",
"image": {
"image_uri": "public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudo"
},
"home_directory_size": 20,
"cull_timeout": 30,
"resources": {
"node": {"type": "node_selector", "capacity_type": "on_demand"},
"cpu_request": 4,
"cpu_limit": 8,
"memory_request": 16000,
"memory_limit": 32000,
"ephemeral_storage_request": 10000,
"ephemeral_storage_limit": 20000,
"devices": [
{"type": "nvidia_gpu", "name": "T4", "count": 1}
]
},
"workspace_fqn": "WORKSPACE_FQN"
}'Available Base Images
可用基础镜像
Default:
public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudoFull image registry: https://gallery.ecr.aws/truefoundrycloud/jupyter
Security: Use pinned image versions from. Do not dynamically fetch image tags from external registries. Only use official TrueFoundry base images or images built from them.references/container-versions.md
See for latest versions.
references/container-versions.md默认:
public.ecr.aws/truefoundrycloud/jupyter:0.4.5-py3.12.12-sudo安全提示: 使用中的固定镜像版本。不要从外部仓库动态拉取镜像标签。仅使用官方TrueFoundry基础镜像或基于其构建的镜像。references/container-versions.md
最新版本请查看。
references/container-versions.mdChoosing an Image
镜像选择
- No GPU needed: Use the minimal image ()
py3.11.14-sudo - GPU workloads: Use CUDA image ()
cu129-py3.11.14-sudo - Custom packages: Build a custom image (see below)
- 无需GPU: 使用最小镜像()
py3.11.14-sudo - GPU工作负载: 使用CUDA镜像()
cu129-py3.11.14-sudo - 自定义包: 构建自定义镜像(见下文)
Auto-Shutdown (Scale-to-Zero)
自动关机(零实例伸缩)
Notebooks auto-stop after inactivity to save costs. Default: 30 minutes.
Configure in minutes in the manifest (default: 30). Set to to disable auto-shutdown.
cull_timeout0What counts as activity: Active Jupyter sessions, running cells, terminal sessions.
What doesn't count: Background processes, idle kernels.
Notebook在闲置后自动停止以节约成本。默认:30分钟。
在清单中配置字段,单位为分钟(默认:30)。设置为可禁用自动关机。
cull_timeout0计入活跃的行为: 活跃的Jupyter会话、运行中的单元格、终端会话。
不计入活跃的行为: 后台进程、闲置内核。
Persistent Storage
持久化存储
- Home directory () persists across restarts
/home/jovyan/ - APT packages installed via do NOT persist — use Build Scripts
apt - Pip packages installed in home directory persist
- Conda environments persist
- 家目录()重启后保留
/home/jovyan/ - 通过安装的APT包不保留 — 请使用构建脚本
apt - 安装在家目录的Pip包保留
- Conda环境保留
Recommended Storage by Use Case
不同场景推荐存储配置
| Use Case | Storage (MB) | Notes |
|---|---|---|
| Light exploration | 10000 | Basic data analysis |
| ML development | 20000-50000 | Models + datasets |
| Large datasets | 50000-100000 | Attach volumes for more |
| LLM experimentation | 100000+ | Use volumes for model weights |
| 场景 | 存储(MB) | 说明 |
|---|---|---|
| 轻量探索 | 10000 | 基础数据分析 |
| ML开发 | 20000-50000 | 模型 + 数据集 |
| 大型数据集 | 50000-100000 | 如需更大容量可挂载卷 |
| LLM实验 | 100000+ | 模型权重请使用卷存储 |
Custom Images
自定义镜像
Extend TrueFoundry base images to pre-install packages:
dockerfile
FROM public.ecr.aws/truefoundrycloud/jupyter:0.4.6-py3.11.14-sudo
USER root
RUN DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends ffmpeg
USER jovyan
RUN python3 -m pip install --use-pep517 --no-cache-dir torch torchvision pandas scikit-learnCritical: Do NOT modify ENTRYPOINT or CMD — TrueFoundry requires them.
扩展TrueFoundry基础镜像来预安装包:
dockerfile
FROM public.ecr.aws/truefoundrycloud/jupyter:0.4.6-py3.11.14-sudo
USER root
RUN DEBIAN_FRONTEND=noninteractive apt install -y --no-install-recommends ffmpeg
USER jovyan
RUN python3 -m pip install --use-pep517 --no-cache-dir torch torchvision pandas scikit-learn重要提示: 不要修改ENTRYPOINT或CMD — TrueFoundry需要使用它们。
Build Scripts (Persistent APT Packages)
构建脚本(持久化APT包)
Instead of custom images, add a build script during deployment to install system packages on every start:
bash
sudo apt update
sudo apt install -y ffmpeg libsm6 libxext6除了自定义镜像,你也可以在部署时添加构建脚本,在每次启动时安装系统包:
bash
sudo apt update
sudo apt install -y ffmpeg libsm6 libxext6Cloud Storage Access
云存储访问
Via Environment Variables
通过环境变量
Set during deployment:
- AWS S3: ,
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY - GCS:
GOOGLE_APPLICATION_CREDENTIALS
部署时设置:
- AWS S3: ,
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY - GCS:
GOOGLE_APPLICATION_CREDENTIALS
Via IAM Service Account
通过IAM服务账号
Attach cloud-native IAM roles through service account integration for secure, credential-free access.
通过服务账号集成绑定云原生IAM角色,实现安全的免凭证访问。
Via Volumes
通过卷
Mount TrueFoundry persistent volumes for direct data access. See skill.
volumes挂载TrueFoundry持久化卷实现直接数据访问。请查看 skill。
volumesGit Integration
Git集成
JupyterLab includes a built-in Git extension. Configure:
bash
git config --global user.name "Your Name"
git config --global user.email "you@example.com"Use Personal Access Tokens or SSH keys for authentication.
JupyterLab内置Git扩展。配置方式:
bash
git config --global user.name "Your Name"
git config --global user.email "you@example.com"使用个人访问令牌或SSH密钥进行认证。
Python Environment Management
Python环境管理
Default: Python 3.11. Create additional environments:
bash
conda create -y -n py39 python=3.9Wait ~2 minutes for kernel sync, then hard-refresh JupyterLab.
默认:Python 3.11。创建额外环境:
bash
conda create -y -n py39 python=3.9等待约2分钟让内核同步,然后硬刷新JupyterLab即可。
Presenting Notebooks
展示Notebook列表
Show as a table:
Notebooks:
| Name | Status | Image | GPU | Storage |
|---------------|---------|---------------|------|---------|
| dev-notebook | Running | py3.11 + CUDA | T4 | 50 GB |
| data-analysis | Stopped | py3.11 | None | 20 GB |<success_criteria>
以表格形式展示:
Notebooks:
| Name | Status | Image | GPU | Storage |
|---------------|---------|---------------|------|---------|
| dev-notebook | Running | py3.11 + CUDA | T4 | 50 GB |
| data-analysis | Stopped | py3.11 | None | 20 GB |<success_criteria>
Success Criteria
成功标准
- The notebook is launched and accessible via its URL in the TrueFoundry dashboard
- GPU resources are allocated as requested and visible inside the notebook (e.g., works)
nvidia-smi - Persistent storage is configured so the user's files survive restarts
- Auto-shutdown is enabled to prevent unnecessary cost from idle notebooks
- The user can install packages and access their data (cloud storage, volumes, or local upload)
</success_criteria>
<references>- Notebook已启动,可通过TrueFoundry dashboard中的URL访问
- GPU资源已按请求分配,在notebook内可见(例如可正常运行)
nvidia-smi - 已配置持久化存储,用户文件在重启后保留
- 已启用自动关机,避免闲置notebook产生不必要成本
- 用户可安装包并访问其数据(云存储、卷或本地上传)
</success_criteria>
<references>Composability
组合使用
- Need workspace: Use skill to find target workspace
workspaces - Need GPU info: Use skill to check available GPU types on cluster
workspaces - Need volumes: Use skill to create persistent storage, then mount
volumes - Deploy model after prototyping: Use or
deployskillllm-deploy - Check status: Use skill to see notebook status
applications
- 需要工作区: 使用skill查找目标工作区
workspaces - 需要GPU信息: 使用skill检查集群上可用的GPU类型
workspaces - 需要卷: 使用skill创建持久化存储,然后挂载
volumes - 原型完成后部署模型: 使用或
deployskillllm-deploy - 检查状态: 使用skill查看notebook状态
applications
Error Handling
错误处理
CLI Errors
CLI错误
tfy: command not found
Install the TrueFoundry CLI:
pip install 'truefoundry==0.5.0'
tfy login --host "$TFY_BASE_URL"Manifest validation failed.
Check:
- YAML syntax is valid
- Required fields: name, type, workspace_fqn
- Image URI exists and is accessible
- Resource values use correct units (memory in MB)tfy: command not found
安装TrueFoundry CLI:
pip install 'truefoundry==0.5.0'
tfy login --host "$TFY_BASE_URL"清单验证失败。
检查:
- YAML语法有效
- 必填字段:name、type、workspace_fqn
- 镜像URI存在且可访问
- 资源值使用正确单位(内存单位为MB)Notebook Not Starting
Notebook无法启动
Notebook stuck in pending. Check:
- Requested GPU type may not be available on cluster
- Insufficient cluster resources (CPU/memory)
- Image pull errors (check container registry access)Notebook卡在待处理状态。检查:
- 请求的GPU类型在集群上可能不可用
- 集群资源不足(CPU/内存)
- 镜像拉取错误(检查容器仓库访问权限)GPU Not Detected
GPU未检测到
GPU not visible in notebook. Verify:
- Used CUDA image (cu129-* variant)
- Requested GPU type is available (check workspaces skill)
- CUDA toolkit version matches your framework requirementsNotebook内看不到GPU。验证:
- 使用了CUDA镜像(cu129-* 变体)
- 请求的GPU类型可用(请查看workspaces skill)
- CUDA工具包版本符合框架要求Storage Full
存储已满
Notebook storage full. Options:
- Clean up unused files in /home/jovyan/
- Increase storage allocation
- Mount an external volume for large datasetsNotebook存储已满。可选方案:
- 清理/home/jovyan/下的无用文件
- 增加存储分配额度
- 挂载外部卷存储大型数据集REST API Fallback Errors
REST API回退错误
401 Unauthorized — Check TFY_API_KEY is valid
404 Not Found — Check TFY_BASE_URL and API endpoint path
422 Validation Error — Check manifest fields match expected schema401 未授权 — 检查TFY_API_KEY是否有效
404 未找到 — 检查TFY_BASE_URL和API端点路径
422 验证错误 — 检查清单字段是否符合预期 schema