jobs

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.
<objective>
路由说明:如果用户意图不明确,请使用references/intent-clarification.md中的通用澄清模板。
<objective>

Jobs

Jobs

Deploy, schedule, and monitor TrueFoundry job runs. Two paths:
  1. CLI (
    tfy apply
    ) -- Write a YAML manifest and apply it. Works everywhere.
  2. REST API (fallback) -- When CLI unavailable, use
    tfy-api.sh
    .
部署、调度和监控TrueFoundry作业运行。支持两种实现路径:
  1. CLI (
    tfy apply
    ) -- 编写YAML清单并执行apply命令,全场景适用。
  2. REST API (备选方案) -- 当CLI不可用时,使用
    tfy-api.sh

When to Use

适用场景

  • User asks "deploy a job", "create a job", "run a batch task"
  • User asks "schedule a job", "run a cron job"
  • User asks "show job runs", "list runs for my job"
  • User asks "is my job running", "job status"
  • User wants to check a specific job run
  • Debugging a failed job run
  • 用户询问「部署作业」、「创建作业」、「运行批处理任务」相关问题
  • 用户询问「调度作业」、「运行cron定时任务」相关问题
  • 用户询问「查看作业运行记录」、「列出我的作业的运行历史」相关问题
  • 用户询问「我的作业是否在运行」、「作业状态」相关问题
  • 用户需要检查特定作业运行记录
  • 调试失败的作业运行任务

When NOT to Use

不适用场景

  • User wants to list job applications -> prefer
    applications
    skill; ask if the user wants another valid path with
    application_type: "job"
</objective> <context>
  • 用户想要列出作业应用 -> 优先使用
    applications
    skill;询问用户是否需要使用
    application_type: "job"
    的其他有效路径
</objective> <context>

Prerequisites

前置条件

Always verify before deploying:
  1. Credentials --
    TFY_BASE_URL
    and
    TFY_API_KEY
    must be set (env or
    .env
    )
  2. Workspace --
    TFY_WORKSPACE_FQN
    required. Never auto-pick. Ask the user if missing.
  3. CLI -- Check if
    tfy
    CLI is available:
    tfy --version
    . If not, install a pinned version (
    pip install 'truefoundry==0.5.0'
    ).
For credential check commands and .env setup, see
references/prerequisites.md
.
</context> <instructions>
部署前请务必确认以下条件:
  1. 凭证 -- 必须配置
    TFY_BASE_URL
    TFY_API_KEY
    (环境变量或
    .env
    文件中配置)
  2. 工作空间 -- 必须提供
    TFY_WORKSPACE_FQN
    绝对不要自动选择。如果缺失请询问用户。
  3. CLI -- 检查
     tfy
    CLI是否可用:执行
    tfy --version
    。如果不可用,安装指定版本:
    pip install 'truefoundry==0.5.0'
凭证检查命令和.env文件配置方法可参考
references/prerequisites.md
</context> <instructions>

Step 1: Analyze the Job

步骤1:分析作业需求

  • What does the job do? (training, batch processing, data pipeline, maintenance)
  • One-time or scheduled?
  • Resource requirements (CPU/GPU/memory)
  • Expected duration
Security requirements
  • Never request or print raw secret values in chat.
  • For sensitive env vars (tokens/passwords/keys), require
    tfy-secret://...
    references instead of inline values.
  • For
    build_source.type: git
    , use trusted repositories and prefer immutable refs (commit SHA or pinned tag) over floating branches.
  • 作业的功能是什么?(训练、批处理、数据管道、运维等)
  • 是一次性任务还是定时任务?
  • 资源需求(CPU/GPU/内存)
  • 预期运行时长
安全要求
  • 绝对不要在聊天中请求或打印明文密钥值。
  • 对于敏感环境变量(令牌/密码/密钥),要求使用
    tfy-secret://...
    引用,不要直接填写明文值。
  • 对于
    build_source.type: git
    场景,使用可信仓库,优先使用不可变引用(提交SHA或固定标签)而非浮动分支。

Step 2: Generate YAML Manifest

步骤2:生成YAML清单

Based on the job requirements, create a YAML manifest.
Security: Always confirm container image sources and git repository URLs with the user before deploying. Do not pull untrusted container images or clone unverified git repositories. Pin image tags to specific versions — avoid
:latest
in production.
根据作业需求创建YAML清单。
安全提示: 部署前务必与用户确认容器镜像来源和git仓库URL。不要拉取不受信任的容器镜像或克隆未验证的git仓库。将镜像标签固定到特定版本 -- 生产环境避免使用
:latest
标签。

Option A: Pre-built Image

选项A:使用预构建镜像

yaml
name: my-batch-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0  # pin to a specific version
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
  ephemeral_storage_request: 1000
  ephemeral_storage_limit: 2000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name
yaml
name: my-batch-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0  # pin to a specific version
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
  ephemeral_storage_request: 1000
  ephemeral_storage_limit: 2000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

Option B: Git Repo + Dockerfile

选项B:Git仓库 + Dockerfile构建

yaml
name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: dockerfile
    dockerfile_path: Dockerfile
    build_context_path: "."
    command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name
yaml
name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: dockerfile
    dockerfile_path: Dockerfile
    build_context_path: "."
    command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
env:
  ENVIRONMENT: production
workspace_fqn: cluster-id:workspace-name

Option C: Git Repo + PythonBuild (No Dockerfile)

选项C:Git仓库 + PythonBuild(无需Dockerfile)

yaml
name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: tfy-python-buildpack
    command: python train.py
    python_version: "3.11"
    python_dependencies:
      type: pip
      requirements_path: requirements.txt
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name
yaml
name: my-batch-job
type: job
image:
  type: build
  build_source:
    type: git
    repo_url: https://github.com/user/repo
    branch_name: main
    ref: 3f2a1c9b0d7e6f5a4b3c2d1e0f9876543210abcd
  build_spec:
    type: tfy-python-buildpack
    command: python train.py
    python_version: "3.11"
    python_dependencies:
      type: pip
      requirements_path: requirements.txt
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Scheduled Jobs (Cron)

定时作业(Cron)

Add a
trigger
section for scheduled execution:
yaml
name: nightly-retrain
type: job
trigger:
  type: cron
  schedule: "0 2 * * *"  # 2 AM daily
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name
Cron format:
minute hour day_of_month month day_of_week
Common schedules:
ScheduleCronDescription
Every hour
0 * * * *
Top of every hour
Daily at 2 AM
0 2 * * *
Nightly jobs
Weekly Monday
0 9 * * 1
Weekly Monday 9 AM
Monthly 1st
0 0 1 * *
First of month midnight
添加
trigger
配置段实现定时执行:
yaml
name: nightly-retrain
type: job
trigger:
  type: cron
  schedule: "0 2 * * *"  # 2 AM daily
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name
Cron格式:
分钟 小时 日期 月份 星期
常用调度配置:
调度规则Cron表达式说明
每小时
0 * * * *
每小时整点执行
每日凌晨2点
0 2 * * *
夜间执行任务
每周一
0 9 * * 1
每周一上午9点执行
每月1号
0 0 1 * *
每月1号零点执行

Manual Trigger with Retries

支持重试的手动触发作业

yaml
name: my-job
type: job
trigger:
  type: manual
  num_retries: 3
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python job.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name
yaml
name: my-job
type: job
trigger:
  type: manual
  num_retries: 3
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python job.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
workspace_fqn: cluster-id:workspace-name

Concurrency Policies

并发策略

Three options for scheduled jobs when a run overlaps:
  • Forbid (default): Skip new run if previous still running
  • Allow: Run in parallel
  • Replace: Kill current, start new
当定时作业的运行时间重叠时,支持三种策略:
  • Forbid(默认):如果上一次运行仍在执行,跳过新的运行任务
  • Allow:并行运行
  • Replace:终止当前运行任务,启动新任务

Parameterized Jobs

参数化作业

python
import argparse
python
import argparse

In your job script, use argparse for dynamic params

In your job script, use argparse for dynamic params

parser = argparse.ArgumentParser() parser.add_argument("--epochs", type=int, default=10) parser.add_argument("--batch-size", type=int, default=32) args = parser.parse_args()

Then set command: `python train.py --epochs 50 --batch-size 64`
parser = argparse.ArgumentParser() parser.add_argument("--epochs", type=int, default=10) parser.add_argument("--batch-size", type=int, default=32) args = parser.parse_args()

然后设置启动命令:`python train.py --epochs 50 --batch-size 64`

GPU Jobs

GPU作业

yaml
name: gpu-training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 4
  cpu_limit: 8
  memory_request: 16000
  memory_limit: 32000
  devices:
    - type: nvidia_gpu
      name: A10_24GB
      count: 1
workspace_fqn: cluster-id:workspace-name
yaml
name: gpu-training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 4
  cpu_limit: 8
  memory_request: 16000
  memory_limit: 32000
  devices:
    - type: nvidia_gpu
      name: A10_24GB
      count: 1
workspace_fqn: cluster-id:workspace-name

Job with Volume Mounts

挂载存储卷的作业

yaml
name: training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
mounts:
  - mount_path: /data
    volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-name
yaml
name: training-job
type: job
image:
  type: image
  image_uri: my-registry/my-image:v1.0.0
  command: python train.py
resources:
  cpu_request: 2
  cpu_limit: 4
  memory_request: 4000
  memory_limit: 8000
mounts:
  - mount_path: /data
    volume_fqn: your-volume-fqn
workspace_fqn: cluster-id:workspace-name

Step 3: Write and Apply Manifest

步骤3:编写并应用清单

Write the manifest to
tfy-manifest.yaml
:
bash
undefined
将清单写入
tfy-manifest.yaml
文件:
bash
undefined

Preview

预览变更

tfy apply -f tfy-manifest.yaml --dry-run --show-diff
tfy apply -f tfy-manifest.yaml --dry-run --show-diff

Apply after user confirms

用户确认后执行应用

tfy apply -f tfy-manifest.yaml
undefined
tfy apply -f tfy-manifest.yaml
undefined

Fallback: REST API

备选方案:REST API

If
tfy
CLI is not available, convert the YAML manifest to JSON and deploy via REST API. See
references/cli-fallback.md
for the conversion process.
bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": { ... JSON version of the YAML manifest ... },
  "workspaceId": "WORKSPACE_ID"
}'
如果
 tfy
CLI不可用,将YAML清单转换为JSON格式,通过REST API部署。转换流程可参考
references/cli-fallback.md
bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

$TFY_API_SH PUT /api/svc/v1/apps '{
  "manifest": { ... JSON version of the YAML manifest ... },
  "workspaceId": "WORKSPACE_ID"
}'

Step 4: Trigger the Job

步骤4:触发作业

After deployment, trigger manually via API:
bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'
部署完成后,通过API手动触发:
bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
$TFY_API_SH POST /api/svc/v1/jobs/JOB_ID/runs '{}'

After Deploy -- Report Status

部署后 -- 上报状态

CRITICAL: Always report the deployment status and job details to the user. Do this automatically after deploy, without asking an extra verification prompt.
关键要求:务必向用户上报部署状态和作业详情。 部署完成后自动执行该操作,无需额外向用户发起确认提示。

Check Job Status

检查作业状态

text
undefined
text
undefined

Preferred (MCP tool call)

优先使用(MCP工具调用)

tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})

If MCP tool calls are unavailable, use API fallback:

```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "JOB_NAME"})

如果MCP工具调用不可用,使用API备选方案:

```bash
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

Get job application details

获取作业应用详情

$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'
undefined
$TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=JOB_NAME'
undefined

Report to User

向用户反馈

Always present this summary after deployment:
Job deployed successfully!

Job: {job-name}
Workspace: {workspace-fqn}
Status: Suspended (deployed, ready to trigger)
Schedule: {cron expression if scheduled, or "Manual trigger"}

To trigger the job:
  - Dashboard: Click "Run Job" on the job page
  - API: POST /api/svc/v1/jobs/{JOB_ID}/runs

To monitor runs:
  - Use the job monitoring commands below
  - Or check the TrueFoundry dashboard
For scheduled jobs, also show when the next run will execute. For manually triggered jobs, remind the user how to trigger them.
部署完成后务必提供以下汇总信息:
作业部署成功!

作业名称:{job-name}
工作空间:{workspace-fqn}
状态:已暂停(部署完成,可触发运行)
触发方式:{如果是定时任务则显示cron表达式,否则显示「手动触发」}

触发作业方式:
  - 控制台:在作业详情页点击「运行作业」
  - API:调用POST /api/svc/v1/jobs/{JOB_ID}/runs

监控运行方式:
  - 使用下方的作业监控命令
  - 或访问TrueFoundry控制台查看
对于定时作业,还需要展示下一次运行的时间。 对于手动触发作业,提醒用户触发方式。

.tfyignore

.tfyignore文件

Create a
.tfyignore
file (follows
.gitignore
syntax) to exclude files from the Docker build:
.git/
__pycache__/
*.pyc
.env
data/
创建
.tfyignore
文件(遵循
.gitignore
语法),排除Docker构建不需要的文件:
.git/
__pycache__/
*.pyc
.env
data/

List Job Runs

列出作业运行记录

When using direct API, set
TFY_API_SH
to the full path of this skill's
scripts/tfy-api.sh
. See
references/tfy-api-setup.md
for paths per agent.
使用直接API调用时,将
TFY_API_SH
设置为当前skill的
scripts/tfy-api.sh
完整路径。不同Agent的路径可参考
references/tfy-api-setup.md

Via Tool Call

通过工具调用

tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name")  # get specific run
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})
tfy_jobs_list_runs(job_id="job-id")
tfy_jobs_list_runs(job_id="job-id", job_run_name="run-name")  # get specific run
tfy_jobs_list_runs(job_id="job-id", filters={"sort_by": "createdAt"})

Via Direct API

通过直接API调用

bash
undefined
bash
undefined

Set the path to tfy-api.sh for your agent (example for Claude Code):

为你的Agent设置tfy-api.sh路径(Claude Code示例):

TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh
TFY_API_SH=~/.claude/skills/truefoundry-jobs/scripts/tfy-api.sh

List runs for a job

列出作业的所有运行记录

$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs

Get specific run

获取特定运行记录详情

$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs/RUN_NAME
$TFY_API_SH GET /api/svc/v1/jobs/JOB_ID/runs/RUN_NAME

With filters

带过滤条件查询

$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'
undefined
$TFY_API_SH GET '/api/svc/v1/jobs/JOB_ID/runs?sortBy=createdAt&searchPrefix=my-run'
undefined

Filter Parameters

过滤参数

ParameterAPI KeyDescription
search_prefix
searchPrefix
Filter runs by name prefix
sort_by
sortBy
Sort field (e.g.
createdAt
)
triggered_by
triggeredBy
Filter by who triggered
参数名API字段名说明
search_prefix
searchPrefix
按名称前缀过滤运行记录
sort_by
sortBy
排序字段(例如
createdAt
triggered_by
triggeredBy
按触发者过滤

Presenting Job Runs

作业运行记录展示格式

Job Runs for data-pipeline:
| Run Name       | Status    | Started            | Duration |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | SUCCEEDED | 2026-02-10 09:00   | 5m 32s  |
| run-20260210-2 | FAILED    | 2026-02-10 10:00   | 1m 05s  |
| run-20260210-3 | RUNNING   | 2026-02-10 11:00   | --       |
</instructions>
<success_criteria>
data-pipeline的作业运行记录:
| 运行名称       | 状态    | 启动时间            | 运行时长 |
|----------------|-----------|--------------------|---------|
| run-20260210-1 | 运行成功 | 2026-02-10 09:00   | 5分32秒  |
| run-20260210-2 | 运行失败    | 2026-02-10 10:00   | 1分05秒  |
| run-20260210-3 | 运行中   | 2026-02-10 11:00   | --       |
</instructions>
<success_criteria>

Success Criteria

成功判定标准

  • The job has been deployed to the target workspace and the user can see it in the TrueFoundry dashboard
  • The user has been provided the job ID and knows how to trigger runs (manually or via cron schedule)
  • The agent has reported the deployment status including job name, workspace, and trigger type
  • Deployment status is verified automatically immediately after apply/deploy (no extra prompt)
  • Job logs are accessible for monitoring via the
    logs
    skill or the dashboard
  • For scheduled jobs, the cron expression is confirmed and the user knows when the next run will execute
</success_criteria>
<references>
  • 作业已部署到目标工作空间,用户可在TrueFoundry控制台中查看
  • 已向用户提供作业ID,用户了解如何触发运行(手动触发或cron定时调度)
  • Agent已上报部署状态,包括作业名称、工作空间、触发类型
  • 应用/部署完成后自动验证部署状态(无需额外提示)
  • 可通过
    logs
    skill或控制台访问作业日志用于监控
  • 对于定时作业,已确认cron表达式,用户了解下一次运行时间
</success_criteria>
<references>

Composability

功能组合

  • Schedule jobs: Use cron trigger for automated scheduling
  • Monitor runs: Use the job runs monitoring sections below
  • Find job first: Use
    applications
    skill with
    application_type: "job"
    to get job app ID
  • Check logs: Use
    logs
    skill with
    job_run_name
    to see run output
</references> <troubleshooting>
  • 调度作业:使用cron触发器实现自动化调度
  • 监控运行记录:使用下方的作业运行监控模块
  • 先查找作业:使用
    applications
    skill配合
    application_type: "job"
    参数获取作业应用ID
  • 查看日志:使用
    logs
    skill配合
    job_run_name
    参数查看运行输出
</references> <troubleshooting>

Error Handling

错误处理

Job Not Found

未找到作业

Job ID not found. Use applications skill to list jobs:
tfy_applications_list(filters={"application_type": "job"})
未找到对应作业ID。使用applications skill列出所有作业:
tfy_applications_list(filters={"application_type": "job"})

No Runs Found

未找到运行记录

No runs found for this job. The job may not have been triggered yet.
未找到该作业的运行记录。作业可能尚未被触发。

CLI Errors

CLI错误

  • tfy: command not found
    -- Install with
    pip install 'truefoundry==0.5.0'
  • tfy apply
    validation errors -- Check YAML syntax, ensure required fields (name, type, image, resources, workspace_fqn) are present
</troubleshooting> </output>
  • tfy: command not found
    -- 执行
    pip install 'truefoundry==0.5.0'
    安装
  • tfy apply
    校验错误 -- 检查YAML语法,确保必填字段(name、type、image、resources、workspace_fqn)已填写
</troubleshooting> </output>