huggingface-jobs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Running Workloads on Hugging Face Jobs

在Hugging Face Jobs上运行工作负载

Overview

概述

Run any workload on fully managed Hugging Face infrastructure. No local setup required—jobs run on cloud CPUs, GPUs, or TPUs and can persist results to the Hugging Face Hub.

Common use cases:

Data Processing - Transform, filter, or analyze large datasets
Batch Inference - Run inference on thousands of samples
Experiments & Benchmarks - Reproducible ML experiments
Model Training - Fine-tune models (see
```
model-trainer
```
skill for TRL-specific training)
Synthetic Data Generation - Generate datasets using LLMs
Development & Testing - Test code without local GPU setup
Scheduled Jobs - Automate recurring tasks

For model training specifically: See the

model-trainer

skill for TRL-based training workflows.

在全托管的Hugging Face基础设施上运行任意工作负载。无需本地设置——作业在云端CPU、GPU或TPU上运行，结果可持久化到Hugging Face Hub。

常见用例：

数据处理 - 转换、过滤或分析大型数据集
批处理推理 - 对数千个样本运行推理
实验与基准测试 - 可复现的机器学习实验
模型训练 - 微调模型（针对基于TRL的特定训练，请查看
```
model-trainer
```
技能）
合成数据生成 - 使用大语言模型生成数据集
开发与测试 - 无需本地GPU即可测试代码
定时作业 - 自动化重复任务

针对模型训练的说明： 基于TRL的训练工作流请使用

model-trainer

技能。

When to Use This Skill

何时使用本技能

Use this skill when users want to:

Run Python workloads on cloud infrastructure
Execute jobs without local GPU/TPU setup
Process data at scale
Run batch inference or experiments
Schedule recurring tasks
Use GPUs/TPUs for any workload
Persist results to the Hugging Face Hub

当用户希望执行以下操作时，使用本技能：

在云端基础设施上运行Python工作负载
无需本地GPU/TPU即可执行作业
大规模处理数据
运行批处理推理或实验
定时执行重复任务
为任意工作负载使用GPU/TPU
将结果持久化到Hugging Face Hub

Key Directives

核心指导原则

When assisting with jobs:

ALWAYS use
hf_jobs()
MCP tool - Submit jobs using
```
hf_jobs("uv", {...})
```
or
```
hf_jobs("run", {...})
```
. The
```
script
```
parameter accepts Python code directly. Do NOT save to local files unless the user explicitly requests it. Pass the script content as a string to
```
hf_jobs()
```
.
Always handle authentication - Jobs that interact with the Hub require
```
HF_TOKEN
```
via secrets. See Token Usage section below.
Provide job details after submission - After submitting, provide job ID, monitoring URL, estimated time, and note that the user can request status checks later.
Set appropriate timeouts - Default 30min may be insufficient for long-running tasks.

协助处理作业时：

务必使用
hf_jobs()
MCP工具 - 使用
```
hf_jobs("uv", {...})
```
或
```
hf_jobs("run", {...})
```
提交作业。
```
script
```
参数可直接接收Python代码。除非用户明确要求，否则不要保存到本地文件。将脚本内容以字符串形式传递给
```
hf_jobs()
```
。
务必处理认证 - 与Hub交互的作业需要通过密钥传入
```
HF_TOKEN
```
。请查看下方的令牌使用章节。
提交后提供作业详情 - 提交完成后，提供作业ID、监控URL、预计时长，并告知用户之后可请求状态检查。
设置合适的超时时间 - 默认30分钟可能不足以处理长时间运行的任务。

Prerequisites Checklist

前置检查清单

Before starting any job, verify:

启动任何作业前，请确认：

✅ Account & Authentication

✅ 账户与认证

Hugging Face Account with Pro, Team, or Enterprise plan (Jobs require paid plan)
Authenticated login: Check with
```
hf_whoami()
```
HF_TOKEN for Hub Access ⚠️ CRITICAL - Required for any Hub operations (push models/datasets, download private repos, etc.)
Token must have appropriate permissions (read for downloads, write for uploads)

拥有Pro、Team或Enterprise计划的Hugging Face账户（Jobs功能需要付费计划）
已完成认证登录：使用
```
hf_whoami()
```
检查
用于Hub访问的HF_TOKEN ⚠️ 关键 - 任何Hub操作（推送模型/数据集、下载私有仓库等）都需要该令牌
令牌必须具备相应权限（下载需要读权限，上传需要写权限）

✅ Token Usage (See Token Usage section for details)

✅ 令牌使用（详情请查看令牌使用章节）

When tokens are required:

Pushing models/datasets to Hub
Accessing private repositories
Using Hub APIs in scripts
Any authenticated Hub operations

How to provide tokens:

python

undefined

需要令牌的场景：

向Hub推送模型/数据集
访问私有仓库
在脚本中使用Hub API
任何需要认证的Hub操作

如何传入令牌：

python

undefined

hf_jobs MCP tool — $HF_TOKEN is auto-replaced with real token:

hf_jobs MCP工具 — $HF_TOKEN会自动替换为实际令牌:

{"secrets": {"HF_TOKEN": "$HF_TOKEN"}}

HfApi().run_uv_job() — MUST pass actual token:

HfApi().run_uv_job() — 必须传入实际令牌:

from huggingface_hub import get_token secrets={"HF_TOKEN": get_token()}


**⚠️ CRITICAL:** The `$HF_TOKEN` placeholder is ONLY auto-replaced by the `hf_jobs` MCP tool. When using `HfApi().run_uv_job()`, you MUST pass the real token via `get_token()`. Passing the literal string `"$HF_TOKEN"` results in a 9-character invalid token and 401 errors.

from huggingface_hub import get_token secrets={"HF_TOKEN": get_token()}


**⚠️ 重要提示：** `$HF_TOKEN`占位符仅会被`hf_jobs` MCP工具自动替换。使用`HfApi().run_uv_job()`时，必须通过`get_token()`传入实际令牌。传入字面字符串`"$HF_TOKEN"`会导致使用9字符的无效令牌，进而触发401错误。

Token Usage Guide

令牌使用指南

Understanding Tokens

理解令牌

What are HF Tokens?

Authentication credentials for Hugging Face Hub
Required for authenticated operations (push, private repos, API access)
Stored securely on your machine after
```
hf auth login
```

Token Types:

Read Token - Can download models/datasets, read private repos
Write Token - Can push models/datasets, create repos, modify content
Organization Token - Can act on behalf of an organization

什么是HF令牌？

Hugging Face Hub的认证凭据
执行认证操作（推送、私有仓库、API访问）时必需
在执行
```
hf auth login
```
后安全存储在本地机器上

令牌类型：

读令牌 - 可下载模型/数据集、读取私有仓库
写令牌 - 可推送模型/数据集、创建仓库、修改内容
组织令牌 - 可代表组织执行操作

When Tokens Are Required

何时需要令牌

Always Required:

Pushing models/datasets to Hub
Accessing private repositories
Creating new repositories
Modifying existing repositories
Using Hub APIs programmatically

Not Required:

Downloading public models/datasets
Running jobs that don't interact with Hub
Reading public repository information

始终需要的场景：

向Hub推送模型/数据集
访问私有仓库
创建新仓库
修改现有仓库
以编程方式使用Hub API

不需要的场景：

下载公开模型/数据集
运行不与Hub交互的作业
读取公开仓库信息

How to Provide Tokens to Jobs

如何为作业提供令牌

Method 1: Automatic Token (Recommended)

方法1：自动令牌（推荐）

python

hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # ✅ Automatic replacement
})

How it works:

```
$HF_TOKEN
```
is a placeholder that gets replaced with your actual token
Uses the token from your logged-in session (
```
hf auth login
```
)
Most secure and convenient method
Token is encrypted server-side when passed as a secret

Benefits:

No token exposure in code
Uses your current login session
Automatically updated if you re-login
Works seamlessly with MCP tools

python

hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"}  # ✅ 自动替换
})

工作原理：

```
$HF_TOKEN
```
是一个占位符，会被替换为你的实际令牌
使用你登录会话中的令牌（
```
hf auth login
```
）
最安全且便捷的方式
作为密钥传入时，令牌会在服务器端加密

优势：

代码中不会暴露令牌
使用当前登录会话
重新登录后会自动更新
与MCP工具无缝协作

Method 2: Explicit Token (Not Recommended)

方法2：显式令牌（不推荐）

python

hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ Hardcoded token
})

When to use:

Only if automatic token doesn't work
Testing with a specific token
Organization tokens (use with caution)

Security concerns:

Token visible in code/logs
Must manually update if token rotates
Risk of token exposure

python

hf_jobs("uv", {
    "script": "your_script.py",
    "secrets": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ 硬编码令牌
})

适用场景：

仅在自动令牌方式无法工作时使用
使用特定令牌进行测试
组织令牌（谨慎使用）

安全隐患：

令牌会在代码/日志中可见
令牌轮换时必须手动更新
存在令牌泄露风险

Method 3: Environment Variable (Less Secure)

方法3：环境变量（安全性较低）

python

hf_jobs("uv", {
    "script": "your_script.py",
    "env": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ Less secure than secrets
})

Difference from secrets:

```
env
```
variables are visible in job logs
```
secrets
```
are encrypted server-side
Always prefer
```
secrets
```
for tokens

python

hf_jobs("uv", {
    "script": "your_script.py",
    "env": {"HF_TOKEN": "hf_abc123..."}  # ⚠️ 安全性低于密钥方式
})

与密钥的区别：

```
env
```
变量会在作业日志中可见
```
secrets
```
会在服务器端加密
始终优先使用
```
secrets
```
传递令牌

Using Tokens in Scripts

在脚本中使用令牌

In your Python script, tokens are available as environment variables:

python

undefined

在Python脚本中，令牌可通过环境变量获取：

python

undefined

/// script

dependencies = ["huggingface-hub"]

///

import os from huggingface_hub import HfApi

Token is automatically available if passed via secrets

如果通过secrets传入，令牌会自动可用

token = os.environ.get("HF_TOKEN")

Use with Hub API

与Hub API一起使用

api = HfApi(token=token)

Or let huggingface_hub auto-detect

或让huggingface-hub自动检测

api = HfApi() # Automatically uses HF_TOKEN env var


**Best practices:**
- Don't hardcode tokens in scripts
- Use `os.environ.get("HF_TOKEN")` to access
- Let `huggingface_hub` auto-detect when possible
- Verify token exists before Hub operations

api = HfApi() # 会自动使用HF_TOKEN环境变量


**最佳实践：**
- 不要在脚本中硬编码令牌
- 使用`os.environ.get("HF_TOKEN")`获取令牌
- 尽可能让`huggingface-hub`自动检测
- 在执行Hub操作前验证令牌是否存在

Token Verification

令牌验证

Check if you're logged in:

python

from huggingface_hub import whoami
user_info = whoami()  # Returns your username if authenticated

Verify token in job:

python

import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN not found!"
token = os.environ["HF_TOKEN"]
print(f"Token starts with: {token[:7]}...")  # Should start with "hf_"

检查是否已登录：

python

from huggingface_hub import whoami
user_info = whoami()  # 已认证的话会返回用户名

在作业中验证令牌：

python

import os
assert "HF_TOKEN" in os.environ, "HF_TOKEN未找到!"
token = os.environ["HF_TOKEN"]
print(f"令牌开头为: {token[:7]}...")  # 应该以"hf_"开头

Common Token Issues

常见令牌问题

Error: 401 Unauthorized

Cause: Token missing or invalid
Fix: Add
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
to job config
Verify: Check
```
hf_whoami()
```
works locally

Error: 403 Forbidden

Cause: Token lacks required permissions
Fix: Ensure token has write permissions for push operations
Check: Token type at https://huggingface.co/settings/tokens

Error: Token not found in environment

Cause:
```
secrets
```
not passed or wrong key name
Fix: Use
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
(not
```
env
```
)
Verify: Script checks
```
os.environ.get("HF_TOKEN")
```

Error: Repository access denied

Cause: Token doesn't have access to private repo
Fix: Use token from account with access
Check: Verify repo visibility and your permissions

错误：401 Unauthorized

原因： 令牌缺失或无效
解决方法： 在作业配置中添加
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
验证： 确认本地
```
hf_whoami()
```
可正常运行

错误：403 Forbidden

原因： 令牌缺少所需权限
解决方法： 确保令牌具备推送操作所需的写权限
检查： 在https://huggingface.co/settings/tokens查看令牌类型

错误：环境中未找到令牌

原因： 未传入secrets或密钥名称错误
解决方法： 使用
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
（而非
```
env
```
）
验证： 脚本中检查
```
os.environ.get("HF_TOKEN")
```

错误：仓库访问被拒绝

原因： 令牌无权访问私有仓库
解决方法： 使用具备访问权限的账户的令牌
检查： 验证仓库可见性及你的权限

Token Security Best Practices

令牌安全最佳实践

Never commit tokens - Use
```
$HF_TOKEN
```
placeholder or environment variables
Use secrets, not env - Secrets are encrypted server-side
Rotate tokens regularly - Generate new tokens periodically
Use minimal permissions - Create tokens with only needed permissions
Don't share tokens - Each user should use their own token
Monitor token usage - Check token activity in Hub settings

永远不要提交令牌 - 使用
```
$HF_TOKEN
```
占位符或环境变量
使用secrets而非env - Secrets会在服务器端加密
定期轮换令牌 - 定期生成新令牌
使用最小权限 - 创建仅具备所需权限的令牌
不要共享令牌 - 每个用户应使用自己的令牌
监控令牌使用 - 在Hub设置中检查令牌活动

Complete Token Example

完整令牌示例

python

undefined

python

undefined

Example: Push results to Hub

示例：将结果推送到Hub

hf_jobs("uv", { "script": """

/// script

dependencies = ["huggingface-hub", "datasets"]

///

import os from huggingface_hub import HfApi from datasets import Dataset

Verify token is available

验证令牌是否可用

assert "HF_TOKEN" in os.environ, "HF_TOKEN required!"

assert "HF_TOKEN" in os.environ, "需要HF_TOKEN!"

Use token for Hub operations

使用令牌执行Hub操作

api = HfApi(token=os.environ["HF_TOKEN"])

Create and push dataset

创建并推送数据集

data = {"text": ["Hello", "World"]} dataset = Dataset.from_dict(data) dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])

print("✅ Dataset pushed successfully!") """, "flavor": "cpu-basic", "timeout": "30m", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ Token provided securely })

undefined

data = {"text": ["Hello", "World"]} dataset = Dataset.from_dict(data) dataset.push_to_hub("username/my-dataset", token=os.environ["HF_TOKEN"])

print("✅ 数据集推送成功!") """, "flavor": "cpu-basic", "timeout": "30m", "secrets": {"HF_TOKEN": "$HF_TOKEN"} # ✅ 安全传入令牌 })

undefined

Quick Start: Two Approaches

快速入门：两种方式

Approach 1: UV Scripts (Recommended)

方式1：UV脚本（推荐）

UV scripts use PEP 723 inline dependencies for clean, self-contained workloads.

MCP Tool:

python

hf_jobs("uv", {
    "script": """

UV脚本使用PEP 723内联依赖，实现简洁、独立的工作负载。

MCP工具：

python

hf_jobs("uv", {
    "script": """

/// script

dependencies = ["transformers", "torch"]

///

from transformers import pipeline import torch

Your workload here

你的工作负载代码

classifier = pipeline("sentiment-analysis") result = classifier("I love Hugging Face!") print(result) """, "flavor": "cpu-basic", "timeout": "30m" })


**CLI Equivalent:**
```bash
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m

Python API:

python

from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")

Benefits: Direct MCP tool usage, clean code, dependencies declared inline, no file saving required

When to use: Default choice for all workloads, custom logic, any scenario requiring

hf_jobs()

classifier = pipeline("sentiment-analysis") result = classifier("I love Hugging Face!") print(result) """, "flavor": "cpu-basic", "timeout": "30m" })


**CLI等效命令：**
```bash
hf jobs uv run my_script.py --flavor cpu-basic --timeout 30m

Python API：

python

from huggingface_hub import run_uv_job
run_uv_job("my_script.py", flavor="cpu-basic", timeout="30m")

优势： 直接使用MCP工具，代码简洁，内声明依赖，无需保存文件

适用场景： 所有工作负载的默认选择、自定义逻辑、任何需要使用

hf_jobs()

的场景

Custom Docker Images for UV Scripts

为UV脚本使用自定义Docker镜像

By default, UV scripts use

ghcr.io/astral-sh/uv:python3.12-bookworm-slim

. For ML workloads with complex dependencies, use pre-built images:

python

hf_jobs("uv", {
    "script": "inference.py",
    "image": "vllm/vllm-openai:latest",  # Pre-built image with vLLM
    "flavor": "a10g-large"
})

CLI:

bash

hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py

Benefits: Faster startup, pre-installed dependencies, optimized for specific frameworks

默认情况下，UV脚本使用

ghcr.io/astral-sh/uv:python3.12-bookworm-slim

。对于依赖复杂的机器学习工作负载，可使用预构建镜像：

python

hf_jobs("uv", {
    "script": "inference.py",
    "image": "vllm/vllm-openai:latest",  # 包含vLLM的预构建镜像
    "flavor": "a10g-large"
})

CLI命令：

bash

hf jobs uv run --image vllm/vllm-openai:latest --flavor a10g-large inference.py

优势： 启动更快，预安装依赖，针对特定框架优化

Python Version

Python版本

By default, UV scripts use Python 3.12. Specify a different version:

python

hf_jobs("uv", {
    "script": "my_script.py",
    "python": "3.11",  # Use Python 3.11
    "flavor": "cpu-basic"
})

Python API:

python

from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")

默认情况下，UV脚本使用Python 3.12。可指定其他版本：

python

hf_jobs("uv", {
    "script": "my_script.py",
    "python": "3.11",  # 使用Python 3.11
    "flavor": "cpu-basic"
})

Python API：

python

from huggingface_hub import run_uv_job
run_uv_job("my_script.py", python="3.11")

Working with Scripts

脚本使用注意事项

⚠️ Important: There are two "script path" stories depending on how you run Jobs:

Using the
hf_jobs()
MCP tool (recommended in this repo): the
```
script
```
value must be inline code (a string) or a URL. A local filesystem path (like
```
"./scripts/foo.py"
```
) won't exist inside the remote container.
Using the
hf jobs uv run
CLI: local file paths do work (the CLI uploads your script).

Common mistake with
hf_jobs()
MCP tool:

python

undefined

⚠️ 重要提示： 根据运行Jobs的方式，"脚本路径"有两种不同的处理逻辑：

使用
hf_jobs()
MCP工具（本仓库推荐方式）：
```
script
```
参数必须是内联代码（字符串）或URL。本地文件系统路径（如
```
"./scripts/foo.py"
```
）在远程容器中不存在。
使用
hf jobs uv run
CLI命令：本地文件路径可以正常使用（CLI会上传你的脚本）。

使用
hf_jobs()
MCP工具的常见错误：

python

undefined

❌ Will fail (remote container can't see your local path)

❌ 会失败（远程容器无法访问你的本地路径）

hf_jobs("uv", {"script": "./scripts/foo.py"})


**Correct patterns with `hf_jobs()` MCP tool:**

```python

hf_jobs("uv", {"script": "./scripts/foo.py"})


**使用`hf_jobs()` MCP工具的正确方式：**

```python

✅ Inline: read the local script file and pass its contents

✅ 内联：读取本地脚本文件并传入其内容

from pathlib import Path script = Path("hf-jobs/scripts/foo.py").read_text() hf_jobs("uv", {"script": script})

✅ URL: host the script somewhere reachable

✅ URL：将脚本托管在可访问的位置

hf_jobs("uv", {"script": "https://huggingface.co/datasets/uv-scripts/.../raw/main/foo.py"})

✅ URL from GitHub

✅ GitHub上的URL

hf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})


**CLI equivalent (local paths supported):**

```bash
hf jobs uv run ./scripts/foo.py -- --your --args

hf_jobs("uv", {"script": "https://raw.githubusercontent.com/huggingface/trl/main/trl/scripts/sft.py"})


**支持本地路径的CLI等效命令：**

```bash
hf jobs uv run ./scripts/foo.py -- --your --args

Adding Dependencies at Runtime

在运行时添加依赖

Add extra dependencies beyond what's in the PEP 723 header:

python

hf_jobs("uv", {
    "script": "inference.py",
    "dependencies": ["transformers", "torch>=2.0"],  # Extra deps
    "flavor": "a10g-small"
})

Python API:

python

from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])

除PEP 723头中声明的依赖外，可添加额外依赖：

python

hf_jobs("uv", {
    "script": "inference.py",
    "dependencies": ["transformers", "torch>=2.0"],  # 额外依赖
    "flavor": "a10g-small"
})

Python API：

python

from huggingface_hub import run_uv_job
run_uv_job("inference.py", dependencies=["transformers", "torch>=2.0"])

Approach 2: Docker-Based Jobs

方式2：基于Docker的作业

Run jobs with custom Docker images and commands.

MCP Tool:

python

hf_jobs("run", {
    "image": "python:3.12",
    "command": ["python", "-c", "print('Hello from HF Jobs!')"],
    "flavor": "cpu-basic",
    "timeout": "30m"
})

CLI Equivalent:

bash

hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"

Python API:

python

from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")

Benefits: Full Docker control, use pre-built images, run any command When to use: Need specific Docker images, non-Python workloads, complex environments

Example with GPU:

python

hf_jobs("run", {
    "image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
    "command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
    "flavor": "a10g-small",
    "timeout": "1h"
})

Using Hugging Face Spaces as Images:

You can use Docker images from HF Spaces:

python

hf_jobs("run", {
    "image": "hf.co/spaces/lhoestq/duckdb",  # Space as Docker image
    "command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
    "flavor": "cpu-basic"
})

CLI:

bash

hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"

使用自定义Docker镜像和命令运行作业。

MCP工具：

python

hf_jobs("run", {
    "image": "python:3.12",
    "command": ["python", "-c", "print('Hello from HF Jobs!')"],
    "flavor": "cpu-basic",
    "timeout": "30m"
})

CLI等效命令：

bash

hf jobs run python:3.12 python -c "print('Hello from HF Jobs!')"

Python API：

python

from huggingface_hub import run_job
run_job(image="python:3.12", command=["python", "-c", "print('Hello!')"], flavor="cpu-basic")

优势： 完全控制Docker，可使用预构建镜像，运行任意命令 适用场景： 需要特定Docker镜像、非Python工作负载、复杂环境

GPU示例：

python

hf_jobs("run", {
    "image": "pytorch/pytorch:2.6.0-cuda12.4-cudnn9-devel",
    "command": ["python", "-c", "import torch; print(torch.cuda.get_device_name())"],
    "flavor": "a10g-small",
    "timeout": "1h"
})

使用Hugging Face Spaces作为镜像：

你可以使用来自HF Spaces的Docker镜像：

python

hf_jobs("run", {
    "image": "hf.co/spaces/lhoestq/duckdb",  # 将Space作为Docker镜像
    "command": ["duckdb", "-c", "SELECT 'Hello from DuckDB!'"],
    "flavor": "cpu-basic"
})

CLI命令：

bash

hf jobs run hf.co/spaces/lhoestq/duckdb duckdb -c "SELECT 'Hello!'"

Finding More UV Scripts on Hub

在Hub上查找更多UV脚本

The

uv-scripts

organization provides ready-to-use UV scripts stored as datasets on Hugging Face Hub:

python

undefined

uv-scripts

组织在Hugging Face Hub上提供了现成可用的UV脚本，存储为数据集：

python

undefined

Discover available UV script collections

发现可用的UV脚本集合

dataset_search({"author": "uv-scripts", "sort": "downloads", "limit": 20})

Explore a specific collection

浏览特定集合

hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)


**Popular collections:** OCR, classification, synthetic-data, vLLM, dataset-creation

hub_repo_details(["uv-scripts/classification"], repo_type="dataset", include_readme=True)


**热门集合：** OCR、分类、合成数据、vLLM、数据集创建

Hardware Selection

硬件选择

Reference: HF Jobs Hardware Docs (updated 07/2025)

Workload Type	Recommended Hardware	Use Case
Data processing, testing	`cpu-basic` , `cpu-upgrade`	Lightweight tasks
Small models, demos	`t4-small`	<1B models, quick tests
Medium models	`t4-medium` , `l4x1`	1-7B models
Large models, production	`a10g-small` , `a10g-large`	7-13B models
Very large models	`a100-large`	13B+ models
Batch inference	`a10g-large` , `a100-large`	High-throughput
Multi-GPU workloads	`l4x4` , `a10g-largex2` , `a10g-largex4`	Parallel/large models
TPU workloads	`v5e-1x1` , `v5e-2x2` , `v5e-2x4`	JAX/Flax, TPU-optimized

All Available Flavors:

CPU:
```
cpu-basic
```
,
```
cpu-upgrade
```

GPU:

t4-small

t4-medium

l4x1

l4x4

a10g-small

a10g-large

a10g-largex2

a10g-largex4

a100-large

TPU:
```
v5e-1x1
```
,
```
v5e-2x2
```
,
```
v5e-2x4
```

Guidelines:

Start with smaller hardware for testing
Scale up based on actual needs
Use multi-GPU for parallel workloads or large models
Use TPUs for JAX/Flax workloads
See
```
references/hardware_guide.md
```
for detailed specifications

参考： HF Jobs硬件文档（更新于2025年7月）

工作负载类型	推荐硬件	适用场景
数据处理、测试	`cpu-basic` , `cpu-upgrade`	轻量任务
小型模型、演示	`t4-small`	小于10亿参数的模型、快速测试
中型模型	`t4-medium` , `l4x1`	10亿-70亿参数的模型
大型模型、生产环境	`a10g-small` , `a10g-large`	70亿-130亿参数的模型
超大型模型	`a100-large`	130亿参数以上的模型
批处理推理	`a10g-large` , `a100-large`	高吞吐量
多GPU工作负载	`l4x4` , `a10g-largex2` , `a10g-largex4`	并行工作负载或大型模型
TPU工作负载	`v5e-1x1` , `v5e-2x2` , `v5e-2x4`	JAX/Flax、TPU优化的工作负载

所有可用硬件规格：

CPU：
```
cpu-basic
```
,
```
cpu-upgrade
```

GPU：

t4-small

t4-medium

l4x1

l4x4

a10g-small

a10g-large

a10g-largex2

a10g-largex4

a100-large

TPU：
```
v5e-1x1
```
,
```
v5e-2x2
```
,
```
v5e-2x4
```

指导原则：

测试时先使用较小的硬件
根据实际需求扩容
并行工作负载或大型模型使用多GPU
JAX/Flax工作负载使用TPU
详细规格请查看
```
references/hardware_guide.md
```

Critical: Saving Results

关键：保存结果

⚠️ EPHEMERAL ENVIRONMENT—MUST PERSIST RESULTS

The Jobs environment is temporary. All files are deleted when the job ends. If results aren't persisted, ALL WORK IS LOST.

⚠️ 环境为临时状态——必须持久化结果

Jobs环境是临时的。作业结束后所有文件都会被删除。如果不持久化结果，所有工作都会丢失。

Persistence Options

持久化选项

1. Push to Hugging Face Hub (Recommended)

python

undefined

1. 推送到Hugging Face Hub（推荐）

python

undefined

Push models

推送模型

model.push_to_hub("username/model-name", token=os.environ["HF_TOKEN"])

Push datasets

推送数据集

dataset.push_to_hub("username/dataset-name", token=os.environ["HF_TOKEN"])

Push artifacts

推送制品

api.upload_file( path_or_fileobj="results.json", path_in_repo="results.json", repo_id="username/results", token=os.environ["HF_TOKEN"] )


**2. Use External Storage**

```python

api.upload_file( path_or_fileobj="results.json", path_in_repo="results.json", repo_id="username/results", token=os.environ["HF_TOKEN"] )


**2. 使用外部存储**

```python

Upload to S3, GCS, etc.

上传到S3、GCS等

import boto3 s3 = boto3.client('s3') s3.upload_file('results.json', 'my-bucket', 'results.json')


**3. Send Results via API**

```python

import boto3 s3 = boto3.client('s3') s3.upload_file('results.json', 'my-bucket', 'results.json')


**3. 通过API发送结果**

```python

POST results to your API

将结果POST到你的API

import requests requests.post("https://your-api.com/results", json=results)

undefined

import requests requests.post("https://your-api.com/results", json=results)

undefined

Required Configuration for Hub Push

推送到Hub的必要配置

In job submission:

python

undefined

作业提交时：

python

undefined

hf_jobs MCP tool:

hf_jobs MCP工具:

{"secrets": {"HF_TOKEN": "$HF_TOKEN"}} # auto-replaced

{"secrets": {"HF_TOKEN": "$HF_TOKEN"}} # 自动替换

HfApi().run_uv_job():

from huggingface_hub import get_token secrets={"HF_TOKEN": get_token()} # must pass real token


**In script:**
```python
import os
from huggingface_hub import HfApi

from huggingface_hub import get_token secrets={"HF_TOKEN": get_token()} # 必须传入实际令牌


**在脚本中：**
```python
import os
from huggingface_hub import HfApi

Token automatically available from secrets

令牌会从secrets自动传入环境变量

api = HfApi(token=os.environ.get("HF_TOKEN"))

Push your results

推送你的结果

api.upload_file(...)

undefined

api.upload_file(...)

undefined

Verification Checklist

验证清单

Before submitting:

Results persistence method chosen
Token in secrets if using Hub (MCP:
```
"$HF_TOKEN"
```
, Python API:
```
get_token()
```
)
Script handles missing token gracefully
Test persistence path works

See:

references/hub_saving.md

for detailed Hub persistence guide

提交前：

已选择结果持久化方式
如果使用Hub，已在secrets中传入令牌（MCP工具使用
```
"$HF_TOKEN"
```
，Python API使用
```
get_token()
```
）
脚本可优雅处理令牌缺失的情况
持久化路径已测试可用

参考：

references/hub_saving.md

获取详细的Hub持久化指南

Timeout Management

超时管理

⚠️ DEFAULT: 30 MINUTES

Jobs automatically stop after the timeout. For long-running tasks like training, always set a custom timeout.

⚠️ 默认值：30分钟

作业会在超时后自动停止。对于训练等长时间运行的任务，务必设置自定义超时时间。

Setting Timeouts

设置超时时间

MCP Tool:

python

{
    "timeout": "2h"   # 2 hours
}

Supported formats:

Integer/float: seconds (e.g.,
```
300
```
= 5 minutes)
String with suffix:
```
"5m"
```
(minutes),
```
"2h"
```
(hours),
```
"1d"
```
(days)
Examples:
```
"90m"
```
,
```
"2h"
```
,
```
"1.5h"
```
,
```
300
```
,
```
"1d"
```

Python API:

python

from huggingface_hub import run_job, run_uv_job

run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200)  # 2 hours in seconds

MCP工具：

python

{
    "timeout": "2h"   # 2小时
}

支持的格式：

整数/浮点数：秒数（如
```
300
```
= 5分钟）
带后缀的字符串：
```
"5m"
```
（分钟）、
```
"2h"
```
（小时）、
```
"1d"
```
（天）
示例：
```
"90m"
```
,
```
"2h"
```
,
```
"1.5h"
```
,
```
300
```
,
```
"1d"
```

Python API：

python

from huggingface_hub import run_job, run_uv_job

run_job(image="python:3.12", command=[...], timeout="2h")
run_uv_job("script.py", timeout=7200)  # 2小时（秒数）

Timeout Guidelines

超时时间指导原则

Scenario	Recommended	Notes
Quick test	10-30 min	Verify setup
Data processing	1-2 hours	Depends on data size
Batch inference	2-4 hours	Large batches
Experiments	4-8 hours	Multiple runs
Long-running	8-24 hours	Production workloads

Always add 20-30% buffer for setup, network delays, and cleanup.

On timeout: Job killed immediately, all unsaved progress lost

场景	推荐值	说明
快速测试	10-30分钟	验证设置
数据处理	1-2小时	取决于数据量
批处理推理	2-4小时	大型批处理任务
实验	4-8小时	多次运行
长时间运行任务	8-24小时	生产环境工作负载

始终预留20-30%的缓冲时间，用于启动、网络延迟和清理操作。

超时后： 作业会立即终止，所有未保存的进度都会丢失

Cost Estimation

成本估算

General guidelines:

Total Cost = (Hours of runtime) × (Cost per hour)

Example calculations:

Quick test:

Hardware: cpu-basic ($0.10/hour)
Time: 15 minutes (0.25 hours)
Cost: $0.03

Data processing:

Hardware: l4x1 ($2.50/hour)
Time: 2 hours
Cost: $5.00

Batch inference:

Hardware: a10g-large ($5/hour)
Time: 4 hours
Cost: $20.00

Cost optimization tips:

Start small - Test on cpu-basic or t4-small
Monitor runtime - Set appropriate timeouts
Use checkpoints - Resume if job fails
Optimize code - Reduce unnecessary compute
Choose right hardware - Don't over-provision

通用公式：

总成本 = 运行时长（小时） × 每小时成本

示例计算：

快速测试：

硬件：cpu-basic（0.10美元/小时）
时长：15分钟（0.25小时）
成本：0.03美元

数据处理：

硬件：l4x1（2.50美元/小时）
时长：2小时
成本：5.00美元

批处理推理：

硬件：a10g-large（5美元/小时）
时长：4小时
成本：20.00美元

成本优化技巧：

从小规模开始 - 在cpu-basic或t4-small上测试
监控运行时长 - 设置合适的超时时间
使用检查点 - 作业失败后可恢复
优化代码 - 减少不必要的计算
选择合适的硬件 - 不要过度配置

Monitoring and Tracking

监控与追踪

Check Job Status

检查作业状态

MCP Tool:

python

undefined

MCP工具：

python

undefined

List all jobs

列出所有作业

hf_jobs("ps")

Inspect specific job

查看特定作业详情

hf_jobs("inspect", {"job_id": "your-job-id"})

View logs

查看日志

hf_jobs("logs", {"job_id": "your-job-id"})

Cancel a job

取消作业

hf_jobs("cancel", {"job_id": "your-job-id"})


**Python API:**
```python
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job

hf_jobs("cancel", {"job_id": "your-job-id"})


**Python API：**
```python
from huggingface_hub import list_jobs, inspect_job, fetch_job_logs, cancel_job

List your jobs

列出你的作业

jobs = list_jobs()

List running jobs only

仅列出运行中的作业

running = [j for j in list_jobs() if j.status.stage == "RUNNING"]

Inspect specific job

查看特定作业详情

job_info = inspect_job(job_id="your-job-id")

View logs

查看日志

for log in fetch_job_logs(job_id="your-job-id"): print(log)

Cancel a job

取消作业

cancel_job(job_id="your-job-id")


**CLI:**
```bash
hf jobs ps                    # List jobs
hf jobs logs <job-id>         # View logs
hf jobs cancel <job-id>       # Cancel job

Remember: Wait for user to request status checks. Avoid polling repeatedly.

cancel_job(job_id="your-job-id")


**CLI命令：**
```bash
hf jobs ps                    # 列出作业
hf jobs logs <job-id>         # 查看日志
hf jobs cancel <job-id>       # 取消作业

注意： 等待用户请求状态检查，避免重复轮询。

Job URLs

作业URL

After submission, jobs have monitoring URLs:

https://huggingface.co/jobs/username/job-id

View logs, status, and details in the browser.

提交完成后，作业会有监控URL：

https://huggingface.co/jobs/username/job-id

可在浏览器中查看日志、状态和详情。

Wait for Multiple Jobs

等待多个作业完成

python

import time
from huggingface_hub import inspect_job, run_job

python

import time
from huggingface_hub import inspect_job, run_job

Run multiple jobs

运行多个作业

jobs = [run_job(image=img, command=cmd) for img, cmd in workloads]

Wait for all to complete

等待所有作业完成

for job in jobs: while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"): time.sleep(10)

undefined

for job in jobs: while inspect_job(job_id=job.id).status.stage not in ("COMPLETED", "ERROR"): time.sleep(10)

undefined

Scheduled Jobs

定时作业

Run jobs on a schedule using CRON expressions or predefined schedules.

MCP Tool:

python

undefined

使用CRON表达式或预定义计划定时运行作业。

MCP工具：

python

undefined

Schedule a UV script that runs every hour

定时运行UV脚本，每小时执行一次

hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "@hourly", "flavor": "cpu-basic" })

Schedule with CRON syntax

使用CRON语法定时

hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "0 9 * * 1", # 9 AM every Monday "flavor": "cpu-basic" })

hf_jobs("scheduled uv", { "script": "your_script.py", "schedule": "0 9 * * 1", # 每周一上午9点 "flavor": "cpu-basic" })

Schedule a Docker-based job

定时运行基于Docker的作业

hf_jobs("scheduled run", { "image": "python:3.12", "command": ["python", "-c", "print('Scheduled!')"], "schedule": "@daily", "flavor": "cpu-basic" })


**Python API:**
```python
from huggingface_hub import create_scheduled_job, create_scheduled_uv_job

hf_jobs("scheduled run", { "image": "python:3.12", "command": ["python", "-c", "print('Scheduled!')"], "schedule": "@daily", "flavor": "cpu-basic" })


**Python API：**
```python
from huggingface_hub import create_scheduled_job, create_scheduled_uv_job

Schedule a Docker job

定时运行Docker作业

create_scheduled_job( image="python:3.12", command=["python", "-c", "print('Running on schedule!')"], schedule="@hourly" )

Schedule a UV script

定时运行UV脚本

create_scheduled_uv_job("my_script.py", schedule="@daily", flavor="cpu-basic")

Schedule with GPU

使用GPU定时运行

create_scheduled_uv_job( "ml_inference.py", schedule="0 */6 * * *", # Every 6 hours flavor="a10g-small" )


**Available schedules:**
- `@annually`, `@yearly` - Once per year
- `@monthly` - Once per month
- `@weekly` - Once per week
- `@daily` - Once per day
- `@hourly` - Once per hour
- CRON expression - Custom schedule (e.g., `"*/5 * * * *"` for every 5 minutes)

**Manage scheduled jobs:**
```python

create_scheduled_uv_job( "ml_inference.py", schedule="0 */6 * * *", # 每6小时一次 flavor="a10g-small" )


**可用的计划：**
- `@annually`, `@yearly` - 每年一次
- `@monthly` - 每月一次
- `@weekly` - 每周一次
- `@daily` - 每天一次
- `@hourly` - 每小时一次
- CRON表达式 - 自定义计划（如`"*/5 * * * *"`表示每5分钟一次）

**管理定时作业：**
```python

MCP Tool

MCP工具

hf_jobs("scheduled ps") # List scheduled jobs hf_jobs("scheduled inspect", {"job_id": "..."}) # Inspect details hf_jobs("scheduled suspend", {"job_id": "..."}) # Pause hf_jobs("scheduled resume", {"job_id": "..."}) # Resume hf_jobs("scheduled delete", {"job_id": "..."}) # Delete


**Python API for management:**
```python
from huggingface_hub import (
    list_scheduled_jobs,
    inspect_scheduled_job,
    suspend_scheduled_job,
    resume_scheduled_job,
    delete_scheduled_job
)

hf_jobs("scheduled ps") # 列出定时作业 hf_jobs("scheduled inspect", {"job_id": "..."}) # 查看详情 hf_jobs("scheduled suspend", {"job_id": "..."}) # 暂停 hf_jobs("scheduled resume", {"job_id": "..."}) # 恢复 hf_jobs("scheduled delete", {"job_id": "..."}) # 删除


**用于管理的Python API：**
```python
from huggingface_hub import (
    list_scheduled_jobs,
    inspect_scheduled_job,
    suspend_scheduled_job,
    resume_scheduled_job,
    delete_scheduled_job
)

List all scheduled jobs

列出所有定时作业

scheduled = list_scheduled_jobs()

Inspect a scheduled job

查看定时作业详情

info = inspect_scheduled_job(scheduled_job_id)

Suspend (pause) a scheduled job

暂停定时作业

suspend_scheduled_job(scheduled_job_id)

Resume a scheduled job

恢复定时作业

resume_scheduled_job(scheduled_job_id)

Delete a scheduled job

删除定时作业

delete_scheduled_job(scheduled_job_id)

undefined

delete_scheduled_job(scheduled_job_id)

undefined

Webhooks: Trigger Jobs on Events

Webhooks：事件触发作业

Trigger jobs automatically when changes happen in Hugging Face repositories.

Python API:

python

from huggingface_hub import create_webhook

当Hugging Face仓库发生变更时，自动触发作业运行。

Python API：

python

from huggingface_hub import create_webhook

Create webhook that triggers a job when a repo changes

创建Webhook，当仓库变更时触发作业

webhook = create_webhook( job_id=job.id, watched=[ {"type": "user", "name": "your-username"}, {"type": "org", "name": "your-org-name"} ], domains=["repo", "discussion"], secret="your-secret" )


**How it works:**
1. Webhook listens for changes in watched repositories
2. When triggered, the job runs with `WEBHOOK_PAYLOAD` environment variable
3. Your script can parse the payload to understand what changed

**Use cases:**
- Auto-process new datasets when uploaded
- Trigger inference when models are updated
- Run tests when code changes
- Generate reports on repository activity

**Access webhook payload in script:**
```python
import os
import json

payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"Event type: {payload.get('event', {}).get('action')}")

See Webhooks Documentation for more details.

webhook = create_webhook( job_id=job.id, watched=[ {"type": "user", "name": "your-username"}, {"type": "org", "name": "your-org-name"} ], domains=["repo", "discussion"], secret="your-secret" )


**工作原理：**
1. Webhook监听指定仓库的变更
2. 触发时，作业会在环境变量`WEBHOOK_PAYLOAD`中获取相关信息
3. 你的脚本可以解析该负载，了解具体发生了什么变更

**适用场景：**
- 上传新数据集时自动处理
- 模型更新时触发推理
- 代码变更时运行测试
- 针对仓库活动生成报告

**在脚本中访问Webhook负载：**
```python
import os
import json

payload = json.loads(os.environ.get("WEBHOOK_PAYLOAD", "{}"))
print(f"事件类型: {payload.get('event', {}).get('action')}")

更多详情请查看Webhooks文档。

Common Workload Patterns

常见工作负载模式

This repository ships ready-to-run UV scripts in

hf-jobs/scripts/

. Prefer using them instead of inventing new templates.

本仓库在

hf-jobs/scripts/

中提供了现成可用的UV脚本。优先使用这些脚本，而非自行编写新模板。

Pattern 1: Dataset → Model Responses (vLLM) —

scripts/generate-responses.py

模式1：数据集→模型响应（vLLM）——

scripts/generate-responses.py

What it does: loads a Hub dataset (chat

messages

or a

prompt

column), applies a model chat template, generates responses with vLLM, and pushes the output dataset + dataset card back to the Hub.

Requires: GPU + write token (it pushes a dataset).

python

from pathlib import Path

script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "username/input-dataset",
        "username/output-dataset",
        "--messages-column", "messages",
        "--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
        "--temperature", "0.7",
        "--top-p", "0.8",
        "--max-tokens", "2048",
    ],
    "flavor": "a10g-large",
    "timeout": "4h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

功能： 加载Hub中的数据集（聊天

messages

或

prompt

列），应用模型聊天模板，使用vLLM生成响应，并将输出数据集和数据集卡片推送回Hub。

要求： GPU + 写令牌（需要推送数据集）。

python

from pathlib import Path

script = Path("hf-jobs/scripts/generate-responses.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "username/input-dataset",
        "username/output-dataset",
        "--messages-column", "messages",
        "--model-id", "Qwen/Qwen3-30B-A3B-Instruct-2507",
        "--temperature", "0.7",
        "--top-p", "0.8",
        "--max-tokens", "2048",
    ],
    "flavor": "a10g-large",
    "timeout": "4h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

Pattern 2: CoT Self-Instruct Synthetic Data —

scripts/cot-self-instruct.py

模式2：思维链自指令合成数据——

scripts/cot-self-instruct.py

What it does: generates synthetic prompts/answers via CoT Self-Instruct, optionally filters outputs (answer-consistency / RIP), then pushes the generated dataset + dataset card to the Hub.

Requires: GPU + write token (it pushes a dataset).

python

from pathlib import Path

script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--seed-dataset", "davanstrien/s1k-reasoning",
        "--output-dataset", "username/synthetic-math",
        "--task-type", "reasoning",
        "--num-samples", "5000",
        "--filter-method", "answer-consistency",
    ],
    "flavor": "l4x4",
    "timeout": "8h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

功能： 通过思维链自指令生成合成提示/答案，可选择过滤输出（答案一致性/RIP），然后将生成的数据集和数据集卡片推送到Hub。

要求： GPU + 写令牌（需要推送数据集）。

python

from pathlib import Path

script = Path("hf-jobs/scripts/cot-self-instruct.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--seed-dataset", "davanstrien/s1k-reasoning",
        "--output-dataset", "username/synthetic-math",
        "--task-type", "reasoning",
        "--num-samples", "5000",
        "--filter-method", "answer-consistency",
    ],
    "flavor": "l4x4",
    "timeout": "8h",
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

Pattern 3: Streaming Dataset Stats (Polars + HF Hub) —

scripts/finepdfs-stats.py

模式3：流式数据集统计（Polars + HF Hub）——

scripts/finepdfs-stats.py

What it does: scans parquet directly from Hub (no 300GB download), computes temporal stats, and (optionally) uploads results to a Hub dataset repo.

Requires: CPU is often enough; token needed only if you pass

--output-repo

(upload).

python

from pathlib import Path

script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--limit", "10000",
        "--show-plan",
        "--output-repo", "username/finepdfs-temporal-stats",
    ],
    "flavor": "cpu-upgrade",
    "timeout": "2h",
    "env": {"HF_XET_HIGH_PERFORMANCE": "1"},
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

功能： 直接从Hub扫描parquet文件（无需下载300GB数据），计算时间统计信息，并（可选）将结果上传到Hub数据集仓库。

要求： 通常CPU即可；仅当传入

--output-repo

（上传）时需要令牌。

python

from pathlib import Path

script = Path("hf-jobs/scripts/finepdfs-stats.py").read_text()
hf_jobs("uv", {
    "script": script,
    "script_args": [
        "--limit", "10000",
        "--show-plan",
        "--output-repo", "username/finepdfs-temporal-stats",
    ],
    "flavor": "cpu-upgrade",
    "timeout": "2h",
    "env": {"HF_XET_HIGH_PERFORMANCE": "1"},
    "secrets": {"HF_TOKEN": "$HF_TOKEN"},
})

Common Failure Modes

常见失败模式

Out of Memory (OOM)

内存不足（OOM）

Fix:

Reduce batch size or data chunk size
Process data in smaller batches
Upgrade hardware: cpu → t4 → a10g → a100

解决方法：

减小批量大小或数据块大小
分小批量处理数据
升级硬件：cpu → t4 → a10g → a100

Job Timeout

作业超时

Fix:

Check logs for actual runtime
Increase timeout with buffer:
```
"timeout": "3h"
```
Optimize code for faster execution
Process data in chunks

解决方法：

查看日志了解实际运行时长
增加超时时间并预留缓冲：
```
"timeout": "3h"
```
优化代码以提升执行速度
分块处理数据

Hub Push Failures

Hub推送失败

Fix:

Add token to secrets: MCP uses
```
"$HF_TOKEN"
```
(auto-replaced), Python API uses
```
get_token()
```
(must pass real token)
Verify token in script:
```
assert "HF_TOKEN" in os.environ
```
Check token permissions
Verify repo exists or can be created

解决方法：

在secrets中添加令牌：MCP工具使用
```
"$HF_TOKEN"
```
（自动替换），Python API使用
```
get_token()
```
（必须传入实际令牌）
在脚本中验证令牌：
```
assert "HF_TOKEN" in os.environ
```
检查令牌权限
验证仓库是否存在或可创建

Missing Dependencies

依赖缺失

Fix: Add to PEP 723 header:

python

undefined

解决方法： 添加到PEP 723头中：

python

undefined

/// script

dependencies = ["package1", "package2>=1.0.0"]

///

undefined

undefined

Authentication Errors

认证错误

Fix:

Check
```
hf_whoami()
```
works locally
Verify token in secrets — MCP:
```
"$HF_TOKEN"
```
, Python API:
```
get_token()
```
(NOT
```
"$HF_TOKEN"
```
)
Re-login:
```
hf auth login
```
Check token has required permissions

解决方法：

确认本地
```
hf_whoami()
```
可正常运行
验证secrets中的令牌——MCP工具使用
```
"$HF_TOKEN"
```
，Python API使用
```
get_token()
```
（而非
```
"$HF_TOKEN"
```
）
重新登录：
```
hf auth login
```
检查令牌是否具备所需权限

Troubleshooting

故障排除

Common issues:

Job times out → Increase timeout, optimize code
Results not saved → Check persistence method, verify HF_TOKEN
Out of Memory → Reduce batch size, upgrade hardware
Import errors → Add dependencies to PEP 723 header
Authentication errors → Check token, verify secrets parameter

See:

references/troubleshooting.md

for complete troubleshooting guide

常见问题：

作业超时 → 增加超时时间、优化代码
结果未保存 → 检查持久化方式、验证HF_TOKEN
内存不足 → 减小批量大小、升级硬件
导入错误 → 在PEP 723头中添加依赖
认证错误 → 检查令牌、验证secrets参数

参考：

references/troubleshooting.md

获取完整的故障排除指南

Resources

资源

References (In This Skill)

本技能内的参考文档

```
references/token_usage.md
```
- Complete token usage guide
```
references/hardware_guide.md
```
- Hardware specs and selection
```
references/hub_saving.md
```
- Hub persistence guide
```
references/troubleshooting.md
```
- Common issues and solutions

```
references/token_usage.md
```
- 完整的令牌使用指南
```
references/hardware_guide.md
```
- 硬件规格与选择
```
references/hub_saving.md
```
- Hub持久化指南
```
references/troubleshooting.md
```
- 常见问题与解决方案

Scripts (In This Skill)

本技能内的脚本

```
scripts/generate-responses.py
```
- vLLM batch generation: dataset → responses → push to Hub
```
scripts/cot-self-instruct.py
```
- CoT Self-Instruct synthetic data generation + filtering → push to Hub
```
scripts/finepdfs-stats.py
```
- Polars streaming stats over
```
finepdfs-edu
```
parquet on Hub (optional push)

```
scripts/generate-responses.py
```
- vLLM批量生成：数据集→响应→推送到Hub
```
scripts/cot-self-instruct.py
```
- 思维链自指令合成数据生成+过滤→推送到Hub
```
scripts/finepdfs-stats.py
```
- 对Hub上的
```
finepdfs-edu
```
parquet文件进行Polars流式统计（可选推送）

External Links

外部链接

Official Documentation:

HF Jobs Guide - Main documentation
HF Jobs CLI Reference - Command line interface
HF Jobs API Reference - Python API details
Hardware Flavors Reference - Available hardware

Related Tools:

UV Scripts Guide - PEP 723 inline dependencies
UV Scripts Organization - Community UV script collection
HF Hub Authentication - Token setup
Webhooks Documentation - Event triggers

官方文档：

HF Jobs指南 - 主文档
HF Jobs CLI参考 - 命令行接口
HF Jobs API参考 - Python API详情
硬件规格参考 - 可用硬件

相关工具：

UV脚本指南 - PEP 723内联依赖
UV脚本组织 - 社区UV脚本集合
HF Hub认证 - 令牌设置
Webhooks文档 - 事件触发

Key Takeaways

核心要点

Submit scripts inline - The
```
script
```
parameter accepts Python code directly; no file saving required unless user requests
Jobs are asynchronous - Don't wait/poll; let user check when ready
Always set timeout - Default 30 min may be insufficient; set appropriate timeout
Always persist results - Environment is ephemeral; without persistence, all work is lost

Use tokens securely - MCP:

secrets={"HF_TOKEN": "$HF_TOKEN"}

, Python API:

secrets={"HF_TOKEN": get_token()}

—

"$HF_TOKEN"

only works with MCP tool

Choose appropriate hardware - Start small, scale up based on needs (see hardware guide)
Use UV scripts - Default to
```
hf_jobs("uv", {...})
```
with inline scripts for Python workloads
Handle authentication - Verify tokens are available before Hub operations
Monitor jobs - Provide job URLs and status check commands
Optimize costs - Choose right hardware, set appropriate timeouts

内联提交脚本 -
```
script
```
参数可直接接收Python代码；除非用户要求，否则无需保存文件
作业为异步执行 - 不要等待/轮询；让用户在需要时自行检查
务必设置超时时间 - 默认30分钟可能不足；设置合适的超时时间
务必持久化结果 - 环境为临时状态；不持久化的话所有工作都会丢失
安全使用令牌 - MCP工具使用
```
secrets={"HF_TOKEN": "$HF_TOKEN"}
```
，Python API使用
```
secrets={"HF_TOKEN": get_token()}
```
—
```
"$HF_TOKEN"
```
仅适用于MCP工具
选择合适的硬件 - 从小规模开始，根据需求扩容（查看硬件指南）
使用UV脚本 - Python工作负载默认使用
```
hf_jobs("uv", {...})
```
和内联脚本
处理认证 - 在执行Hub操作前验证令牌是否可用
监控作业 - 提供作业URL和状态检查命令
优化成本 - 选择合适的硬件、设置合适的超时时间

Quick Reference: MCP Tool vs CLI vs Python API

快速参考：MCP工具 vs CLI vs Python API

Operation	MCP Tool	CLI	Python API
Run UV script	`hf_jobs("uv", {...})`	`hf jobs uv run script.py`	`run_uv_job("script.py")`
Run Docker job	`hf_jobs("run", {...})`	`hf jobs run image cmd`	`run_job(image, command)`
List jobs	`hf_jobs("ps")`	`hf jobs ps`	`list_jobs()`
View logs	`hf_jobs("logs", {...})`	`hf jobs logs <id>`	`fetch_job_logs(job_id)`
Cancel job	`hf_jobs("cancel", {...})`	`hf jobs cancel <id>`	`cancel_job(job_id)`
Schedule UV	`hf_jobs("scheduled uv", {...})`	`hf jobs scheduled uv run SCHEDULE script.py`	`create_scheduled_uv_job()`
Schedule Docker	`hf_jobs("scheduled run", {...})`	`hf jobs scheduled run SCHEDULE image cmd`	`create_scheduled_job()`
List scheduled	`hf_jobs("scheduled ps")`	`hf jobs scheduled ps`	`list_scheduled_jobs()`
Delete scheduled	`hf_jobs("scheduled delete", {...})`	`hf jobs scheduled delete <id>`	`delete_scheduled_job()`

操作	MCP工具	CLI	Python API
运行UV脚本	`hf_jobs("uv", {...})`	`hf jobs uv run script.py`	`run_uv_job("script.py")`
运行Docker作业	`hf_jobs("run", {...})`	`hf jobs run image cmd`	`run_job(image, command)`
列出作业	`hf_jobs("ps")`	`hf jobs ps`	`list_jobs()`
查看日志	`hf_jobs("logs", {...})`	`hf jobs logs <id>`	`fetch_job_logs(job_id)`
取消作业	`hf_jobs("cancel", {...})`	`hf jobs cancel <id>`	`cancel_job(job_id)`
定时运行UV脚本	`hf_jobs("scheduled uv", {...})`	`hf jobs scheduled uv run SCHEDULE script.py`	`create_scheduled_uv_job()`
定时运行Docker作业	`hf_jobs("scheduled run", {...})`	`hf jobs scheduled run SCHEDULE image cmd`	`create_scheduled_job()`
列出定时作业	`hf_jobs("scheduled ps")`	`hf jobs scheduled ps`	`list_scheduled_jobs()`
删除定时作业	`hf_jobs("scheduled delete", {...})`	`hf jobs scheduled delete <id>`	`delete_scheduled_job()`