huggingface-lora-space-builder
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGradio LoRA Space Builder
Gradio LoRA Space 构建指南
Build and publish a Gradio demo on Hugging Face Spaces that runs inference with a user-provided LoRA. Use whenever someone asks to create, generate, ship, or publish "a Space", "a demo", "a Gradio app", or "a playground" for a LoRA — whether the base model is Qwen-Image, Qwen-Image-Edit, LTX, or another diffusion model. Also use when someone describes a LoRA they trained or hosts on the Hub and wants to share it. The default target is ZeroGPU hardware and the default inference library is when the base model supports it.
diffusersThe output is a real, published Space (private by default) that the user can try in the browser, not a local script.
为用户提供的LoRA在Hugging Face Spaces上构建并发布可运行推理的Gradio演示。当有人要求为LoRA创建、生成、部署或发布"Space"、"演示"、"Gradio应用"或"playground"时使用——无论基础模型是Qwen-Image、Qwen-Image-Edit、LTX还是其他扩散模型。当有人描述他们在Hub上训练或托管的LoRA并希望分享时,也适用此流程。默认目标硬件为ZeroGPU,当基础模型支持时,默认推理库为。
diffusers输出结果是一个可在浏览器中试用的真实已发布Space(默认私有),而非本地脚本。
What "good" looks like for these demos
优质演示的标准
The demo should feel handcrafted for this specific LoRA, not a generic template with the LoRA bolted on. Two LoRAs that share a task can still need different demos: a pose-control video LoRA and an outpainting video LoRA both take video in and produce video out, but the inputs the user provides, the preprocessing, and the controls are completely different. Recognizing that is the central job here.
Concretely, a good demo:
- Loads fast and runs fast — minimal model loading, sensible step count, no wasted computation per call.
- Has a UI with exactly the controls this LoRA needs and nothing else. Excess sliders are a cost, not a feature.
- Shows the user what's happening — progress, intermediate outputs where useful, the seed used, a clear error when input is missing.
- Honors the LoRA's own recommendations from its model card: trigger words, recommended step count, recommended guidance scale, recommended LoRA scale, example inputs.
- Is creative where creativity helps — interactive canvases, before/after sliders, side-by-side previews of intermediate processing — and plain where plainness is right.
演示应针对特定LoRA量身定制,而非仅将LoRA附加到通用模板上。即使两个LoRA属于同一任务,也可能需要不同的演示:姿态控制视频LoRA和扩绘视频LoRA均以视频为输入并输出视频,但用户提供的输入、预处理流程和控制项完全不同。识别这种差异是核心工作。
具体而言,优质演示需满足:
- 加载和运行速度快——模型加载步骤最少、步数设置合理、每次调用无冗余计算。
- UI仅包含该LoRA所需的控制项,无多余内容。多余的滑块是负担而非功能。
- 向用户展示流程状态——进度提示、必要时显示中间输出、使用的随机种子,输入缺失时给出清晰错误提示。
- 遵循LoRA模型卡片中的建议:触发词、推荐步数、推荐引导系数、推荐LoRA缩放比例、示例输入。
- 在需要创意的地方设计交互——交互式画布、前后对比滑块、中间处理步骤的并排预览;在合适的地方保持简洁。
Workflow
工作流程
Work through these phases in order. Information gathered in one phase decides the next.
- Gather the LoRA info needed to pick a pipeline and design a UI.
- Pick the base pipeline and inference recipe.
- Design the UI for this specific LoRA's task and inputs.
- Write ,
app.py, andrequirements.txttogether; show all three to the user for one batched approval.README.md - Publish the Space (private).
Don't drip-feed questions across multiple turns. Batch them.
按以下阶段依次执行,前一阶段收集的信息决定下一阶段的操作。
- 收集选择pipeline和设计UI所需的LoRA信息。
- 选择基础pipeline和推理方案。
- 针对该LoRA的任务和输入设计UI。
- 同步编写、
app.py和requirements.txt;将三个文件一并展示给用户以获得批量批准。README.md - 发布Space(私有)。
不要分多次提问,一次性批量询问所需信息。
Phase 1 — Gather LoRA info
阶段1 — 收集LoRA信息
Required: a LoRA repo on the Hub (e.g. ).
username/my-loraFirst, try to read the repo without a token. If it succeeds, the repo is public — proceed. If it fails with 401/403, the repo is private/gated and you need an authenticated session to read it. Don't immediately ask for a token. Check first whether the user is already authenticated.
python
from huggingface_hub import HfApi, get_token
cached_token = get_token() # picks up HF_TOKEN env var or cached CLI login
if cached_token:
try:
info = HfApi().whoami(token=cached_token)
username = info["name"]
# info also has fine-grained token scope info if applicable
except Exception:
cached_token = None # token exists but is invalid/expiredThen:
- If a valid cached token exists and it can read the repo, use it. No prompt needed.
- If no cached token, or the cached token can't read this private repo, ask the user for a token — once, with the explanation below.
When asking for a token (and only when you actually need to ask):
I need a Hugging Face access token with write scope (to read the LoRA if it's private/gated, and to publish the Space). Create one at https://huggingface.co/settings/tokens. Paste it here.
The same token will be reused for publishing in the final phase, so this is a one-time ask.
Then read what's in the repo:
- List the repo files (). Look for
huggingface_hub.HfApi().list_repo_files(repo_id),.safetensors, example images/videos, multiple checkpoints.README.md - Fetch the model card (). The
huggingface_hub.ModelCard.load(repo_id)dict has structured fields; thedatahas the README body.text - If multiple files exist, pick the right one — see "Picking the LoRA weights file" in
.safetensors. Briefly: README-recommended file wins, thenreferences/zerogpu-and-publishing.md, then latest training checkpoint, otherwise ask.pytorch_lora_weights.safetensors
From the model card, try to determine:
- Base model — the field, or text mentions in the README. Usually present. Use it to pick the pipeline reference file (see Phase 2).
base_model - Task — if set, otherwise inferred from the base model and README text. The five tasks this skill handles:
pipeline_tag,text-to-image,image-to-image,text-to-video,image-to-video.video-to-video - Trigger words — often called "trigger word", "instance prompt", "activation word"; sometimes embedded in example prompts.
- Recommended inference recipe — step count, guidance scale, true CFG scale, LoRA scale, resolution. Many LoRA cards include a Python snippet; trust its parameters (steps, guidance, CFG, LoRA scale, dtype). For loading mechanics, see — prefer
adapting-to-the-lora.mdover whatever loading approach the snippet uses.pipe.load_lora_weights(...) - Example prompts and example media — use these as Gradio examples in the UI.
- Sub-task / specific use case — for image edits and video LoRAs, "what does this LoRA actually do" matters as much as the task category. A relighting LoRA, a face-swap LoRA, and a style LoRA all might be image-to-image, but the UI for each is different.
When something can't be inferred, ask the user — once, in a single batched message. Format the question to make answering trivial. For task category, list the five options as a numbered choice. For sub-task, give a one-line description ("what does this LoRA do? e.g. 'relight portraits', 'apply manga style', 'extend videos to wider aspect ratios'"). Don't ask if you can already infer it confidently from the base model or README.
If the model card has nothing helpful at all — no base model, no task, no example — surface that clearly: "The model card has no usable info. I'll need you to tell me: (1) base model, (2) what this LoRA does, (3) recommended step count and guidance scale if you know them."
必要条件:Hub上的LoRA仓库(例如)。
username/my-lora首先,尝试无需token即可读取仓库。如果成功,说明仓库是公开的——继续操作。如果返回401/403错误,说明仓库是私有/ gated的,需要认证会话才能读取。不要立即请求token,先检查用户是否已认证。
python
from huggingface_hub import HfApi, get_token
cached_token = get_token() # 读取HF_TOKEN环境变量或缓存的CLI登录token
if cached_token:
try:
info = HfApi().whoami(token=cached_token)
username = info["name"]
# info中还包含细粒度token权限信息(如适用)
except Exception:
cached_token = None # token存在但无效/已过期然后:
- 如果存在有效缓存token且可读取该仓库,则直接使用,无需提示用户。
- 如果无缓存token,或缓存token无法读取该私有仓库,则一次性向用户请求token,并附上以下说明。
当需要请求token时(仅在确实需要时):
我需要一个具有写入权限的Hugging Face访问token(用于读取私有/gated的LoRA,以及发布Space)。请在https://huggingface.co/settings/tokens创建并粘贴在此处。
该token将在最终发布阶段重复使用,因此只需请求一次。
然后读取仓库内容:
- 列出仓库文件()。查找
huggingface_hub.HfApi().list_repo_files(repo_id)、.safetensors、示例图像/视频、多个检查点文件。README.md - 获取模型卡片()。
huggingface_hub.ModelCard.load(repo_id)字典包含结构化字段;data包含README正文。text - 如果存在多个文件,选择正确的一个——参考
.safetensors中的"选择LoRA权重文件"部分。简要规则:优先选择README推荐的文件,其次是references/zerogpu-and-publishing.md,然后是最新训练检查点,否则询问用户。pytorch_lora_weights.safetensors
从模型卡片中尝试确定:
- 基础模型——字段,或README中的文本提及。通常会存在。用它来选择pipeline参考文件(见阶段2)。
base_model - 任务类型——如果设置了则使用该字段,否则根据基础模型和README文本推断。本技能处理的五类任务:
pipeline_tag、text-to-image、image-to-image、text-to-video、image-to-video。video-to-video - 触发词——通常称为"trigger word"、"instance prompt"、"activation word";有时嵌入在示例提示词中。
- 推荐推理方案——步数、引导系数、真实CFG系数、LoRA缩放比例、分辨率。许多LoRA卡片包含Python代码片段;信任其中的参数(步数、引导系数、CFG、LoRA缩放比例、数据类型)。关于加载机制,参考——优先使用
adapting-to-the-lora.md而非代码片段中的加载方式。pipe.load_lora_weights(...) - 示例提示词和示例媒体——将其用作UI中的Gradio示例。
- 子任务/特定用例——对于图像编辑和视频LoRA,"该LoRA实际功能"与任务类别同样重要。重新打光LoRA、换脸LoRA和风格迁移LoRA都属于image-to-image任务,但各自的UI完全不同。
当无法推断信息时,一次性批量询问用户。将问题格式化为易于回答的形式。对于任务类别,列出五个选项供用户选择。对于子任务,给出简短描述("该LoRA的功能是什么?例如:'人像重新打光'、'应用漫画风格'、'扩展视频至更宽比例'")。如果可从基础模型或README中自信推断,则无需询问。
如果模型卡片完全没有有用信息——无基础模型、无任务类型、无示例——则明确告知用户:"模型卡片无可用信息。请告知我:(1) 基础模型,(2) 该LoRA的功能,(3) 若您知晓,推荐的步数和引导系数。"
Phase 2 — Pick the base pipeline
阶段2 — 选择基础pipeline
Two things to decide here: which reference file to load, and which pipeline class to use. They're not the same question — a base-model family file (e.g. ) covers multiple variants, and variants in the same family don't always share a pipeline class. Get this wrong and the Space loads but produces wrong output, or fails at startup.
qwen-image.mdStep 1 — Load the reference file for this base model family.
- — covers Qwen-Image and Qwen-Image-Edit family (text-to-image and image-to-image).
references/base-models/qwen-image.md - — covers LTX family (text-to-video, image-to-video, video-to-video, including IC-LoRAs).
references/base-models/ltx.md
If the base model isn't in one of these files, this skill doesn't have first-class support yet. Tell the user, and ask whether they want to proceed by analogy (use the closest model's recipe and adjust) or stop. Don't guess silently.
Step 2 — Verify the pipeline class against the base model's own card. This step is mandatory, not optional.
A new base model variant might use the same pipeline class with a different repo path, or a new pipeline class entirely. Don't trust the reference file's table alone — it's best-effort and can lag a recent release. Verify before committing:
python
from huggingface_hub import ModelCard
base_card = ModelCard.load(base_model_id)需要决定两件事:加载哪个参考文件,以及使用哪个pipeline类。这两个问题并不相同——基础模型族文件(例如)涵盖多个变体,同一族的变体不一定共享pipeline类。如果选择错误,Space可能加载后输出错误结果,或启动失败。
qwen-image.md步骤1 — 加载该基础模型族的参考文件。
- ——涵盖Qwen-Image和Qwen-Image-Edit系列(text-to-image和image-to-image)。
references/base-models/qwen-image.md - ——涵盖LTX系列(text-to-video、image-to-video、video-to-video,包括IC-LoRAs)。
references/base-models/ltx.md
如果基础模型不在上述文件中,说明本技能暂未提供一流支持。告知用户,并询问他们是否希望通过类比方式继续(使用最接近模型的方案并调整)或停止操作。不要盲目猜测。
步骤2 — 对照基础模型自身的卡片验证pipeline类。此步骤为必填项,不可省略。
新的基础模型变体可能使用相同的pipeline类但仓库路径不同,或使用全新的pipeline类。不要仅信任参考文件中的表格——它是尽力而为的结果,可能滞后于最新版本。提交前务必验证:
python
from huggingface_hub import ModelCard
base_card = ModelCard.load(base_model_id)Read base_card.text — find the diffusers inference snippet, note the pipeline class it imports.
读取base_card.text——找到diffusers推理代码片段,记录导入的pipeline类。
The class imported in the base model card's diffusers snippet is the source of truth. Real examples where this matters:
- `Qwen-Image-Edit` uses `QwenImageEditPipeline`. `Qwen-Image-Edit-2509` and `Qwen-Image-Edit-2511` use `QwenImageEditPlusPipeline` — different class, different default parameters, takes a list of images instead of one. A LoRA targeting 2511 loaded onto `QwenImageEditPipeline` produces broken output.
- LTX-Video uses `LTXPipeline`/`LTXImageToVideoPipeline`/`LTXConditionPipeline`. LTX-2 uses `LTX2Pipeline` from a different module path. LTX-2.3 sometimes needs a native pipeline outside diffusers.
If the base model card has no diffusers snippet at all, fall back to the reference file's table — and tell the user you're falling back, in case they know something the table doesn't.
The cost of this verification is one Hub fetch and a few seconds of reading. The cost of skipping it is the failure mode the previous bullet describes — a "working" Space that's quietly using the wrong class.
**Step 3 — Diffusers vs native pipeline.** Default to `diffusers` when the base model has a diffusers pipeline class. That's the case for Qwen-Image and Qwen-Image-Edit and most of LTX. Some LTX variants (notably LTX-2.3 with certain IC-LoRAs) need a native pipeline; the LTX reference says when. Diffusers gives standard `load_lora_weights` / `set_adapters` semantics; the native path needs LoRA-specific glue.
---
基础模型卡片中diffusers代码片段导入的类是权威来源。以下是实际案例:
- `Qwen-Image-Edit`使用`QwenImageEditPipeline`。`Qwen-Image-Edit-2509`和`Qwen-Image-Edit-2511`使用`QwenImageEditPlusPipeline`——不同的类、不同的默认参数、接受图像列表而非单张图像。针对2511的LoRA加载到`QwenImageEditPipeline`会产生损坏的输出。
- LTX-Video使用`LTXPipeline`/`LTXImageToVideoPipeline`/`LTXConditionPipeline`。LTX-2使用来自不同模块路径的`LTX2Pipeline`。LTX-2.3有时需要diffusers之外的原生pipeline。
如果基础模型卡片完全没有diffusers代码片段,则退回到参考文件的表格——并告知用户正在使用 fallback 方案,以防用户知晓表格中未涵盖的信息。
验证的成本是一次Hub获取和几秒的阅读时间。跳过验证的代价是上述案例中的失败模式——看似"正常运行"的Space实际上在错误使用pipeline类。
**步骤3 — Diffusers vs 原生pipeline**。当基础模型有diffusers pipeline类时,默认使用`diffusers`。Qwen-Image、Qwen-Image-Edit和大多数LTX模型都属于这种情况。某些LTX变体(特别是带有特定IC-LoRAs的LTX-2.3)需要原生pipeline;LTX参考文件会说明何时使用。Diffusers提供标准的`load_lora_weights`/`set_adapters`语义;原生路径需要LoRA特定的粘合代码。
---Phase 3 — Design the UI for this LoRA
阶段3 — 为该LoRA设计UI
Don't reach for a template. Reason from the LoRA's task and inputs to a UI.
Read for the per-task baseline UI patterns (what the standard inputs/outputs look like for T2I, I2I, T2V, I2V, V2V).
references/tasks.mdThen read , which is about thinking through what this specific LoRA needs — beyond the task category. That file is the most important one in this skill. The same task can need very different UIs: a pose-control LTX LoRA needs a video input and a pose-extraction preview; an outpaint LTX LoRA needs an aspect-ratio picker and a black-margin preview; a relighting Flux LoRA needs an image and a brush canvas for indicating where to add light. None of those reduce to "the V2V template" or "the I2I template".
references/adapting-to-the-lora.mdSelf-check before writing the UI. Write one sentence describing what a user does with this Space in 10 seconds. If that sentence doesn't distinguish this LoRA from any other LoRA of the same task, the UI isn't shaped enough yet.
Examples that pass the self-check:
- "Upload a video, pick a target aspect ratio, click Generate; the model fills the empty margins."
- "Draw colored brush strokes where you want light, pick an illumination style, click Generate; the model relights the photo."
- "Upload a video of someone moving and an image of a different character; the model produces a video of the character doing the motion."
Examples that fail:
- "Type a prompt and click generate." (Generic T2I — say more.)
- "Upload an image and an instruction." (Generic edit — what kind of edit?)
Gradio component freshness. Gradio's component set evolves. Before defaulting to plain components, consider whether something newer fits better — for example for before/after on edit LoRAs, for persistent prefs, for UIs that change based on input. If you're unsure whether a component exists or what its signature is, web-fetch the current Gradio docs at https://www.gradio.app/docs rather than guessing.
gr.ImageSlidergr.BrowserState@gr.renderWhen stock and Hub custom components aren't enough — creative mode. If the LoRA's natural input is a shape no Gradio component (built-in or on the Hub) expresses well — point sets, strokes, trajectories, multi-region annotations with metadata, 3D rotation gizmos, timeline scrubbers, anything where the user manipulates a thing on top of media — drop down to custom HTML/JS via . See for the Gradio primitives (, injection, addressing, the two JS↔Python state-sync approaches), the discipline around defining a JSON wire format, and the pitfalls. Don't reach for creative mode just because it would be cool — reach for it when the LoRA's input shape demands it. And don't skip the Hub custom components rung above (e.g. ) before going fully bespoke.
gr.HTMLreferences/creative-mode.mdgr.HTMLhead=elem_idgradio_image_annotationgr.Exampleslinoyts/repo-to-space-example-inputslinoyts/repo-to-space-example-videoscategoriescaptioncache_examples=True, cache_mode="lazy"references/zerogpu-and-publishing.md不要直接使用模板。根据LoRA的任务和输入推导UI设计。
阅读获取各任务的基准UI模式(T2I、I2I、T2V、I2V、V2V的标准输入/输出样式)。
references/tasks.md然后阅读,其中介绍了思考该特定LoRA的需求——超出任务类别的部分。该文件是本技能中最重要的文件。同一任务可能需要截然不同的UI:姿态控制LTX LoRA需要视频输入和姿态提取预览;扩绘LTX LoRA需要宽高比选择器和黑边预览;Flux重新打光LoRA需要图像和用于指示打光位置的画笔画布。这些都无法简化为"V2V模板"或"I2I模板"。
references/adapting-to-the-lora.md编写UI前的自检。用一句话描述用户在10秒内如何使用该Space。如果这句话无法将该LoRA与同任务的其他LoRA区分开,说明UI设计还不够贴合。
通过自检的示例:
- "上传视频,选择目标宽高比,点击生成;模型填充空白边距。"
- "在需要打光的位置绘制彩色笔触,选择光照风格,点击生成;模型为照片重新打光。"
- "上传人物动作视频和另一角色的图像;模型生成该角色做出相同动作的视频。"
未通过自检的示例:
- "输入提示词并点击生成。"(通用T2I——需更具体。)
- "上传图像和指令。"(通用编辑——需说明编辑类型。)
Gradio组件时效性。Gradio的组件集不断演进。在默认使用普通组件前,考虑是否有更新的组件更合适——例如,编辑类LoRA使用进行前后对比,保存持久化偏好,实现基于输入动态变化的UI。如果不确定组件是否存在或其签名,可访问Gradio官方文档https://www.gradio.app/docs查询,不要猜测。
gr.ImageSlidergr.BrowserState@gr.render当标准组件和Hub自定义组件不足时——创意模式。如果LoRA的自然输入无法用现有Gradio组件(内置或Hub上的)很好地表达——点集、笔触、轨迹、带元数据的多区域标注、3D旋转控件、时间轴 scrubber、任何用户在媒体上操作元素的场景——可通过使用自定义HTML/JS。参考了解Gradio原语(、注入、寻址、两种JS↔Python状态同步方式)、定义JSON通信格式的规则,以及常见陷阱。不要仅为了炫酷而使用创意模式——只有当LoRA的输入形式要求时才使用。在完全自定义前,不要跳过Hub自定义组件(例如)。
gr.HTMLreferences/creative-mode.mdgr.HTMLhead=elem_idgradio_image_annotation媒体输入Space的。当模型仓库中没有合适的示例媒体时,从共享输入池中获取——按模态分类,以便HF数据集查看器能正确渲染缩略图:图像来自,视频来自。两者均为CC0许可,带有和自然语言元数据,且每个数据集的README中都有相同的过滤/排序规则。选择2-3个符合任务的示例,预处理为模型期望的格式,并嵌入到Space中。设置,以便首次点击时缓存,而非在构建时运行示例(参考)。
gr.Exampleslinoyts/repo-to-space-example-inputslinoyts/repo-to-space-example-videoscategoriescaptioncache_examples=True, cache_mode="lazy"references/zerogpu-and-publishing.mdPhase 4 — Write the Space files
阶段4 — 编写Space文件
Before writing, tell the user concretely what's about to happen — name the actual files. Not "I'll write the three files" but something like:
"Now I'll write the three files needed to publish a Space:(the Gradio demo and inference code),app.py(Python dependencies), andrequirements.txt(Space configuration including ZeroGPU hardware setting). Then I'll show all three for your review before publishing."README.md
This anchors the user in what's being produced. Don't say "three files" without naming them — it's vague and signals lack of commitment to the deliverable.
The three files are tightly coupled: is determined by what imports, and the YAML frontmatter sets the SDK version, hardware, and Space title that have to match. Write them together, then show all three to the user for approval in one batched message before publishing.
requirements.txtapp.pyREADME.mdRead for the ZeroGPU rules. The non-obvious ones:
references/zerogpu-and-publishing.md- Models go on at module level (not lazy-loaded inside the GPU function). ZeroGPU has a CUDA emulation that makes this work pre-allocation, and module-level placement is significantly faster than deferred placement.
cuda - The function that runs inference is decorated with . Pick a duration appropriate for the task — short for image generation, longer for video.
@spaces.GPU(duration=...) - Don't use — it's incompatible with ZeroGPU's process model.
torch.compile
编写前,明确告知用户即将执行的操作——列出具体文件名。不要说"我将编写三个文件",而是类似:
"现在我将编写发布Space所需的三个文件:(Gradio演示和推理代码)、app.py(Python依赖)和**requirements.txt**(包含ZeroGPU硬件设置的Space配置)。然后我会将三个文件一并展示给您审核,之后再发布。"README.md
这能让用户清楚了解产出内容。不要只说"三个文件"却不命名——这会显得模糊,且表明对交付成果缺乏承诺。
三个文件紧密关联:由的导入内容决定,的YAML前置元数据设置的SDK版本、硬件和Space标题必须与其他文件匹配。同步编写三个文件,然后一次性将所有文件展示给用户以获得批准,再进行发布。
requirements.txtapp.pyREADME.md阅读了解ZeroGPU规则。非显而易见的规则包括:
references/zerogpu-and-publishing.md- 模型在模块级别加载到(不要延迟加载到GPU函数内部)。ZeroGPU有CUDA仿真功能,可在预分配前正常工作,且模块级别加载比延迟加载快得多。
cuda - 运行推理的函数需用装饰。根据任务选择合适的时长——图像生成较短,视频生成较长。
@spaces.GPU(duration=...) - 不要使用——它与ZeroGPU的进程模型不兼容。
torch.compile
app.py
app.pyapp.py
app.pyCompose from the pieces decided in Phases 1–3. Don't paste from a template. Each section should be there because it's needed:
- Imports — ,
gradio as gr,torch, the pipeline class, anything the preprocessing needs.spaces - Constants — ,
LORA_REPO, recommended step count, guidance, LoRA scale, trigger word.BASE_MODEL - Module-level model load — pipeline ,
from_pretrained,.to("cuda"). If the LoRA repo is private, passload_lora_weights.token=os.environ["HF_TOKEN"] - Preprocessing functions (if any) — pose extraction, padding, mask building, etc. CPU code can run at module level; GPU code needs to be inside a function.
@spaces.GPU - The inference function — decorated with . Validates inputs, applies trigger word, builds the pipeline kwargs, returns outputs.
@spaces.GPU(duration=...) - The Gradio Blocks — the UI from Phase 3, wired to the inference function.
Common things to get right:
- Return the actually-used seed alongside the result so the user can reproduce.
- on the inference function surfaces diffusers' internal progress bar.
gr.Progress(track_tqdm=True) - Validate inputs — raise when a required input is missing, rather than letting the pipeline fail with a cryptic error.
gr.Error("Please upload an image first.") - On , use
gr.Examples— plaincache_examples=True, cache_mode="lazy"runs examples at build time and fails on ZeroGPU; lazy mode defers caching to the first user click.cache_examples=True
整合阶段1-3中确定的内容。不要从模板粘贴。每个部分都应是必要的:
- 导入——、
gradio as gr、torch、pipeline类、预处理所需的任何库。spaces - 常量——、
LORA_REPO、推荐步数、引导系数、LoRA缩放比例、触发词。BASE_MODEL - 模块级别模型加载——pipeline、
from_pretrained、.to("cuda")。如果LoRA仓库是私有的,传递load_lora_weights。token=os.environ["HF_TOKEN"] - 预处理函数(如有)——姿态提取、填充、遮罩构建等。CPU代码可在模块级别运行;GPU代码需放在函数内部。
@spaces.GPU - 推理函数——用装饰。验证输入、应用触发词、构建pipeline参数、返回输出。
@spaces.GPU(duration=...) - Gradio Blocks——阶段3设计的UI,与推理函数关联。
需要注意的常见事项:
- 返回实际使用的随机种子,以便用户复现结果。
- 在推理函数上添加,以显示diffusers的内部进度条。
gr.Progress(track_tqdm=True) - 验证输入——当缺少必填输入时,抛出,而非让pipeline抛出晦涩的错误。
gr.Error("请先上传图像。") - 在上使用
gr.Examples——单纯的cache_examples=True, cache_mode="lazy"会在构建时运行示例,在ZeroGPU上会失败;lazy模式将缓存延迟到用户首次点击时。cache_examples=True
requirements.txt
requirements.txtrequirements.txt
requirements.txtDon't ship a fixed minimal list and hope for the best. The "minimal" list works for plain T2I LoRAs and breaks the moment the base model has a vision-language text encoder, video output, or any non-trivial preprocessing. Derive from what the Space actually needs, in this order:
requirements.txt- Every top-level non-stdlib import in . If
app.pydoesapp.py,import cv2hasrequirements.txt. If it doesopencv-python,from controlnet_aux import OpenposeDetectorhasrequirements.txt. Walk the imports mechanically. (Note the exclusions in the next paragraph — some imports are runtime built-ins and don't need to be listed.)controlnet-aux - What the base-model reference's "Required dependencies" subsection says. Each base-model file lists the non-obvious extras the pipeline pulls in — for Qwen-Image (Qwen 2.5-VL text encoder),
torchvisionfor LTX (video export), etc. Include all of them. These are the deps that aren't picked up from imports because the pipeline's components import them transitively at load time.imageio[ffmpeg] - What the LoRA's own model card explicitly mentions installing. If the LoRA README has its own block, lift the deps from there.
pip install - The diffusers/ML stack: ,
diffusers,transformers,accelerate,peft. Default to plain (unpinned). Switchsafetensorstodiffusersif the base-model reference says the model needs it (recent releases often do — Qwen-Image-Edit-2511 is a current example).git+https://github.com/huggingface/diffusers
What not to list in :
requirements.txt- — controlled by the
gradiofield insdk_version:'s YAML frontmatter, not byREADME.md. Listing it in requirements is at best ignored, at worst causes a version conflict with the SDK. Set the version in the README only.requirements.txt - — provided by the Space runtime. Only add if you need a specific version pinned (rare, and usually a sign something else is wrong).
torch - — provided by the Space runtime. Only add if you need a specific version pinned.
spaces - — provided by the Space runtime. Only add if you need a specific version pinned.
huggingface_hub
These four come pre-installed in the ZeroGPU container. Listing them anyway is the kind of "include rather than skip" instinct that's right for non-baseline deps but wrong for baseline ones, because pinning conflicts with the runtime's managed versions.
Bias for everything else: include rather than skip when uncertain. A package the Space doesn't actually use causes a slightly slower build. A missing required package causes a startup-time crash that's much harder for the user to diagnose. These costs aren't symmetric — the test failure that prompted this rule was exactly the second kind.
But two specific deps are not safe to add reflexively because they routinely cause more problems than they solve on ZeroGPU:
-
— pinned to specific torch versions, frequent source of conflicts. The ZeroGPU runtime ships torch 2.8+, so any pinned
xformersversion must support that. Additional gotcha on Blackwell: xformers' FA3 dispatch mis-gates the hardware (FA3 kernels are Hopper-only atxformers, but the dispatcher gates onsm_90a, which also matches Blackwell) and crashes at kernel launch withdevice_capability >= (9, 0). If a Space using xformers attention hits this, disable FA3 dispatch at module load:CUDA invalid argumentpythontry: from xformers.ops.fmha import _set_use_fa3 _set_use_fa3(False) except Exception: passOnly includeifxformersactually uses it.app.py -
— needs a build step, often fails to install. Same torch 2.8+ alignment caveat as
flash-attn. Only include ifxformersactually uses it.app.py
Pin other versions only when you have a reason (e.g. a known incompatibility, or matching a recipe from the model card).
不要使用固定的最小依赖列表并寄希望于它能正常工作。"最小"列表适用于普通T2I LoRA,但当基础模型有视觉语言文本编码器、视频输出或任何非平凡预处理时就会失效。根据Space的实际需求推导,顺序如下:
requirements.txt- 中所有顶级非标准库导入。如果
app.py导入app.py,则cv2需包含requirements.txt。如果从opencv-python导入controlnet_aux,则OpenposeDetector需包含requirements.txt。逐一检查导入内容。(注意下一段中的排除项——某些导入是运行时内置的,无需列出。)controlnet-aux - 基础模型参考文件中"Required dependencies"小节的内容。每个基础模型文件列出了pipeline所需的非显式依赖——Qwen-Image需要(Qwen 2.5-VL文本编码器),LTX需要
torchvision(视频导出)等。全部包含这些依赖。这些依赖无法通过导入检测到,因为pipeline组件会在加载时间接导入它们。imageio[ffmpeg] - LoRA模型卡片明确提及的依赖。如果LoRA的README有自己的块,则从中提取依赖。
pip install - Diffusers/ML栈:、
diffusers、transformers、accelerate、peft。默认不固定版本。如果基础模型参考文件说明模型需要,将safetensors改为diffusers(最新版本通常需要——Qwen-Image-Edit-2511就是当前示例)。git+https://github.com/huggingface/diffusers
不要在中列出的内容:
requirements.txt- ——由
gradio的YAML前置元数据中的README.md字段控制,而非sdk_version:。在requirements中列出它最多会被忽略,最坏会导致与SDK版本冲突。仅在README中设置版本。requirements.txt - ——由Space运行时提供。仅当需要固定特定版本时添加(罕见,通常表明存在其他问题)。
torch - ——由Space运行时提供。仅当需要固定特定版本时添加。
spaces - ——由Space运行时提供。仅当需要固定特定版本时添加。
huggingface_hub
这四个库已预安装在ZeroGPU容器中。列出它们属于"宁多勿少"的本能,但对于基线库来说是错误的,因为固定版本会与运行时的管理版本冲突。
其他内容的原则:不确定时宁多勿少。Space实际不需要的包只会导致构建速度稍慢。缺少必要的包会导致启动时崩溃,用户难以诊断。两者的代价不对称——制定此规则的原因正是第二种测试失败案例。
但有两个特定依赖不能随意添加,因为它们在ZeroGPU上通常会引发更多问题:
-
——与特定torch版本绑定,经常导致冲突。ZeroGPU运行时使用torch 2.8+,因此任何固定的
xformers版本必须支持该版本。Blackwell硬件的额外问题:xformers的FA3调度错误地限制了硬件(FA3内核仅支持Hopper的xformers,但调度器限制为sm_90a,这也匹配Blackwell),会在内核启动时因device_capability >= (9, 0)崩溃。如果使用xformers attention的Space遇到此问题,在模块加载时禁用FA3调度:CUDA invalid argumentpythontry: from xformers.ops.fmha import _set_use_fa3 _set_use_fa3(False) except Exception: pass仅当实际使用时才包含app.py。xformers -
——需要构建步骤,经常安装失败。与
flash-attn一样,需注意与torch 2.8+的兼容性。仅当xformers实际使用时才包含。app.py
仅在有理由时固定其他版本(例如已知不兼容,或匹配模型卡片中的方案)。
README.md
README.mdREADME.md
README.mdSpaces are configured by the YAML frontmatter at the top of . This frontmatter is what selects ZeroGPU.
README.md---
title: <human-readable title>
emoji: 🎨
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: <current Gradio version>
app_file: app.py
pinned: false
hardware: zero-a10g
short_description: <one short line for the Space tile, ~60 chars max>
models:
- <base model repo>
- <lora repo>
---Space由顶部的YAML前置元数据配置。该元数据用于选择ZeroGPU。
README.md---
title: <易读标题>
emoji: 🎨
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: <当前Gradio版本>
app_file: app.py
pinned: false
hardware: zero-a10g
short_description: <Space卡片的简短描述,最多约60字符>
models:
- <基础模型仓库>
- <LoRA仓库>
---<title>
<标题>
A short description with links to the LoRA and base model.
Key fields:
- `sdk: gradio` — required for ZeroGPU.
- `sdk_version` — match the Gradio version you wrote against. Look up the current version (`pip index versions gradio`, or check https://www.gradio.app) rather than guessing.
- `hardware: zero-a10g` — the legacy string for ZeroGPU. The actual hardware is NVIDIA RTX Pro 6000 Blackwell, but the identifier is `zero-a10g`. ZeroGPU is available to PRO, Team, and Enterprise accounts; if the user isn't subscribed, the Space will fall back to CPU. Mention this if you suspect they aren't on PRO.
- `models:` — list base and LoRA repos. This enables Hub caching and discovery.
- `short_description` — appears on the Space tile. **Keep it short (~60 characters or less).** The Hub's YAML validator rejects long values with a 400 from `https://huggingface.co/api/validate-yaml`, which surfaces as an `HfHubHTTPError` during `create_repo` or `upload_file`. The exact server-side limit isn't documented and may change, so target the visible-tile-length range rather than pushing right up to a cap. If you do hit the 400, the fix is almost always to shorten this field. One sentence describing what the Space does is plenty — the README body below the YAML is where you put longer prose.包含LoRA和基础模型链接的简短描述。
关键字段:
- `sdk: gradio`——ZeroGPU必需。
- `sdk_version`——与编写时使用的Gradio版本匹配。查询当前版本(`pip index versions gradio`,或访问https://www.gradio.app),不要猜测。
- `hardware: zero-a10g`——ZeroGPU的旧标识符。实际硬件是NVIDIA RTX Pro 6000 Blackwell,但标识符仍为`zero-a10g`。ZeroGPU仅对PRO、Team和Enterprise账户可用;如果用户未订阅,Space将回退到CPU。如果怀疑用户未订阅PRO,需告知此情况。
- `models:`——列出基础模型和LoRA仓库。这将启用Hub缓存和发现功能。
- `short_description`——显示在Space卡片上。**保持简短(约60字符或更少)**。Hub的YAML验证器会拒绝过长的值,返回来自`https://huggingface.co/api/validate-yaml`的400错误,表现为`create_repo`或`upload_file`时的`HfHubHTTPError`。服务器端的确切限制未记录且可能更改,因此目标长度应在卡片可见范围内,不要接近上限。如果遇到400错误,修复方法几乎总是缩短此字段。用一句话描述Space功能即可——YAML下方的README正文用于放置更长的内容。Single batched approval — order of operations matters
批量批准——操作顺序至关重要
The discipline here is write all three files first, then show them all together in one message. Not "write app.py → talk about it → write requirements → talk about it → write README → talk about it." That rhythm produces three approval moments even if you don't explicitly ask for approval, because the user is being asked to react after each file.
Concretely:
- Write ,
app.py, andrequirements.txtin succession with no intervening prose. No commentary between files. No "Now I'll write the next one." No description of what each file does as you produce it. Just the three files, back to back.README.md - Then, in a single message, ask for approval covering all three at once. Something like: "Here's the Space — (N lines),
app.py, andrequirements.txt. Review and confirm to publish, or tell me what to change."README.md - The user responds once, covering whatever they want changed across any of the three files.
What to avoid:
- Walking through 's structure or design choices after writing it but before writing the others. Save commentary for either the pre-writing announcement (Phase 4 opening) or the single approval message after all three exist.
app.py - Asking "ready for the next one?" or "want me to continue with requirements?" — those are implicit per-file approvals.
- Showing one file inline and offering to "show the next when you're ready" — same trap.
- Treating any of the three files as optional or as a follow-up. They are produced together as one deliverable.
If the user interrupts after seeing the first file with feedback or a question, that's fine — engage with it — but the rule still applies: the next time you produce code, produce all remaining files together, not one at a time.
规则是先编写所有三个文件,然后在一条消息中一并展示。不要"编写app.py → 讨论 → 编写requirements → 讨论 → 编写README → 讨论"。这种节奏会产生三次批准环节,即使未明确请求批准,用户也会在每个文件后做出反应。
具体操作:
- 连续编写、
app.py和requirements.txt,中间无额外说明。文件间无评论,无"现在我将编写下一个文件",无编写时描述每个文件的功能。仅依次展示三个文件。README.md - 然后在一条消息中一次性请求对所有三个文件的批准。例如:"这是Space的三个文件——(N行)、
app.py和requirements.txt。请审核并确认是否发布,或告知需要修改的内容。"README.md - 用户一次回复,涵盖对三个文件的任何修改需求。
需要避免的情况:
- 编写完后、编写其他文件前,讲解其结构或设计选择。将评论留到编写前的说明(阶段4开头)或所有文件完成后的批准消息中。
app.py - 询问"准备好下一个文件了吗?"或"要我继续编写requirements吗?"——这些是隐含的逐文件批准请求。
- 展示一个文件并提出"准备好后再展示下一个"——同样的陷阱。
- 将任何一个文件视为可选或后续补充。它们作为一个交付成果同步生成。
如果用户在看到第一个文件后打断并提供反馈或提问,没问题——回应即可,但规则仍然适用:下次生成代码时,一次性生成所有剩余文件,而非逐个生成。
Phase 5 — Publish the Space
阶段5 — 发布Space
Use the authenticated session from Phase 1. Default to private, so the user can vet the Space before flipping it public. Confirm the target username with the user before creating: "I'll publish to — confirm?"
{username}/{space_name}python
from huggingface_hub import HfApi, SpaceHardware
api = HfApi(token=hf_token)
username = api.whoami()["name"]
repo_id = f"{username}/{space_name}"
api.create_repo(
repo_id=repo_id,
repo_type="space",
space_sdk="gradio",
space_hardware=SpaceHardware.ZERO_A10G,
private=True,
exist_ok=True,
)使用阶段1中的认证会话。默认设置为私有,以便用户在公开前审核Space。创建前确认目标用户名:"我将发布到——请确认?"
{username}/{space_name}python
from huggingface_hub import HfApi, SpaceHardware
api = HfApi(token=hf_token)
username = api.whoami()["name"]
repo_id = f"{username}/{space_name}"
api.create_repo(
repo_id=repo_id,
repo_type="space",
space_sdk="gradio",
space_hardware=SpaceHardware.ZERO_A10G,
private=True,
exist_ok=True,
)Upload files
上传文件
for path in ["app.py", "requirements.txt", "README.md"]:
api.upload_file(path_or_fileobj=path, path_in_repo=path,
repo_id=repo_id, repo_type="space")
If the LoRA repo itself is private/gated, the Space needs the token at runtime to download the LoRA. Set it as a Space secret:
```python
api.add_space_secret(repo_id=repo_id, key="HF_TOKEN", value=HF_TOKEN)…and in , load the LoRA with .
app.pytoken=os.environ["HF_TOKEN"]After upload, run the smoke-test below before sharing — the build runs asynchronously and silent failures (wrong , missing dep, wrong pipeline class) only surface at first inference. Once the smoke-test passes, share the Space URL () and tell the user the Space is private — they'll need to be logged in to view it. Note that the build takes a few minutes; the logs are at if anything fails.
weight_namehttps://huggingface.co/spaces/{repo_id}https://huggingface.co/spaces/{repo_id}/logs/containerPublish-time failures (before the build starts):
- from
HfHubHTTPError: 400 Bad Requestduringhttps://huggingface.co/api/validate-yamlorcreate_repo. The README YAML failed server-side validation. By far the most common cause is aupload_filethat's too long; sometimes a stray field or malformed value. Fix: shortenshort_descriptionto ~60 characters and retry. If shortening doesn't fix it, look for typos in field names or invalid values (e.g. unsupported colors inshort_description/colorFrom, an invalidcolorTostring).hardware - 403 on with
create_repo: user isn't on PRO/Team/Enterprise, so they can't request ZeroGPU at creation time. Fix: retryspace_hardware="zero-a10g"withoutcreate_repo, leavespace_hardwarein the README YAML — the Space gets created on CPU. The user can then either upgrade to PRO (auto-promotes to ZeroGPU) or apply for a community GPU grant (request via the Space's hardware settings).hardware: zero-a10g - 401/403 on : token doesn't have write scope. Fix: ask the user for a write-scoped token.
upload_file
Common build failures (after the build starts):
- LoRA mismatch in
weight_name→ check the actual filename viaload_lora_weights.list_repo_files - Base model is gated and the token wasn't set as a Space secret.
- ZeroGPU not allocated (user not on PRO) → Space falls back to CPU and is unusably slow.
- Diffusers version doesn't recognize the pipeline class → pin to git diffusers in .
requirements.txt - Missing dependency at module load → see derivation rules above; the most common case is a transitive dep like
requirements.txtfor Qwen-Image's text encoder.torchvision
If a build fails, offer to read the logs and propose a fix.
for path in ["app.py", "requirements.txt", "README.md"]:
api.upload_file(path_or_fileobj=path, path_in_repo=path,
repo_id=repo_id, repo_type="space")
如果LoRA仓库本身是私有/gated的,Space需要在运行时使用token下载LoRA。将其设置为Space密钥:
```python
api.add_space_secret(repo_id=repo_id, key="HF_TOKEN", value=HF_TOKEN)…并在中使用加载LoRA。
app.pytoken=os.environ["HF_TOKEN"]上传后,在分享前运行以下冒烟测试——构建是异步的,静默故障(错误的、缺失依赖、错误的pipeline类)仅在首次推理时显现。冒烟测试通过后,分享Space URL(),并告知用户Space是私有的——他们需要登录才能查看。注意构建需要几分钟;如果失败,日志位于。
weight_namehttps://huggingface.co/spaces/{repo_id}https://huggingface.co/spaces/{repo_id}/logs/container发布时的失败(构建开始前):
- 或
create_repo时出现upload_file(来自HfHubHTTPError: 400 Bad Request)。README的YAML未通过服务器端验证。最常见的原因是https://huggingface.co/api/validate-yaml过长;有时是字段错误或格式错误的值(例如short_description/colorFrom中不支持的颜色、无效的colorTo字符串)。修复方法:将hardware缩短至约60字符并重试。如果缩短无效,检查字段名拼写或无效值。short_description - 时
create_repo返回403:用户未订阅PRO/Team/Enterprise,因此无法在创建时请求ZeroGPU。修复方法:不带space_hardware="zero-a10g"参数重新运行space_hardware,在README YAML中保留create_repo——Space将创建在CPU上。用户可升级到PRO(自动升级到ZeroGPU)或申请社区GPU资助(通过Space的硬件设置申请)。hardware: zero-a10g - 时返回401/403:token无写入权限。修复方法:向用户请求具有写入权限的token。
upload_file
常见构建失败(构建开始后):
- 中的LoRA
load_lora_weights不匹配→通过weight_name检查实际文件名。list_repo_files - 基础模型是gated的,且未将token设置为Space密钥。
- 未分配ZeroGPU(用户未订阅PRO)→Space回退到CPU,运行速度极慢。
- Diffusers版本不识别pipeline类→在中固定为git版本的diffusers。
requirements.txt - 模块加载时缺失依赖→参考上述推导规则;最常见的情况是Qwen-Image文本编码器所需的
requirements.txt等间接依赖。torchvision
如果构建失败,主动提出读取日志并提供修复方案。
Phase 6 — Smoke-test the Space
阶段6 — 冒烟测试Space
Before declaring the Space done and handing the URL to the user, exercise it once end-to-end. Several failure modes (wrong , wrong pipeline class, missing transitive dep, gated-base-model token issue) build cleanly and only surface at first inference. The Python package ships a CLI that does exactly this — returns the endpoint signature, runs an actual inference. Both ship with the pip dependency the Space already needs, so they're available in any environment where this skill ran.
weight_namegradiogradio infogradio predictgradioStep 1 — Wait for the build. returns immediately, but the container image is still building. Poll until it reaches :
create_repoHfApi().get_space_runtime(repo_id).stageRUNNINGpython
import time
from huggingface_hub import HfApi
api = HfApi(token=hf_token)
while True:
stage = api.get_space_runtime(repo_id).stage
if stage == "RUNNING": break
if stage in {"BUILD_ERROR", "RUNTIME_ERROR", "CONFIG_ERROR"}:
raise RuntimeError(f"Build failed: {stage}. Logs: https://huggingface.co/spaces/{repo_id}/logs/container")
time.sleep(15)If the build fails, fetch the container logs (), read the traceback, and propose a fix. Don't run against a Space that isn't running — it'll hang or 503.
https://huggingface.co/spaces/{repo_id}/logs/containergradio infoStep 2 — Verify the endpoint signature. returns the exposed endpoints and their parameter types. Read the output and confirm: (a) the endpoint exists (default is , but Blocks Spaces often have a custom name from the Python function name), (b) the parameters in order match what declares, (c) file-typed params show as expected. If any of this is off, the user-facing UI may still appear correct but API calls will fail — fix and re-upload.
gradio info {repo_id} --token {hf_token}/predictapp.py"type": "filepath"Step 3 — Run one real inference. Pick the lightest viable input — the simplest example from the LoRA card, or one of the entries. Pass for private Spaces. For file inputs, the payload uses .
gr.Examples--token{"path": "...", "meta": {"_type": "gradio.FileData"}}bash
undefined在宣布Space完成并将URL交给用户前,端到端测试一次。多种故障模式(错误的、错误的pipeline类、缺失间接依赖、gated基础模型的token问题)会导致构建成功但首次推理失败。 Python包附带的CLI可完成此测试——返回端点签名,运行实际推理。这两个命令都随Space所需的 pip依赖一起提供,因此在本技能运行的任何环境中都可用。
weight_namegradiogradio infogradio predictgradio步骤1 — 等待构建完成。立即返回,但容器镜像仍在构建中。轮询直到其变为:
create_repoHfApi().get_space_runtime(repo_id).stageRUNNINGpython
import time
from huggingface_hub import HfApi
api = HfApi(token=hf_token)
while True:
stage = api.get_space_runtime(repo_id).stage
if stage == "RUNNING": break
if stage in {"BUILD_ERROR", "RUNTIME_ERROR", "CONFIG_ERROR"}:
raise RuntimeError(f"构建失败:{stage}。日志:https://huggingface.co/spaces/{repo_id}/logs/container")
time.sleep(15)如果构建失败,获取容器日志(),读取回溯信息并提出修复方案。不要对未运行的Space执行——它会挂起或返回503错误。
https://huggingface.co/spaces/{repo_id}/logs/containergradio info步骤2 — 验证端点签名。返回暴露的端点及其参数类型。读取输出并确认:(a) 端点存在(默认是,但Blocks Space通常使用Python函数名作为自定义名称),(b) 参数顺序与中的声明一致,(c) 文件类型参数显示符合预期。如果任何一项不符合,用户界面可能看似正常但API调用会失败——修复并重新上传。
gradio info {repo_id} --token {hf_token}/predictapp.py"type": "filepath"步骤3 — 运行一次实际推理。选择最简单的可行输入——LoRA卡片中的最简示例,或中的一个条目。私有Space需传递。对于文件输入,负载使用。
gr.Examples--token{"path": "...", "meta": {"_type": "gradio.FileData"}}bash
undefinedText-to-image:
文本转图像:
gradio predict {repo_id} /predict '{"prompt": "...", "aspect_ratio": "1:1", ...}' --token $HF_TOKEN
gradio predict {repo_id} /predict '{"prompt": "...", "aspect_ratio": "1:1", ...}' --token $HF_TOKEN
Image-to-image (file input):
图像转图像(文件输入):
gradio predict {repo_id} /predict '{"input_image": {"path": "/tmp/sample.jpg", "meta": {"_type": "gradio.FileData"}}, "prompt": "..."}' --token $HF_TOKEN
If you don't have a local sample image for I2I, lift one from the LoRA repo (`hf_hub_download(repo_id, filename="example.png")`) or the base model card.
**Caveat for creative-mode Spaces.** `gradio info` and `gradio predict` only exercise the Python endpoint — they tell you nothing about whether custom JS in a `gr.HTML` widget works. If the Space uses creative mode (see `references/creative-mode.md`), after the API smoke-test passes, **open the Space URL in a browser and verify the interaction once** before sharing. Server-side green plus broken JS is the most common failure mode for these.
**Step 4 — Interpret the result.**
- **Returns successfully and the output looks plausible** → done. Share the URL.
- **HTTPError 503 / "Space is sleeping"** → the Space spun down between steps 1 and 3. Wake it (`api.restart_space(repo_id)`) and retry.
- **Inference error mentioning `weight_name` / `safetensors`** → the LoRA filename in `app.py` doesn't match the actual file in the LoRA repo. Re-check `list_repo_files`, fix `weight_name=`, re-upload `app.py`.
- **Inference error mentioning a missing pipeline class or attribute** → diffusers version too old. Switch `requirements.txt` to `git+https://github.com/huggingface/diffusers` and re-upload.
- **`ImportError` at module load** → missing dep. Add it to `requirements.txt` and re-upload. The runtime logs (`/logs/run`) name the missing package.
- **OOM** → reduce default resolution or step count, or pick a smaller base variant.
- **Timeout / hangs** → bump `@spaces.GPU(duration=...)` and re-upload.
The smoke-test exists to convert these from "user discovers it and reports back" to "you discover it and fix it before sharing." Don't skip it because the build went green — green-build-broken-inference is the most common failure mode for Spaces with a non-trivial pipeline.
---gradio predict {repo_id} /predict '{"input_image": {"path": "/tmp/sample.jpg", "meta": {"_type": "gradio.FileData"}}, "prompt": "..."}' --token $HF_TOKEN
如果没有本地图像用于I2I,可从LoRA仓库(`hf_hub_download(repo_id, filename="example.png")`)或基础模型卡片中获取。
**创意模式Space的注意事项**。`gradio info`和`gradio predict`仅测试Python端点——无法告知`gr.HTML`小部件中的自定义JS是否正常工作。如果Space使用创意模式(参考`references/creative-mode.md`),API冒烟测试通过后,**在浏览器中打开Space URL并验证交互一次**再分享。服务器端正常但JS故障是这类Space最常见的失败模式。
**步骤4 — 解读结果**。
- **成功返回且输出合理**→完成。分享URL。
- **HTTPError 503 / "Space正在休眠"**→步骤1和3之间Space已停止运行。重启Space(`api.restart_space(repo_id)`)并重试。
- **推理错误提及`weight_name`/`safetensors`**→`app.py`中的LoRA文件名与LoRA仓库中的实际文件不匹配。重新检查`list_repo_files`,修复`weight_name=`,重新上传`app.py`。
- **推理错误提及缺失的pipeline类或属性**→diffusers版本过旧。将`requirements.txt`中的`diffusers`改为`git+https://github.com/huggingface/diffusers`并重新上传。
- **`ImportError`在模块加载时出现**→缺失依赖。添加到`requirements.txt`并重新上传。运行时日志(`/logs/run`)会列出缺失的包。
- **OOM(内存不足)**→降低默认分辨率或步数,或选择更小的基础模型变体。
- **超时/挂起**→增加`@spaces.GPU(duration=...)`的值并重新上传。
冒烟测试的目的是将这些问题从"用户发现并反馈"转变为"您发现并在分享前修复"。不要因为构建成功就跳过此步骤——构建成功但推理失败是具有非平凡pipeline的Space最常见的故障模式。
---What to avoid
需要避免的事项
- A generic "one demo for all LoRAs" template. The whole point of this skill is to tailor.
- Lazy-loading the model inside the GPU function. Slow on ZeroGPU, and hides startup errors until first request.
- . Not supported on ZeroGPU.
torch.compile - without
cache_examples=Trueon ZeroGPU.cache_mode="lazy" - Uploading the LoRA weights into the Space repo. Pull from the LoRA's own Hub repo at runtime.
- Asking for the HF token only at the end, then discovering the LoRA was private all along and you couldn't read the model card.
- Exposing every diffusers knob. Pick the 1–3 controls that matter for this LoRA.
- Long preambles in the chat reply once the Space is published. The Space URL is the deliverable; keep the wrap-up brief.
- 使用通用的"适用于所有LoRA的演示"模板。本技能的核心是量身定制。
- 在GPU函数内部延迟加载模型。在ZeroGPU上速度慢,且会将启动错误隐藏到首次请求时。
- 使用。ZeroGPU不支持。
torch.compile - 在ZeroGPU上使用但不设置
cache_examples=True。cache_mode="lazy" - 将LoRA权重上传到Space仓库。运行时从LoRA自身的Hub仓库拉取。
- 直到最后才请求HF token,然后发现LoRA一直是私有的,无法读取模型卡片。
- 暴露所有diffusers控件。仅选择对该LoRA重要的1-3个控件。
- Space发布后在聊天回复中添加冗长的开场白。Space URL是交付成果,总结要简洁。",