run-models

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Docs

文档

Reference: https://replicate.com/docs/llms.txt
OpenAPI schema: https://api.replicate.com/openapi.json
MCP server: https://mcp.replicate.com

Per-model docs:

https://replicate.com/{owner}/{model}/llms.txt

Set
```
Accept: text/markdown
```
when requesting docs pages for Markdown responses.

参考文档：https://replicate.com/docs/llms.txt
OpenAPI 架构：https://api.replicate.com/openapi.json
MCP 服务器：https://mcp.replicate.com

单模型文档：

https://replicate.com/{owner}/{model}/llms.txt

请求文档页面时设置
```
Accept: text/markdown
```
可获取Markdown格式响应。

Workflow

工作流程

Choose the right model - Search with the API or ask the user.
Get model metadata - Fetch input and output schema via API.
Create prediction - POST to /v1/predictions.
Poll for results - GET prediction until status is "succeeded".
Return output - Usually URLs to generated content.

选择合适的模型 - 通过API搜索或询问用户。
获取模型元数据 - 通过API获取输入和输出架构。
创建预测 - 向 /v1/predictions 发送POST请求。
轮询结果 - 持续GET预测状态，直到状态变为"succeeded"。
返回输出 - 通常是生成内容的URL。

Three ways to get output

获取输出的三种方式

Create a prediction, store its id from the response, and poll until completion.
Set a
```
Prefer: wait
```
header when creating a prediction for a blocking synchronous response. Only recommended for very fast models. Max 60 seconds.
Set an HTTPS webhook URL when creating a prediction, and Replicate will POST to that URL when the prediction completes.

创建预测，存储响应中的ID，轮询直到完成。
创建预测时设置
```
Prefer: wait
```
头，以获得阻塞式同步响应。仅推荐用于速度极快的模型，最长等待60秒。
创建预测时设置HTTPS Webhook URL，Replicate会在预测完成时向该URL发送POST请求。

Guidelines

指南

Use the
```
POST /v1/predictions
```
endpoint, as it supports both official and community models.
Every model has its own OpenAPI schema. Always fetch and check model schemas to make sure you're setting valid inputs. Even popular models change their schemas.
Validate input parameters against schema constraints (
```
minimum
```
,
```
maximum
```
,
```
enum
```
values). Don't generate values that violate them.
When unsure about a parameter value, use the model's default example or omit the optional parameter.
Don't set optional inputs unless you have a reason to. Stick to the required inputs and let the model's defaults do the work.
Use HTTPS URLs for file inputs whenever possible. You can also send base64-encoded files, but they should be avoided.
Fire off multiple predictions concurrently. Don't wait for one to finish before starting the next.
Output file URLs expire after 1 hour, so back them up if you need to keep them, using a service like Cloudflare R2.
Webhooks are a good mechanism for receiving and storing prediction output.

使用
```
POST /v1/predictions
```
端点，它支持官方和社区模型。
每个模型都有自己的OpenAPI架构。务必获取并检查模型架构，确保设置的输入有效。即使是热门模型也可能更改架构。
根据架构约束（
```
minimum
```
、
```
maximum
```
、
```
enum
```
值）验证输入参数。不要生成违反约束的值。
不确定参数值时，使用模型的默认示例或省略可选参数。
除非有必要，否则不要设置可选输入。坚持使用必填输入，让模型的默认值发挥作用。
尽可能使用HTTPS URL作为文件输入。也可以发送base64编码的文件，但应避免这种方式。
并发发起多个预测。不要等待一个预测完成后再开始下一个。
输出文件URL会在1小时后过期，因此如果需要保留，请使用Cloudflare R2等服务备份。
Webhook是接收和存储预测输出的良好机制。

Predictions

预测

A prediction goes through these states:
```
starting
```
->
```
processing
```
->
```
succeeded
```
/
```
failed
```
/
```
canceled
```
.
Official models use
```
owner/name
```
format. Community models require
```
owner/name:version_id
```
.
The
```
POST /v1/predictions
```
endpoint handles both.

预测会经历以下状态：

starting

processing

succeeded

failed

canceled

。

官方模型使用
```
owner/name
```
格式。社区模型需要
```
owner/name:version_id
```
。
```
POST /v1/predictions
```
端点可处理这两种格式。

Webhooks

Webhook

Set
```
webhook
```
to an HTTPS URL when creating a prediction. Replicate POSTs the full prediction object when it completes.

Filter events with

webhook_events_filter

start

output

logs

completed

Validate webhook signatures using the
```
Webhook-ID
```
,
```
Webhook-Timestamp
```
, and
```
Webhook-Signature
```
headers. Get the signing secret from
```
GET /v1/webhooks/default/secret
```
.

创建预测时将
```
webhook
```
设置为HTTPS URL。Replicate会在预测完成时向该URL发送完整的预测对象。

使用

webhook_events_filter

过滤事件：

start

、

output

、

logs

、

completed

。

使用

Webhook-ID

、

Webhook-Timestamp

和

Webhook-Signature

头验证Webhook签名。从

GET /v1/webhooks/default/secret

获取签名密钥。

Prediction lifetime

预测生命周期

Set
```
lifetime
```
to auto-cancel predictions that run too long (e.g.
```
30s
```
,
```
5m
```
,
```
1h
```
). Measured from creation time.

设置
```
lifetime
```
可自动取消运行时间过长的预测（例如
```
30s
```
、
```
5m
```
、
```
1h
```
）。从创建时间开始计算。

Streaming

流处理

Language models that support streaming include a
```
stream
```
URL in the response. Use SSE to receive incremental output.

支持流处理的语言模型会在响应中包含一个
```
stream
```
URL。使用SSE接收增量输出。

File handling

文件处理

Prefer HTTPS URLs for file inputs. Output URLs from one prediction can be passed directly as file inputs to the next model.
Output file URLs expire after 1 hour. Download and store them immediately if you need to keep them.

优先使用HTTPS URL作为文件输入。一个预测的输出URL可以直接作为下一个模型的文件输入。
输出文件URL会在1小时后过期。如果需要保留，请立即下载并存储。

Multi-model workflows

多模型工作流

Chain models by passing output URLs as file inputs to the next model.
Start all independent predictions in parallel, then collect results.
Output URLs are valid for 1 hour, which is enough for pipeline steps.

通过将输出URL作为文件输入传递给下一个模型来串联模型。
并行启动所有独立预测，然后收集结果。
输出URL的有效期为1小时，足以满足流水线步骤的需求。