run-models
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDocs
文档
- Reference: https://replicate.com/docs/llms.txt
- OpenAPI schema: https://api.replicate.com/openapi.json
- MCP server: https://mcp.replicate.com
- Per-model docs:
https://replicate.com/{owner}/{model}/llms.txt - Set when requesting docs pages for Markdown responses.
Accept: text/markdown
- 参考文档:https://replicate.com/docs/llms.txt
- OpenAPI 架构:https://api.replicate.com/openapi.json
- MCP 服务器:https://mcp.replicate.com
- 单模型文档:
https://replicate.com/{owner}/{model}/llms.txt - 请求文档页面时设置 可获取Markdown格式响应。
Accept: text/markdown
Workflow
工作流程
- Choose the right model - Search with the API or ask the user.
- Get model metadata - Fetch input and output schema via API.
- Create prediction - POST to /v1/predictions.
- Poll for results - GET prediction until status is "succeeded".
- Return output - Usually URLs to generated content.
- 选择合适的模型 - 通过API搜索或询问用户。
- 获取模型元数据 - 通过API获取输入和输出架构。
- 创建预测 - 向 /v1/predictions 发送POST请求。
- 轮询结果 - 持续GET预测状态,直到状态变为"succeeded"。
- 返回输出 - 通常是生成内容的URL。
Three ways to get output
获取输出的三种方式
- Create a prediction, store its id from the response, and poll until completion.
- Set a header when creating a prediction for a blocking synchronous response. Only recommended for very fast models. Max 60 seconds.
Prefer: wait - Set an HTTPS webhook URL when creating a prediction, and Replicate will POST to that URL when the prediction completes.
- 创建预测,存储响应中的ID,轮询直到完成。
- 创建预测时设置 头,以获得阻塞式同步响应。仅推荐用于速度极快的模型,最长等待60秒。
Prefer: wait - 创建预测时设置HTTPS Webhook URL,Replicate会在预测完成时向该URL发送POST请求。
Guidelines
指南
- Use the endpoint, as it supports both official and community models.
POST /v1/predictions - Every model has its own OpenAPI schema. Always fetch and check model schemas to make sure you're setting valid inputs. Even popular models change their schemas.
- Validate input parameters against schema constraints (,
minimum,maximumvalues). Don't generate values that violate them.enum - When unsure about a parameter value, use the model's default example or omit the optional parameter.
- Don't set optional inputs unless you have a reason to. Stick to the required inputs and let the model's defaults do the work.
- Use HTTPS URLs for file inputs whenever possible. You can also send base64-encoded files, but they should be avoided.
- Fire off multiple predictions concurrently. Don't wait for one to finish before starting the next.
- Output file URLs expire after 1 hour, so back them up if you need to keep them, using a service like Cloudflare R2.
- Webhooks are a good mechanism for receiving and storing prediction output.
- 使用 端点,它支持官方和社区模型。
POST /v1/predictions - 每个模型都有自己的OpenAPI架构。务必获取并检查模型架构,确保设置的输入有效。即使是热门模型也可能更改架构。
- 根据架构约束(、
minimum、maximum值)验证输入参数。不要生成违反约束的值。enum - 不确定参数值时,使用模型的默认示例或省略可选参数。
- 除非有必要,否则不要设置可选输入。坚持使用必填输入,让模型的默认值发挥作用。
- 尽可能使用HTTPS URL作为文件输入。也可以发送base64编码的文件,但应避免这种方式。
- 并发发起多个预测。不要等待一个预测完成后再开始下一个。
- 输出文件URL会在1小时后过期,因此如果需要保留,请使用Cloudflare R2等服务备份。
- Webhook是接收和存储预测输出的良好机制。
Predictions
预测
- A prediction goes through these states: ->
starting->processing/succeeded/failed.canceled - Official models use format. Community models require
owner/name.owner/name:version_id - The endpoint handles both.
POST /v1/predictions
- 预测会经历以下状态:->
starting->processing/succeeded/failed。canceled - 官方模型使用 格式。社区模型需要
owner/name。owner/name:version_id - 端点可处理这两种格式。
POST /v1/predictions
Webhooks
Webhook
- Set to an HTTPS URL when creating a prediction. Replicate POSTs the full prediction object when it completes.
webhook - Filter events with :
webhook_events_filter,start,output,logs.completed - Validate webhook signatures using the ,
Webhook-ID, andWebhook-Timestampheaders. Get the signing secret fromWebhook-Signature.GET /v1/webhooks/default/secret
- 创建预测时将 设置为HTTPS URL。Replicate会在预测完成时向该URL发送完整的预测对象。
webhook - 使用 过滤事件:
webhook_events_filter、start、output、logs。completed - 使用 、
Webhook-ID和Webhook-Timestamp头验证Webhook签名。从Webhook-Signature获取签名密钥。GET /v1/webhooks/default/secret
Prediction lifetime
预测生命周期
- Set to auto-cancel predictions that run too long (e.g.
lifetime,30s,5m). Measured from creation time.1h
- 设置 可自动取消运行时间过长的预测(例如
lifetime、30s、5m)。从创建时间开始计算。1h
Streaming
流处理
- Language models that support streaming include a URL in the response. Use SSE to receive incremental output.
stream
- 支持流处理的语言模型会在响应中包含一个 URL。使用SSE接收增量输出。
stream
File handling
文件处理
- Prefer HTTPS URLs for file inputs. Output URLs from one prediction can be passed directly as file inputs to the next model.
- Output file URLs expire after 1 hour. Download and store them immediately if you need to keep them.
- 优先使用HTTPS URL作为文件输入。一个预测的输出URL可以直接作为下一个模型的文件输入。
- 输出文件URL会在1小时后过期。如果需要保留,请立即下载并存储。
Multi-model workflows
多模型工作流
- Chain models by passing output URLs as file inputs to the next model.
- Start all independent predictions in parallel, then collect results.
- Output URLs are valid for 1 hour, which is enough for pipeline steps.
- 通过将输出URL作为文件输入传递给下一个模型来串联模型。
- 并行启动所有独立预测,然后收集结果。
- 输出URL的有效期为1小时,足以满足流水线步骤的需求。