run-models

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Docs

文档

Workflow

工作流程

  1. Choose the right model - Search with the API or ask the user.
  2. Get model metadata - Fetch input and output schema via API.
  3. Create prediction - POST to /v1/predictions.
  4. Poll for results - GET prediction until status is "succeeded".
  5. Return output - Usually URLs to generated content.
  1. 选择合适的模型 - 通过API搜索或询问用户。
  2. 获取模型元数据 - 通过API获取输入和输出架构。
  3. 创建预测 - 向 /v1/predictions 发送POST请求。
  4. 轮询结果 - 持续GET预测状态,直到状态变为"succeeded"。
  5. 返回输出 - 通常是生成内容的URL。

Three ways to get output

获取输出的三种方式

  1. Create a prediction, store its id from the response, and poll until completion.
  2. Set a
    Prefer: wait
    header when creating a prediction for a blocking synchronous response. Only recommended for very fast models. Max 60 seconds.
  3. Set an HTTPS webhook URL when creating a prediction, and Replicate will POST to that URL when the prediction completes.
  1. 创建预测,存储响应中的ID,轮询直到完成。
  2. 创建预测时设置
    Prefer: wait
    头,以获得阻塞式同步响应。仅推荐用于速度极快的模型,最长等待60秒。
  3. 创建预测时设置HTTPS Webhook URL,Replicate会在预测完成时向该URL发送POST请求。

Guidelines

指南

  • Use the
    POST /v1/predictions
    endpoint, as it supports both official and community models.
  • Every model has its own OpenAPI schema. Always fetch and check model schemas to make sure you're setting valid inputs. Even popular models change their schemas.
  • Validate input parameters against schema constraints (
    minimum
    ,
    maximum
    ,
    enum
    values). Don't generate values that violate them.
  • When unsure about a parameter value, use the model's default example or omit the optional parameter.
  • Don't set optional inputs unless you have a reason to. Stick to the required inputs and let the model's defaults do the work.
  • Use HTTPS URLs for file inputs whenever possible. You can also send base64-encoded files, but they should be avoided.
  • Fire off multiple predictions concurrently. Don't wait for one to finish before starting the next.
  • Output file URLs expire after 1 hour, so back them up if you need to keep them, using a service like Cloudflare R2.
  • Webhooks are a good mechanism for receiving and storing prediction output.
  • 使用
    POST /v1/predictions
    端点,它支持官方和社区模型。
  • 每个模型都有自己的OpenAPI架构。务必获取并检查模型架构,确保设置的输入有效。即使是热门模型也可能更改架构。
  • 根据架构约束(
    minimum
    maximum
    enum
    值)验证输入参数。不要生成违反约束的值。
  • 不确定参数值时,使用模型的默认示例或省略可选参数。
  • 除非有必要,否则不要设置可选输入。坚持使用必填输入,让模型的默认值发挥作用。
  • 尽可能使用HTTPS URL作为文件输入。也可以发送base64编码的文件,但应避免这种方式。
  • 并发发起多个预测。不要等待一个预测完成后再开始下一个。
  • 输出文件URL会在1小时后过期,因此如果需要保留,请使用Cloudflare R2等服务备份。
  • Webhook是接收和存储预测输出的良好机制。

Predictions

预测

  • A prediction goes through these states:
    starting
    ->
    processing
    ->
    succeeded
    /
    failed
    /
    canceled
    .
  • Official models use
    owner/name
    format. Community models require
    owner/name:version_id
    .
  • The
    POST /v1/predictions
    endpoint handles both.
  • 预测会经历以下状态:
    starting
    ->
    processing
    ->
    succeeded
    /
    failed
    /
    canceled
  • 官方模型使用
    owner/name
    格式。社区模型需要
    owner/name:version_id
  • POST /v1/predictions
    端点可处理这两种格式。

Webhooks

Webhook

  • Set
    webhook
    to an HTTPS URL when creating a prediction. Replicate POSTs the full prediction object when it completes.
  • Filter events with
    webhook_events_filter
    :
    start
    ,
    output
    ,
    logs
    ,
    completed
    .
  • Validate webhook signatures using the
    Webhook-ID
    ,
    Webhook-Timestamp
    , and
    Webhook-Signature
    headers. Get the signing secret from
    GET /v1/webhooks/default/secret
    .
  • 创建预测时将
    webhook
    设置为HTTPS URL。Replicate会在预测完成时向该URL发送完整的预测对象。
  • 使用
    webhook_events_filter
    过滤事件:
    start
    output
    logs
    completed
  • 使用
    Webhook-ID
    Webhook-Timestamp
    Webhook-Signature
    头验证Webhook签名。从
    GET /v1/webhooks/default/secret
    获取签名密钥。

Prediction lifetime

预测生命周期

  • Set
    lifetime
    to auto-cancel predictions that run too long (e.g.
    30s
    ,
    5m
    ,
    1h
    ). Measured from creation time.
  • 设置
    lifetime
    可自动取消运行时间过长的预测(例如
    30s
    5m
    1h
    )。从创建时间开始计算。

Streaming

流处理

  • Language models that support streaming include a
    stream
    URL in the response. Use SSE to receive incremental output.
  • 支持流处理的语言模型会在响应中包含一个
    stream
    URL。使用SSE接收增量输出。

File handling

文件处理

  • Prefer HTTPS URLs for file inputs. Output URLs from one prediction can be passed directly as file inputs to the next model.
  • Output file URLs expire after 1 hour. Download and store them immediately if you need to keep them.
  • 优先使用HTTPS URL作为文件输入。一个预测的输出URL可以直接作为下一个模型的文件输入。
  • 输出文件URL会在1小时后过期。如果需要保留,请立即下载并存储。

Multi-model workflows

多模型工作流

  • Chain models by passing output URLs as file inputs to the next model.
  • Start all independent predictions in parallel, then collect results.
  • Output URLs are valid for 1 hour, which is enough for pipeline steps.
  • 通过将输出URL作为文件输入传递给下一个模型来串联模型。
  • 并行启动所有独立预测,然后收集结果。
  • 输出URL的有效期为1小时,足以满足流水线步骤的需求。