compare-models
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDocs
文档
- Reference: https://replicate.com/docs/llms.txt
- OpenAPI schema: https://api.replicate.com/openapi.json
- MCP server: https://mcp.replicate.com
- Per-model docs:
https://replicate.com/{owner}/{model}/llms.txt - Set when requesting docs pages for Markdown responses.
Accept: text/markdown
- 参考文档:https://replicate.com/docs/llms.txt
- OpenAPI 架构:https://api.replicate.com/openapi.json
- MCP 服务器:https://mcp.replicate.com
- 单个模型文档:
https://replicate.com/{owner}/{model}/llms.txt - 请求文档页面时设置 可获取Markdown格式的响应。
Accept: text/markdown
Workflow
工作流程
- Search or browse collections to build a shortlist of candidate models.
- Fetch each model's schema to compare inputs, outputs, and capabilities.
- Check pricing from model metadata or the Replicate website.
- Run a small batch of test predictions to compare output quality.
- Pick the model that best fits your constraints (cost, latency, quality).
- 搜索或浏览模型合集,筛选出候选模型短名单。
- 获取每个模型的架构,对比其输入、输出和功能。
- 从模型元数据或Replicate官网查看定价。
- 运行一小批测试预测,对比输出质量。
- 选择最符合你约束条件(成本、延迟、质量)的模型。
What to compare
对比维度
- Speed: Check on completed predictions for actual inference time. Official models are always warm. Community models can cold-boot.
metrics.predict_time - Cost: Official models have predictable per-run pricing. Community models charge by compute time (GPU-seconds). Run a few predictions and check the field for actual cost.
metrics - Quality: Run the same prompts through each model and compare outputs. Quality is subjective. Match it to your use case, not a leaderboard.
- Capabilities: Compare input schemas for supported features (reference images, masks, aspect ratios, streaming, multi-image input). Check output formats.
- 速度:查看已完成预测的字段获取实际推理时间。官方模型始终处于预热状态,社区模型可能需要冷启动。
metrics.predict_time - 成本:官方模型的单次运行定价可预测,社区模型按计算时间(GPU秒)收费。运行几次预测后,查看字段获取实际成本。
metrics - 质量:使用相同提示词在每个模型上运行,对比输出结果。质量具有主观性,需匹配你的使用场景,而非盲目参考排行榜。
- 功能:对比输入架构支持的特性(参考图片、蒙版、宽高比、流式输出、多图输入),检查输出格式。
Key tradeoffs
核心权衡
- Lowest cost: smaller/distilled models. Accept slower inference and lower quality.
- Lowest latency: official models or schnell/turbo variants. Accept higher cost per run.
- Highest quality: pro/max/quality variants. Accept slower inference and higher cost.
- Most control: models with ControlNet, masks, or reference images. Accept more complex input setup.
- 最低成本:小型/蒸馏模型。需接受较慢的推理速度和较低的质量。
- 最低延迟:官方模型或schnell/turbo变体。需接受更高的单次运行成本。
- 最高质量:pro/max/quality变体。需接受较慢的推理速度和更高的成本。
- 最多控制:支持ControlNet、蒙版或参考图片的模型。需接受更复杂的输入设置。
Official vs community models
官方模型 vs 社区模型
- Official models: always warm, stable APIs, predictable pricing, maintained by Replicate.
- Community models: may cold-boot, require version pinning, maintained by the author.
- If a community model meets your needs and an official model doesn't, consider creating a deployment for consistent uptime.
- 官方模型:始终预热、API稳定、定价可预测,由Replicate维护。
- 社区模型:可能需要冷启动、需固定版本,由作者维护。
- 如果社区模型满足你的需求而官方模型不满足,可以考虑创建部署以保证稳定的可用性。
Prompting guidance
提示词指导
For prompting techniques and task-specific guidance:
- Image generation and editing: see the prompt-images skill.
- Video generation: see the prompt-videos skill.
如需提示词技巧和特定任务指导:
- 图像生成与编辑:查看prompt-images技能文档。
- 视频生成:查看prompt-videos技能文档。