cloudflare-workers-ai
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCloudflare Workers AI - Complete Reference
Cloudflare Workers AI 完整参考指南
Production-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.
Status: Production Ready ✅
Last Updated: 2025-11-21
Dependencies: cloudflare-worker-base (for Worker setup)
Latest Versions: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0
这是一份用于基于Cloudflare Workers AI构建AI驱动应用的生产级知识库。
状态: 生产就绪 ✅
最后更新: 2025-11-21
依赖: cloudflare-worker-base(用于Worker配置)
最新版本: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0
Table of Contents
目录
Quick Start (5 minutes)
快速入门(5分钟)
1. Add AI Binding
1. 添加AI绑定
wrangler.jsonc:
jsonc
{
"ai": {
"binding": "AI"
}
}wrangler.jsonc:
jsonc
{
"ai": {
"binding": "AI"
}
}2. Run Your First Model
2. 运行你的第一个模型
typescript
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
prompt: 'What is Cloudflare?',
});
return Response.json(response);
},
};typescript
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
prompt: 'What is Cloudflare?',
});
return Response.json(response);
},
};3. Add Streaming (Recommended)
3. 添加流式传输(推荐)
typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true, // Always use streaming for text generation!
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' },
});Why streaming?
- Prevents buffering large responses in memory
- Faster time-to-first-token
- Better user experience for long-form content
- Avoids Worker timeout issues
typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true, // Always use streaming for text generation!
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' },
});为什么使用流式传输?
- 避免在内存中缓冲大型响应
- 缩短首令牌响应时间
- 提升长内容的用户体验
- 避免Worker超时问题
Workers AI API Reference
Workers AI API参考
Core API: env.AI.run()
env.AI.run()核心API: env.AI.run()
env.AI.run()typescript
const response = await env.AI.run(model, inputs, options?);| Parameter | Type | Description |
|---|---|---|
| string | Model ID (e.g., |
| object | Model-specific inputs (see model type below) |
| string | AI Gateway ID for caching/logging |
| boolean | Skip AI Gateway cache |
Returns: (non-streaming) or (streaming)
Promise<ModelOutput>ReadableStreamtypescript
const response = await env.AI.run(model, inputs, options?);| 参数 | 类型 | 描述 |
|---|---|---|
| string | 模型ID(例如: |
| object | 模型特定输入(见下方模型类型) |
| string | 用于缓存/日志的AI网关ID |
| boolean | 跳过AI网关缓存 |
返回值: (非流式)或 (流式)
Promise<ModelOutput>ReadableStreamInput Types by Model Category
按模型分类的输入类型
| Category | Key Inputs | Output |
|---|---|---|
| Text Generation | | |
| Embeddings | | |
| Image Generation | | Binary PNG |
| Vision | | |
📄 Full model details: Load for complete model list, parameters, and rate limits.
references/models-catalog.md| 分类 | 核心输入 | 输出 |
|---|---|---|
| 文本生成 | | |
| 嵌入 | | |
| 图像生成 | | 二进制PNG |
| 视觉 | | |
📄 完整模型详情: 加载查看完整模型列表、参数及速率限制。
references/models-catalog.mdModel Selection Guide
模型选择指南
Text Generation (LLMs)
文本生成(LLMs)
| Model | Best For | Rate Limit | Size |
|---|---|---|---|
| General purpose, fast | 300/min | 8B |
| Ultra-fast, simple tasks | 300/min | 1B |
| High quality, complex reasoning | 150/min | 14B |
| Coding, technical content | 300/min | 32B |
| Fast, efficient | 400/min | 7B |
| 模型 | 适用场景 | 速率限制 | 规模 |
|---|---|---|---|
| 通用场景,速度快 | 300次/分钟 | 8B |
| 超高速,简单任务 | 300次/分钟 | 1B |
| 高质量,复杂推理 | 150次/分钟 | 14B |
| 代码生成,技术内容 | 300次/分钟 | 32B |
| 快速,高效 | 400次/分钟 | 7B |
Text Embeddings
文本嵌入
| Model | Dimensions | Best For | Rate Limit |
|---|---|---|---|
| 768 | General purpose RAG | 3000/min |
| 1024 | High accuracy search | 1500/min |
| 384 | Fast, low storage | 3000/min |
| 模型 | 维度 | 适用场景 | 速率限制 |
|---|---|---|---|
| 768 | 通用RAG场景 | 3000次/分钟 |
| 1024 | 高精度搜索 | 1500次/分钟 |
| 384 | 快速,低存储占用 | 3000次/分钟 |
Image Generation
图像生成
| Model | Best For | Rate Limit | Speed |
|---|---|---|---|
| High quality, photorealistic | 720/min | Fast |
| General purpose | 720/min | Medium |
| Artistic, stylized | 720/min | Fast |
| 模型 | 适用场景 | 速率限制 | 速度 |
|---|---|---|---|
| 高质量,照片级真实感 | 720次/分钟 | 快速 |
| 通用场景 | 720次/分钟 | 中等 |
| 艺术风格,个性化 | 720次/分钟 | 快速 |
Vision Models
视觉模型
| Model | Best For | Rate Limit |
|---|---|---|
| Image understanding | 720/min |
| Fast image captioning | 720/min |
| 模型 | 适用场景 | 速率限制 |
|---|---|---|
| 图像理解 | 720次/分钟 |
| 快速图像 captioning | 720次/分钟 |
Common Patterns
常见模式
Pattern 1: Chat with Streaming
模式1:流式聊天
typescript
app.post('/chat', async (c) => {
const { messages } = await c.req.json<{ messages: Array<{ role: string; content: string }> }>();
const stream = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages, stream: true });
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
});typescript
app.post('/chat', async (c) => {
const { messages } = await c.req.json<{ messages: Array<{ role: string; content: string }> }>();
const stream = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages, stream: true });
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
});Pattern 2: RAG (Retrieval Augmented Generation)
模式2:检索增强生成(RAG)
typescript
// 1. Generate embedding for query
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userQuery] });
// 2. Search Vectorize
const matches = await env.VECTORIZE.query(embeddings.data[0], { topK: 3 });
// 3. Build context
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');
// 4. Generate with context
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: `Answer using this context:\n${context}` },
{ role: 'user', content: userQuery },
],
stream: true,
});
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });📄 More patterns: Load for structured output, image generation, multi-model consensus, and production patterns.
references/best-practices.mdtypescript
// 1. 为查询生成嵌入
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userQuery] });
// 2. 搜索Vectorize
const matches = await env.VECTORIZE.query(embeddings.data[0], { topK: 3 });
// 3. 构建上下文
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');
// 4. 结合上下文生成内容
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: `Answer using this context:\n${context}` },
{ role: 'user', content: userQuery },
],
stream: true,
});
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });📄 更多模式: 加载查看结构化输出、图像生成、多模型共识及生产环境模式。
references/best-practices.mdAI Gateway Integration
AI网关集成
Enable caching, logging, and cost tracking with AI Gateway:
typescript
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { prompt: 'Hello' }, {
gateway: { id: 'my-gateway', skipCache: false },
});Benefits: Cost tracking, response caching (50-90% savings on repeated queries), request logging, rate limiting, analytics.
通过AI网关启用缓存、日志记录和成本追踪:
typescript
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { prompt: 'Hello' }, {
gateway: { id: 'my-gateway', skipCache: false },
});优势: 成本追踪、响应缓存(重复查询可节省50-90%成本)、请求日志、速率限制、数据分析。
Rate Limits & Pricing
速率限制与定价
Information last verified: 2025-01-14
Rate limits and pricing vary significantly by model. Always check the official documentation for the most current information:
- Rate Limits: https://developers.cloudflare.com/workers-ai/platform/limits/
- Pricing: https://developers.cloudflare.com/workers-ai/platform/pricing/
Free Tier: 10,000 neurons/day
Paid Tier: $0.011 per 1,000 neurons
📄 Per-model details: See for specific rate limits and pricing for each model.
references/models-catalog.md信息最后验证时间: 2025-01-14
速率限制和定价因模型差异显著。请始终查阅官方文档获取最新信息:
- 速率限制: https://developers.cloudflare.com/workers-ai/platform/limits/
- 定价: https://developers.cloudflare.com/workers-ai/platform/pricing/
免费额度: 每日10,000神经元
付费额度: 每1,000神经元0.011美元
📄 各模型详情: 查看获取每个模型的具体速率限制和定价。
references/models-catalog.mdProduction Checklist
生产环境检查清单
Essential before deploying:
- Enable AI Gateway for cost tracking
- Implement streaming for text generation
- Add rate limit retry with exponential backoff
- Validate input length (prevent token limit errors)
- Add input sanitization (prevent prompt injection)
📄 Full checklist: Load for complete production checklist, error handling patterns, monitoring, and cost optimization.
references/best-practices.md部署前必备事项:
- 启用AI网关以追踪成本
- 为文本生成实现流式传输
- 添加带指数退避的速率限制重试机制
- 验证输入长度(防止令牌超限错误)
- 添加输入清理(防止提示注入)
📄 完整检查清单: 加载查看完整生产环境检查清单、错误处理模式、监控及成本优化方案。
references/best-practices.mdExternal SDK Integrations
外部SDK集成
Workers AI supports OpenAI SDK compatibility and Vercel AI SDK:
typescript
// OpenAI SDK - use same patterns with Workers AI models
const openai = new OpenAI({
apiKey: env.CLOUDFLARE_API_KEY,
baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});
// Vercel AI SDK - native integration
import { createWorkersAI } from 'workers-ai-provider';
const workersai = createWorkersAI({ binding: env.AI });📄 Full integration guide: Load for OpenAI SDK, Vercel AI SDK, and REST API examples.
references/integrations.mdWorkers AI支持OpenAI SDK兼容和Vercel AI SDK:
typescript
// OpenAI SDK - 使用与Workers AI模型相同的模式
const openai = new OpenAI({
apiKey: env.CLOUDFLARE_API_KEY,
baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});
// Vercel AI SDK - 原生集成
import { createWorkersAI } from 'workers-ai-provider';
const workersai = createWorkersAI({ binding: env.AI });📄 完整集成指南: 加载查看OpenAI SDK、Vercel AI SDK及REST API示例。
references/integrations.mdLimits Summary
限制汇总
| Feature | Limit |
|---|---|
| Concurrent requests | No hard limit (rate limits apply) |
| Max input tokens | Varies by model (typically 2K-128K) |
| Max output tokens | Varies by model (typically 512-2048) |
| Streaming chunk size | ~1 KB |
| Image size (output) | ~5 MB |
| Request timeout | Workers timeout applies (30s default, 5m max CPU) |
| Daily free neurons | 10,000 |
| Rate limits | See "Rate Limits & Pricing" section |
| 功能 | 限制 |
|---|---|
| 并发请求 | 无硬限制(适用速率限制) |
| 最大输入令牌数 | 因模型而异(通常2K-128K) |
| 最大输出令牌数 | 因模型而异(通常512-2048) |
| 流式传输块大小 | ~1 KB |
| 输出图像大小 | ~5 MB |
| 请求超时 | 遵循Worker超时规则(默认30秒,最大CPU时间5分钟) |
| 每日免费神经元 | 10,000 |
| 速率限制 | 见「速率限制与定价」章节 |
When to Load References
何时加载参考文档
| Reference File | Load When... |
|---|---|
| Choosing a model, checking rate limits, comparing model capabilities |
| Production deployment, error handling, cost optimization, security |
| Using OpenAI SDK, Vercel AI SDK, or REST API instead of native binding |
| 参考文件 | 加载场景 |
|---|---|
| 选择模型、查看速率限制、对比模型能力时 |
| 生产环境部署、错误处理、成本优化、安全防护时 |
| 使用OpenAI SDK、Vercel AI SDK或REST API替代原生绑定时 |