cloudflare-workers-ai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cloudflare Workers AI - Complete Reference

Cloudflare Workers AI 完整参考指南

Production-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.
Status: Production Ready ✅ Last Updated: 2025-11-21 Dependencies: cloudflare-worker-base (for Worker setup) Latest Versions: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0

这是一份用于基于Cloudflare Workers AI构建AI驱动应用的生产级知识库。
状态: 生产就绪 ✅ 最后更新: 2025-11-21 依赖: cloudflare-worker-base(用于Worker配置) 最新版本: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0

Table of Contents

目录

Quick Start (5 minutes)

快速入门(5分钟)

1. Add AI Binding

1. 添加AI绑定

wrangler.jsonc:
jsonc
{
  "ai": {
    "binding": "AI"
  }
}
wrangler.jsonc:
jsonc
{
  "ai": {
    "binding": "AI"
  }
}

2. Run Your First Model

2. 运行你的第一个模型

typescript
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
      prompt: 'What is Cloudflare?',
    });

    return Response.json(response);
  },
};
typescript
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
      prompt: 'What is Cloudflare?',
    });

    return Response.json(response);
  },
};

3. Add Streaming (Recommended)

3. 添加流式传输(推荐)

typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true, // Always use streaming for text generation!
});

return new Response(stream, {
  headers: { 'content-type': 'text/event-stream' },
});
Why streaming?
  • Prevents buffering large responses in memory
  • Faster time-to-first-token
  • Better user experience for long-form content
  • Avoids Worker timeout issues

typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true, // Always use streaming for text generation!
});

return new Response(stream, {
  headers: { 'content-type': 'text/event-stream' },
});
为什么使用流式传输?
  • 避免在内存中缓冲大型响应
  • 缩短首令牌响应时间
  • 提升长内容的用户体验
  • 避免Worker超时问题

Workers AI API Reference

Workers AI API参考

Core API:
env.AI.run()

核心API:
env.AI.run()

typescript
const response = await env.AI.run(model, inputs, options?);
ParameterTypeDescription
model
stringModel ID (e.g.,
@cf/meta/llama-3.1-8b-instruct
)
inputs
objectModel-specific inputs (see model type below)
options.gateway.id
stringAI Gateway ID for caching/logging
options.gateway.skipCache
booleanSkip AI Gateway cache
Returns:
Promise<ModelOutput>
(non-streaming) or
ReadableStream
(streaming)
typescript
const response = await env.AI.run(model, inputs, options?);
参数类型描述
model
string模型ID(例如:
@cf/meta/llama-3.1-8b-instruct
inputs
object模型特定输入(见下方模型类型)
options.gateway.id
string用于缓存/日志的AI网关ID
options.gateway.skipCache
boolean跳过AI网关缓存
返回值:
Promise<ModelOutput>
(非流式)或
ReadableStream
(流式)

Input Types by Model Category

按模型分类的输入类型

CategoryKey InputsOutput
Text Generation
messages[]
,
stream
,
max_tokens
,
temperature
{ response: string }
Embeddings
text: string | string[]
{ data: number[][], shape: number[] }
Image Generation
prompt
,
num_steps
,
guidance
Binary PNG
Vision
messages[].content[].image_url
{ response: string }
📄 Full model details: Load
references/models-catalog.md
for complete model list, parameters, and rate limits.

分类核心输入输出
文本生成
messages[]
,
stream
,
max_tokens
,
temperature
{ response: string }
嵌入
text: string | string[]
{ data: number[][], shape: number[] }
图像生成
prompt
,
num_steps
,
guidance
二进制PNG
视觉
messages[].content[].image_url
{ response: string }
📄 完整模型详情: 加载
references/models-catalog.md
查看完整模型列表、参数及速率限制。

Model Selection Guide

模型选择指南

Text Generation (LLMs)

文本生成(LLMs)

ModelBest ForRate LimitSize
@cf/meta/llama-3.1-8b-instruct
General purpose, fast300/min8B
@cf/meta/llama-3.2-1b-instruct
Ultra-fast, simple tasks300/min1B
@cf/qwen/qwen1.5-14b-chat-awq
High quality, complex reasoning150/min14B
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b
Coding, technical content300/min32B
@hf/thebloke/mistral-7b-instruct-v0.1-awq
Fast, efficient400/min7B
模型适用场景速率限制规模
@cf/meta/llama-3.1-8b-instruct
通用场景,速度快300次/分钟8B
@cf/meta/llama-3.2-1b-instruct
超高速,简单任务300次/分钟1B
@cf/qwen/qwen1.5-14b-chat-awq
高质量,复杂推理150次/分钟14B
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b
代码生成,技术内容300次/分钟32B
@hf/thebloke/mistral-7b-instruct-v0.1-awq
快速,高效400次/分钟7B

Text Embeddings

文本嵌入

ModelDimensionsBest ForRate Limit
@cf/baai/bge-base-en-v1.5
768General purpose RAG3000/min
@cf/baai/bge-large-en-v1.5
1024High accuracy search1500/min
@cf/baai/bge-small-en-v1.5
384Fast, low storage3000/min
模型维度适用场景速率限制
@cf/baai/bge-base-en-v1.5
768通用RAG场景3000次/分钟
@cf/baai/bge-large-en-v1.5
1024高精度搜索1500次/分钟
@cf/baai/bge-small-en-v1.5
384快速,低存储占用3000次/分钟

Image Generation

图像生成

ModelBest ForRate LimitSpeed
@cf/black-forest-labs/flux-1-schnell
High quality, photorealistic720/minFast
@cf/stabilityai/stable-diffusion-xl-base-1.0
General purpose720/minMedium
@cf/lykon/dreamshaper-8-lcm
Artistic, stylized720/minFast
模型适用场景速率限制速度
@cf/black-forest-labs/flux-1-schnell
高质量,照片级真实感720次/分钟快速
@cf/stabilityai/stable-diffusion-xl-base-1.0
通用场景720次/分钟中等
@cf/lykon/dreamshaper-8-lcm
艺术风格,个性化720次/分钟快速

Vision Models

视觉模型

ModelBest ForRate Limit
@cf/meta/llama-3.2-11b-vision-instruct
Image understanding720/min
@cf/unum/uform-gen2-qwen-500m
Fast image captioning720/min

模型适用场景速率限制
@cf/meta/llama-3.2-11b-vision-instruct
图像理解720次/分钟
@cf/unum/uform-gen2-qwen-500m
快速图像 captioning720次/分钟

Common Patterns

常见模式

Pattern 1: Chat with Streaming

模式1:流式聊天

typescript
app.post('/chat', async (c) => {
  const { messages } = await c.req.json<{ messages: Array<{ role: string; content: string }> }>();
  const stream = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages, stream: true });
  return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
});
typescript
app.post('/chat', async (c) => {
  const { messages } = await c.req.json<{ messages: Array<{ role: string; content: string }> }>();
  const stream = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages, stream: true });
  return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
});

Pattern 2: RAG (Retrieval Augmented Generation)

模式2:检索增强生成(RAG)

typescript
// 1. Generate embedding for query
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userQuery] });
// 2. Search Vectorize
const matches = await env.VECTORIZE.query(embeddings.data[0], { topK: 3 });
// 3. Build context
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');
// 4. Generate with context
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: `Answer using this context:\n${context}` },
    { role: 'user', content: userQuery },
  ],
  stream: true,
});
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
📄 More patterns: Load
references/best-practices.md
for structured output, image generation, multi-model consensus, and production patterns.

typescript
// 1. 为查询生成嵌入
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userQuery] });
// 2. 搜索Vectorize
const matches = await env.VECTORIZE.query(embeddings.data[0], { topK: 3 });
// 3. 构建上下文
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');
// 4. 结合上下文生成内容
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: `Answer using this context:\n${context}` },
    { role: 'user', content: userQuery },
  ],
  stream: true,
});
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
📄 更多模式: 加载
references/best-practices.md
查看结构化输出、图像生成、多模型共识及生产环境模式。

AI Gateway Integration

AI网关集成

Enable caching, logging, and cost tracking with AI Gateway:
typescript
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { prompt: 'Hello' }, {
  gateway: { id: 'my-gateway', skipCache: false },
});
Benefits: Cost tracking, response caching (50-90% savings on repeated queries), request logging, rate limiting, analytics.

通过AI网关启用缓存、日志记录和成本追踪:
typescript
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { prompt: 'Hello' }, {
  gateway: { id: 'my-gateway', skipCache: false },
});
优势: 成本追踪、响应缓存(重复查询可节省50-90%成本)、请求日志、速率限制、数据分析。

Rate Limits & Pricing

速率限制与定价

Information last verified: 2025-01-14
Rate limits and pricing vary significantly by model. Always check the official documentation for the most current information:
Free Tier: 10,000 neurons/day Paid Tier: $0.011 per 1,000 neurons
📄 Per-model details: See
references/models-catalog.md
for specific rate limits and pricing for each model.

信息最后验证时间: 2025-01-14
速率限制和定价因模型差异显著。请始终查阅官方文档获取最新信息:
免费额度: 每日10,000神经元 付费额度: 每1,000神经元0.011美元
📄 各模型详情: 查看
references/models-catalog.md
获取每个模型的具体速率限制和定价。

Production Checklist

生产环境检查清单

Essential before deploying:
  • Enable AI Gateway for cost tracking
  • Implement streaming for text generation
  • Add rate limit retry with exponential backoff
  • Validate input length (prevent token limit errors)
  • Add input sanitization (prevent prompt injection)
📄 Full checklist: Load
references/best-practices.md
for complete production checklist, error handling patterns, monitoring, and cost optimization.

部署前必备事项:
  • 启用AI网关以追踪成本
  • 为文本生成实现流式传输
  • 添加带指数退避的速率限制重试机制
  • 验证输入长度(防止令牌超限错误)
  • 添加输入清理(防止提示注入)
📄 完整检查清单: 加载
references/best-practices.md
查看完整生产环境检查清单、错误处理模式、监控及成本优化方案。

External SDK Integrations

外部SDK集成

Workers AI supports OpenAI SDK compatibility and Vercel AI SDK:
typescript
// OpenAI SDK - use same patterns with Workers AI models
const openai = new OpenAI({
  apiKey: env.CLOUDFLARE_API_KEY,
  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});

// Vercel AI SDK - native integration
import { createWorkersAI } from 'workers-ai-provider';
const workersai = createWorkersAI({ binding: env.AI });
📄 Full integration guide: Load
references/integrations.md
for OpenAI SDK, Vercel AI SDK, and REST API examples.

Workers AI支持OpenAI SDK兼容和Vercel AI SDK:
typescript
// OpenAI SDK - 使用与Workers AI模型相同的模式
const openai = new OpenAI({
  apiKey: env.CLOUDFLARE_API_KEY,
  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});

// Vercel AI SDK - 原生集成
import { createWorkersAI } from 'workers-ai-provider';
const workersai = createWorkersAI({ binding: env.AI });
📄 完整集成指南: 加载
references/integrations.md
查看OpenAI SDK、Vercel AI SDK及REST API示例。

Limits Summary

限制汇总

FeatureLimit
Concurrent requestsNo hard limit (rate limits apply)
Max input tokensVaries by model (typically 2K-128K)
Max output tokensVaries by model (typically 512-2048)
Streaming chunk size~1 KB
Image size (output)~5 MB
Request timeoutWorkers timeout applies (30s default, 5m max CPU)
Daily free neurons10,000
Rate limitsSee "Rate Limits & Pricing" section

功能限制
并发请求无硬限制(适用速率限制)
最大输入令牌数因模型而异(通常2K-128K)
最大输出令牌数因模型而异(通常512-2048)
流式传输块大小~1 KB
输出图像大小~5 MB
请求超时遵循Worker超时规则(默认30秒,最大CPU时间5分钟)
每日免费神经元10,000
速率限制见「速率限制与定价」章节

When to Load References

何时加载参考文档

Reference FileLoad When...
references/models-catalog.md
Choosing a model, checking rate limits, comparing model capabilities
references/best-practices.md
Production deployment, error handling, cost optimization, security
references/integrations.md
Using OpenAI SDK, Vercel AI SDK, or REST API instead of native binding

参考文件加载场景
references/models-catalog.md
选择模型、查看速率限制、对比模型能力时
references/best-practices.md
生产环境部署、错误处理、成本优化、安全防护时
references/integrations.md
使用OpenAI SDK、Vercel AI SDK或REST API替代原生绑定时

References

参考链接