cloudflare-workers-ai
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCloudflare Workers AI
Cloudflare Workers AI
Status: Production Ready ✅
Last Updated: 2026-01-21
Dependencies: cloudflare-worker-base (for Worker setup)
Latest Versions: wrangler@4.58.0, @cloudflare/workers-types@4.20260109.0, workers-ai-provider@3.0.2
Recent Updates (2025):
- April 2025 - Performance: Llama 3.3 70B 2-4x faster (speculative decoding, prefix caching), BGE embeddings 2x faster
- April 2025 - Breaking Changes: max_tokens now correctly defaults to 256 (was not respected), BGE pooling parameter (cls NOT backwards compatible with mean)
- 2025 - New Models (14): Mistral 3.1 24B (vision+tools), Gemma 3 12B (128K context), EmbeddingGemma 300M, Llama 4 Scout, GPT-OSS 120B/20B, Qwen models (QwQ 32B, Coder 32B), Leonardo image gen, Deepgram Aura 2, Whisper v3 Turbo, IBM Granite, Nova 3
- 2025 - Platform: Context windows API change (tokens not chars), unit-based pricing with per-model granularity, workers-ai-provider v3.0.2 (AI SDK v5), LoRA rank up to 32 (was 8), 100 adapters per account
- October 2025: Model deprecations (use Llama 4, GPT-OSS instead)
状态:已就绪可用于生产环境 ✅
最后更新时间:2026-01-21
依赖项:cloudflare-worker-base(用于Worker配置)
最新版本:wrangler@4.58.0, @cloudflare/workers-types@4.20260109.0, workers-ai-provider@3.0.2
2025年近期更新:
- 2025年4月 - 性能提升:Llama 3.3 70B速度提升2-4倍(采用投机解码、前缀缓存技术),BGE嵌入速度提升2倍
- 2025年4月 - 破坏性变更:max_tokens参数现在默认值为256(此前未生效),BGE池化参数(cls模式不兼容旧版mean模式)
- 2025年新增模型(14个):Mistral 3.1 24B(支持视觉+工具调用)、Gemma 3 12B(128K上下文窗口)、EmbeddingGemma 300M、Llama 4 Scout、GPT-OSS 120B/20B、Qwen系列模型(QwQ 32B、Coder 32B)、Leonardo图像生成模型、Deepgram Aura 2、Whisper v3 Turbo、IBM Granite、Nova 3
- 2025年平台更新:上下文窗口API变更(基于令牌而非字符)、按模型粒度的单元计费模式、workers-ai-provider v3.0.2(AI SDK v5)、LoRA秩提升至32(此前为8)、每个账户支持100个适配器
- 2025年10月:部分模型停用(建议使用Llama 4、GPT-OSS替代)
Quick Start (5 Minutes)
快速入门(5分钟)
typescript
// 1. Add AI binding to wrangler.jsonc
{ "ai": { "binding": "AI" } }
// 2. Run model with streaming (recommended)
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: 'Tell me a story' }],
stream: true, // Always stream for text generation!
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' },
});
},
};Why streaming? Prevents buffering in memory, faster time-to-first-token, avoids Worker timeout issues.
typescript
// 1. 在wrangler.jsonc中添加AI绑定
{ "ai": { "binding": "AI" } }
// 2. 以流式传输方式运行模型(推荐)
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: '给我讲个故事' }],
stream: true, // 文本生成建议始终启用流式传输!
});
return new Response(stream, {
headers: { 'content-type': 'text/event-stream' },
});
},
};为什么选择流式传输? 避免内存缓冲,缩短首令牌响应时间,防止Worker超时问题。
Known Issues Prevention
已知问题预防
This skill prevents 7 documented issues:
本指南可预防7种已记录的问题:
Issue #1: Context Window Validation Changed to Tokens (February 2025)
问题1:上下文窗口验证改为基于令牌(2025年2月)
Error: despite model supporting larger context
Source: Cloudflare Changelog
Why It Happens: Before February 2025, Workers AI validated prompts using a hard 6144 character limit, even for models with larger token-based context windows (e.g., Mistral with 32K tokens). After the update, validation switched to token-based counting.
Prevention: Calculate tokens (not characters) when checking context window limits.
"Exceeded character limit"typescript
import { encode } from 'gpt-tokenizer'; // or model-specific tokenizer
const tokens = encode(prompt);
const contextWindow = 32768; // Model's max tokens (check docs)
const maxResponseTokens = 2048;
if (tokens.length + maxResponseTokens > contextWindow) {
throw new Error(`Prompt exceeds context window: ${tokens.length} tokens`);
}
const response = await env.AI.run('@cf/mistral/mistral-7b-instruct-v0.2', {
messages: [{ role: 'user', content: prompt }],
max_tokens: maxResponseTokens,
});错误:尽管模型支持更大的上下文,仍提示“超出字符限制”
来源:Cloudflare更新日志
问题原因:2025年2月之前,Workers AI使用固定的6144字符限制验证提示词,即使模型支持更大的基于令牌的上下文窗口(例如支持32K令牌的Mistral模型)。更新后,验证方式切换为基于令牌计数。
预防措施:检查上下文窗口限制时计算令牌数(而非字符数)。
typescript
import { encode } from 'gpt-tokenizer'; // 或模型专属的令牌计算器
const tokens = encode(prompt);
const contextWindow = 32768; // 模型的最大令牌数(请查阅文档)
const maxResponseTokens = 2048;
if (tokens.length + maxResponseTokens > contextWindow) {
throw new Error(`提示词超出上下文窗口限制:${tokens.length}个令牌`);
}
const response = await env.AI.run('@cf/mistral/mistral-7b-instruct-v0.2', {
messages: [{ role: 'user', content: prompt }],
max_tokens: maxResponseTokens,
});Issue #2: Neuron Consumption Discrepancies in Dashboard
问题2:控制台中神经元消耗数据不一致
Error: Dashboard neuron usage significantly exceeds expected token-based calculations
Source: Cloudflare Community Discussion
Why It Happens: Users report dashboard showing hundred-million-level neuron consumption for K-level token usage, particularly with AutoRAG features and certain models. The discrepancy between expected neuron consumption (based on pricing docs) and actual dashboard metrics is not fully documented.
Prevention: Monitor neuron usage via AI Gateway logs and correlate with requests. File support ticket if consumption significantly exceeds expectations.
typescript
// Use AI Gateway for detailed request logging
const response = await env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{ messages: [{ role: 'user', content: query }] },
{ gateway: { id: 'my-gateway' } }
);
// Monitor dashboard at: https://dash.cloudflare.com → AI → Workers AI
// Compare neuron usage with token counts
// File support ticket with details if discrepancy persists错误:控制台显示的神经元使用量远高于基于令牌计算的预期值
来源:Cloudflare社区讨论
问题原因:用户反馈,在使用AutoRAG功能和特定模型时,控制台显示的神经元消耗达到数亿级别,而实际令牌使用量仅为数千级别。基于定价文档计算的预期神经元消耗与控制台实际指标之间的差异尚未完全记录。
预防措施:通过AI Gateway日志监控神经元使用情况,并与请求关联。如果消耗远高于预期,提交支持工单。
typescript
// 使用AI Gateway进行详细请求日志记录
const response = await env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{ messages: [{ role: 'user', content: query }] },
{ gateway: { id: 'my-gateway' } }
);
// 在以下地址监控控制台:https://dash.cloudflare.com → AI → Workers AI
// 对比神经元使用量与令牌计数
// 如果差异持续存在,提交包含详细信息的支持工单Issue #3: AI Binding Requires Remote or Latest Tooling in Local Dev
问题3:本地开发中AI绑定需要远程或最新工具
Error:
Source: GitHub Issue #6796
Why It Happens: When using Workers AI bindings with Miniflare in local development (particularly with custom Vite plugins), the AI binding requires external workers that aren't properly exposed by older . The error occurs when Miniflare can't resolve the internal AI worker module.
Prevention: Use remote bindings for AI in local dev, or update to latest @cloudflare/vite-plugin.
"MiniflareCoreError: wrapped binding module can't be resolved (internal modules only)"unstable_getMiniflareWorkerOptionsjsonc
// wrangler.jsonc - Option 1: Use remote AI binding in local dev
{
"ai": { "binding": "AI" },
"dev": {
"remote": true // Use production AI binding locally
}
}bash
undefined错误:
来源:GitHub Issue #6796
问题原因:在本地开发中使用Miniflare运行Workers AI绑定时(尤其是配合自定义Vite插件),AI绑定需要外部Worker,但旧版未正确暴露这些Worker。当Miniflare无法解析内部AI Worker模块时会触发该错误。
预防措施:在本地开发中使用远程AI绑定,或更新至最新版@cloudflare/vite-plugin。
"MiniflareCoreError: wrapped binding module can't be resolved (internal modules only)"unstable_getMiniflareWorkerOptionsjsonc
// wrangler.jsonc - 选项1:本地开发使用远程AI绑定
{
"ai": { "binding": "AI" },
"dev": {
"remote": true // 本地使用生产环境AI绑定
}
}bash
undefinedOption 2: Update to latest tooling
选项2:更新至最新工具
npm install -D @cloudflare/vite-plugin@latest
npm install -D @cloudflare/vite-plugin@latest
Option 3: Use wrangler dev instead of custom Miniflare
选项3:使用wrangler dev替代自定义Miniflare
npm run dev
undefinednpm run dev
undefinedIssue #4: Flux Image Generation NSFW Filter False Positives
问题4:Flux图像生成NSFW过滤器误判
Error: for innocent prompts
Source: Cloudflare Community Discussion
Why It Happens: Flux image generation models () sometimes trigger false positive NSFW content errors even with innocent single-word prompts like "hamburger". The NSFW filter can be overly sensitive without context.
Prevention: Add descriptive context around potential trigger words instead of using single-word prompts.
"AiError: Input prompt contains NSFW content (code 3030)"@cf/black-forest-labs/flux-1-schnelltypescript
// ❌ May trigger error 3030
const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: 'hamburger', // Single word triggers filter
});
// ✅ Add context to avoid false positives
const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: 'A photo of a delicious large hamburger on a plate with lettuce and tomato',
num_steps: 4,
});错误:对于无害提示词,仍提示
来源:Cloudflare社区讨论
问题原因:Flux图像生成模型()有时会对“汉堡”这类无害的单字提示词触发NSFW内容错误。NSFW过滤器在缺乏上下文时可能过于敏感。
预防措施:为潜在触发词添加描述性上下文,避免使用单字提示词。
"AiError: Input prompt contains NSFW content (code 3030)"@cf/black-forest-labs/flux-1-schnelltypescript
// ❌ 可能触发错误3030
const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: 'hamburger', // 单字提示词触发过滤器
});
// ✅ 添加上下文避免误判
const response = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: '一盘配有生菜和番茄的美味大汉堡的照片',
num_steps: 4,
});Issue #5: Image Generation Error 1000 - Missing num_steps Parameter
问题5:图像生成错误1000 - 缺少num_steps参数
Error:
Source: Cloudflare Community Discussion
Why It Happens: Image generation API calls return error code 1000 when the parameter is not provided, even though documentation suggests it's optional. The parameter is actually required for most Flux models.
Prevention: Always include for image generation models (typically 4 for Flux Schnell).
"Error: unexpected type 'int32' with value 'undefined' (code 1000)"num_stepsnum_steps: 4typescript
// ✅ Always include num_steps for image generation
const image = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: 'A beautiful sunset over mountains',
num_steps: 4, // Required - typically 4 for Flux Schnell
});
// Note: FLUX.2 [klein] 4B has fixed steps=4 (cannot be adjusted)错误:
来源:Cloudflare社区讨论
问题原因:尽管文档显示参数为可选,但图像生成API调用在未提供该参数时会返回错误代码1000。实际上,大多数Flux模型都需要该参数。
预防措施:图像生成模型始终添加参数(Flux Schnell模型通常设为4)。
"Error: unexpected type 'int32' with value 'undefined' (code 1000)"num_stepsnum_steps: 4typescript
// ✅ 图像生成始终包含num_steps参数
const image = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: '山间美丽的日落',
num_steps: 4, // 必填 - Flux Schnell模型通常设为4
});
// 注意:FLUX.2 [klein] 4B模型的steps固定为4(无法调整)Issue #6: Zod v4 Incompatibility with Structured Output Tools
问题6:Zod v4与结构化输出工具不兼容
Error: Syntax errors and failed transpilation when using Stagehand with Zod v4
Source: GitHub Issue #10798
Why It Happens: Stagehand (browser automation) and some structured output examples in Workers AI fail with Zod v4 (now default). The underlying library doesn't yet support Zod v4, causing transpilation failures.
Prevention: Pin Zod to v3 until zod-to-json-schema supports v4.
zod-to-json-schemabash
undefined错误:使用Stagehand配合Zod v4时出现语法错误和转译失败
来源:GitHub Issue #10798
问题原因:Stagehand(浏览器自动化工具)和Workers AI中的部分结构化输出示例在使用默认的Zod v4时会失败。底层的库尚不支持Zod v4,导致转译失败。
预防措施:在zod-to-json-schema支持v4之前,将Zod固定为v3版本。
zod-to-json-schemabash
undefinedInstall Zod v3 specifically
专门安装Zod v3
npm install zod@3
npm install zod@3
Or pin in package.json
或在package.json中固定版本
{
"dependencies": {
"zod": "~3.23.8" // Pin to v3 for compatibility
}
}
undefined{
"dependencies": {
"zod": "~3.23.8" // 固定为v3版本以保证兼容性
}
}
undefinedIssue #7: AI Gateway Cache Headers for Per-Request Control
问题7:AI Gateway的请求级缓存控制头
Not an error, but important feature: AI Gateway supports per-request cache control via HTTP headers for custom TTL, cache bypass, and custom cache keys beyond dashboard defaults.
Source: AI Gateway Caching Documentation
Use When: You need different caching behavior for different requests (e.g., 1 hour for expensive queries, skip cache for real-time data).
Implementation: See AI Gateway Integration section below for header usage.
并非错误,但为重要功能:AI Gateway支持通过HTTP头实现请求级缓存控制,包括自定义TTL、绕过缓存、以及超出控制台默认设置的自定义缓存键。
来源:AI Gateway缓存文档
适用场景:需要为不同请求设置不同缓存行为时(例如,昂贵查询缓存1小时,实时数据跳过缓存)。
实现方式:请参阅下方AI Gateway集成部分的头信息使用说明。
API Reference
API参考
typescript
env.AI.run(
model: string,
inputs: ModelInputs,
options?: { gateway?: { id: string; skipCache?: boolean } }
): Promise<ModelOutput | ReadableStream>typescript
env.AI.run(
model: string,
inputs: ModelInputs,
options?: { gateway?: { id: string; skipCache?: boolean } }
): Promise<ModelOutput | ReadableStream>Model Selection Guide (Updated 2025)
2025年模型选择指南
Text Generation (LLMs)
文本生成(LLM)
| Model | Best For | Rate Limit | Size | Notes |
|---|---|---|---|---|
| 2025 Models | ||||
| Latest Llama, general purpose | 300/min | 17B | NEW 2025 |
| Largest open-source GPT | 300/min | 120B | NEW 2025 |
| Smaller open-source GPT | 300/min | 20B | NEW 2025 |
| 128K context, 140+ languages | 300/min | 12B | NEW 2025, vision |
| Vision + tool calling | 300/min | 24B | NEW 2025 |
| Reasoning, complex tasks | 300/min | 32B | NEW 2025 |
| Coding specialist | 300/min | 32B | NEW 2025 |
| Fast quantized | 300/min | 30B | NEW 2025 |
| Small, efficient | 300/min | Micro | NEW 2025 |
| Performance (2025) | ||||
| 2-4x faster (2025 update) | 300/min | 70B | Speculative decoding |
| Fast 8B variant | 300/min | 8B | - |
| Standard Models | ||||
| General purpose | 300/min | 8B | - |
| Ultra-fast, simple tasks | 300/min | 1B | - |
| Coding, technical | 300/min | 32B | - |
| 模型 | 适用场景 | 速率限制 | 模型规模 | 说明 |
|---|---|---|---|---|
| 2025年新增模型 | ||||
| 最新Llama模型,通用场景 | 300次/分钟 | 17B | 2025年新增 |
| 最大的开源GPT模型 | 300次/分钟 | 120B | 2025年新增 |
| 轻量化开源GPT模型 | 300次/分钟 | 20B | 2025年新增 |
| 128K上下文窗口,支持140+语言 | 300次/分钟 | 12B | 2025年新增,支持视觉 |
| 视觉+工具调用 | 300次/分钟 | 24B | 2025年新增 |
| 推理、复杂任务 | 300次/分钟 | 32B | 2025年新增 |
| 编码、技术场景 | 300次/分钟 | 32B | 2025年新增 |
| 快速量化模型 | 300次/分钟 | 30B | 2025年新增 |
| 小型、高效 | 300次/分钟 | Micro | 2025年新增 |
| 2025年性能优化模型 | ||||
| 速度提升2-4倍(2025年更新) | 300次/分钟 | 70B | 采用投机解码 |
| 快速8B变体 | 300次/分钟 | 8B | - |
| 标准模型 | ||||
| 通用场景 | 300次/分钟 | 8B | - |
| 超快速、简单任务 | 300次/分钟 | 1B | - |
| 编码、技术场景 | 300次/分钟 | 32B | - |
Text Embeddings (2x Faster - 2025)
文本嵌入(2025年速度提升2倍)
| Model | Dimensions | Best For | Rate Limit | Notes |
|---|---|---|---|---|
| 768 | Best-in-class RAG | 3000/min | NEW 2025 |
| 768 | General RAG (2x faster) | 3000/min | pooling: "cls" recommended |
| 1024 | High accuracy (2x faster) | 1500/min | pooling: "cls" recommended |
| 384 | Fast, low storage (2x faster) | 3000/min | pooling: "cls" recommended |
| 768 | Qwen embeddings | 3000/min | NEW 2025 |
CRITICAL (2025): BGE models now support parameter (recommended) but NOT backwards compatible with (default).
pooling: "cls"pooling: "mean"| 模型 | 维度 | 适用场景 | 速率限制 | 说明 |
|---|---|---|---|---|
| 768 | 顶级RAG场景 | 3000次/分钟 | 2025年新增 |
| 768 | 通用RAG(速度提升2倍) | 3000次/分钟 | 推荐使用pooling: "cls" |
| 1024 | 高精度(速度提升2倍) | 1500次/分钟 | 推荐使用pooling: "cls" |
| 384 | 快速、低存储(速度提升2倍) | 3000次/分钟 | 推荐使用pooling: "cls" |
| 768 | Qwen嵌入模型 | 3000次/分钟 | 2025年新增 |
2025年重要提示:BGE模型现在支持参数(推荐使用),但与旧版(默认值)不兼容。
pooling: "cls"pooling: "mean"Image Generation
图像生成
| Model | Best For | Rate Limit | Notes |
|---|---|---|---|
| High quality, photorealistic | 720/min | ⚠️ See warnings below |
| Leonardo AI style | 720/min | NEW 2025, requires num_steps |
| Leonardo AI variant | 720/min | NEW 2025, requires num_steps |
| General purpose | 720/min | Requires num_steps |
⚠️ Common Image Generation Issues:
- Error 1000: Always include parameter (required despite docs suggesting optional)
num_steps: 4 - Error 3030 (NSFW filter): Single words like "hamburger" may trigger false positives - add descriptive context to prompts
typescript
// ✅ Correct pattern for image generation
const image = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: 'A photo of a delicious hamburger on a plate with fresh vegetables',
num_steps: 4, // Required to avoid error 1000
});
// Descriptive context helps avoid NSFW false positives (error 3030)| 模型 | 适用场景 | 速率限制 | 说明 |
|---|---|---|---|
| 高质量、照片级写实 | 720次/分钟 | ⚠️ 请参阅下方警告 |
| Leonardo AI风格 | 720次/分钟 | 2025年新增,需要num_steps参数 |
| Leonardo AI变体 | 720次/分钟 | 2025年新增,需要num_steps参数 |
| 通用场景 | 720次/分钟 | 需要num_steps参数 |
⚠️ 图像生成常见问题:
- 错误1000:始终包含参数(尽管文档显示可选,但实际必填)
num_steps: 4 - 错误3030(NSFW过滤器):“汉堡”这类单字可能触发误判 - 为提示词添加描述性上下文
typescript
// ✅ 图像生成正确示例
const image = await env.AI.run('@cf/black-forest-labs/flux-1-schnell', {
prompt: '一盘配有新鲜蔬菜的美味汉堡的照片',
num_steps: 4, // 必填,避免错误1000
});
// 描述性上下文有助于避免NSFW误判(错误3030)Vision Models
视觉模型
| Model | Best For | Rate Limit | Notes |
|---|---|---|---|
| Image understanding | 720/min | - |
| Vision + text (128K context) | 300/min | NEW 2025 |
| 模型 | 适用场景 | 速率限制 | 说明 |
|---|---|---|---|
| 图像理解 | 720次/分钟 | - |
| 视觉+文本(128K上下文窗口) | 300次/分钟 | 2025年新增 |
Audio Models (2025)
音频模型(2025年)
| Model | Type | Rate Limit | Notes |
|---|---|---|---|
| Text-to-speech (English) | 720/min | NEW 2025 |
| Text-to-speech (Spanish) | 720/min | NEW 2025 |
| Speech-to-text (+ WebSocket) | 720/min | NEW 2025 |
| Speech-to-text (faster) | 720/min | NEW 2025 |
| 模型 | 类型 | 速率限制 | 说明 |
|---|---|---|---|
| 文本转语音(英文) | 720次/分钟 | 2025年新增 |
| 文本转语音(西班牙文) | 720次/分钟 | 2025年新增 |
| 语音转文本(支持WebSocket) | 720次/分钟 | 2025年新增 |
| 语音转文本(更快) | 720次/分钟 | 2025年新增 |
Common Patterns
常见模式
RAG (Retrieval Augmented Generation)
RAG(检索增强生成)
typescript
// 1. Generate embeddings
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userQuery] });
// 2. Search Vectorize
const matches = await env.VECTORIZE.query(embeddings.data[0], { topK: 3 });
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');
// 3. Generate with context
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: `Answer using this context:\n${context}` },
{ role: 'user', content: userQuery },
],
stream: true,
});typescript
// 1. 生成嵌入向量
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userQuery] });
// 2. 搜索Vectorize
const matches = await env.VECTORIZE.query(embeddings.data[0], { topK: 3 });
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');
// 3. 结合上下文生成结果
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [
{ role: 'system', content: `使用以下上下文回答问题:\n${context}` },
{ role: 'user', content: userQuery },
],
stream: true,
});Structured Output with Zod
结合Zod的结构化输出
typescript
import { z } from 'zod';
const Schema = z.object({ name: z.string(), items: z.array(z.string()) });
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{
role: 'user',
content: `Generate JSON matching: ${JSON.stringify(Schema.shape)}`
}],
});
const validated = Schema.parse(JSON.parse(response.response));typescript
import { z } from 'zod';
const Schema = z.object({ name: z.string(), items: z.array(z.string()) });
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{
role: 'user',
content: `生成符合以下结构的JSON: ${JSON.stringify(Schema.shape)}`
}],
});
const validated = Schema.parse(JSON.parse(response.response));AI Gateway Integration
AI Gateway集成
Provides caching, logging, cost tracking, and analytics for AI requests.
为AI请求提供缓存、日志、成本跟踪和分析功能。
Basic Gateway Usage
基础Gateway使用
typescript
const response = await env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{ prompt: 'Hello' },
{ gateway: { id: 'my-gateway', skipCache: false } }
);
// Access logs and send feedback
const gateway = env.AI.gateway('my-gateway');
await gateway.patchLog(env.AI.aiGatewayLogId, {
feedback: { rating: 1, comment: 'Great response' },
});typescript
const response = await env.AI.run(
'@cf/meta/llama-3.1-8b-instruct',
{ prompt: '你好' },
{ gateway: { id: 'my-gateway', skipCache: false } }
);
// 访问日志并提交反馈
const gateway = env.AI.gateway('my-gateway');
await gateway.patchLog(env.AI.aiGatewayLogId, {
feedback: { rating: 1, comment: '回复很棒' },
});Per-Request Cache Control (Advanced)
请求级缓存控制(进阶)
Override default cache behavior with HTTP headers for fine-grained control:
typescript
// Custom cache TTL (1 hour for expensive queries)
const response = await fetch(
`https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/workers-ai/@cf/meta/llama-3.1-8b-instruct`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${env.CLOUDFLARE_API_KEY}`,
'Content-Type': 'application/json',
'cf-aig-cache-ttl': '3600', // 1 hour in seconds (min: 60, max: 2592000)
},
body: JSON.stringify({
messages: [{ role: 'user', content: prompt }],
}),
}
);
// Skip cache for real-time data
const response = await fetch(gatewayUrl, {
headers: {
'cf-aig-skip-cache': 'true', // Bypass cache entirely
},
// ...
});
// Check if response was cached
const cacheStatus = response.headers.get('cf-aig-cache-status'); // "HIT" or "MISS"Available Cache Headers:
- : Set custom TTL in seconds (60s to 1 month)
cf-aig-cache-ttl - : Bypass cache entirely (
cf-aig-skip-cache)'true' - : Custom cache key for granular control
cf-aig-cache-key - : Response header showing
cf-aig-cache-statusor"HIT""MISS"
Benefits: Cost tracking, caching (reduces duplicate inference), logging, rate limiting, analytics, per-request cache customization.
通过HTTP头覆盖默认缓存行为,实现细粒度控制:
typescript
// 自定义缓存TTL(昂贵查询缓存1小时)
const response = await fetch(
`https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}/workers-ai/@cf/meta/llama-3.1-8b-instruct`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${env.CLOUDFLARE_API_KEY}`,
'Content-Type': 'application/json',
'cf-aig-cache-ttl': '3600', // 缓存1小时(单位:秒,最小值60,最大值2592000)
},
body: JSON.stringify({
messages: [{ role: 'user', content: prompt }],
}),
}
);
// 实时数据跳过缓存
const response = await fetch(gatewayUrl, {
headers: {
'cf-aig-skip-cache': 'true', // 完全绕过缓存
},
// ...
});
// 检查响应是否来自缓存
const cacheStatus = response.headers.get('cf-aig-cache-status'); // "HIT" 或 "MISS"可用缓存头:
- :设置自定义缓存TTL(秒)
cf-aig-cache-ttl - :完全绕过缓存(值为
cf-aig-skip-cache)'true' - :自定义缓存键,实现更细粒度控制
cf-aig-cache-key - :响应头,显示缓存命中状态("HIT"或"MISS")
cf-aig-cache-status
优势: 成本跟踪、缓存(减少重复推理)、日志记录、速率限制、分析、请求级缓存自定义。
Rate Limits & Pricing (Updated 2025)
速率限制与定价(2025年更新)
Rate Limits (per minute)
速率限制(每分钟)
| Task Type | Default Limit | Notes |
|---|---|---|
| Text Generation | 300/min | Some fast models: 400-1500/min |
| Text Embeddings | 3000/min | BGE-large: 1500/min |
| Image Generation | 720/min | All image models |
| Vision Models | 720/min | Image understanding |
| Audio (TTS/STT) | 720/min | Deepgram, Whisper |
| Translation | 720/min | M2M100, Opus MT |
| Classification | 2000/min | Text classification |
| 任务类型 | 默认限制 | 说明 |
|---|---|---|
| 文本生成 | 300次/分钟 | 部分快速模型:400-1500次/分钟 |
| 文本嵌入 | 3000次/分钟 | BGE-large:1500次/分钟 |
| 图像生成 | 720次/分钟 | 所有图像模型 |
| 视觉模型 | 720次/分钟 | 图像理解 |
| 音频(TTS/STT) | 720次/分钟 | Deepgram、Whisper |
| 翻译 | 720次/分钟 | M2M100、Opus MT |
| 分类 | 2000次/分钟 | 文本分类 |
Pricing (Unit-Based, Billed in Neurons - 2025)
定价(基于单元,按神经元计费 - 2025年)
Free Tier:
- 10,000 neurons per day
- Resets daily at 00:00 UTC
Paid Tier ($0.011 per 1,000 neurons):
- 10,000 neurons/day included
- Unlimited usage above free allocation
2025 Model Costs (per 1M tokens):
| Model | Input | Output | Notes |
|---|---|---|---|
| 2025 Models | |||
| Llama 4 Scout 17B | $0.270 | $0.850 | NEW 2025 |
| GPT-OSS 120B | $0.350 | $0.750 | NEW 2025 |
| GPT-OSS 20B | $0.200 | $0.300 | NEW 2025 |
| Gemma 3 12B | $0.345 | $0.556 | NEW 2025 |
| Mistral 3.1 24B | $0.351 | $0.555 | NEW 2025 |
| Qwen QwQ 32B | $0.660 | $1.000 | NEW 2025 |
| Qwen Coder 32B | $0.660 | $1.000 | NEW 2025 |
| IBM Granite Micro | $0.017 | $0.112 | NEW 2025 |
| EmbeddingGemma 300M | $0.012 | N/A | NEW 2025 |
| Qwen3 Embedding 0.6B | $0.012 | N/A | NEW 2025 |
| Performance (2025) | |||
| Llama 3.3 70B Fast | $0.293 | $2.253 | 2-4x faster |
| Llama 3.1 8B FP8 Fast | $0.045 | $0.384 | Fast variant |
| Standard Models | |||
| Llama 3.2 1B | $0.027 | $0.201 | - |
| Llama 3.1 8B | $0.282 | $0.827 | - |
| Deepseek R1 32B | $0.497 | $4.881 | - |
| BGE-base (2x faster) | $0.067 | N/A | 2025 speedup |
| BGE-large (2x faster) | $0.204 | N/A | 2025 speedup |
| Image Models (2025) | |||
| Flux 1 Schnell | $0.0000528 per 512x512 tile | - | |
| Leonardo Lucid | $0.006996 per 512x512 tile | NEW 2025 | |
| Leonardo Phoenix | $0.005830 per 512x512 tile | NEW 2025 | |
| Audio Models (2025) | |||
| Deepgram Aura 2 | $0.030 per 1k chars | NEW 2025 | |
| Deepgram Nova 3 | $0.0052 per audio min | NEW 2025 | |
| Whisper v3 Turbo | $0.0005 per audio min | NEW 2025 |
免费额度:
- 每日10,000个神经元
- 每日00:00 UTC重置
付费额度(每1000个神经元0.011美元):
- 包含每日10,000个神经元
- 超出免费额度后无限制使用
2025年模型成本(每百万令牌):
| 模型 | 输入 | 输出 | 说明 |
|---|---|---|---|
| 2025年新增模型 | |||
| Llama 4 Scout 17B | $0.270 | $0.850 | 2025年新增 |
| GPT-OSS 120B | $0.350 | $0.750 | 2025年新增 |
| GPT-OSS 20B | $0.200 | $0.300 | 2025年新增 |
| Gemma 3 12B | $0.345 | $0.556 | 2025年新增 |
| Mistral 3.1 24B | $0.351 | $0.555 | 2025年新增 |
| Qwen QwQ 32B | $0.660 | $1.000 | 2025年新增 |
| Qwen Coder 32B | $0.660 | $1.000 | 2025年新增 |
| IBM Granite Micro | $0.017 | $0.112 | 2025年新增 |
| EmbeddingGemma 300M | $0.012 | N/A | 2025年新增 |
| Qwen3 Embedding 0.6B | $0.012 | N/A | 2025年新增 |
| 2025年性能优化模型 | |||
| Llama 3.3 70B Fast | $0.293 | $2.253 | 速度提升2-4倍 |
| Llama 3.1 8B FP8 Fast | $0.045 | $0.384 | 快速变体 |
| 标准模型 | |||
| Llama 3.2 1B | $0.027 | $0.201 | - |
| Llama 3.1 8B | $0.282 | $0.827 | - |
| Deepseek R1 32B | $0.497 | $4.881 | - |
| BGE-base(速度提升2倍) | $0.067 | N/A | 2025年提速 |
| BGE-large(速度提升2倍) | $0.204 | N/A | 2025年提速 |
| 图像模型(2025年) | |||
| Flux 1 Schnell | 每512x512像素块$0.0000528 | - | |
| Leonardo Lucid | 每512x512像素块$0.006996 | 2025年新增 | |
| Leonardo Phoenix | 每512x512像素块$0.005830 | 2025年新增 | |
| 音频模型(2025年) | |||
| Deepgram Aura 2 | 每1000字符$0.030 | 2025年新增 | |
| Deepgram Nova 3 | 每音频分钟$0.0052 | 2025年新增 | |
| Whisper v3 Turbo | 每音频分钟$0.0005 | 2025年新增 |
Error Handling with Retry
带重试的错误处理
typescript
async function runAIWithRetry(
env: Env,
model: string,
inputs: any,
maxRetries = 3
): Promise<any> {
let lastError: Error;
for (let i = 0; i < maxRetries; i++) {
try {
return await env.AI.run(model, inputs);
} catch (error) {
lastError = error as Error;
// Rate limit - retry with exponential backoff
if (lastError.message.toLowerCase().includes('rate limit')) {
await new Promise((resolve) => setTimeout(resolve, Math.pow(2, i) * 1000));
continue;
}
throw error; // Other errors - fail immediately
}
}
throw lastError!;
}typescript
async function runAIWithRetry(
env: Env,
model: string,
inputs: any,
maxRetries = 3
): Promise<any> {
let lastError: Error;
for (let i = 0; i < maxRetries; i++) {
try {
return await env.AI.run(model, inputs);
} catch (error) {
lastError = error as Error;
// 速率限制 - 指数退避重试
if (lastError.message.toLowerCase().includes('rate limit')) {
await new Promise((resolve) => setTimeout(resolve, Math.pow(2, i) * 1000));
continue;
}
throw error; // 其他错误 - 立即失败
}
}
throw lastError!;
}OpenAI Compatibility
OpenAI兼容性
typescript
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: env.CLOUDFLARE_API_KEY,
baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.ACCOUNT_ID}/ai/v1`,
});
// Chat completions
await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages: [{ role: 'user', content: 'Hello!' }],
});Endpoints: ,
/v1/chat/completions/v1/embeddingstypescript
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: env.CLOUDFLARE_API_KEY,
baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.ACCOUNT_ID}/ai/v1`,
});
// 聊天补全
await openai.chat.completions.create({
model: '@cf/meta/llama-3.1-8b-instruct',
messages: [{ role: 'user', content: '你好!' }],
});支持的端点: ,
/v1/chat/completions/v1/embeddingsVercel AI SDK Integration (workers-ai-provider v3.0.2)
Vercel AI SDK集成(workers-ai-provider v3.0.2)
typescript
import { createWorkersAI } from 'workers-ai-provider'; // v3.0.2 with AI SDK v5
import { generateText, streamText } from 'ai';
const workersai = createWorkersAI({ binding: env.AI });
// Generate or stream
await generateText({
model: workersai('@cf/meta/llama-3.1-8b-instruct'),
prompt: 'Write a poem',
});typescript
import { createWorkersAI } from 'workers-ai-provider'; // 配合AI SDK v5的v3.0.2版本
import { generateText, streamText } from 'ai';
const workersai = createWorkersAI({ binding: env.AI });
// 生成或流式传输结果
await generateText({
model: workersai('@cf/meta/llama-3.1-8b-instruct'),
prompt: '写一首诗',
});Community Tips
社区技巧
Note: These tips come from community discussions and production experience.
注意:这些技巧来自社区讨论和生产实践经验。
Hono Framework Streaming Pattern
Hono框架流式传输模式
When using Workers AI streaming with Hono, return the stream directly as a Response (not through Hono's streaming utilities):
typescript
import { Hono } from 'hono';
type Bindings = { AI: Ai };
const app = new Hono<{ Bindings: Bindings }>();
app.post('/chat', async (c) => {
const { prompt } = await c.req.json();
const stream = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: prompt }],
stream: true,
});
// Return stream directly (not c.stream())
return new Response(stream, {
headers: {
'content-type': 'text/event-stream',
'cache-control': 'no-cache',
'connection': 'keep-alive',
},
});
});Source: Hono Discussion #2409
在Hono中使用Workers AI流式传输时,直接将流作为Response返回(不要通过Hono的流式工具):
typescript
import { Hono } from 'hono';
type Bindings = { AI: Ai };
const app = new Hono<{ Bindings: Bindings }>();
app.post('/chat', async (c) => {
const { prompt } = await c.req.json();
const stream = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
messages: [{ role: 'user', content: prompt }],
stream: true,
});
// 直接返回流(不要使用c.stream())
return new Response(stream, {
headers: {
'content-type': 'text/event-stream',
'cache-control': 'no-cache',
'connection': 'keep-alive',
},
});
});来源:Hono讨论 #2409
Troubleshooting Unexplained AI Binding Failures
排查无法解释的AI绑定失败
If experiencing unexplained Workers AI failures:
bash
undefined如果遇到无法解释的Workers AI失败:
bash
undefined1. Check wrangler version
1. 检查wrangler版本
npx wrangler --version
npx wrangler --version
2. Clear wrangler cache
2. 清除wrangler缓存
rm -rf ~/.wrangler
rm -rf ~/.wrangler
3. Update to latest stable
3. 更新至最新稳定版
npm install -D wrangler@latest
npm install -D wrangler@latest
4. Check local network/firewall settings
4. 检查本地网络/防火墙设置
Some corporate firewalls block Workers AI endpoints
部分企业防火墙会阻止Workers AI端点
**Note**: Most "version incompatibility" issues turn out to be network configuration problems.
---
**注意**:大多数“版本不兼容”问题实际上是网络配置问题。
---References
参考资料
- Workers AI Docs
- Models Catalog
- AI Gateway
- Pricing
- Changelog
- LoRA Adapters
- MCP Tool: Use for latest docs
mcp__cloudflare-docs__search_cloudflare_documentation
- Workers AI文档
- 模型目录
- AI Gateway
- 定价
- 更新日志
- LoRA适配器
- MCP工具:使用获取最新文档
mcp__cloudflare-docs__search_cloudflare_documentation