awesome-free-llm-apis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Awesome Free LLM APIs

优质免费LLM API汇总

Skill by ara.so — Daily 2026 Skills collection.
A curated list of LLM providers offering permanent free tiers for text inference — no trial credits, no expiry. All endpoints listed are OpenAI SDK-compatible unless noted.

ara.so提供的技能内容 — 2026每日技能合集。
本文整理了提供永久免费层文本推理服务的LLM服务商列表 — 无试用额度、无到期限制。除非特别说明,列出的所有端点均兼容OpenAI SDK。

Provider Overview

服务商概览

Provider APIs (trained/fine-tuned by the company)

服务商自有API(由企业训练/微调)

ProviderNotable ModelsRate LimitsRegion
CohereCommand A, Command R+, Aya Expanse 32B20 RPM, 1K req/mo🇺🇸
Google GeminiGemini 2.5 Pro, Flash, Flash-Lite5–15 RPM, 100–1K RPD🇺🇸 (not EU/UK/CH)
Mistral AIMistral Large 3, Small 3.1, Ministral 8B1 req/s, 1B tok/mo🇪🇺
Zhipu AIGLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-FlashUndocumented🇨🇳
服务商重点模型速率限制区域
CohereCommand A, Command R+, Aya Expanse 32B20次/分钟,1000次/月🇺🇸
Google GeminiGemini 2.5 Pro, Flash, Flash-Lite5–15次/分钟,100–1000次/天🇺🇸(欧盟/英国/瑞士不可用)
Mistral AIMistral Large 3, Small 3.1, Ministral 8B1次/秒,10亿 tokens/月🇪🇺
智谱AIGLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash未公开🇨🇳

Inference Providers (host open-weight models)

推理服务商(托管开源权重模型)

ProviderNotable ModelsRate LimitsRegion
CerebrasLlama 3.3 70B, Qwen3 235B, GPT-OSS-120B30 RPM, 14,400 RPD🇺🇸
Cloudflare Workers AILlama 3.3 70B, Qwen QwQ 32B10K neurons/day🇺🇸
GitHub ModelsGPT-4o, Llama 3.3 70B, DeepSeek-R110–15 RPM, 50–150 RPD🇺🇸
GroqLlama 3.3 70B, Llama 4 Scout, Kimi K230 RPM, 1K RPD🇺🇸
Hugging FaceLlama 3.3 70B, Qwen2.5 72B, Mistral 7B$0.10/mo free credits🇺🇸
Kluster AIDeepSeek-R1, Llama 4 Maverick, Qwen3-235BUndocumented🇺🇸
LLM7.ioDeepSeek R1, Flash-Lite, Qwen2.5 Coder30 RPM (120 with token)🇬🇧
NVIDIA NIMLlama 3.3 70B, Mistral Large, Qwen3 235B40 RPM🇺🇸
Ollama CloudDeepSeek-V3.2, Qwen3.5, Kimi-K2.51 concurrent, light usage🇺🇸
OpenRouterDeepSeek R1, Llama 3.3 70B, GPT-OSS-120B20 RPM, 50 RPD (1K with $10+)🇺🇸

服务商重点模型速率限制区域
CerebrasLlama 3.3 70B, Qwen3 235B, GPT-OSS-120B30次/分钟,14400次/天🇺🇸
Cloudflare Workers AILlama 3.3 70B, Qwen QwQ 32B10000神经元/天🇺🇸
GitHub ModelsGPT-4o, Llama 3.3 70B, DeepSeek-R110–15次/分钟,50–150次/天🇺🇸
GroqLlama 3.3 70B, Llama 4 Scout, Kimi K230次/分钟,1000次/天🇺🇸
Hugging FaceLlama 3.3 70B, Qwen2.5 72B, Mistral 7B每月0.1美元免费额度🇺🇸
Kluster AIDeepSeek-R1, Llama 4 Maverick, Qwen3-235B未公开🇺🇸
LLM7.ioDeepSeek R1, Flash-Lite, Qwen2.5 Coder30次/分钟(使用令牌可达120次/分钟)🇬🇧
NVIDIA NIMLlama 3.3 70B, Mistral Large, Qwen3 235B40次/分钟🇺🇸
Ollama CloudDeepSeek-V3.2, Qwen3.5, Kimi-K2.51个并发连接,轻量使用🇺🇸
OpenRouterDeepSeek R1, Llama 3.3 70B, GPT-OSS-120B20次/分钟,50次/天(消费满10美元后可达1000次/天)🇺🇸

Getting API Keys

获取API密钥

Each provider has its own key management page:
bash
undefined
每个服务商都有自己的密钥管理页面:
bash
undefined

Store keys as environment variables — never hardcode them

将密钥存储为环境变量 — 切勿硬编码

export GROQ_API_KEY="your_groq_key" export GEMINI_API_KEY="your_gemini_key" export OPENROUTER_API_KEY="your_openrouter_key" export MISTRAL_API_KEY="your_mistral_key" export COHERE_API_KEY="your_cohere_key" export CEREBRAS_API_KEY="your_cerebras_key" export GITHUB_TOKEN="your_github_pat" export HF_TOKEN="your_huggingface_token" export NVIDIA_API_KEY="your_nvidia_key" export CLOUDFLARE_API_TOKEN="your_cf_token" export CLOUDFLARE_ACCOUNT_ID="your_cf_account_id"

---
export GROQ_API_KEY="your_groq_key" export GEMINI_API_KEY="your_gemini_key" export OPENROUTER_API_KEY="your_openrouter_key" export MISTRAL_API_KEY="your_mistral_key" export COHERE_API_KEY="your_cohere_key" export CEREBRAS_API_KEY="your_cerebras_key" export GITHUB_TOKEN="your_github_pat" export HF_TOKEN="your_huggingface_token" export NVIDIA_API_KEY="your_nvidia_key" export CLOUDFLARE_API_TOKEN="your_cf_token" export CLOUDFLARE_ACCOUNT_ID="your_cf_account_id"

---

OpenAI SDK Integration

OpenAI SDK集成

All providers (except Ollama Cloud) are OpenAI SDK-compatible — just swap the
base_url
and
api_key
.
所有服务商(除Ollama Cloud外)均兼容OpenAI SDK — 只需替换
base_url
api_key
即可。

Python

Python

python
from openai import OpenAI
python
from openai import OpenAI

── Groq ──────────────────────────────────────────────────────────────────────

── Groq ──────────────────────────────────────────────────────────────────────

client = OpenAI( base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"], ) response = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content)
client = OpenAI( base_url="https://api.groq.com/openai/v1", api_key=os.environ["GROQ_API_KEY"], ) response = client.chat.completions.create( model="llama-3.3-70b-versatile", messages=[{"role": "user", "content": "Hello!"}], ) print(response.choices[0].message.content)

── Google Gemini ─────────────────────────────────────────────────────────────

── Google Gemini ─────────────────────────────────────────────────────────────

client = OpenAI( base_url="https://generativelanguage.googleapis.com/v1beta/openai/", api_key=os.environ["GEMINI_API_KEY"], ) response = client.chat.completions.create( model="gemini-2.0-flash", messages=[{"role": "user", "content": "Explain quantum entanglement."}], )
client = OpenAI( base_url="https://generativelanguage.googleapis.com/v1beta/openai/", api_key=os.environ["GEMINI_API_KEY"], ) response = client.chat.completions.create( model="gemini-2.0-flash", messages=[{"role": "user", "content": "Explain quantum entanglement."}], )

── Mistral AI ────────────────────────────────────────────────────────────────

── Mistral AI ────────────────────────────────────────────────────────────────

client = OpenAI( base_url="https://api.mistral.ai/v1", api_key=os.environ["MISTRAL_API_KEY"], ) response = client.chat.completions.create( model="mistral-small-latest", messages=[{"role": "user", "content": "Write a haiku about code."}], )
client = OpenAI( base_url="https://api.mistral.ai/v1", api_key=os.environ["MISTRAL_API_KEY"], ) response = client.chat.completions.create( model="mistral-small-latest", messages=[{"role": "user", "content": "Write a haiku about code."}], )

── OpenRouter ────────────────────────────────────────────────────────────────

── OpenRouter ────────────────────────────────────────────────────────────────

client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], ) response = client.chat.completions.create( model="deepseek/deepseek-r1", # free model on OpenRouter messages=[{"role": "user", "content": "What is 2+2?"}], extra_headers={ "HTTP-Referer": "https://yourapp.com", # optional but recommended "X-Title": "My App", }, )
client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], ) response = client.chat.completions.create( model="deepseek/deepseek-r1", # OpenRouter上的免费模型 messages=[{"role": "user", "content": "What is 2+2?"}], extra_headers={ "HTTP-Referer": "https://yourapp.com", # 可选但推荐 "X-Title": "My App", }, )

── Cerebras ──────────────────────────────────────────────────────────────────

── Cerebras ──────────────────────────────────────────────────────────────────

client = OpenAI( base_url="https://api.cerebras.ai/v1", api_key=os.environ["CEREBRAS_API_KEY"], ) response = client.chat.completions.create( model="llama-3.3-70b", messages=[{"role": "user", "content": "Tell me a joke."}], )
client = OpenAI( base_url="https://api.cerebras.ai/v1", api_key=os.environ["CEREBRAS_API_KEY"], ) response = client.chat.completions.create( model="llama-3.3-70b", messages=[{"role": "user", "content": "Tell me a joke."}], )

── NVIDIA NIM ────────────────────────────────────────────────────────────────

── NVIDIA NIM ────────────────────────────────────────────────────────────────

client = OpenAI( base_url="https://integrate.api.nvidia.com/v1", api_key=os.environ["NVIDIA_API_KEY"], ) response = client.chat.completions.create( model="meta/llama-3.3-70b-instruct", messages=[{"role": "user", "content": "Summarize this text."}], )
client = OpenAI( base_url="https://integrate.api.nvidia.com/v1", api_key=os.environ["NVIDIA_API_KEY"], ) response = client.chat.completions.create( model="meta/llama-3.3-70b-instruct", messages=[{"role": "user", "content": "Summarize this text."}], )

── GitHub Models ─────────────────────────────────────────────────────────────

── GitHub Models ─────────────────────────────────────────────────────────────

client = OpenAI( base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"], ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Draft an email."}], )
client = OpenAI( base_url="https://models.inference.ai.azure.com", api_key=os.environ["GITHUB_TOKEN"], ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Draft an email."}], )

── Cohere (OpenAI-compatible endpoint) ───────────────────────────────────────

── Cohere(OpenAI兼容端点)────────────────────────────────────────────────────

client = OpenAI( base_url="https://api.cohere.com/compatibility/v1", api_key=os.environ["COHERE_API_KEY"], ) response = client.chat.completions.create( model="command-a-03-2025", messages=[{"role": "user", "content": "Translate to French: Hello world"}], )
undefined
client = OpenAI( base_url="https://api.cohere.com/compatibility/v1", api_key=os.environ["COHERE_API_KEY"], ) response = client.chat.completions.create( model="command-a-03-2025", messages=[{"role": "user", "content": "Translate to French: Hello world"}], )
undefined

JavaScript / TypeScript

JavaScript / TypeScript

typescript
import OpenAI from "openai";

// ── Groq ──────────────────────────────────────────────────────────────────────
const groq = new OpenAI({
  baseURL: "https://api.groq.com/openai/v1",
  apiKey: process.env.GROQ_API_KEY,
});

const completion = await groq.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);

// ── OpenRouter with free model router ────────────────────────────────────────
const openrouter = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    "HTTP-Referer": "https://yourapp.com",
    "X-Title": "My App",
  },
});

// Use the free models router — automatically picks an available free model
const freeCompletion = await openrouter.chat.completions.create({
  model: "openrouter/free",
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

// ── Mistral ───────────────────────────────────────────────────────────────────
const mistral = new OpenAI({
  baseURL: "https://api.mistral.ai/v1",
  apiKey: process.env.MISTRAL_API_KEY,
});

const mistralCompletion = await mistral.chat.completions.create({
  model: "mistral-small-latest",
  messages: [{ role: "user", content: "Explain async/await in JavaScript." }],
});

typescript
import OpenAI from "openai";

// ── Groq ──────────────────────────────────────────────────────────────────────
const groq = new OpenAI({
  baseURL: "https://api.groq.com/openai/v1",
  apiKey: process.env.GROQ_API_KEY,
});

const completion = await groq.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);

// ── OpenRouter搭配免费模型路由───────────────────────────────────────────────────
const openrouter = new OpenAI({
  baseURL: "https://openrouter.ai/api/v1",
  apiKey: process.env.OPENROUTER_API_KEY,
  defaultHeaders: {
    "HTTP-Referer": "https://yourapp.com",
    "X-Title": "My App",
  },
});

// 使用免费模型路由 — 自动选择可用的免费模型
const freeCompletion = await openrouter.chat.completions.create({
  model: "openrouter/free",
  messages: [{ role: "user", content: "What is the capital of France?" }],
});

// ── Mistral ───────────────────────────────────────────────────────────────────
const mistral = new OpenAI({
  baseURL: "https://api.mistral.ai/v1",
  apiKey: process.env.MISTRAL_API_KEY,
});

const mistralCompletion = await mistral.chat.completions.create({
  model: "mistral-small-latest",
  messages: [{ role: "user", content: "Explain async/await in JavaScript." }],
});

Cloudflare Workers AI

Cloudflare Workers AI

Cloudflare uses a slightly different auth pattern:
python
import requests, os

ACCOUNT_ID = os.environ["CLOUDFLARE_ACCOUNT_ID"]
API_TOKEN  = os.environ["CLOUDFLARE_API_TOKEN"]

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/"
    "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
    headers={"Authorization": f"Bearer {API_TOKEN}"},
    json={"messages": [{"role": "user", "content": "What is Cloudflare Workers?"}]},
)
result = response.json()
print(result["result"]["response"])
typescript
// Cloudflare Workers runtime (inside a Worker)
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const ai = new Ai(env.AI);
    const response = await ai.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", {
      messages: [{ role: "user", content: "Hello from Workers AI!" }],
    });
    return Response.json(response);
  },
};

Cloudflare采用略有不同的认证模式:
python
import requests, os

ACCOUNT_ID = os.environ["CLOUDFLARE_ACCOUNT_ID"]
API_TOKEN  = os.environ["CLOUDFLARE_API_TOKEN"]

response = requests.post(
    f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/"
    "@cf/meta/llama-3.3-70b-instruct-fp8-fast",
    headers={"Authorization": f"Bearer {API_TOKEN}"},
    json={"messages": [{"role": "user", "content": "What is Cloudflare Workers?"}]},
)
result = response.json()
print(result["result"]["response"])
typescript
// Cloudflare Workers运行时(在Worker内部)
export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const ai = new Ai(env.AI);
    const response = await ai.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", {
      messages: [{ role: "user", content: "Hello from Workers AI!" }],
    });
    return Response.json(response);
  },
};

Ollama Cloud (Non-OpenAI API)

Ollama Cloud(非OpenAI API)

Ollama Cloud uses the Ollama API format, not the OpenAI format:
python
import requests, os

response = requests.post(
    "https://ollama.com/api/chat",
    headers={"Authorization": f"Bearer {os.environ['OLLAMA_API_KEY']}"},
    json={
        "model": "deepseek-v3.2",
        "messages": [{"role": "user", "content": "What is 2 + 2?"}],
        "stream": False,
    },
)
print(response.json()["message"]["content"])
python
undefined
Ollama Cloud采用Ollama API格式,而非OpenAI格式:
python
import requests, os

response = requests.post(
    "https://ollama.com/api/chat",
    headers={"Authorization": f"Bearer {os.environ['OLLAMA_API_KEY']}"},
    json={
        "model": "deepseek-v3.2",
        "messages": [{"role": "user", "content": "What is 2 + 2?"}],
        "stream": False,
    },
)
print(response.json()["message"]["content"])
python
undefined

Using the ollama Python client

使用ollama Python客户端

import ollama, os
client = ollama.Client( host="https://ollama.com", headers={"Authorization": f"Bearer {os.environ['OLLAMA_API_KEY']}"}, ) response = client.chat( model="qwen3.5", messages=[{"role": "user", "content": "Write a poem about the sea."}], ) print(response["message"]["content"])

---
import ollama, os
client = ollama.Client( host="https://ollama.com", headers={"Authorization": f"Bearer {os.environ['OLLAMA_API_KEY']}"}, ) response = client.chat( model="qwen3.5", messages=[{"role": "user", "content": "Write a poem about the sea."}], ) print(response["message"]["content"])

---

Hugging Face Inference API

Hugging Face推理API

python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://router.huggingface.co/novita/v3/openai",
    api_key=os.environ["HF_TOKEN"],
)

response = client.chat.completions.create(
    model="meta-llama/llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Summarize the theory of relativity."}],
    max_tokens=512,
)
print(response.choices[0].message.content)

python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://router.huggingface.co/novita/v3/openai",
    api_key=os.environ["HF_TOKEN"],
)

response = client.chat.completions.create(
    model="meta-llama/llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Summarize the theory of relativity."}],
    max_tokens=512,
)
print(response.choices[0].message.content)

Streaming Responses

流式响应

python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key=os.environ["GROQ_API_KEY"],
)

with client.chat.completions.stream(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Write a short story about a robot."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
typescript
const stream = await groq.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Write a haiku." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.groq.com/openai/v1",
    api_key=os.environ["GROQ_API_KEY"],
)

with client.chat.completions.stream(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Write a short story about a robot."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
typescript
const stream = await groq.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Write a haiku." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

Provider Fallback Pattern

服务商降级策略

Cycle through providers when rate limits are hit:
python
from openai import OpenAI, RateLimitError
import os

PROVIDERS = [
    {
        "name": "Groq",
        "base_url": "https://api.groq.com/openai/v1",
        "api_key": os.environ.get("GROQ_API_KEY"),
        "model": "llama-3.3-70b-versatile",
    },
    {
        "name": "Cerebras",
        "base_url": "https://api.cerebras.ai/v1",
        "api_key": os.environ.get("CEREBRAS_API_KEY"),
        "model": "llama-3.3-70b",
    },
    {
        "name": "Mistral",
        "base_url": "https://api.mistral.ai/v1",
        "api_key": os.environ.get("MISTRAL_API_KEY"),
        "model": "mistral-small-latest",
    },
    {
        "name": "OpenRouter",
        "base_url": "https://openrouter.ai/api/v1",
        "api_key": os.environ.get("OPENROUTER_API_KEY"),
        "model": "openrouter/free",
    },
]

def chat_with_fallback(messages: list[dict], **kwargs) -> str:
    for provider in PROVIDERS:
        if not provider["api_key"]:
            continue
        try:
            client = OpenAI(
                base_url=provider["base_url"],
                api_key=provider["api_key"],
            )
            response = client.chat.completions.create(
                model=provider["model"],
                messages=messages,
                **kwargs,
            )
            return response.choices[0].message.content
        except RateLimitError:
            print(f"Rate limited on {provider['name']}, trying next...")
            continue
        except Exception as e:
            print(f"Error on {provider['name']}: {e}, trying next...")
            continue
    raise RuntimeError("All providers exhausted.")
当触发速率限制时,循环切换服务商:
python
from openai import OpenAI, RateLimitError
import os

PROVIDERS = [
    {
        "name": "Groq",
        "base_url": "https://api.groq.com/openai/v1",
        "api_key": os.environ.get("GROQ_API_KEY"),
        "model": "llama-3.3-70b-versatile",
    },
    {
        "name": "Cerebras",
        "base_url": "https://api.cerebras.ai/v1",
        "api_key": os.environ.get("CEREBRAS_API_KEY"),
        "model": "llama-3.3-70b",
    },
    {
        "name": "Mistral",
        "base_url": "https://api.mistral.ai/v1",
        "api_key": os.environ.get("MISTRAL_API_KEY"),
        "model": "mistral-small-latest",
    },
    {
        "name": "OpenRouter",
        "base_url": "https://openrouter.ai/api/v1",
        "api_key": os.environ.get("OPENROUTER_API_KEY"),
        "model": "openrouter/free",
    },
]

def chat_with_fallback(messages: list[dict], **kwargs) -> str:
    for provider in PROVIDERS:
        if not provider["api_key"]:
            continue
        try:
            client = OpenAI(
                base_url=provider["base_url"],
                api_key=provider["api_key"],
            )
            response = client.chat.completions.create(
                model=provider["model"],
                messages=messages,
                **kwargs,
            )
            return response.choices[0].message.content
        except RateLimitError:
            print(f"{provider['name']}触发速率限制,尝试下一个服务商...")
            continue
        except Exception as e:
            print(f"{provider['name']}出现错误: {e},尝试下一个服务商...")
            continue
    raise RuntimeError("所有服务商均已尝试完毕。")

Usage

使用示例

answer = chat_with_fallback( messages=[{"role": "user", "content": "What is the speed of light?"}] ) print(answer)

---
answer = chat_with_fallback( messages=[{"role": "user", "content": "What is the speed of light?"}] ) print(answer)

---

OpenRouter Free Models Router

OpenRouter免费模型路由

OpenRouter provides a special router that automatically selects available free models:
python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)
OpenRouter提供特殊路由,可自动选择可用的免费模型:
python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

Use the free router — picks from 29+ free models automatically

使用免费路由 — 自动从29+个免费模型中选择

response = client.chat.completions.create( model="openrouter/free", messages=[{"role": "user", "content": "Explain recursion."}], )
response = client.chat.completions.create( model="openrouter/free", messages=[{"role": "user", "content": "Explain recursion."}], )

Or use model fallbacks for priority ordering

或使用模型降级策略进行优先级排序

response = client.chat.completions.create( model="deepseek/deepseek-r1", messages=[{"role": "user", "content": "Explain recursion."}], extra_body={ "route": "fallback", "models": [ "deepseek/deepseek-r1", "meta-llama/llama-3.3-70b-instruct:free", "openrouter/free", ], }, )

---
response = client.chat.completions.create( model="deepseek/deepseek-r1", messages=[{"role": "user", "content": "Explain recursion."}], extra_body={ "route": "fallback", "models": [ "deepseek/deepseek-r1", "meta-llama/llama-3.3-70b-instruct:free", "openrouter/free", ], }, )

---

LangChain Integration

LangChain集成

python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import os
python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import os

Works with any OpenAI-compatible provider

适用于所有OpenAI兼容服务商

llm = ChatOpenAI( model="llama-3.3-70b-versatile", openai_api_base="https://api.groq.com/openai/v1", openai_api_key=os.environ["GROQ_API_KEY"], temperature=0.7, )
response = llm.invoke([HumanMessage(content="What are the SOLID principles?")]) print(response.content)
llm = ChatOpenAI( model="llama-3.3-70b-versatile", openai_api_base="https://api.groq.com/openai/v1", openai_api_key=os.environ["GROQ_API_KEY"], temperature=0.7, )
response = llm.invoke([HumanMessage(content="What are the SOLID principles?")]) print(response.content)

Gemini via LangChain

通过LangChain调用Gemini

gemini = ChatOpenAI( model="gemini-2.0-flash", openai_api_base="https://generativelanguage.googleapis.com/v1beta/openai/", openai_api_key=os.environ["GEMINI_API_KEY"], )

---
gemini = ChatOpenAI( model="gemini-2.0-flash", openai_api_base="https://generativelanguage.googleapis.com/v1beta/openai/", openai_api_key=os.environ["GEMINI_API_KEY"], )

---

Rate Limit Reference

速率限制参考

ProviderRPMRPDNotes
Groq301,00014,400 RPD for Llama 3.1 8B only
Cerebras3014,400
Gemini Flash151,500Not in EU/UK/CH
Gemini 2.5 Pro525Not in EU/UK/CH
GitHub Models10–1550–150Varies by model tier
OpenRouter (free)20501K RPD after $10+ purchase
Mistral1 req/s1B tokens/month cap
NVIDIA NIM40
Cloudflare Workers AI10K neurons/day
Cohere201K requests/month

服务商次/分钟次/天说明
Groq301000仅Llama 3.1 8B支持14400次/天
Cerebras3014400
Gemini Flash151500欧盟/英国/瑞士不可用
Gemini 2.5 Pro525欧盟/英国/瑞士不可用
GitHub Models10–1550–150因模型层级而异
OpenRouter(免费)2050消费满10美元后永久解锁1000次/天
Mistral1次/秒每月10亿 tokens 上限
NVIDIA NIM40
Cloudflare Workers AI10000神经元/天
Cohere20每月1000次请求

Common Troubleshooting

常见问题排查

AuthenticationError
  • Double-check the env var is set:
    echo $GROQ_API_KEY
  • Ensure the key is for the correct provider
  • Some providers (GitHub Models) require a classic PAT, not a fine-grained token
RateLimitError
  • Implement exponential backoff or use the fallback pattern above
  • Switch to a provider with higher limits (Cerebras: 14,400 RPD)
  • For Groq, use
    llama-3.1-8b-instant
    for the 14,400 RPD limit
Model not found
  • Check the exact model ID on the provider's docs/dashboard
  • OpenRouter free models have
    :free
    suffix:
    meta-llama/llama-3.3-70b-instruct:free
  • Cloudflare models use
    @cf/
    prefix:
    @cf/meta/llama-3.3-70b-instruct-fp8-fast
Gemini free tier unavailable
  • The free tier is not available in EU, UK, or Switzerland
  • Use a VPN or switch to a different provider like Groq or Mistral
Ollama Cloud not working with OpenAI SDK
  • Ollama Cloud uses its own API format — use the
    ollama
    Python package or raw HTTP
OpenRouter 50 RPD limit
  • Make a one-time $10 credit purchase to unlock 1,000 RPD for free models permanently
  • Alternatively, use
    openrouter/free
    router to distribute across all free models

AuthenticationError
  • 再次检查环境变量是否设置:
    echo $GROQ_API_KEY
  • 确保密钥对应正确的服务商
  • 部分服务商(如GitHub Models)需要经典PAT,而非细粒度令牌
RateLimitError
  • 实现指数退避或使用上述降级策略
  • 切换到速率限制更高的服务商(如Cerebras:14400次/天)
  • 对于Groq,使用
    llama-3.1-8b-instant
    可获得14400次/天的限制
Model not found
  • 在服务商文档/控制台中核对准确的模型ID
  • OpenRouter免费模型带有
    :free
    后缀:
    meta-llama/llama-3.3-70b-instruct:free
  • Cloudflare模型带有
    @cf/
    前缀:
    @cf/meta/llama-3.3-70b-instruct-fp8-fast
Gemini免费层不可用
  • 免费层在欧盟、英国或瑞士不可用
  • 使用VPN或切换到其他服务商如Groq或Mistral
Ollama Cloud无法兼容OpenAI SDK
  • Ollama Cloud采用自有API格式 — 使用
    ollama
    Python包或原生HTTP请求
OpenRouter 50次/天限制
  • 一次性充值10美元额度,即可永久解锁免费模型1000次/天的限制
  • 或使用
    openrouter/free
    路由,将请求分散到所有免费模型

Choosing the Right Provider

选择合适的服务商

Need highest RPD?         → Cerebras (14,400 RPD)
Need smartest free model? → Gemini 2.5 Pro (if not in EU/UK/CH)
Need EU-hosted?           → Mistral AI (France)
Need most model variety?  → OpenRouter (29+ free models) or Cloudflare (48+ models)
Need fastest inference?   → Groq (purpose-built inference chips)
Need reasoning model?     → DeepSeek-R1 on Groq/OpenRouter/Kluster AI
Need vision?              → Gemini Flash, Llama 4 Scout (Groq), GLM-4.6V-Flash (Zhipu)
No rate limit concern?    → Cloudflare (10K neurons/day, compute-based)
需要最高日请求量?         → Cerebras(14400次/天)
需要最智能的免费模型? → Gemini 2.5 Pro(若不在欧盟/英国/瑞士)
需要欧盟托管?           → Mistral AI(法国)
需要最多模型选择?  → OpenRouter(29+个免费模型)或Cloudflare(48+个模型)
需要最快推理速度?   → Groq(专用推理芯片)
需要推理型模型?     → Groq/OpenRouter/Kluster AI上的DeepSeek-R1
需要视觉能力?              → Gemini Flash、Groq上的Llama 4 Scout、智谱的GLM-4.6V-Flash
无需担心速率限制?    → Cloudflare(10000神经元/天,基于计算量)