freellmapi-proxy
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseFreeLLMAPI Proxy
FreeLLMAPI Proxy
Skill by ara.so — Daily 2026 Skills collection.
FreeLLMAPI is a self-hosted OpenAI-compatible proxy that aggregates free-tier API keys from ~14 AI providers (Google, Groq, Cerebras, SambaNova, NVIDIA, Mistral, OpenRouter, GitHub Models, Hugging Face, Cohere, Cloudflare, Zhipu, Moonshot, MiniMax) behind a single endpoint. It handles automatic failover on 429/5xx, per-key rate tracking, sticky sessions for multi-turn conversations, and AES-256-GCM encrypted key storage.
/v1/chat/completions由ara.so开发的Skill——2026每日技能合集。
FreeLLMAPI是一款自托管的兼容OpenAI的代理服务,将来自约14家AI提供商(Google、Groq、Cerebras、SambaNova、NVIDIA、Mistral、OpenRouter、GitHub Models、Hugging Face、Cohere、Cloudflare、智谱、Moonshot、MiniMax)的免费层级API密钥聚合到单一的端点之后。它可处理429/5xx错误时的自动故障转移、按密钥速率跟踪、多轮对话的粘性会话,以及AES-256-GCM加密密钥存储。
/v1/chat/completionsInstallation
安装
Prerequisites: Node.js 20+, npm.
bash
git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi
npm install前置要求: Node.js 20+、npm。
bash
git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi
npm installGenerate encryption key and set up environment
Generate encryption key and set up environment
cp .env.example .env
echo "ENCRYPTION_KEY=$(node -e "console.log(require('crypto').randomBytes(32).toString('hex'))")" >> .env
cp .env.example .env
echo "ENCRYPTION_KEY=$(node -e "console.log(require('crypto').randomBytes(32).toString('hex'))")" >> .env
Development (server + Vite dashboard on :5173)
Development (server + Vite dashboard on :5173)
npm run dev
npm run dev
Production build
Production build
npm run build
node server/dist/index.js # serves API + dashboard on :3001
---npm run build
node server/dist/index.js # serves API + dashboard on :3001
---Environment Variables
环境变量
bash
undefinedbash
undefined.env
.env
ENCRYPTION_KEY=<64-char hex string> # Required — AES-256 key for provider key storage
PORT=3001 # Optional — defaults to 3001
NODE_ENV=production # Optional
Never commit `.env`. The `ENCRYPTION_KEY` protects all stored provider API keys.
---ENCRYPTION_KEY=<64-char hex string> # 必填 — 用于存储提供商密钥的AES-256密钥
PORT=3001 # 可选 — 默认值为3001
NODE_ENV=production # 可选
请勿提交`.env`文件。`ENCRYPTION_KEY`用于保护所有存储的提供商API密钥。
---Key Commands
关键命令
bash
npm run dev # Start Express server + Vite dashboard in watch mode
npm run build # Compile TypeScript server + build React dashboard
npm run lint # ESLint across server/ and client/
npm run test # Run test suitebash
npm run dev # 启动Express服务器 + Vite控制台(监听模式)
npm run build # 编译TypeScript服务器 + 构建React控制台
npm run lint # 对server/和client/目录执行ESLint检查
npm run test # 运行测试套件Provider Setup
提供商设置
- Open the dashboard at (dev) or
http://localhost:5173(prod).http://localhost:3001 - Navigate to Keys page.
- Add raw API keys for each provider you have. Keys are encrypted before SQLite storage.
- Navigate to Fallback Chain to reorder provider priority.
- Copy your unified bearer token from the Keys page header.
freellmapi-…
Supported providers and what to put in:
| Provider | Where to get a free key |
|---|---|
| Google Gemini | https://ai.google.dev |
| Groq | https://groq.com |
| Cerebras | https://cerebras.ai |
| SambaNova | https://cloud.sambanova.ai |
| NVIDIA NIM | https://build.nvidia.com |
| Mistral | https://mistral.ai |
| OpenRouter | https://openrouter.ai |
| GitHub Models | https://github.com/marketplace/models |
| Hugging Face | https://huggingface.co |
| Cohere | https://cohere.com |
| Cloudflare Workers AI | https://developers.cloudflare.com/workers-ai |
| Zhipu | https://bigmodel.cn |
| Moonshot | https://platform.moonshot.cn |
| MiniMax | https://platform.minimax.io |
- 打开控制台:开发环境访问,生产环境访问
http://localhost:5173。http://localhost:3001 - 导航至Keys页面。
- 添加你拥有的各提供商原始API密钥。密钥在存入SQLite数据库前会被加密。
- 导航至Fallback Chain页面重新排序提供商优先级。
- 从Keys页面头部复制统一的Bearer令牌。
freellmapi-…
支持的提供商及密钥获取方式:
| 提供商 | 免费密钥获取地址 |
|---|---|
| Google Gemini | https://ai.google.dev |
| Groq | https://groq.com |
| Cerebras | https://cerebras.ai |
| SambaNova | https://cloud.sambanova.ai |
| NVIDIA NIM | https://build.nvidia.com |
| Mistral | https://mistral.ai |
| OpenRouter | https://openrouter.ai |
| GitHub Models | https://github.com/marketplace/models |
| Hugging Face | https://huggingface.co |
| Cohere | https://cohere.com |
| Cloudflare Workers AI | https://developers.cloudflare.com/workers-ai |
| 智谱 | https://bigmodel.cn |
| Moonshot | https://platform.moonshot.cn |
| MiniMax | https://platform.minimax.io |
Using the API
API使用方法
Python (openai SDK)
Python(openai SDK)
python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3001/v1",
api_key="freellmapi-your-unified-key", # from dashboard Keys page
)python
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3001/v1",
api_key="freellmapi-your-unified-key", # 来自控制台Keys页面
)Let the router pick the best available provider
由路由选择最佳可用提供商
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Explain async/await in Python in two sentences."}],
)
print(response.choices[0].message.content)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Explain async/await in Python in two sentences."}],
)
print(response.choices[0].message.content)
Which provider actually served this request:
查看实际处理请求的提供商:
print("Routed via:", response.headers.get("x-routed-via"))
undefinedprint("Routed via:", response.headers.get("x-routed-via"))
undefinedRequest a specific model
请求特定模型
python
undefinedpython
undefinedRequest a specific model — router finds a provider that has it
请求特定模型 — 路由会找到提供该模型的提供商
response = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Write a haiku about SQLite."}],
)
undefinedresponse = client.chat.completions.create(
model="gemini-2.5-flash",
messages=[{"role": "user", "content": "Write a haiku about SQLite."}],
)
undefinedStreaming
流式传输
python
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "List 5 TypeScript best practices."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print()python
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "List 5 TypeScript best practices."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
print()curl
curl
bash
undefinedbash
undefinedNon-streaming
非流式
curl http://localhost:3001/v1/chat/completions
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello"}] }'
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello"}] }'
curl http://localhost:3001/v1/chat/completions
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello"}] }'
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello"}] }'
Streaming
流式
curl http://localhost:3001/v1/chat/completions
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
--no-buffer
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Count to 5 slowly"}], "stream": true }'
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
--no-buffer
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Count to 5 slowly"}], "stream": true }'
curl http://localhost:3001/v1/chat/completions
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
--no-buffer
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Count to 5 slowly"}], "stream": true }'
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
--no-buffer
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Count to 5 slowly"}], "stream": true }'
List available models
列出可用模型
curl http://localhost:3001/v1/models
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Authorization: Bearer $FREELLMAPI_KEY"
undefinedcurl http://localhost:3001/v1/models
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Authorization: Bearer $FREELLMAPI_KEY"
undefinedTypeScript / Node.js
TypeScript / Node.js
typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3001/v1",
apiKey: process.env.FREELLMAPI_KEY,
});
async function chat(userMessage: string): Promise<string> {
const response = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: userMessage }],
});
return response.choices[0].message.content ?? "";
}
// Streaming version
async function streamChat(userMessage: string): Promise<void> {
const stream = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: userMessage }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}
console.log();
}typescript
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:3001/v1",
apiKey: process.env.FREELLMAPI_KEY,
});
async function chat(userMessage: string): Promise<string> {
const response = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: userMessage }],
});
return response.choices[0].message.content ?? "";
}
// 流式版本
async function streamChat(userMessage: string): Promise<void> {
const stream = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: userMessage }],
stream: true,
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta?.content;
if (delta) process.stdout.write(delta);
}
console.log();
}Tool Calling
工具调用
Tool calling works across all supported providers. OpenAI-compatible providers receive requests verbatim; Gemini requests are automatically translated to / format and back.
functionDeclarationsfunctionResponsepython
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3001/v1",
api_key="freellmapi-your-unified-key",
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
},
}
]工具调用功能支持所有兼容的提供商。兼容OpenAI的提供商会直接接收请求;Gemini请求会自动转换为/格式,返回时再转换回来。
functionDeclarationsfunctionResponsepython
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:3001/v1",
api_key="freellmapi-your-unified-key",
)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
},
}
]Step 1: Model requests a tool call
步骤1:模型请求工具调用
first = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What's the weather in Karachi?"}],
tools=tools,
tool_choice="required",
)
call = first.choices[0].message.tool_calls[0]
print(f"Tool requested: {call.function.name}({call.function.arguments})")
first = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What's the weather in Karachi?"}],
tools=tools,
tool_choice="required",
)
call = first.choices[0].message.tool_calls[0]
print(f"Tool requested: {call.function.name}({call.function.arguments})")
Step 2: Execute the tool locally, feed result back
步骤2:本地执行工具,将结果反馈给模型
final = client.chat.completions.create(
model="auto",
messages=[
{"role": "user", "content": "What's the weather in Karachi?"},
first.choices[0].message, # assistant message with tool_calls
{
"role": "tool",
"tool_call_id": call.id,
"content": '{"temp_c": 32, "condition": "sunny"}',
},
],
tools=tools,
)
print(final.choices[0].message.content)
undefinedfinal = client.chat.completions.create(
model="auto",
messages=[
{"role": "user", "content": "What's the weather in Karachi?"},
first.choices[0].message, # 包含tool_calls的助手消息
{
"role": "tool",
"tool_call_id": call.id,
"content": '{"temp_c": 32, "condition": "sunny"}',
},
],
tools=tools,
)
print(final.choices[0].message.content)
undefinedStreaming tool calls
流式工具调用
python
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What's the weather in Karachi?"}],
tools=tools,
tool_choice="required",
stream=True,
)
tool_call_chunks = []
for chunk in stream:
delta = chunk.choices[0].delta
if delta.tool_calls:
tool_call_chunks.extend(delta.tool_calls)
if chunk.choices[0].finish_reason == "tool_calls":
print("Tool call complete — assemble chunks and execute")python
stream = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What's the weather in Karachi?"}],
tools=tools,
tool_choice="required",
stream=True,
)
tool_call_chunks = []
for chunk in stream:
delta = chunk.choices[0].delta
if delta.tool_calls:
tool_call_chunks.extend(delta.tool_calls)
if chunk.choices[0].finish_reason == "tool_calls":
print("Tool call complete — assemble chunks and execute")Multi-turn Conversations (Sticky Sessions)
多轮对话(粘性会话)
The proxy keeps multi-turn conversations on the same model for 30 minutes to avoid hallucination spikes from mid-conversation model switches. Pass a consistent in requests if the provider supports it, or rely on the proxy's automatic session tracking.
session_idpython
messages = [{"role": "system", "content": "You are a helpful coding assistant."}]代理会将多轮对话保持在同一模型上30分钟,避免对话中途切换模型导致幻觉激增。如果提供商支持,可在请求中传入一致的,或依赖代理的自动会话跟踪功能。
session_idpython
messages = [{"role": "system", "content": "You are a helpful coding assistant."}]Turn 1
第一轮
messages.append({"role": "user", "content": "Write a Python function to flatten a nested list."})
resp1 = client.chat.completions.create(model="auto", messages=messages)
assistant_msg = resp1.choices[0].message
messages.append({"role": "assistant", "content": assistant_msg.content})
print(assistant_msg.content)
messages.append({"role": "user", "content": "Write a Python function to flatten a nested list."})
resp1 = client.chat.completions.create(model="auto", messages=messages)
assistant_msg = resp1.choices[0].message
messages.append({"role": "assistant", "content": assistant_msg.content})
print(assistant_msg.content)
Turn 2 — sticky session keeps same provider
第二轮 — 粘性会话保持使用同一提供商
messages.append({"role": "user", "content": "Now add type hints to that function."})
resp2 = client.chat.completions.create(model="auto", messages=messages)
print(resp2.choices[0].message.content)
---messages.append({"role": "user", "content": "Now add type hints to that function."})
resp2 = client.chat.completions.create(model="auto", messages=messages)
print(resp2.choices[0].message.content)
---LangChain Integration
LangChain集成
python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import os
llm = ChatOpenAI(
model="auto",
openai_api_base="http://localhost:3001/v1",
openai_api_key=os.environ["FREELLMAPI_KEY"],
streaming=True,
)
response = llm.invoke([HumanMessage(content="Summarise the CAP theorem in one paragraph.")])
print(response.content)python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import os
llm = ChatOpenAI(
model="auto",
openai_api_base="http://localhost:3001/v1",
openai_api_key=os.environ["FREELLMAPI_KEY"],
streaming=True,
)
response = llm.invoke([HumanMessage(content="Summarise the CAP theorem in one paragraph.")])
print(response.content)Response Headers
响应头
Every response includes diagnostic headers:
| Header | Description |
|---|---|
| |
| Number of providers tried before success (only present if > 0) |
python
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "hi"}],
)每个响应都包含诊断头信息:
| 响应头 | 描述 |
|---|---|
| |
| 成功前尝试的提供商数量(仅当数量>0时存在) |
python
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "hi"}],
)Headers are on the raw httpx response:
头信息在原始httpx响应中:
raw = response._response # openai SDK exposes underlying httpx response
print(raw.headers.get("x-routed-via")) # e.g. "groq/llama-4-scout"
print(raw.headers.get("x-fallback-attempts")) # e.g. "2"
---raw = response._response # openai SDK暴露底层httpx响应
print(raw.headers.get("x-routed-via")) # 示例:"groq/llama-4-scout"
print(raw.headers.get("x-fallback-attempts")) # 示例:"2"
---How the Router Works
路由工作原理
Request arrives
│
▼
Router scans fallback chain (priority order)
│
├─ For each model: is there a healthy key under all rate caps?
│ RPM / RPD / TPM / TPD tracked per (platform, model, key)
│
├─ Picks first viable (platform, model, key) tuple
│
├─ Decrypts key in-memory, calls provider SDK
│
└─ On 429 / 5xx / timeout:
Put key on cooldown → retry next model (up to 20 attempts)Rate limit tracking: The router tracks , , , and counters per triple. When a key hits a cap it's cooled down automatically and the next viable key/model is tried.
RPMRPDTPMTPD(platform, model, key)Health checks: Background probes classify each key as , , , or . The router skips non-healthy keys without making a live request.
healthyrate_limitedinvaliderror请求到达
│
▼
路由扫描故障转移链(按优先级排序)
│
├─ 针对每个模型:是否存在符合所有速率限制的健康密钥?
│ RPM / RPD / TPM / TPD 按(平台, 模型, 密钥)跟踪
│
├─ 选择第一个可行的(平台, 模型, 密钥)组合
│
├─ 内存中解密密钥,调用提供商SDK
│
└─ 遇到429 / 5xx / 超时:
将密钥设为冷却状态 → 重试下一个模型(最多尝试20次)速率限制跟踪: 路由按三元组跟踪、、和计数器。当密钥达到限制时,会自动进入冷却状态,然后尝试下一个可行的密钥/模型。
(平台, 模型, 密钥)RPMRPDTPMTPD健康检查: 后台探针会将每个密钥分类为、、或。路由会跳过非健康密钥,无需发起实时请求。
healthyrate_limitedinvaliderrorDashboard Pages
控制台页面
| Page | Purpose |
|---|---|
| Keys | Add/remove provider credentials, view health status, copy unified API key |
| Fallback Chain | Drag to reorder provider priority |
| Playground | Interactive chat showing which provider served each message + latency |
| Analytics | Request volume, success rate, token counts, latency, per-provider breakdown (24h/7d/30d) |
| 页面 | 用途 |
|---|---|
| Keys | 添加/删除提供商凭证,查看健康状态,复制统一API密钥 |
| Fallback Chain | 拖拽排序提供商优先级 |
| Playground | 交互式聊天界面,显示每条消息的处理提供商及延迟 |
| Analytics | 请求量、成功率、令牌计数、延迟、按提供商细分的数据(24小时/7天/30天) |
Production Deployment (Raspberry Pi / Linux)
生产部署(树莓派 / Linux)
bash
undefinedbash
undefinedBuild
构建
npm run build
npm run build
Install PM2
安装PM2
npm install -g pm2
npm install -g pm2
Start
启动
pm2 start server/dist/index.js --name freellmapi
pm2 save
pm2 startup
pm2 start server/dist/index.js --name freellmapi
pm2 save
pm2 startup
nginx reverse proxy (optional)
nginx反向代理(可选)
/etc/nginx/sites-available/freellmapi
/etc/nginx/sites-available/freellmapi
server {
listen 80;
server_name your.domain.com;
location / {
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_buffering off; # Required for SSE streaming
proxy_cache_control no-cache; # Required for SSE streaming
}
}
Memory footprint: ~40 MB RSS at idle on a Pi 4.
---server {
listen 80;
server_name your.domain.com;
location / {
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_buffering off; # SSE流式传输必填
proxy_cache_control no-cache; # SSE流式传输必填
}
}
内存占用:树莓派4闲置时约40 MB RSS。
---Adding a New Provider
添加新提供商
Create a new adapter in :
server/src/providers/typescript
// server/src/providers/myprovider.ts
import type { ProviderAdapter, ChatRequest, ChatResponse } from "../types";
export const myProviderAdapter: ProviderAdapter = {
name: "myprovider",
models: ["my-model-v1", "my-model-v2"],
async chat(request: ChatRequest, apiKey: string): Promise<ChatResponse> {
// Call provider API, return OpenAI-shaped response
const res = await fetch("https://api.myprovider.com/v1/chat", {
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: request.model,
messages: request.messages,
}),
});
const data = await res.json();
return {
id: data.id,
object: "chat.completion",
choices: [{ message: data.choices[0].message, finish_reason: "stop", index: 0 }],
usage: data.usage,
};
},
async *stream(request: ChatRequest, apiKey: string): AsyncGenerator<string> {
// Yield SSE chunks
},
};Register in and add rate limit caps to the router config.
server/src/providers/index.ts在目录下创建新的适配器:
server/src/providers/typescript
// server/src/providers/myprovider.ts
import type { ProviderAdapter, ChatRequest, ChatResponse } from "../types";
export const myProviderAdapter: ProviderAdapter = {
name: "myprovider",
models: ["my-model-v1", "my-model-v2"],
async chat(request: ChatRequest, apiKey: string): Promise<ChatResponse> {
// Call provider API, return OpenAI-shaped response
const res = await fetch("https://api.myprovider.com/v1/chat", {
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: request.model,
messages: request.messages,
}),
});
const data = await res.json();
return {
id: data.id,
object: "chat.completion",
choices: [{ message: data.choices[0].message, finish_reason: "stop", index: 0 }],
usage: data.usage,
};
},
async *stream(request: ChatRequest, apiKey: string): AsyncGenerator<string> {
// Yield SSE chunks
},
};在中注册,并在路由配置中添加速率限制上限。
server/src/providers/index.tsTroubleshooting
故障排除
"No healthy keys available"
- Check the Keys dashboard — all keys may be rate-limited or invalid.
- Wait for cooldown (usually a few minutes for RPM limits) or add more keys.
- Verify the key is valid by testing it directly against the provider's API.
Requests always fall back to the same provider
- Check the Fallback Chain order in the dashboard.
- Ensure keys for higher-priority providers are marked .
healthy
Streaming stops mid-response
- If behind nginx, ensure is set.
proxy_buffering off - Check provider-side token/minute caps — the stream may be cut by a mid-stream rate limit.
ENCRYPTION_KEY- Ensure in
ENCRYPTION_KEYis exactly 64 hex characters (32 bytes)..env - Regenerate:
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
Tool calls not working with a specific provider
- Not all free-tier models support function calling. Check the provider's docs.
- Try — the router will pick a tool-capable model.
model="auto" - Gemini tool calls are auto-translated; others pass through as-is.
High latency on first request
- Health checks run periodically in the background. The first request after startup may probe a few keys. Subsequent requests are faster.
"No healthy keys available"(无健康密钥可用)
- 检查Keys控制台——所有密钥可能都被限流或无效。
- 等待冷却(通常RPM限制只需几分钟)或添加更多密钥。
- 通过直接调用提供商API验证密钥是否有效。
请求总是故障转移到同一提供商
- 检查控制台中的Fallback Chain顺序。
- 确保高优先级提供商的密钥标记为。
healthy
流式传输中途停止
- 如果使用nginx反向代理,确保已设置。
proxy_buffering off - 检查提供商的每分钟令牌限制——流式传输可能因中途触发速率限制而被切断。
启动时出现错误
ENCRYPTION_KEY- 确保中的
.env恰好是64个十六进制字符(32字节)。ENCRYPTION_KEY - 重新生成:
node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"
特定提供商的工具调用无法工作
- 并非所有免费层级模型都支持函数调用。请查看提供商文档。
- 尝试使用——路由会选择支持工具调用的模型。
model="auto" - Gemini的工具调用会自动转换,其他提供商则直接传递请求。
首次请求延迟高
- 健康检查会定期在后台运行。启动后的首次请求可能会探测多个密钥。后续请求会更快。
Limitations
局限性
- Text-only — no vision/multimodal inputs
- No embeddings ()
/v1/embeddings - No image generation ()
/v1/images/* - No audio/speech ()
/v1/audio/* - No legacy completions ()
/v1/completions - No moderation ()
/v1/moderations - not supported (single completion per request)
n > 1 - Single-user by design — no per-user billing or multi-tenant auth
- Personal/experimental use only — review each provider's ToS before production use
- 仅支持文本——无视觉/多模态输入
- 无嵌入功能()
/v1/embeddings - 无图像生成功能()
/v1/images/* - 无音频/语音功能()
/v1/audio/* - 无旧版补全功能()
/v1/completions - 无内容审核功能()
/v1/moderations - 不支持(每个请求仅返回单个补全结果)
n > 1 - 设计为单用户使用——无按用户计费或多租户认证
- 仅用于个人/实验用途——生产使用前请查看每个提供商的服务条款