freellmapi-proxy

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

FreeLLMAPI Proxy

Skill by ara.so — Daily 2026 Skills collection.

FreeLLMAPI is a self-hosted OpenAI-compatible proxy that aggregates free-tier API keys from ~14 AI providers (Google, Groq, Cerebras, SambaNova, NVIDIA, Mistral, OpenRouter, GitHub Models, Hugging Face, Cohere, Cloudflare, Zhipu, Moonshot, MiniMax) behind a single

/v1/chat/completions

endpoint. It handles automatic failover on 429/5xx, per-key rate tracking, sticky sessions for multi-turn conversations, and AES-256-GCM encrypted key storage.

由ara.so开发的Skill——2026每日技能合集。

FreeLLMAPI是一款自托管的兼容OpenAI的代理服务，将来自约14家AI提供商（Google、Groq、Cerebras、SambaNova、NVIDIA、Mistral、OpenRouter、GitHub Models、Hugging Face、Cohere、Cloudflare、智谱、Moonshot、MiniMax）的免费层级API密钥聚合到单一的

/v1/chat/completions

端点之后。它可处理429/5xx错误时的自动故障转移、按密钥速率跟踪、多轮对话的粘性会话，以及AES-256-GCM加密密钥存储。

Installation

安装

Prerequisites: Node.js 20+, npm.

bash

git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi
npm install

前置要求： Node.js 20+、npm。

bash

git clone https://github.com/tashfeenahmed/freellmapi.git
cd freellmapi
npm install

Generate encryption key and set up environment

cp .env.example .env echo "ENCRYPTION_KEY=$(node -e "console.log(require('crypto').randomBytes(32).toString('hex'))")" >> .env

Development (server + Vite dashboard on :5173)

npm run dev

Production build

npm run build node server/dist/index.js # serves API + dashboard on :3001

---

npm run build node server/dist/index.js # serves API + dashboard on :3001

---

Environment Variables

环境变量

bash

undefined

bash

undefined

.env

ENCRYPTION_KEY=<64-char hex string> # Required — AES-256 key for provider key storage PORT=3001 # Optional — defaults to 3001 NODE_ENV=production # Optional


Never commit `.env`. The `ENCRYPTION_KEY` protects all stored provider API keys.

---

ENCRYPTION_KEY=<64-char hex string> # 必填 — 用于存储提供商密钥的AES-256密钥 PORT=3001 # 可选 — 默认值为3001 NODE_ENV=production # 可选


请勿提交`.env`文件。`ENCRYPTION_KEY`用于保护所有存储的提供商API密钥。

---

Key Commands

关键命令

bash

npm run dev        # Start Express server + Vite dashboard in watch mode
npm run build      # Compile TypeScript server + build React dashboard
npm run lint       # ESLint across server/ and client/
npm run test       # Run test suite

bash

npm run dev        # 启动Express服务器 + Vite控制台（监听模式）
npm run build      # 编译TypeScript服务器 + 构建React控制台
npm run lint       # 对server/和client/目录执行ESLint检查
npm run test       # 运行测试套件

Provider Setup

提供商设置

Open the dashboard at

http://localhost:5173

(dev) or

http://localhost:3001

(prod).

Navigate to Keys page.
Add raw API keys for each provider you have. Keys are encrypted before SQLite storage.
Navigate to Fallback Chain to reorder provider priority.
Copy your unified
```
freellmapi-…
```
bearer token from the Keys page header.

Supported providers and what to put in:

Provider	Where to get a free key
Google Gemini	https://ai.google.dev
Groq	https://groq.com
Cerebras	https://cerebras.ai
SambaNova	https://cloud.sambanova.ai
NVIDIA NIM	https://build.nvidia.com
Mistral	https://mistral.ai
OpenRouter	https://openrouter.ai
GitHub Models	https://github.com/marketplace/models
Hugging Face	https://huggingface.co
Cohere	https://cohere.com
Cloudflare Workers AI	https://developers.cloudflare.com/workers-ai
Zhipu	https://bigmodel.cn
Moonshot	https://platform.moonshot.cn
MiniMax	https://platform.minimax.io

打开控制台：开发环境访问
```
http://localhost:5173
```
，生产环境访问
```
http://localhost:3001
```
。
导航至Keys页面。
添加你拥有的各提供商原始API密钥。密钥在存入SQLite数据库前会被加密。
导航至Fallback Chain页面重新排序提供商优先级。
从Keys页面头部复制统一的
```
freellmapi-…
```
Bearer令牌。

支持的提供商及密钥获取方式：

提供商	免费密钥获取地址
Google Gemini	https://ai.google.dev
Groq	https://groq.com
Cerebras	https://cerebras.ai
SambaNova	https://cloud.sambanova.ai
NVIDIA NIM	https://build.nvidia.com
Mistral	https://mistral.ai
OpenRouter	https://openrouter.ai
GitHub Models	https://github.com/marketplace/models
Hugging Face	https://huggingface.co
Cohere	https://cohere.com
Cloudflare Workers AI	https://developers.cloudflare.com/workers-ai
智谱	https://bigmodel.cn
Moonshot	https://platform.moonshot.cn
MiniMax	https://platform.minimax.io

Using the API

API使用方法

Python (openai SDK)

Python（openai SDK）

python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3001/v1",
    api_key="freellmapi-your-unified-key",  # from dashboard Keys page
)

python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3001/v1",
    api_key="freellmapi-your-unified-key",  # 来自控制台Keys页面
)

Let the router pick the best available provider

由路由选择最佳可用提供商

response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "Explain async/await in Python in two sentences."}], )

print(response.choices[0].message.content)

response = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "Explain async/await in Python in two sentences."}], )

print(response.choices[0].message.content)

Which provider actually served this request:

查看实际处理请求的提供商:

print("Routed via:", response.headers.get("x-routed-via"))

undefined

print("Routed via:", response.headers.get("x-routed-via"))

undefined

Request a specific model

请求特定模型

python

undefined

python

undefined

Request a specific model — router finds a provider that has it

请求特定模型 — 路由会找到提供该模型的提供商

response = client.chat.completions.create( model="gemini-2.5-flash", messages=[{"role": "user", "content": "Write a haiku about SQLite."}], )

undefined

response = client.chat.completions.create( model="gemini-2.5-flash", messages=[{"role": "user", "content": "Write a haiku about SQLite."}], )

undefined

Streaming

流式传输

python

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "List 5 TypeScript best practices."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
print()

python

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "List 5 TypeScript best practices."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
print()

curl

bash

undefined

bash

undefined

Non-streaming

非流式

curl http://localhost:3001/v1/chat/completions
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Hello"}] }'

Streaming

流式

curl http://localhost:3001/v1/chat/completions
-H "Authorization: Bearer $FREELLMAPI_KEY"
-H "Content-Type: application/json"
--no-buffer
-d '{ "model": "auto", "messages": [{"role": "user", "content": "Count to 5 slowly"}], "stream": true }'

List available models

列出可用模型

curl http://localhost:3001/v1/models
-H "Authorization: Bearer $FREELLMAPI_KEY"

undefined

curl http://localhost:3001/v1/models
-H "Authorization: Bearer $FREELLMAPI_KEY"

undefined

TypeScript / Node.js

typescript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:3001/v1",
  apiKey: process.env.FREELLMAPI_KEY,
});

async function chat(userMessage: string): Promise<string> {
  const response = await client.chat.completions.create({
    model: "auto",
    messages: [{ role: "user", content: userMessage }],
  });
  return response.choices[0].message.content ?? "";
}

// Streaming version
async function streamChat(userMessage: string): Promise<void> {
  const stream = await client.chat.completions.create({
    model: "auto",
    messages: [{ role: "user", content: userMessage }],
    stream: true,
  });

  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta?.content;
    if (delta) process.stdout.write(delta);
  }
  console.log();
}

typescript

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:3001/v1",
  apiKey: process.env.FREELLMAPI_KEY,
});

async function chat(userMessage: string): Promise<string> {
  const response = await client.chat.completions.create({
    model: "auto",
    messages: [{ role: "user", content: userMessage }],
  });
  return response.choices[0].message.content ?? "";
}

// 流式版本
async function streamChat(userMessage: string): Promise<void> {
  const stream = await client.chat.completions.create({
    model: "auto",
    messages: [{ role: "user", content: userMessage }],
    stream: true,
  });

  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta?.content;
    if (delta) process.stdout.write(delta);
  }
  console.log();
}

Tool Calling

工具调用

Tool calling works across all supported providers. OpenAI-compatible providers receive requests verbatim; Gemini requests are automatically translated to

functionDeclarations

functionResponse

format and back.

python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3001/v1",
    api_key="freellmapi-your-unified-key",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                },
                "required": ["city"],
            },
        },
    }
]

工具调用功能支持所有兼容的提供商。兼容OpenAI的提供商会直接接收请求；Gemini请求会自动转换为

functionDeclarations

functionResponse

格式，返回时再转换回来。

python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:3001/v1",
    api_key="freellmapi-your-unified-key",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                },
                "required": ["city"],
            },
        },
    }
]

Step 1: Model requests a tool call

步骤1：模型请求工具调用

first = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "What's the weather in Karachi?"}], tools=tools, tool_choice="required", )

call = first.choices[0].message.tool_calls[0] print(f"Tool requested: {call.function.name}({call.function.arguments})")

first = client.chat.completions.create( model="auto", messages=[{"role": "user", "content": "What's the weather in Karachi?"}], tools=tools, tool_choice="required", )

call = first.choices[0].message.tool_calls[0] print(f"Tool requested: {call.function.name}({call.function.arguments})")

Step 2: Execute the tool locally, feed result back

步骤2：本地执行工具，将结果反馈给模型

final = client.chat.completions.create( model="auto", messages=[ {"role": "user", "content": "What's the weather in Karachi?"}, first.choices[0].message, # assistant message with tool_calls { "role": "tool", "tool_call_id": call.id, "content": '{"temp_c": 32, "condition": "sunny"}', }, ], tools=tools, )

print(final.choices[0].message.content)

undefined

final = client.chat.completions.create( model="auto", messages=[ {"role": "user", "content": "What's the weather in Karachi?"}, first.choices[0].message, # 包含tool_calls的助手消息 { "role": "tool", "tool_call_id": call.id, "content": '{"temp_c": 32, "condition": "sunny"}', }, ], tools=tools, )

print(final.choices[0].message.content)

undefined

Streaming tool calls

流式工具调用

python

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What's the weather in Karachi?"}],
    tools=tools,
    tool_choice="required",
    stream=True,
)

tool_call_chunks = []
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.tool_calls:
        tool_call_chunks.extend(delta.tool_calls)
    if chunk.choices[0].finish_reason == "tool_calls":
        print("Tool call complete — assemble chunks and execute")

python

stream = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What's the weather in Karachi?"}],
    tools=tools,
    tool_choice="required",
    stream=True,
)

tool_call_chunks = []
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.tool_calls:
        tool_call_chunks.extend(delta.tool_calls)
    if chunk.choices[0].finish_reason == "tool_calls":
        print("Tool call complete — assemble chunks and execute")

Multi-turn Conversations (Sticky Sessions)

多轮对话（粘性会话）

The proxy keeps multi-turn conversations on the same model for 30 minutes to avoid hallucination spikes from mid-conversation model switches. Pass a consistent

session_id

in requests if the provider supports it, or rely on the proxy's automatic session tracking.

python

messages = [{"role": "system", "content": "You are a helpful coding assistant."}]

代理会将多轮对话保持在同一模型上30分钟，避免对话中途切换模型导致幻觉激增。如果提供商支持，可在请求中传入一致的

session_id

，或依赖代理的自动会话跟踪功能。

python

messages = [{"role": "system", "content": "You are a helpful coding assistant."}]

Turn 1

第一轮

messages.append({"role": "user", "content": "Write a Python function to flatten a nested list."}) resp1 = client.chat.completions.create(model="auto", messages=messages) assistant_msg = resp1.choices[0].message messages.append({"role": "assistant", "content": assistant_msg.content}) print(assistant_msg.content)

Turn 2 — sticky session keeps same provider

第二轮 — 粘性会话保持使用同一提供商

messages.append({"role": "user", "content": "Now add type hints to that function."}) resp2 = client.chat.completions.create(model="auto", messages=messages) print(resp2.choices[0].message.content)

---

messages.append({"role": "user", "content": "Now add type hints to that function."}) resp2 = client.chat.completions.create(model="auto", messages=messages) print(resp2.choices[0].message.content)

---

LangChain Integration

LangChain集成

python

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import os

llm = ChatOpenAI(
    model="auto",
    openai_api_base="http://localhost:3001/v1",
    openai_api_key=os.environ["FREELLMAPI_KEY"],
    streaming=True,
)

response = llm.invoke([HumanMessage(content="Summarise the CAP theorem in one paragraph.")])
print(response.content)

python

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import os

llm = ChatOpenAI(
    model="auto",
    openai_api_base="http://localhost:3001/v1",
    openai_api_key=os.environ["FREELLMAPI_KEY"],
    streaming=True,
)

response = llm.invoke([HumanMessage(content="Summarise the CAP theorem in one paragraph.")])
print(response.content)

Response Headers

响应头

Every response includes diagnostic headers:

Header	Description
`X-Routed-Via`	`<platform>/<model>` — which provider served the request
`X-Fallback-Attempts`	Number of providers tried before success (only present if > 0)

python

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "hi"}],
)

每个响应都包含诊断头信息：

响应头	描述
`X-Routed-Via`	`<platform>/<model>` — 处理请求的提供商
`X-Fallback-Attempts`	成功前尝试的提供商数量（仅当数量>0时存在）

python

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "hi"}],
)

Headers are on the raw httpx response:

头信息在原始httpx响应中:

raw = response._response # openai SDK exposes underlying httpx response print(raw.headers.get("x-routed-via")) # e.g. "groq/llama-4-scout" print(raw.headers.get("x-fallback-attempts")) # e.g. "2"

---

raw = response._response # openai SDK暴露底层httpx响应 print(raw.headers.get("x-routed-via")) # 示例："groq/llama-4-scout" print(raw.headers.get("x-fallback-attempts")) # 示例："2"

---

How the Router Works

路由工作原理

Request arrives
      │
      ▼
Router scans fallback chain (priority order)
      │
      ├─ For each model: is there a healthy key under all rate caps?
      │     RPM / RPD / TPM / TPD tracked per (platform, model, key)
      │
      ├─ Picks first viable (platform, model, key) tuple
      │
      ├─ Decrypts key in-memory, calls provider SDK
      │
      └─ On 429 / 5xx / timeout:
            Put key on cooldown → retry next model (up to 20 attempts)

Rate limit tracking: The router tracks

RPM

RPD

TPM

, and

TPD

counters per

(platform, model, key)

triple. When a key hits a cap it's cooled down automatically and the next viable key/model is tried.

Health checks: Background probes classify each key as

healthy

rate_limited

invalid

, or

error

. The router skips non-healthy keys without making a live request.

请求到达
      │
      ▼
路由扫描故障转移链（按优先级排序）
      │
      ├─ 针对每个模型：是否存在符合所有速率限制的健康密钥？
      │     RPM / RPD / TPM / TPD 按(平台, 模型, 密钥)跟踪
      │
      ├─ 选择第一个可行的(平台, 模型, 密钥)组合
      │
      ├─ 内存中解密密钥，调用提供商SDK
      │
      └─ 遇到429 / 5xx / 超时:
            将密钥设为冷却状态 → 重试下一个模型（最多尝试20次）

速率限制跟踪： 路由按

(平台, 模型, 密钥)

三元组跟踪

RPM

、

RPD

、

TPM

和

TPD

计数器。当密钥达到限制时，会自动进入冷却状态，然后尝试下一个可行的密钥/模型。

健康检查： 后台探针会将每个密钥分类为

healthy

、

rate_limited

、

invalid

或

error

。路由会跳过非健康密钥，无需发起实时请求。

Dashboard Pages

控制台页面

Page	Purpose
Keys	Add/remove provider credentials, view health status, copy unified API key
Fallback Chain	Drag to reorder provider priority
Playground	Interactive chat showing which provider served each message + latency
Analytics	Request volume, success rate, token counts, latency, per-provider breakdown (24h/7d/30d)

页面	用途
Keys	添加/删除提供商凭证，查看健康状态，复制统一API密钥
Fallback Chain	拖拽排序提供商优先级
Playground	交互式聊天界面，显示每条消息的处理提供商及延迟
Analytics	请求量、成功率、令牌计数、延迟、按提供商细分的数据（24小时/7天/30天）

Production Deployment (Raspberry Pi / Linux)

生产部署（树莓派 / Linux）

bash

undefined

bash

undefined

Build

构建

npm run build

Install PM2

安装PM2

npm install -g pm2

Start

启动

pm2 start server/dist/index.js --name freellmapi pm2 save pm2 startup

nginx reverse proxy (optional)

nginx反向代理（可选）

/etc/nginx/sites-available/freellmapi

server { listen 80; server_name your.domain.com; location / { proxy_pass http://localhost:3001; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_buffering off; # Required for SSE streaming proxy_cache_control no-cache; # Required for SSE streaming } }


Memory footprint: ~40 MB RSS at idle on a Pi 4.

---

server { listen 80; server_name your.domain.com; location / { proxy_pass http://localhost:3001; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_buffering off; # SSE流式传输必填 proxy_cache_control no-cache; # SSE流式传输必填 } }


内存占用：树莓派4闲置时约40 MB RSS。

---

Adding a New Provider

添加新提供商

Create a new adapter in

server/src/providers/

typescript

// server/src/providers/myprovider.ts
import type { ProviderAdapter, ChatRequest, ChatResponse } from "../types";

export const myProviderAdapter: ProviderAdapter = {
  name: "myprovider",
  models: ["my-model-v1", "my-model-v2"],

  async chat(request: ChatRequest, apiKey: string): Promise<ChatResponse> {
    // Call provider API, return OpenAI-shaped response
    const res = await fetch("https://api.myprovider.com/v1/chat", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${apiKey}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        model: request.model,
        messages: request.messages,
      }),
    });
    const data = await res.json();
    return {
      id: data.id,
      object: "chat.completion",
      choices: [{ message: data.choices[0].message, finish_reason: "stop", index: 0 }],
      usage: data.usage,
    };
  },

  async *stream(request: ChatRequest, apiKey: string): AsyncGenerator<string> {
    // Yield SSE chunks
  },
};

server/src/providers/index.ts

and add rate limit caps to the router config.

在

server/src/providers/

目录下创建新的适配器：

typescript

// server/src/providers/myprovider.ts
import type { ProviderAdapter, ChatRequest, ChatResponse } from "../types";

export const myProviderAdapter: ProviderAdapter = {
  name: "myprovider",
  models: ["my-model-v1", "my-model-v2"],

  async chat(request: ChatRequest, apiKey: string): Promise<ChatResponse> {
    // Call provider API, return OpenAI-shaped response
    const res = await fetch("https://api.myprovider.com/v1/chat", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${apiKey}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        model: request.model,
        messages: request.messages,
      }),
    });
    const data = await res.json();
    return {
      id: data.id,
      object: "chat.completion",
      choices: [{ message: data.choices[0].message, finish_reason: "stop", index: 0 }],
      usage: data.usage,
    };
  },

  async *stream(request: ChatRequest, apiKey: string): AsyncGenerator<string> {
    // Yield SSE chunks
  },
};

在

server/src/providers/index.ts

中注册，并在路由配置中添加速率限制上限。

Troubleshooting

故障排除

"No healthy keys available"

Check the Keys dashboard — all keys may be rate-limited or invalid.
Wait for cooldown (usually a few minutes for RPM limits) or add more keys.
Verify the key is valid by testing it directly against the provider's API.

Requests always fall back to the same provider

Check the Fallback Chain order in the dashboard.
Ensure keys for higher-priority providers are marked
```
healthy
```
.

Streaming stops mid-response

If behind nginx, ensure
```
proxy_buffering off
```
is set.
Check provider-side token/minute caps — the stream may be cut by a mid-stream rate limit.

ENCRYPTION_KEY
error on startup

Ensure
```
ENCRYPTION_KEY
```
in
```
.env
```
is exactly 64 hex characters (32 bytes).

Regenerate:

node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"

Tool calls not working with a specific provider

Not all free-tier models support function calling. Check the provider's docs.
Try
```
model="auto"
```
— the router will pick a tool-capable model.
Gemini tool calls are auto-translated; others pass through as-is.

High latency on first request

Health checks run periodically in the background. The first request after startup may probe a few keys. Subsequent requests are faster.

"No healthy keys available"（无健康密钥可用）

检查Keys控制台——所有密钥可能都被限流或无效。
等待冷却（通常RPM限制只需几分钟）或添加更多密钥。
通过直接调用提供商API验证密钥是否有效。

请求总是故障转移到同一提供商

检查控制台中的Fallback Chain顺序。
确保高优先级提供商的密钥标记为
```
healthy
```
。

流式传输中途停止

如果使用nginx反向代理，确保已设置
```
proxy_buffering off
```
。
检查提供商的每分钟令牌限制——流式传输可能因中途触发速率限制而被切断。

启动时出现
ENCRYPTION_KEY
错误

确保
```
.env
```
中的
```
ENCRYPTION_KEY
```
恰好是64个十六进制字符（32字节）。

重新生成：

node -e "console.log(require('crypto').randomBytes(32).toString('hex'))"

特定提供商的工具调用无法工作

并非所有免费层级模型都支持函数调用。请查看提供商文档。
尝试使用
```
model="auto"
```
——路由会选择支持工具调用的模型。
Gemini的工具调用会自动转换，其他提供商则直接传递请求。

首次请求延迟高

健康检查会定期在后台运行。启动后的首次请求可能会探测多个密钥。后续请求会更快。

Limitations

局限性

Text-only — no vision/multimodal inputs
No embeddings (
```
/v1/embeddings
```
)
No image generation (
```
/v1/images/*
```
)
No audio/speech (
```
/v1/audio/*
```
)
No legacy completions (
```
/v1/completions
```
)
No moderation (
```
/v1/moderations
```
)
```
n > 1
```
not supported (single completion per request)
Single-user by design — no per-user billing or multi-tenant auth
Personal/experimental use only — review each provider's ToS before production use

仅支持文本——无视觉/多模态输入
无嵌入功能（
```
/v1/embeddings
```
）
无图像生成功能（
```
/v1/images/*
```
）
无音频/语音功能（
```
/v1/audio/*
```
）
无旧版补全功能（
```
/v1/completions
```
）
无内容审核功能（
```
/v1/moderations
```
）
不支持
```
n > 1
```
（每个请求仅返回单个补全结果）
设计为单用户使用——无按用户计费或多租户认证
仅用于个人/实验用途——生产使用前请查看每个提供商的服务条款