portkey-python-sdk

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Portkey Python SDK

Portkey Python SDK

The Portkey Python SDK provides a unified interface to 200+ LLMs through the Portkey AI Gateway. Built on top of the OpenAI SDK for seamless compatibility, it adds production-grade features: automatic fallbacks, retries, load balancing, semantic caching, guardrails, and comprehensive observability.
Additional References:
  • API Reference - Response structures, error handling
  • Advanced Features - Tool calling, embeddings, audio, images
  • Framework Integrations - LangChain, LlamaIndex, Strands, Google ADK
  • Provider Configuration - Azure, AWS Bedrock, Vertex AI setup

Portkey Python SDK 通过 Portkey AI Gateway 提供统一接口,可访问200+大语言模型(LLM)。它基于 OpenAI SDK 构建,具备无缝兼容性,同时新增了生产级功能:自动降级、重试、负载均衡、语义缓存、安全防护与全面的可观测能力。
更多参考文档
  • API 参考文档 - 响应结构、错误处理
  • 进阶功能文档 - 工具调用、嵌入、音频、图像能力
  • 框架集成文档 - LangChain、LlamaIndex、Strands、Google ADK 集成
  • 服务商配置文档 - Azure、AWS Bedrock、Vertex AI 配置指南

Installation

安装

bash
pip install portkey-ai
bash
pip install portkey-ai

Or with poetry/uv

Or with poetry/uv

poetry add portkey-ai uv add portkey-ai

---
poetry add portkey-ai uv add portkey-ai

---

Quick Start

快速开始

python
import os
from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="your-openai-virtual-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

python
import os
from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="your-openai-virtual-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Authentication

身份验证

API Key + Virtual Key (Recommended)

API Key + 虚拟密钥(推荐方案)

Virtual keys securely store provider API keys in Portkey's vault:
python
import os
from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],  # From app.portkey.ai
    virtual_key="openai-virtual-key-xxx"     # From app.portkey.ai/virtual-keys
)
虚拟密钥可在 Portkey 密钥库中安全存储服务商API密钥:
python
import os
from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],  # 从 app.portkey.ai 获取
    virtual_key="openai-virtual-key-xxx"     # 从 app.portkey.ai/virtual-keys 获取
)

Using Config IDs

使用配置ID

Pre-configure routing, fallbacks, and caching in the dashboard:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config="pc-config-xxx"  # Config ID from dashboard
)

可在控制台预配置路由、降级与缓存策略:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config="pc-config-xxx"  # 从控制台获取配置ID
)

Chat Completions

对话补全

Basic Request

基础请求

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing briefly."}
    ]
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")
python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing briefly."}
    ]
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Streaming

流式响应

python
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
python
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Async Support

异步支持

python
import asyncio
from portkey_ai import AsyncPortkey

async def main():
    client = AsyncPortkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        virtual_key="openai-key"
    )
    
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())
python
import asyncio
from portkey_ai import AsyncPortkey

async def main():
    client = AsyncPortkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        virtual_key="openai-key"
    )
    
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Async Streaming

异步流式响应

python
async def stream_response():
    client = AsyncPortkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        virtual_key="openai-key"
    )
    
    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Write a poem"}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

python
async def stream_response():
    client = AsyncPortkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        virtual_key="openai-key"
    )
    
    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Write a poem"}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

Gateway Features

网关核心功能

Fallbacks

自动降级

Automatic failover when a provider fails:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "strategy": {"mode": "fallback"},
        "targets": [
            {
                "virtual_key": "openai-key",
                "override_params": {"model": "gpt-4o"}
            },
            {
                "virtual_key": "anthropic-key",
                "override_params": {"model": "claude-3-5-sonnet-20241022"}
            }
        ]
    }
)
当服务商服务异常时自动切换至备用服务商:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "strategy": {"mode": "fallback"},
        "targets": [
            {
                "virtual_key": "openai-key",
                "override_params": {"model": "gpt-4o"}
            },
            {
                "virtual_key": "anthropic-key",
                "override_params": {"model": "claude-3-5-sonnet-20241022"}
            }
        ]
    }
)

If OpenAI fails, automatically tries Anthropic

若OpenAI服务异常,将自动尝试调用Anthropic

response = client.chat.completions.create( messages=[{"role": "user", "content": "Hello!"}] )
undefined
response = client.chat.completions.create( messages=[{"role": "user", "content": "Hello!"}] )
undefined

Load Balancing

负载均衡

Distribute traffic across providers:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "strategy": {"mode": "loadbalance"},
        "targets": [
            {"virtual_key": "openai-key-1", "weight": 0.7},
            {"virtual_key": "openai-key-2", "weight": 0.3}
        ]
    }
)
在多个服务商间分配请求流量:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "strategy": {"mode": "loadbalance"},
        "targets": [
            {"virtual_key": "openai-key-1", "weight": 0.7},
            {"virtual_key": "openai-key-2", "weight": 0.3}
        ]
    }
)

Automatic Retries

自动重试

python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "retry": {
            "attempts": 3,
            "on_status_codes": [429, 500, 502, 503, 504]
        },
        "virtual_key": "openai-key"
    }
)
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "retry": {
            "attempts": 3,
            "on_status_codes": [429, 500, 502, 503, 504]
        },
        "virtual_key": "openai-key"
    }
)

Semantic Caching

语义缓存

Reduce costs and latency with intelligent caching:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "cache": {
            "mode": "semantic",  # or "simple" for exact match
            "max_age": 3600      # TTL in seconds
        },
        "virtual_key": "openai-key"
    }
)
通过智能缓存降低成本与响应延迟:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "cache": {
            "mode": "semantic",  # 或使用"simple"实现精确匹配缓存
            "max_age": 3600      # 缓存过期时间(秒)
        },
        "virtual_key": "openai-key"
    }
)

Similar queries return cached responses

相似请求将返回缓存响应

response1 = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What is the capital of France?"}] )
response2 = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Tell me France's capital"}] ) # Returns cached response
undefined
response1 = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What is the capital of France?"}] )
response2 = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Tell me France's capital"}] ) # 返回缓存响应
undefined

Request Timeout

请求超时

python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    request_timeout=30  # 30 seconds
)

python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    request_timeout=30  # 30秒
)

Observability

可观测性

Trace IDs

追踪ID

Link related requests for debugging:
python
import uuid

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    trace_id=str(uuid.uuid4())
)
通过追踪ID关联相关请求,便于调试:
python
import uuid

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    trace_id=str(uuid.uuid4())
)

Custom Metadata

自定义元数据

Add searchable metadata to requests:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    metadata={
        "user_id": "user-123",
        "session_id": "session-456",
        "environment": "production"
    }
)
为请求添加可搜索的元数据:
python
client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    metadata={
        "user_id": "user-123",
        "session_id": "session-456",
        "environment": "production"
    }
)

Per-Request Options

单请求配置

python
response = client.with_options(
    trace_id="unique-trace-id",
    metadata={"request_type": "summarization"}
).chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this..."}]
)

python
response = client.with_options(
    trace_id="unique-trace-id",
    metadata={"request_type": "summarization"}
).chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this..."}]
)

Common Patterns

常见使用模式

Multi-turn Conversation

多轮对话

python
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "Show me a hello world example."}
]

response = client.chat.completions.create(model="gpt-4o", messages=messages)
python
messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "Show me a hello world example."}
]

response = client.chat.completions.create(model="gpt-4o", messages=messages)

JSON Output

JSON 格式输出

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract as JSON with name and age fields."},
        {"role": "user", "content": "John is 30 years old."}
    ],
    response_format={"type": "json_object"}
)
python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract as JSON with name and age fields."},
        {"role": "user", "content": "John is 30 years old."}
    ],
    response_format={"type": "json_object"}
)

Returns: {"name": "John", "age": 30}

返回结果: {"name": "John", "age": 30}

undefined
undefined

Production Setup with Fallbacks + Caching

生产环境配置(含降级与缓存)

python
def create_production_client():
    return Portkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        config={
            "strategy": {"mode": "fallback"},
            "targets": [
                {
                    "virtual_key": os.environ["OPENAI_VIRTUAL_KEY"],
                    "override_params": {"model": "gpt-4o"},
                    "retry": {"attempts": 2, "on_status_codes": [429, 500]}
                },
                {
                    "virtual_key": os.environ["ANTHROPIC_VIRTUAL_KEY"],
                    "override_params": {"model": "claude-3-5-sonnet-20241022"}
                }
            ],
            "cache": {"mode": "semantic", "max_age": 3600}
        },
        trace_id="production-session",
        metadata={"environment": "production"}
    )

python
def create_production_client():
    return Portkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        config={
            "strategy": {"mode": "fallback"},
            "targets": [
                {
                    "virtual_key": os.environ["OPENAI_VIRTUAL_KEY"],
                    "override_params": {"model": "gpt-4o"},
                    "retry": {"attempts": 2, "on_status_codes": [429, 500]}
                },
                {
                    "virtual_key": os.environ["ANTHROPIC_VIRTUAL_KEY"],
                    "override_params": {"model": "claude-3-5-sonnet-20241022"}
                }
            ],
            "cache": {"mode": "semantic", "max_age": 3600}
        },
        trace_id="production-session",
        metadata={"environment": "production"}
    )

Best Practices

最佳实践

  1. Use environment variables - Never hardcode API keys
  2. Implement fallbacks - Always have backup providers for production
  3. Use streaming - Better UX for long responses
  4. Add tracing - Enable observability with trace IDs and metadata
  5. Enable caching - Reduce costs with semantic caching
  6. Handle errors - Implement retry logic with exponential backoff

  1. 使用环境变量 - 切勿硬编码API密钥
  2. 配置自动降级 - 生产环境务必配置备用服务商
  3. 优先使用流式响应 - 长响应场景下提升用户体验
  4. 添加追踪能力 - 通过追踪ID与元数据实现可观测性
  5. 启用缓存 - 借助语义缓存降低成本
  6. 错误处理 - 实现带指数退避的重试逻辑

Resources

相关资源