portkey-python-sdk

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Portkey Python SDK

The Portkey Python SDK provides a unified interface to 200+ LLMs through the Portkey AI Gateway. Built on top of the OpenAI SDK for seamless compatibility, it adds production-grade features: automatic fallbacks, retries, load balancing, semantic caching, guardrails, and comprehensive observability.

Additional References:

API Reference - Response structures, error handling
Advanced Features - Tool calling, embeddings, audio, images
Framework Integrations - LangChain, LlamaIndex, Strands, Google ADK
Provider Configuration - Azure, AWS Bedrock, Vertex AI setup

Portkey Python SDK 通过 Portkey AI Gateway 提供统一接口，可访问200+大语言模型（LLM）。它基于 OpenAI SDK 构建，具备无缝兼容性，同时新增了生产级功能：自动降级、重试、负载均衡、语义缓存、安全防护与全面的可观测能力。

更多参考文档：

API 参考文档 - 响应结构、错误处理
进阶功能文档 - 工具调用、嵌入、音频、图像能力
框架集成文档 - LangChain、LlamaIndex、Strands、Google ADK 集成
服务商配置文档 - Azure、AWS Bedrock、Vertex AI 配置指南

Installation

安装

bash

pip install portkey-ai

bash

pip install portkey-ai

Or with poetry/uv

poetry add portkey-ai uv add portkey-ai

---

poetry add portkey-ai uv add portkey-ai

---

Quick Start

快速开始

python

import os
from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="your-openai-virtual-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

python

import os
from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="your-openai-virtual-key"
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Authentication

身份验证

API Key + Virtual Key (Recommended)

API Key + 虚拟密钥（推荐方案）

Virtual keys securely store provider API keys in Portkey's vault:

python

import os
from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],  # From app.portkey.ai
    virtual_key="openai-virtual-key-xxx"     # From app.portkey.ai/virtual-keys
)

虚拟密钥可在 Portkey 密钥库中安全存储服务商API密钥：

python

import os
from portkey_ai import Portkey

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],  # 从 app.portkey.ai 获取
    virtual_key="openai-virtual-key-xxx"     # 从 app.portkey.ai/virtual-keys 获取
)

Using Config IDs

使用配置ID

Pre-configure routing, fallbacks, and caching in the dashboard:

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config="pc-config-xxx"  # Config ID from dashboard
)

可在控制台预配置路由、降级与缓存策略：

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config="pc-config-xxx"  # 从控制台获取配置ID
)

Chat Completions

对话补全

Basic Request

基础请求

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing briefly."}
    ]
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain quantum computing briefly."}
    ]
)

print(response.choices[0].message.content)
print(f"Tokens used: {response.usage.total_tokens}")

Streaming

流式响应

python

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

python

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Async Support

异步支持

python

import asyncio
from portkey_ai import AsyncPortkey

async def main():
    client = AsyncPortkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        virtual_key="openai-key"
    )
    
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

python

import asyncio
from portkey_ai import AsyncPortkey

async def main():
    client = AsyncPortkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        virtual_key="openai-key"
    )
    
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Async Streaming

异步流式响应

python

async def stream_response():
    client = AsyncPortkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        virtual_key="openai-key"
    )
    
    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Write a poem"}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

python

async def stream_response():
    client = AsyncPortkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        virtual_key="openai-key"
    )
    
    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Write a poem"}],
        stream=True
    )
    
    async for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)

Gateway Features

网关核心功能

Fallbacks

自动降级

Automatic failover when a provider fails:

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "strategy": {"mode": "fallback"},
        "targets": [
            {
                "virtual_key": "openai-key",
                "override_params": {"model": "gpt-4o"}
            },
            {
                "virtual_key": "anthropic-key",
                "override_params": {"model": "claude-3-5-sonnet-20241022"}
            }
        ]
    }
)

当服务商服务异常时自动切换至备用服务商：

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "strategy": {"mode": "fallback"},
        "targets": [
            {
                "virtual_key": "openai-key",
                "override_params": {"model": "gpt-4o"}
            },
            {
                "virtual_key": "anthropic-key",
                "override_params": {"model": "claude-3-5-sonnet-20241022"}
            }
        ]
    }
)

If OpenAI fails, automatically tries Anthropic

若OpenAI服务异常，将自动尝试调用Anthropic

response = client.chat.completions.create( messages=[{"role": "user", "content": "Hello!"}] )

undefined

response = client.chat.completions.create( messages=[{"role": "user", "content": "Hello!"}] )

undefined

Load Balancing

负载均衡

Distribute traffic across providers:

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "strategy": {"mode": "loadbalance"},
        "targets": [
            {"virtual_key": "openai-key-1", "weight": 0.7},
            {"virtual_key": "openai-key-2", "weight": 0.3}
        ]
    }
)

在多个服务商间分配请求流量：

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "strategy": {"mode": "loadbalance"},
        "targets": [
            {"virtual_key": "openai-key-1", "weight": 0.7},
            {"virtual_key": "openai-key-2", "weight": 0.3}
        ]
    }
)

Automatic Retries

自动重试

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "retry": {
            "attempts": 3,
            "on_status_codes": [429, 500, 502, 503, 504]
        },
        "virtual_key": "openai-key"
    }
)

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "retry": {
            "attempts": 3,
            "on_status_codes": [429, 500, 502, 503, 504]
        },
        "virtual_key": "openai-key"
    }
)

Semantic Caching

语义缓存

Reduce costs and latency with intelligent caching:

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "cache": {
            "mode": "semantic",  # or "simple" for exact match
            "max_age": 3600      # TTL in seconds
        },
        "virtual_key": "openai-key"
    }
)

通过智能缓存降低成本与响应延迟：

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    config={
        "cache": {
            "mode": "semantic",  # 或使用"simple"实现精确匹配缓存
            "max_age": 3600      # 缓存过期时间（秒）
        },
        "virtual_key": "openai-key"
    }
)

Similar queries return cached responses

相似请求将返回缓存响应

response1 = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What is the capital of France?"}] )

response2 = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Tell me France's capital"}] ) # Returns cached response

undefined

response1 = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "What is the capital of France?"}] )

response2 = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Tell me France's capital"}] ) # 返回缓存响应

undefined

Request Timeout

请求超时

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    request_timeout=30  # 30 seconds
)

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    request_timeout=30  # 30秒
)

Observability

可观测性

Trace IDs

追踪ID

Link related requests for debugging:

python

import uuid

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    trace_id=str(uuid.uuid4())
)

通过追踪ID关联相关请求，便于调试：

python

import uuid

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    trace_id=str(uuid.uuid4())
)

Custom Metadata

自定义元数据

Add searchable metadata to requests:

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    metadata={
        "user_id": "user-123",
        "session_id": "session-456",
        "environment": "production"
    }
)

为请求添加可搜索的元数据：

python

client = Portkey(
    api_key=os.environ["PORTKEY_API_KEY"],
    virtual_key="openai-key",
    metadata={
        "user_id": "user-123",
        "session_id": "session-456",
        "environment": "production"
    }
)

Per-Request Options

单请求配置

python

response = client.with_options(
    trace_id="unique-trace-id",
    metadata={"request_type": "summarization"}
).chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this..."}]
)

python

response = client.with_options(
    trace_id="unique-trace-id",
    metadata={"request_type": "summarization"}
).chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Summarize this..."}]
)

Common Patterns

常见使用模式

Multi-turn Conversation

多轮对话

python

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "Show me a hello world example."}
]

response = client.chat.completions.create(model="gpt-4o", messages=messages)

python

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "Show me a hello world example."}
]

response = client.chat.completions.create(model="gpt-4o", messages=messages)

JSON Output

JSON 格式输出

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract as JSON with name and age fields."},
        {"role": "user", "content": "John is 30 years old."}
    ],
    response_format={"type": "json_object"}
)

python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract as JSON with name and age fields."},
        {"role": "user", "content": "John is 30 years old."}
    ],
    response_format={"type": "json_object"}
)

Returns: {"name": "John", "age": 30}

返回结果: {"name": "John", "age": 30}

undefined

undefined

Production Setup with Fallbacks + Caching

生产环境配置（含降级与缓存）

python

def create_production_client():
    return Portkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        config={
            "strategy": {"mode": "fallback"},
            "targets": [
                {
                    "virtual_key": os.environ["OPENAI_VIRTUAL_KEY"],
                    "override_params": {"model": "gpt-4o"},
                    "retry": {"attempts": 2, "on_status_codes": [429, 500]}
                },
                {
                    "virtual_key": os.environ["ANTHROPIC_VIRTUAL_KEY"],
                    "override_params": {"model": "claude-3-5-sonnet-20241022"}
                }
            ],
            "cache": {"mode": "semantic", "max_age": 3600}
        },
        trace_id="production-session",
        metadata={"environment": "production"}
    )

python

def create_production_client():
    return Portkey(
        api_key=os.environ["PORTKEY_API_KEY"],
        config={
            "strategy": {"mode": "fallback"},
            "targets": [
                {
                    "virtual_key": os.environ["OPENAI_VIRTUAL_KEY"],
                    "override_params": {"model": "gpt-4o"},
                    "retry": {"attempts": 2, "on_status_codes": [429, 500]}
                },
                {
                    "virtual_key": os.environ["ANTHROPIC_VIRTUAL_KEY"],
                    "override_params": {"model": "claude-3-5-sonnet-20241022"}
                }
            ],
            "cache": {"mode": "semantic", "max_age": 3600}
        },
        trace_id="production-session",
        metadata={"environment": "production"}
    )

Best Practices

最佳实践

Use environment variables - Never hardcode API keys
Implement fallbacks - Always have backup providers for production
Use streaming - Better UX for long responses
Add tracing - Enable observability with trace IDs and metadata
Enable caching - Reduce costs with semantic caching
Handle errors - Implement retry logic with exponential backoff

使用环境变量 - 切勿硬编码API密钥
配置自动降级 - 生产环境务必配置备用服务商
优先使用流式响应 - 长响应场景下提升用户体验
添加追踪能力 - 通过追踪ID与元数据实现可观测性
启用缓存 - 借助语义缓存降低成本
错误处理 - 实现带指数退避的重试逻辑

portkey-python-sdk

Original

Translation

Portkey Python SDK

Portkey Python SDK

Installation

安装

Or with poetry/uv

Or with poetry/uv

Quick Start

快速开始

Authentication

身份验证

API Key + Virtual Key (Recommended)

API Key + 虚拟密钥（推荐方案）

Using Config IDs

使用配置ID

Chat Completions

对话补全

Basic Request

基础请求

Streaming

流式响应

Async Support

异步支持

Async Streaming

异步流式响应

Gateway Features

网关核心功能

Fallbacks

自动降级

If OpenAI fails, automatically tries Anthropic

若OpenAI服务异常，将自动尝试调用Anthropic

Load Balancing

负载均衡

Automatic Retries

自动重试

Semantic Caching

语义缓存

Similar queries return cached responses

相似请求将返回缓存响应

Request Timeout

请求超时

Observability

可观测性

Trace IDs

追踪ID

Custom Metadata

自定义元数据

Per-Request Options

单请求配置

Common Patterns

常见使用模式

Multi-turn Conversation

多轮对话

JSON Output

JSON 格式输出

Returns: {"name": "John", "age": 30}

返回结果: {"name": "John", "age": 30}

Production Setup with Fallbacks + Caching

生产环境配置（含降级与缓存）

Best Practices

最佳实践

Resources

相关资源