z-ai-api

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Z.ai API Skill

Z.ai API 技能

Quick Reference

快速参考

Base URL:
https://api.z.ai/api/paas/v4
Coding Plan URL:
https://api.z.ai/api/coding/paas/v4
Auth:
Authorization: Bearer YOUR_API_KEY
Base URL:
https://api.z.ai/api/paas/v4
Coding Plan URL:
https://api.z.ai/api/coding/paas/v4
Auth:
Authorization: Bearer YOUR_API_KEY

Core Endpoints

核心接口

EndpointPurpose
/chat/completions
Text/vision chat
/images/generations
Image generation
/videos/generations
Video generation (async)
/audio/transcriptions
Speech-to-text
/web_search
Web search
/async-result/{id}
Poll async tasks
/v1/agents
Translation, slides, effects
接口用途
/chat/completions
文本/视觉对话
/images/generations
图像生成
/videos/generations
视频生成(异步)
/audio/transcriptions
语音转文字
/web_search
网页搜索
/async-result/{id}
轮询异步任务
/v1/agents
翻译、幻灯片、特效

Model Selection

模型选择

Chat (pick by need):
  • glm-4.7
    — Latest flagship, best quality, agentic coding
  • glm-4.7-flash
    — Fast, high quality
  • glm-4.6
    — Reliable general use
  • glm-4.5-flash
    — Fastest, lower cost
Vision:
  • glm-4.6v
    — Best multimodal (images, video, files)
  • glm-4.6v-flash
    — Fast vision
Media:
  • glm-image
    — High-quality images (HD, ~20s)
  • cogview-4-250304
    — Fast images (~5-10s)
  • cogvideox-3
    — Video, up to 4K, 5-10s
  • viduq1-text/image
    — Vidu video generation
对话(按需选择):
  • glm-4.7
    — 最新旗舰模型,质量最优,支持Agent式编码
  • glm-4.7-flash
    — 速度快,质量高
  • glm-4.6
    — 可靠的通用模型
  • glm-4.5-flash
    — 速度最快,成本更低
视觉:
  • glm-4.6v
    — 最佳多模态模型(支持图像、视频、文件)
  • glm-4.6v-flash
    — 高速视觉模型
媒体:
  • glm-image
    — 高质量图像(高清,约20秒生成)
  • cogview-4-250304
    — 快速生成图像(约5-10秒)
  • cogvideox-3
    — 视频生成,最高4K分辨率,时长5-10秒
  • viduq1-text/image
    — Vidu视频生成

Implementation Patterns

实现模式

Basic Chat

基础对话

python
from zai import ZaiClient

client = ZaiClient(api_key="YOUR_KEY")

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)
python
from zai import ZaiClient

client = ZaiClient(api_key="YOUR_KEY")

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[
        {"role": "system", "content": "You are helpful."},
        {"role": "user", "content": "Hello!"}
    ]
)
print(response.choices[0].message.content)

OpenAI SDK Compatibility

OpenAI SDK 兼容性

python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_ZAI_KEY",
    base_url="https://api.z.ai/api/paas/v4/"
)
python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_ZAI_KEY",
    base_url="https://api.z.ai/api/paas/v4/"
)

Use exactly like OpenAI SDK

Use exactly like OpenAI SDK

undefined
undefined

Streaming

流式输出

python
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[...],
    stream=True
)
for chunk in response:
    print(chunk.choices[0].delta.content, end="")
python
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[...],
    stream=True
)
for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Function Calling

函数调用

python
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)
python
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"
)

Handle tool_calls in response.choices[0].message.tool_calls

Handle tool_calls in response.choices[0].message.tool_calls

undefined
undefined

Vision (Images/Video/Files)

视觉(图像/视频/文件)

python
response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://..."}},
            {"type": "text", "text": "Describe this image"}
        ]
    }]
)
python
response = client.chat.completions.create(
    model="glm-4.6v",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://..."}},
            {"type": "text", "text": "Describe this image"}
        ]
    }]
)

Image Generation

图像生成

python
response = client.images.generate(
    model="glm-image",
    prompt="A serene mountain at sunset",
    size="1280x1280",
    quality="hd"
)
print(response.data[0].url)  # Expires in 30 days
python
response = client.images.generate(
    model="glm-image",
    prompt="A serene mountain at sunset",
    size="1280x1280",
    quality="hd"
)
print(response.data[0].url)  # Expires in 30 days

Video Generation (Async)

视频生成(异步)

python
undefined
python
undefined

Submit

Submit

response = client.videos.generate( model="cogvideox-3", prompt="A cat playing with yarn", size="1920x1080", duration=5 ) task_id = response.id
response = client.videos.generate( model="cogvideox-3", prompt="A cat playing with yarn", size="1920x1080", duration=5 ) task_id = response.id

Poll for result

Poll for result

import time while True: result = client.async_result.get(task_id) if result.task_status == "SUCCESS": print(result.video_result[0].url) break time.sleep(5)
undefined
import time while True: result = client.async_result.get(task_id) if result.task_status == "SUCCESS": print(result.video_result[0].url) break time.sleep(5)
undefined

Web Search Integration

网页搜索集成

python
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Latest AI news?"}],
    tools=[{
        "type": "web_search",
        "web_search": {
            "enable": True,
            "search_result": True
        }
    }]
)
python
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[{"role": "user", "content": "Latest AI news?"}],
    tools=[{
        "type": "web_search",
        "web_search": {
            "enable": True,
            "search_result": True
        }
    }]
)

Access response.web_search for sources

Access response.web_search for sources

undefined
undefined

Thinking Mode (Chain-of-Thought)

思维链模式

python
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[...],
    thinking={"type": "enabled"},
    stream=True  # Recommended with thinking
)
python
response = client.chat.completions.create(
    model="glm-4.7",
    messages=[...],
    thinking={"type": "enabled"},
    stream=True  # Recommended with thinking
)

Access reasoning_content in response

Access reasoning_content in response

undefined
undefined

Key Parameters

关键参数

ParameterValuesNotes
temperature
0.0-1.0GLM-4.7: 1.0, GLM-4.5: 0.6 default
top_p
0.01-1.0Default ~0.95
max_tokens
variesGLM-4.7: 128K, GLM-4.5: 96K max
stream
boolEnable SSE streaming
response_format
{"type": "json_object"}
Force JSON output
参数取值说明
temperature
0.0-1.0GLM-4.7默认值1.0,GLM-4.5默认值0.6
top_p
0.01-1.0默认值约0.95
max_tokens
可变GLM-4.7最大128K,GLM-4.5最大96K
stream
布尔值启用SSE流式输出
response_format
{"type": "json_object"}
强制输出JSON格式

Error Handling

错误处理

  • 429: Rate limited — implement exponential backoff
  • 401: Bad API key — verify credentials
  • sensitive: Content filtered — modify input
python
if response.choices[0].finish_reason == "tool_calls":
    # Execute function and continue conversation
elif response.choices[0].finish_reason == "length":
    # Increase max_tokens or truncate
elif response.choices[0].finish_reason == "sensitive":
    # Content was filtered
  • 429: 请求频率超限 — 实现指数退避策略
  • 401: API密钥无效 — 验证凭据
  • sensitive: 内容被过滤 — 修改输入内容
python
if response.choices[0].finish_reason == "tool_calls":
    # Execute function and continue conversation
elif response.choices[0].finish_reason == "length":
    # Increase max_tokens or truncate
elif response.choices[0].finish_reason == "sensitive":
    # Content was filtered

Reference Files

参考文档

For detailed API specifications, consult:
  • references/chat-completions.md
    — Full chat API, parameters, models
  • references/tools-and-functions.md
    — Function calling, web search, retrieval
  • references/media-generation.md
    — Image, video, audio APIs
  • references/agents.md
    — Translation, slides, effects agents
  • references/error-codes.md
    — Error handling, rate limits
如需详细API规格,请参考:
  • references/chat-completions.md
    — 完整对话API、参数、模型说明
  • references/tools-and-functions.md
    — 函数调用、网页搜索、检索功能
  • references/media-generation.md
    — 图像、视频、音频API
  • references/agents.md
    — 翻译、幻灯片、特效Agent
  • references/error-codes.md
    — 错误处理、请求频率限制