agent-platform-inference

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Agent Platform GenAI Inference Skill

Agent Platform GenAI推理技能

This skill provides instructions for authenticating and connecting to Google Cloud Agent Platform to use Generative AI models. It covers both First-Party (Gemini) and Third-Party (OpenMaaS) models.

本技能提供了连接Google Cloud Agent Platform并使用生成式AI模型的身份验证及连接指南，涵盖第一方（Gemini）和第三方（OpenMaaS）模型。

Phase 0: Environment Setup

阶段0：环境设置

CRITICAL: Before running any of the Python sample scripts in the

scripts/

directory (e.g.,

scripts/openmaas_openai_sdk.py

), you MUST ensure the environment is correctly initialized by following these steps:

Google Cloud Authentication: Authenticate with your Google Cloud credentials and configure active Application Default Credentials (ADC) for Agent Platform access:
bash
```
gcloud auth login
gcloud auth application-default login
```

Enable API (if not already enabled):

bash

gcloud services enable aiplatform.googleapis.com

Virtual Environment: Create and activate a dedicated local virtual environment:
bash
```
python3 -m venv .venv
source .venv/bin/activate
```
Install Dependencies: Install the required SDKs:
bash
```
pip install -r scripts/requirements.txt
```
Verify Setup (Optional): Run all sample scripts at once to verify the environment is working end-to-end:
bash
```
./scripts/verify_all.sh
```
Execution: Advise the user that every time they execute a Python snippet from this skill, they must ensure this virtual environment is activated first.

[!IMPORTANT] CRITICAL: Model IDs & Availability

Gemini Models: See Gemini Models for valid Model IDs and Regions.

OpenMaaS Models: See [Use Open Models on Agent Platform] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) for Llama, DeepSeek, Qwen, etc.

Incomplete Lists: The Model IDs listed in this skill are examples only and may be incomplete or outdated.

Action: Always verify the Model ID and Region using the links above before generating code.

关键提示：在运行

scripts/

目录下的任何Python示例脚本（如

scripts/openmaas_openai_sdk.py

）之前，您必须按照以下步骤确保环境已正确初始化：

Google Cloud身份验证：使用您的Google Cloud凭据进行身份验证，并配置用于Agent Platform访问的有效应用默认凭据（ADC）：
bash
```
gcloud auth login
gcloud auth application-default login
```

启用API（若尚未启用）：

bash

gcloud services enable aiplatform.googleapis.com

虚拟环境：创建并激活专属本地虚拟环境：
bash
```
python3 -m venv .venv
source .venv/bin/activate
```
安装依赖：安装所需SDK：
bash
```
pip install -r scripts/requirements.txt
```
验证设置（可选）：一次性运行所有示例脚本，验证环境端到端运行正常：
bash
```
./scripts/verify_all.sh
```
执行说明：提醒用户，每次执行本技能中的Python代码片段时，必须确保已激活该虚拟环境。

[!IMPORTANT] 关键提示：模型ID与可用性

Gemini模型：请查看Gemini模型文档获取有效的模型ID和区域信息。

OpenMaaS模型：请查看[在Agent Platform上使用开源模型] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) 获取Llama、DeepSeek、Qwen等模型的相关信息。

列表不全：本技能中列出的模型ID仅为示例，可能不完整或已过时。

操作建议：生成代码前，请务必通过上述链接验证模型ID和区域信息。

Workflow Decision Tree

工作流决策树

Model Family Identification: Has the user specified whether they want to call a Gemini (First-Party) model or an OpenMaaS (Third-Party, e.g. Llama, DeepSeek, Qwen) model?
- No -> Ask the user which model family they want to use. If they provide a specific model name, infer the family from the name.
- Yes -> Proceed to Step 2.
SDK Choice: Which SDK does the user want to use?
- Gemini + GenAI SDK (preferred for Gemini) -> Proceed to [1. Gemini Models].
- Gemini + legacy Vertex AI SDK -> Proceed to [1. Gemini Models].
- OpenMaaS + OpenAI SDK (preferred for OpenMaaS) -> Proceed to [2. OpenMaaS Models].
- OpenMaaS + GenAI SDK -> Proceed to [2. OpenMaaS Models].
- Unsure -> Default to the preferred SDK for the chosen family.
Troubleshooting: Is the user reporting an error (429 Resource Exhausted, 400 User Validation, 404 Not Found, etc.)?
- Yes -> Proceed to [3. Troubleshooting & Common Error Codes].
- No -> Proceed with the SDK choice from Step 2.

模型家族识别：用户是否指定了要调用Gemini（第一方）模型还是OpenMaaS（第三方，如Llama、DeepSeek、Qwen）模型？
- 未指定 -> 询问用户想要使用的模型家族。若用户提供了具体模型名称，可从名称推断所属家族。
- 已指定 -> 进入步骤2。
SDK选择：用户想要使用哪种SDK？
- Gemini + GenAI SDK（Gemini推荐方案） -> 进入[1. Gemini模型]。
- Gemini + 旧版Vertex AI SDK -> 进入[1. Gemini模型]。
- OpenMaaS + OpenAI SDK（OpenMaaS推荐方案） -> 进入[2. OpenMaaS模型]。
- OpenMaaS + GenAI SDK -> 进入[2. OpenMaaS模型]。
- 不确定 -> 默认选择对应模型家族的推荐SDK。
故障排查：用户是否报告了错误（如429资源耗尽、400用户验证错误、404未找到错误等）？
- 是 -> 进入[3. 故障排查与常见错误码]。
- 否 -> 按照步骤2选择的SDK继续操作。

1. Gemini Models

1. Gemini模型

For Gemini models (e.g.,

gemini-2.5-pro

gemini-3-flash-preview

), the GenAI SDK (

google-genai

) is the PREFERRED method. The legacy

vertexai

SDK is still supported but GenAI SDK is recommended for new projects.

[!IMPORTANT] Preview Models (including Gemini 3.1) are often ONLY available in the
global
region. Stable models are available in
us-central1
and other regions.

对于Gemini模型（如

gemini-2.5-pro

、

gemini-3-flash-preview

），GenAI SDK（

google-genai

）是首选方法。旧版

vertexai

SDK仍受支持，但新项目推荐使用GenAI SDK。

[!IMPORTANT] 预览模型（包括Gemini 3.1）通常仅在
global
区域可用。稳定模型在
us-central1
及其他区域可用。

Choosing the Right SDK

选择合适的SDK

Gemini Models: GenAI SDK (
```
google-genai
```
) is PREFERRED. Use OpenAI SDK for compatibility, or Legacy SDK (
```
vertexai
```
) if needed.
OpenMaaS Models: OpenAI SDK is HIGHLY RECOMMENDED. Use GenAI SDK or Legacy SDK if you have specific infrastructure requirements.

Gemini模型：GenAI SDK（
```
google-genai
```
）为首选。如需兼容性可使用OpenAI SDK，或根据需求使用旧版SDK（
```
vertexai
```
）。
OpenMaaS模型：强烈推荐使用标准OpenAI SDK。若有特定基础设施需求，可使用GenAI SDK或旧版SDK。

Installation

安装

bash

pip install google-genai

bash

pip install google-genai

Python Example (GenAI SDK - Preferred)

Python示例（GenAI SDK - 首选）

See

scripts/gemini_genai_sdk.py

for the complete code.

完整代码请查看

scripts/gemini_genai_sdk.py

。

Alternative: OpenAI SDK (Chat Completions)

替代方案：OpenAI SDK（聊天补全）

Use the standard OpenAI SDK with the Agent Platform endpoint. This is great for cross-compatibility.

See

scripts/gemini_openai_sdk.py

for the complete code.

结合Agent Platform端点使用标准OpenAI SDK，这非常适合跨兼容性场景。

完整代码请查看

scripts/gemini_openai_sdk.py

。

Legacy: Agent Platform SDK

旧版方案：Agent Platform SDK

The legacy

vertexai

SDK is still widely used but

google-genai

is preferred for new Gemini projects.

See

scripts/gemini_vertexai_sdk.py

for the complete code.

Documentation: Google GenAI SDK

Documentation: Agent Platform Gemini Models

旧版

vertexai

SDK仍被广泛使用，但新Gemini项目首选

google-genai

。

完整代码请查看

scripts/gemini_vertexai_sdk.py

。

文档：Google GenAI SDK

文档：Agent Platform Gemini模型

2. OpenMaaS Models (Llama, DeepSeek, Qwen, etc.)

2. OpenMaaS模型（Llama、DeepSeek、Qwen等）

For OpenMaaS (Model-as-a-Service) models, the HIGHLY RECOMMENDED approach is to use the standard OpenAI SDK with a specific Vertex AI endpoint.

[!WARNING] While
GenerativeModel
can support some OpenMaaS models, it is discouraged. Use the OpenAI SDK for best compatibility (especially for Chat Completions).

对于OpenMaaS（模型即服务）模型，强烈推荐的方式是结合特定Vertex AI端点使用标准OpenAI SDK。

[!WARNING] 虽然
GenerativeModel
可以支持部分OpenMaaS模型，但不建议使用。为获得最佳兼容性（尤其是聊天补全功能），请使用OpenAI SDK。

Installation

安装

bash

pip install openai google-auth

bash

pip install openai google-auth

Authentication for OpenAI SDK

OpenAI SDK的身份验证

You MUST use a Google Cloud OAuth access token as the API key for the OpenAI SDK.

python

import google.auth
from google.auth.transport.requests import Request

def get_gcp_access_token():
    creds, _ = google.auth.default()
    creds.refresh(Request())
    return creds.token

[!NOTE] Google Cloud access tokens typically expire after 1 hour. The
get_gcp_access_token()
function above retrieves a fresh token at the time it is called.

For long-running applications, you implement a refresh mechanism. See Refresh the access token for details.

您必须使用Google Cloud OAuth访问令牌作为OpenAI SDK的API密钥。

python

import google.auth
from google.auth.transport.requests import Request

def get_gcp_access_token():
    creds, _ = google.auth.default()
    creds.refresh(Request())
    return creds.token

[!NOTE] Google Cloud访问令牌通常在1小时后过期。上述
get_gcp_access_token()
函数会在调用时获取新令牌。

对于长时间运行的应用，您需要实现令牌刷新机制。详情请查看刷新访问令牌。

Configuration (Base URL)

配置（基础URL）

Global Endpoint (Recommended for most models requiring global availability):

https://aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/global/endpoints/openapi

Regional Endpoint:

https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/endpoints/openapi

全局端点（推荐用于大多数需要全局可用性的模型）：

https://aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/global/endpoints/openapi

区域端点：

https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/endpoints/openapi

Python Example (OpenMaaS - Chat Completions)

Python示例（OpenMaaS - 聊天补全）

See

scripts/openmaas_openai_sdk.py

for the complete code.

[!TIP] Alternative: Environment Variables You can set environment variables in your shell instead of updating the code.
bash
export OPENAI_BASE_URL="https://aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/global/endpoints/openapi"
export OPENAI_API_KEY="$(gcloud auth print-access-token)"
Then initialize the client without arguments:
client = OpenAI()

完整代码请查看

scripts/openmaas_openai_sdk.py

。

[!TIP] 替代方案：环境变量 您可以在Shell中设置环境变量，而非修改代码。
bash
export OPENAI_BASE_URL="https://aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/global/endpoints/openapi"
export OPENAI_API_KEY="$(gcloud auth print-access-token)"
然后无需参数初始化客户端：
client = OpenAI()

Python Example (OpenMaaS - Completions API)

Python示例（OpenMaaS - 补全API）

The following models support the legacy Completions API:

zai-org/glm-5-maas

moonshotai/kimi-k2-thinking-maas

minimaxai/minimax-m2-maas

deepseek-ai/deepseek-v3.1-maas

, and

deepseek-ai/deepseek-v3.2-maas

python

response = client.completions.create(
    model="deepseek-ai/deepseek-v3.2-maas",
    prompt="Once upon a time",
    max_tokens=100
)
print(response.choices[0].text)

以下模型支持旧版补全API：

zai-org/glm-5-maas

、

moonshotai/kimi-k2-thinking-maas

、

minimaxai/minimax-m2-maas

、

deepseek-ai/deepseek-v3.1-maas

和

deepseek-ai/deepseek-v3.2-maas

。

python

response = client.completions.create(
    model="deepseek-ai/deepseek-v3.2-maas",
    prompt="Once upon a time",
    max_tokens=100
)
print(response.choices[0].text)

Python Example (OpenMaaS - Embeddings)

Python示例（OpenMaaS - 嵌入）

python

undefined

python

undefined

Verify specific Embedding Model ID on Model Garden (e.g., intfloat/multilingual-e5-small)

在Model Garden中验证特定嵌入模型ID（如intfloat/multilingual-e5-small）

response = client.embeddings.create( model="intfloat/multilingual-e5-large-maas", input="The quick brown fox jumps over the lazy dog", ) print(response.data[0].embedding)

undefined

response = client.embeddings.create( model="intfloat/multilingual-e5-large-maas", input="The quick brown fox jumps over the lazy dog", ) print(response.data[0].embedding)

undefined

Alternative: GenAI SDK

替代方案：GenAI SDK

The

google-genai

SDK can also access OpenMaaS models via the

vertexai

backend.

See

scripts/openmaas_genai_sdk.py

for the complete code.

[!IMPORTANT] Model ID Format: For GenAI SDK with OpenMaaS, you MUST use the full path:
publishers/PUBLISHER/models/MODEL
(e.g.,
publishers/zai-org/models/glm-5-maas
).

google-genai

SDK也可通过

vertexai

后端访问OpenMaaS模型。

完整代码请查看

scripts/openmaas_genai_sdk.py

。

[!IMPORTANT] 模型ID格式：使用GenAI SDK访问OpenMaaS模型时，您必须使用完整路径：
publishers/PUBLISHER/models/MODEL
（例如
publishers/zai-org/models/glm-5-maas
）。

Legacy: Agent Platform SDK (OpenMaaS)

旧版方案：Agent Platform SDK（OpenMaaS）

For OpenMaaS, you can also use

GenerativeModel

(if supported).

See

scripts/openmaas_vertexai_sdk.py

for the complete code.

[!IMPORTANT] Model ID Format: For Agent Platform SDK with OpenMaaS, you MUST use the full path:
publishers/PUBLISHER/models/MODEL
.

对于OpenMaaS，您也可以使用

GenerativeModel

（若支持）。

完整代码请查看

scripts/openmaas_vertexai_sdk.py

。

[!IMPORTANT] 模型ID格式：使用Agent Platform SDK访问OpenMaaS模型时，您必须使用完整路径：
publishers/PUBLISHER/models/MODEL
。

Model Reference & Availability

模型参考与可用性

Documentation: Use Open Models on Agent Platform

[!TIP] Self-Deployment for Control: If you need dedicated hardware (GPUs/TPUs), guaranteed capacity, or specific regional placement not offered by MaaS, you can Self-Deploy these models to Agent Platform Endpoints. Search for the model in Model Garden and click "Deploy" to select your machine type.

[!IMPORTANT] Finding Inference Examples: The list above is a starting point. For the definitive inference snippets (especially for Chat Completions payload structure):
Consult the Use Open Models on Agent Platform list.

Click the link for your specific model (e.g., "DeepSeek-V3") to visit its Model Garden page.
Look for the "Sample Code" or "Use this model" button on the Model Garden page to get the exact
curl
or Python code for that specific model version.

[!NOTE] This list is INCOMPLETE. See [Use Open Models on Agent Platform] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) for the full list of supported models.

Model Family	Model ID Examples	Location	Notes
Llama 4	`meta/llama-4-maverick-17b-128e-instruct-maas`	`us-east5`
Llama 4	`meta/llama-4-scout-17b-16e-instruct-maas`	`us-east5`
Llama 3.3	`meta/llama-3.3-70b-instruct-maas`	`us-central1`
DeepSeek	`deepseek-ai/deepseek-v3.2-maas`	`global`	Global ONLY
DeepSeek	`deepseek-ai/deepseek-v3.1-maas`	`us-west2`	US-West2 ONLY
DeepSeek	`deepseek-ai/deepseek-r1-0528-maas`	`us-central1`
Qwen 3	`qwen/qwen3-coder-480b-a35b-instruct-maas`	`global`
Qwen 3	`qwen/qwen3-next-80b-a3b-instruct-maas`	`global`
Kimi	`moonshotai/kimi-k2-thinking-maas`	`global`
MiniMax	`minimaxai/minimax-m2-maas`	`global`
GLM	`zai-org/glm-4.7-maas` , `zai-org/glm-5-maas`	`global`

文档：在Agent Platform上使用开源模型

[!TIP] 自主部署以获得控制权：如果您需要专用硬件（GPU/TPU）、保证容量或MaaS未提供的特定区域部署，您可以将这些模型自主部署到Agent Platform端点。在Model Garden中搜索模型，点击“部署”选择您的机器类型即可。

[!IMPORTANT] 查找推理示例：以上列表仅为起点。如需权威的推理代码片段（尤其是聊天补全的请求体结构）：
查看在Agent Platform上使用开源模型列表。

点击您使用的特定模型链接（例如“DeepSeek-V3”），进入其Model Garden页面。
在Model Garden页面中查找**“示例代码”或“使用此模型”**按钮，获取该特定模型版本的准确
curl
或Python代码。

[!NOTE] 本列表不完整。请查看[在Agent Platform上使用开源模型] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) 获取支持的完整模型列表。

模型家族	模型ID示例	区域	说明
Llama 4	`meta/llama-4-maverick-17b-128e-instruct-maas`	`us-east5`
Llama 4	`meta/llama-4-scout-17b-16e-instruct-maas`	`us-east5`
Llama 3.3	`meta/llama-3.3-70b-instruct-maas`	`us-central1`
DeepSeek	`deepseek-ai/deepseek-v3.2-maas`	`global`	仅全局可用
DeepSeek	`deepseek-ai/deepseek-v3.1-maas`	`us-west2`	仅美西2区可用
DeepSeek	`deepseek-ai/deepseek-r1-0528-maas`	`us-central1`
Qwen 3	`qwen/qwen3-coder-480b-a35b-instruct-maas`	`global`
Qwen 3	`qwen/qwen3-next-80b-a3b-instruct-maas`	`global`
Kimi	`moonshotai/kimi-k2-thinking-maas`	`global`
MiniMax	`minimaxai/minimax-m2-maas`	`global`
GLM	`zai-org/glm-4.7-maas` , `zai-org/glm-5-maas`	`global`

3. Troubleshooting & Common Error Codes

3. 故障排查与常见错误码

429: Resource Exhausted

429：资源耗尽

Cause: OpenMaaS and Gemini models use Dynamic Shared Quota (DSQ). Resources are pooled and allocated dynamically based on availability. A 429 error indicates the shared pool is temporarily exhausted, not necessarily that your specific project quota is hit (though it can be).
Solution: Implement strict exponential backoff and retry strategies.
High Throughput: For production workloads requiring high throughput or guaranteed capacity, consider Provisioned Throughput (PT).
Important: Quota increases through normal cloud processes (Cloud Console) are NOT applicable for DSQ constraints.
Documentation: Quotas and limits (DSQ)

原因：OpenMaaS和Gemini模型使用动态共享配额（DSQ）。资源被集中管理，并根据可用性动态分配。429错误表示共享池暂时耗尽，不一定意味着您的项目配额已用完（但也有可能）。
解决方案：严格实现指数退避与重试策略。
高吞吐量场景：对于需要高吞吐量或保证容量的生产工作负载，可考虑预配置吞吐量（PT）。
重要提示：通过常规云流程（云控制台）申请配额提升不适用于DSQ限制。
文档：配额与限制（DSQ）

400: User Validation Error

400：用户验证错误

Cause: Invalid request format, unsupported parameter, or incorrect Model ID.
Action: Double-check your request payload and parameters. Verify the Model ID and Region are correct.

原因：请求格式无效、参数不支持或模型ID错误。
操作：仔细检查请求体和参数，验证模型ID和区域是否正确。

404: Not Found / Model Not Available

404：未找到/模型不可用

Cause: The model is not enabled, or not available in the specified project or region.
Action:
1. Check Location Availability:
  - OpenMaaS: Verify the model is available in your region. See Model Availability by Location.
  - Gemini: 
    - Source of Truth: Always check Gemini Model Locations for the authoritative list.
    
    - Preview Models: All Preview models (e.g., Gemini 3.1, experimental versions) are often ONLY available in the
      us-central1
      or
      global
      regions.
    - Stable Models: (e.g., Gemini 2.5 Pro) Available in
      us-central1
      ,
      europe-west4
      , and many other regions.
    - Important: If you get a 404/400 error, try switching your client location to
      us-central1
      or
      global
      .
2. Enable Llama Models: For Llama 3.3 and Llama 4, you MUST enable the model in Model Garden before use. Go to the [Model Garden] (https://console.cloud.google.com/agent-platform/model-garden), search for the model card (e.g., "Llama 3.3 API Service"), and click Enable. Only then can you make inference requests.

原因：模型未启用，或在指定项目/区域不可用。
操作：
1. 检查区域可用性：
  - OpenMaaS：验证模型在您的区域是否可用。请查看按区域划分的模型可用性。
  - Gemini： 
    - 权威来源：请始终查看Gemini模型区域获取权威列表。
    
    - 预览模型：所有预览模型（如Gemini 3.1、实验版本）通常仅在
      us-central1
      或
      global
      区域可用。
    - 稳定模型：（如Gemini 2.5 Pro）在
      us-central1
      、
      europe-west4
      及其他多个区域可用。
    - 重要提示：如果遇到404/400错误，请尝试将客户端区域切换为
      us-central1
      或
      global
      。
2. 启用Llama模型：对于Llama 3.3和Llama 4，您必须先在Model Garden中启用模型才能使用。进入[Model Garden] (https://console.cloud.google.com/agent-platform/model-garden)，搜索模型卡片（例如“Llama 3.3 API Service”），点击启用。之后才能发起推理请求。