agent-platform-inference
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAgent Platform GenAI Inference Skill
Agent Platform GenAI推理技能
This skill provides instructions for authenticating and connecting to Google
Cloud Agent Platform to use Generative AI models. It covers both First-Party
(Gemini) and Third-Party (OpenMaaS) models.
本技能提供了连接Google Cloud Agent Platform并使用生成式AI模型的身份验证及连接指南,涵盖第一方(Gemini)和第三方(OpenMaaS)模型。
Phase 0: Environment Setup
阶段0:环境设置
CRITICAL: Before running any of the Python sample scripts in the
directory (e.g., ), you MUST ensure the
environment is correctly initialized by following these steps:
scripts/scripts/openmaas_openai_sdk.py- Google Cloud Authentication: Authenticate with your Google Cloud
credentials and configure active Application Default Credentials (ADC) for
Agent Platform access:
bash
gcloud auth login gcloud auth application-default login - Enable API (if not already enabled):
bash
gcloud services enable aiplatform.googleapis.com - Virtual Environment: Create and activate a dedicated local virtual
environment:
bash
python3 -m venv .venv source .venv/bin/activate - Install Dependencies: Install the required SDKs:
bash
pip install -r scripts/requirements.txt - Verify Setup (Optional): Run all sample scripts at once to verify the
environment is working end-to-end:
bash
./scripts/verify_all.sh - Execution: Advise the user that every time they execute a Python snippet from this skill, they must ensure this virtual environment is activated first.
<!-- enableFinding(LINE_OVER_80) -->[!IMPORTANT] CRITICAL: Model IDs & Availability
- Gemini Models: See Gemini Models for valid Model IDs and Regions.
- OpenMaaS Models: See [Use Open Models on Agent Platform] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) for Llama, DeepSeek, Qwen, etc.
- Incomplete Lists: The Model IDs listed in this skill are examples only and may be incomplete or outdated.
- Action: Always verify the Model ID and Region using the links above before generating code.
关键提示:在运行目录下的任何Python示例脚本(如)之前,您必须按照以下步骤确保环境已正确初始化:
scripts/scripts/openmaas_openai_sdk.py- Google Cloud身份验证:使用您的Google Cloud凭据进行身份验证,并配置用于Agent Platform访问的有效应用默认凭据(ADC):
bash
gcloud auth login gcloud auth application-default login - 启用API(若尚未启用):
bash
gcloud services enable aiplatform.googleapis.com - 虚拟环境:创建并激活专属本地虚拟环境:
bash
python3 -m venv .venv source .venv/bin/activate - 安装依赖:安装所需SDK:
bash
pip install -r scripts/requirements.txt - 验证设置(可选):一次性运行所有示例脚本,验证环境端到端运行正常:
bash
./scripts/verify_all.sh - 执行说明:提醒用户,每次执行本技能中的Python代码片段时,必须确保已激活该虚拟环境。
<!-- enableFinding(LINE_OVER_80) -->[!IMPORTANT] 关键提示:模型ID与可用性
- Gemini模型:请查看Gemini模型文档获取有效的模型ID和区域信息。
- OpenMaaS模型:请查看[在Agent Platform上使用开源模型] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) 获取Llama、DeepSeek、Qwen等模型的相关信息。
- 列表不全:本技能中列出的模型ID仅为示例,可能不完整或已过时。
- 操作建议:生成代码前,请务必通过上述链接验证模型ID和区域信息。
Workflow Decision Tree
工作流决策树
-
Model Family Identification: Has the user specified whether they want to call a Gemini (First-Party) model or an OpenMaaS (Third-Party, e.g. Llama, DeepSeek, Qwen) model?
- No -> Ask the user which model family they want to use. If they provide a specific model name, infer the family from the name.
- Yes -> Proceed to Step 2.
-
SDK Choice: Which SDK does the user want to use?
- Gemini + GenAI SDK (preferred for Gemini) -> Proceed to [1. Gemini Models].
- Gemini + legacy Vertex AI SDK -> Proceed to [1. Gemini Models].
- OpenMaaS + OpenAI SDK (preferred for OpenMaaS) -> Proceed to [2. OpenMaaS Models].
- OpenMaaS + GenAI SDK -> Proceed to [2. OpenMaaS Models].
- Unsure -> Default to the preferred SDK for the chosen family.
-
Troubleshooting: Is the user reporting an error (429 Resource Exhausted, 400 User Validation, 404 Not Found, etc.)?
- Yes -> Proceed to [3. Troubleshooting & Common Error Codes].
- No -> Proceed with the SDK choice from Step 2.
-
模型家族识别:用户是否指定了要调用Gemini(第一方)模型还是OpenMaaS(第三方,如Llama、DeepSeek、Qwen)模型?
- 未指定 -> 询问用户想要使用的模型家族。若用户提供了具体模型名称,可从名称推断所属家族。
- 已指定 -> 进入步骤2。
-
SDK选择:用户想要使用哪种SDK?
- Gemini + GenAI SDK(Gemini推荐方案) -> 进入[1. Gemini模型]。
- Gemini + 旧版Vertex AI SDK -> 进入[1. Gemini模型]。
- OpenMaaS + OpenAI SDK(OpenMaaS推荐方案) -> 进入[2. OpenMaaS模型]。
- OpenMaaS + GenAI SDK -> 进入[2. OpenMaaS模型]。
- 不确定 -> 默认选择对应模型家族的推荐SDK。
-
故障排查:用户是否报告了错误(如429资源耗尽、400用户验证错误、404未找到错误等)?
- 是 -> 进入[3. 故障排查与常见错误码]。
- 否 -> 按照步骤2选择的SDK继续操作。
1. Gemini Models
1. Gemini模型
For Gemini models (e.g., , ), the
GenAI SDK () is the PREFERRED method. The legacy
SDK is still supported but GenAI SDK is recommended for new projects.
gemini-2.5-progemini-3-flash-previewgoogle-genaivertexai[!IMPORTANT] Preview Models (including Gemini 3.1) are often ONLY available in theregion. Stable models are available inglobaland other regions.us-central1
对于Gemini模型(如、),GenAI SDK()是首选方法。旧版 SDK仍受支持,但新项目推荐使用GenAI SDK。
gemini-2.5-progemini-3-flash-previewgoogle-genaivertexai[!IMPORTANT] 预览模型(包括Gemini 3.1)通常仅在区域可用。稳定模型在global及其他区域可用。us-central1
Choosing the Right SDK
选择合适的SDK
- Gemini Models: GenAI SDK () is PREFERRED. Use OpenAI SDK for compatibility, or Legacy SDK (
google-genai) if needed.vertexai - OpenMaaS Models: OpenAI SDK is HIGHLY RECOMMENDED. Use GenAI SDK or Legacy SDK if you have specific infrastructure requirements.
- Gemini模型:GenAI SDK()为首选。如需兼容性可使用OpenAI SDK,或根据需求使用旧版SDK(
google-genai)。vertexai - OpenMaaS模型:强烈推荐使用标准OpenAI SDK。若有特定基础设施需求,可使用GenAI SDK或旧版SDK。
Installation
安装
bash
pip install google-genaibash
pip install google-genaiPython Example (GenAI SDK - Preferred)
Python示例(GenAI SDK - 首选)
See for the
complete code.
scripts/gemini_genai_sdk.py完整代码请查看。
scripts/gemini_genai_sdk.pyAlternative: OpenAI SDK (Chat Completions)
替代方案:OpenAI SDK(聊天补全)
Use the standard OpenAI SDK with the Agent Platform endpoint. This is great for
cross-compatibility.
See for the
complete code.
scripts/gemini_openai_sdk.py结合Agent Platform端点使用标准OpenAI SDK,这非常适合跨兼容性场景。
完整代码请查看。
scripts/gemini_openai_sdk.pyLegacy: Agent Platform SDK
旧版方案:Agent Platform SDK
The legacy SDK is still widely used but is preferred
for new Gemini projects.
vertexaigoogle-genaiSee for the
complete code.
scripts/gemini_vertexai_sdk.pyDocumentation: Google GenAI SDK
Documentation: Agent Platform Gemini Models
旧版 SDK仍被广泛使用,但新Gemini项目首选。
vertexaigoogle-genai完整代码请查看。
scripts/gemini_vertexai_sdk.py2. OpenMaaS Models (Llama, DeepSeek, Qwen, etc.)
2. OpenMaaS模型(Llama、DeepSeek、Qwen等)
For OpenMaaS (Model-as-a-Service) models, the HIGHLY RECOMMENDED approach is
to use the standard OpenAI SDK with a specific Vertex AI endpoint.
[!WARNING] Whilecan support some OpenMaaS models, it is discouraged. Use the OpenAI SDK for best compatibility (especially for Chat Completions).GenerativeModel
对于OpenMaaS(模型即服务)模型,强烈推荐的方式是结合特定Vertex AI端点使用标准OpenAI SDK。
[!WARNING] 虽然可以支持部分OpenMaaS模型,但不建议使用。为获得最佳兼容性(尤其是聊天补全功能),请使用OpenAI SDK。GenerativeModel
Installation
安装
bash
pip install openai google-authbash
pip install openai google-authAuthentication for OpenAI SDK
OpenAI SDK的身份验证
You MUST use a Google Cloud OAuth access token as the API key for the OpenAI
SDK.
python
import google.auth
from google.auth.transport.requests import Request
def get_gcp_access_token():
creds, _ = google.auth.default()
creds.refresh(Request())
return creds.token<!-- disableFinding(LINE_OVER_80) -->[!NOTE] Google Cloud access tokens typically expire after 1 hour. Thefunction above retrieves a fresh token at the time it is called.get_gcp_access_token()
<!-- enableFinding(LINE_OVER_80) -->For long-running applications, you implement a refresh mechanism. See Refresh the access token for details.
您必须使用Google Cloud OAuth访问令牌作为OpenAI SDK的API密钥。
python
import google.auth
from google.auth.transport.requests import Request
def get_gcp_access_token():
creds, _ = google.auth.default()
creds.refresh(Request())
return creds.token<!-- disableFinding(LINE_OVER_80) -->[!NOTE] Google Cloud访问令牌通常在1小时后过期。上述函数会在调用时获取新令牌。get_gcp_access_token()
<!-- enableFinding(LINE_OVER_80) -->对于长时间运行的应用,您需要实现令牌刷新机制。详情请查看刷新访问令牌。
Configuration (Base URL)
配置(基础URL)
<!-- disableFinding(LINE_OVER_80) -->
- Global Endpoint (Recommended for most models requiring global
availability):
https://aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/global/endpoints/openapi - Regional Endpoint:
<!-- enableFinding(LINE_OVER_80) -->
https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/endpoints/openapi
<!-- disableFinding(LINE_OVER_80) -->
- 全局端点(推荐用于大多数需要全局可用性的模型):
https://aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/global/endpoints/openapi - 区域端点:
<!-- enableFinding(LINE_OVER_80) -->
https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/endpoints/openapi
Python Example (OpenMaaS - Chat Completions)
Python示例(OpenMaaS - 聊天补全)
See for the
complete code.
scripts/openmaas_openai_sdk.py[!TIP] Alternative: Environment Variables You can set environment variables in your shell instead of updating the code.bashexport OPENAI_BASE_URL="https://aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/global/endpoints/openapi" export OPENAI_API_KEY="$(gcloud auth print-access-token)"Then initialize the client without arguments:client = OpenAI()
完整代码请查看。
scripts/openmaas_openai_sdk.py[!TIP] 替代方案:环境变量 您可以在Shell中设置环境变量,而非修改代码。bashexport OPENAI_BASE_URL="https://aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/global/endpoints/openapi" export OPENAI_API_KEY="$(gcloud auth print-access-token)"然后无需参数初始化客户端:client = OpenAI()
Python Example (OpenMaaS - Completions API)
Python示例(OpenMaaS - 补全API)
The following models support the legacy Completions API: ,
, ,
, and .
zai-org/glm-5-maasmoonshotai/kimi-k2-thinking-maasminimaxai/minimax-m2-maasdeepseek-ai/deepseek-v3.1-maasdeepseek-ai/deepseek-v3.2-maaspython
response = client.completions.create(
model="deepseek-ai/deepseek-v3.2-maas",
prompt="Once upon a time",
max_tokens=100
)
print(response.choices[0].text)以下模型支持旧版补全API:、、、和。
zai-org/glm-5-maasmoonshotai/kimi-k2-thinking-maasminimaxai/minimax-m2-maasdeepseek-ai/deepseek-v3.1-maasdeepseek-ai/deepseek-v3.2-maaspython
response = client.completions.create(
model="deepseek-ai/deepseek-v3.2-maas",
prompt="Once upon a time",
max_tokens=100
)
print(response.choices[0].text)Python Example (OpenMaaS - Embeddings)
Python示例(OpenMaaS - 嵌入)
python
undefinedpython
undefinedVerify specific Embedding Model ID on Model Garden (e.g., intfloat/multilingual-e5-small)
在Model Garden中验证特定嵌入模型ID(如intfloat/multilingual-e5-small)
response = client.embeddings.create(
model="intfloat/multilingual-e5-large-maas",
input="The quick brown fox jumps over the lazy dog",
)
print(response.data[0].embedding)
undefinedresponse = client.embeddings.create(
model="intfloat/multilingual-e5-large-maas",
input="The quick brown fox jumps over the lazy dog",
)
print(response.data[0].embedding)
undefinedAlternative: GenAI SDK
替代方案:GenAI SDK
The SDK can also access OpenMaaS models via the
backend.
google-genaivertexaiSee for the
complete code.
scripts/openmaas_genai_sdk.py[!IMPORTANT] Model ID Format: For GenAI SDK with OpenMaaS, you MUST use the full path:(e.g.,publishers/PUBLISHER/models/MODEL).publishers/zai-org/models/glm-5-maas
google-genaivertexai完整代码请查看。
scripts/openmaas_genai_sdk.py[!IMPORTANT] 模型ID格式:使用GenAI SDK访问OpenMaaS模型时,您必须使用完整路径:(例如publishers/PUBLISHER/models/MODEL)。publishers/zai-org/models/glm-5-maas
Legacy: Agent Platform SDK (OpenMaaS)
旧版方案:Agent Platform SDK(OpenMaaS)
For OpenMaaS, you can also use (if supported).
GenerativeModelSee for
the complete code.
scripts/openmaas_vertexai_sdk.py[!IMPORTANT] Model ID Format: For Agent Platform SDK with OpenMaaS, you MUST use the full path:.publishers/PUBLISHER/models/MODEL
对于OpenMaaS,您也可以使用(若支持)。
GenerativeModel完整代码请查看。
scripts/openmaas_vertexai_sdk.py[!IMPORTANT] 模型ID格式:使用Agent Platform SDK访问OpenMaaS模型时,您必须使用完整路径:。publishers/PUBLISHER/models/MODEL
Model Reference & Availability
模型参考与可用性
Documentation: Use Open Models on Agent Platform
[!TIP] Self-Deployment for Control: If you need dedicated hardware (GPUs/TPUs), guaranteed capacity, or specific regional placement not offered by MaaS, you can Self-Deploy these models to Agent Platform Endpoints. Search for the model in Model Garden and click "Deploy" to select your machine type.
[!IMPORTANT] Finding Inference Examples: The list above is a starting point. For the definitive inference snippets (especially for Chat Completions payload structure):
- Consult the Use Open Models on Agent Platform list.
- Click the link for your specific model (e.g., "DeepSeek-V3") to visit its Model Garden page.
- Look for the "Sample Code" or "Use this model" button on the Model Garden page to get the exact
or Python code for that specific model version.curl
[!NOTE] This list is INCOMPLETE. See [Use Open Models on Agent Platform] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) for the full list of supported models.
| Model Family | Model ID Examples | Location | Notes |
|---|---|---|---|
| Llama 4 | | | |
| Llama 4 | | | |
| Llama 3.3 | | | |
| DeepSeek | | | Global ONLY |
| DeepSeek | | | US-West2 ONLY |
| DeepSeek | | | |
| Qwen 3 | | | |
| Qwen 3 | | | |
| Kimi | | | |
| MiniMax | | | |
| GLM | | |
[!TIP] 自主部署以获得控制权:如果您需要专用硬件(GPU/TPU)、保证容量或MaaS未提供的特定区域部署,您可以将这些模型自主部署到Agent Platform端点。在Model Garden中搜索模型,点击“部署”选择您的机器类型即可。
[!IMPORTANT] 查找推理示例:以上列表仅为起点。如需权威的推理代码片段(尤其是聊天补全的请求体结构):
- 查看在Agent Platform上使用开源模型 列表。
- 点击您使用的特定模型链接(例如“DeepSeek-V3”),进入其Model Garden页面。
- 在Model Garden页面中查找**“示例代码”或“使用此模型”**按钮,获取该特定模型版本的准确
或Python代码。curl
[!NOTE] 本列表不完整。请查看[在Agent Platform上使用开源模型] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) 获取支持的完整模型列表。
| 模型家族 | 模型ID示例 | 区域 | 说明 |
|---|---|---|---|
| Llama 4 | | | |
| Llama 4 | | | |
| Llama 3.3 | | | |
| DeepSeek | | | 仅全局可用 |
| DeepSeek | | | 仅美西2区可用 |
| DeepSeek | | | |
| Qwen 3 | | | |
| Qwen 3 | | | |
| Kimi | | | |
| MiniMax | | | |
| GLM | | |
3. Troubleshooting & Common Error Codes
3. 故障排查与常见错误码
429: Resource Exhausted
429:资源耗尽
- Cause: OpenMaaS and Gemini models use Dynamic Shared Quota (DSQ). Resources are pooled and allocated dynamically based on availability. A 429 error indicates the shared pool is temporarily exhausted, not necessarily that your specific project quota is hit (though it can be).
- Solution: Implement strict exponential backoff and retry strategies.
- High Throughput: For production workloads requiring high throughput or guaranteed capacity, consider Provisioned Throughput (PT).
- Important: Quota increases through normal cloud processes (Cloud Console) are NOT applicable for DSQ constraints.
- Documentation: Quotas and limits (DSQ)
- 原因:OpenMaaS和Gemini模型使用动态共享配额(DSQ)。资源被集中管理,并根据可用性动态分配。429错误表示共享池暂时耗尽,不一定意味着您的项目配额已用完(但也有可能)。
- 解决方案:严格实现指数退避与重试策略。
- 高吞吐量场景:对于需要高吞吐量或保证容量的生产工作负载,可考虑预配置吞吐量(PT)。
- 重要提示:通过常规云流程(云控制台)申请配额提升不适用于DSQ限制。
- 文档:配额与限制(DSQ)
400: User Validation Error
400:用户验证错误
- Cause: Invalid request format, unsupported parameter, or incorrect Model ID.
- Action: Double-check your request payload and parameters. Verify the Model ID and Region are correct.
- 原因:请求格式无效、参数不支持或模型ID错误。
- 操作:仔细检查请求体和参数,验证模型ID和区域是否正确。
404: Not Found / Model Not Available
404:未找到/模型不可用
- Cause: The model is not enabled, or not available in the specified project or region.
- Action:
- Check Location Availability:
- OpenMaaS: Verify the model is available in your region. See Model Availability by Location.
- Gemini:
<!-- disableFinding(LINE_OVER_80) -->
- Source of Truth: Always check Gemini Model Locations for the authoritative list.
- Preview Models: All Preview models (e.g., Gemini 3.1, experimental versions) are often ONLY available in the or
us-central1regions.global - Stable Models: (e.g., Gemini 2.5 Pro) Available in ,
us-central1, and many other regions.europe-west4 - Important: If you get a 404/400 error, try switching your client location to or
us-central1.global
- Enable Llama Models: For Llama 3.3 and Llama 4, you MUST enable the model in Model Garden before use. Go to the [Model Garden] (https://console.cloud.google.com/agent-platform/model-garden), search for the model card (e.g., "Llama 3.3 API Service"), and click Enable. Only then can you make inference requests.
- Check Location Availability:
- 原因:模型未启用,或在指定项目/区域不可用。
- 操作:
- 检查区域可用性:
- OpenMaaS:验证模型在您的区域是否可用。请查看按区域划分的模型可用性。
- Gemini:
<!-- disableFinding(LINE_OVER_80) -->
- 权威来源:请始终查看Gemini模型区域获取权威列表。
- 预览模型:所有预览模型(如Gemini 3.1、实验版本)通常仅在或
us-central1区域可用。global - 稳定模型:(如Gemini 2.5 Pro)在、
us-central1及其他多个区域可用。europe-west4 - 重要提示:如果遇到404/400错误,请尝试将客户端区域切换为或
us-central1。global
- 启用Llama模型:对于Llama 3.3和Llama 4,您必须先在Model Garden中启用模型才能使用。进入[Model Garden] (https://console.cloud.google.com/agent-platform/model-garden),搜索模型卡片(例如“Llama 3.3 API Service”),点击 启用。之后才能发起推理请求。
- 检查区域可用性: