Loading...
Loading...
Connects to and performs inference with Google Cloud Agent Platform GenAI models, including First-Party Gemini models and Third-Party OpenMaaS models (Llama, DeepSeek, Qwen, etc.). Use when you need to generate code for calling Gemini or OpenMaaS models, authenticate with GenAI SDK, OpenAI SDK, or legacy Agent Platform SDK, configure base URLs and global/regional endpoints, or troubleshoot 429 Resource Exhausted (DSQ), 400 User Validation, or 404 Not Found errors. Don't use for deploying models to endpoints or for running model evaluations.
npx skill4agent add google/skills agent-platform-inferencescripts/scripts/openmaas_openai_sdk.pygcloud auth login
gcloud auth application-default logingcloud services enable aiplatform.googleapis.compython3 -m venv .venv
source .venv/bin/activatepip install -r scripts/requirements.txt./scripts/verify_all.sh<!-- enableFinding(LINE_OVER_80) -->[!IMPORTANT] CRITICAL: Model IDs & Availability
- Gemini Models: See Gemini Models for valid Model IDs and Regions.
- OpenMaaS Models: See [Use Open Models on Agent Platform] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) for Llama, DeepSeek, Qwen, etc.
- Incomplete Lists: The Model IDs listed in this skill are examples only and may be incomplete or outdated.
- Action: Always verify the Model ID and Region using the links above before generating code.
gemini-2.5-progemini-3-flash-previewgoogle-genaivertexai[!IMPORTANT] Preview Models (including Gemini 3.1) are often ONLY available in theregion. Stable models are available inglobaland other regions.us-central1
google-genaivertexaipip install google-genaiscripts/gemini_genai_sdk.pyscripts/gemini_openai_sdk.pyvertexaigoogle-genaiscripts/gemini_vertexai_sdk.py[!WARNING] Whilecan support some OpenMaaS models, it is discouraged. Use the OpenAI SDK for best compatibility (especially for Chat Completions).GenerativeModel
pip install openai google-authimport google.auth
from google.auth.transport.requests import Request
def get_gcp_access_token():
creds, _ = google.auth.default()
creds.refresh(Request())
return creds.token<!-- disableFinding(LINE_OVER_80) -->[!NOTE] Google Cloud access tokens typically expire after 1 hour. Thefunction above retrieves a fresh token at the time it is called.get_gcp_access_token()
<!-- enableFinding(LINE_OVER_80) -->For long-running applications, you implement a refresh mechanism. See Refresh the access token for details.
https://aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/global/endpoints/openapihttps://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT_ID}/locations/{REGION}/endpoints/openapiscripts/openmaas_openai_sdk.py[!TIP] Alternative: Environment Variables You can set environment variables in your shell instead of updating the code.bashexport OPENAI_BASE_URL="https://aiplatform.googleapis.com/v1/projects/YOUR_PROJECT_ID/locations/global/endpoints/openapi" export OPENAI_API_KEY="$(gcloud auth print-access-token)"Then initialize the client without arguments:client = OpenAI()
zai-org/glm-5-maasmoonshotai/kimi-k2-thinking-maasminimaxai/minimax-m2-maasdeepseek-ai/deepseek-v3.1-maasdeepseek-ai/deepseek-v3.2-maasresponse = client.completions.create(
model="deepseek-ai/deepseek-v3.2-maas",
prompt="Once upon a time",
max_tokens=100
)
print(response.choices[0].text)# Verify specific Embedding Model ID on Model Garden (e.g., intfloat/multilingual-e5-small)
response = client.embeddings.create(
model="intfloat/multilingual-e5-large-maas",
input="The quick brown fox jumps over the lazy dog",
)
print(response.data[0].embedding)google-genaivertexaiscripts/openmaas_genai_sdk.py[!IMPORTANT] Model ID Format: For GenAI SDK with OpenMaaS, you MUST use the full path:(e.g.,publishers/PUBLISHER/models/MODEL).publishers/zai-org/models/glm-5-maas
GenerativeModelscripts/openmaas_vertexai_sdk.py[!IMPORTANT] Model ID Format: For Agent Platform SDK with OpenMaaS, you MUST use the full path:.publishers/PUBLISHER/models/MODEL
[!TIP] Self-Deployment for Control: If you need dedicated hardware (GPUs/TPUs), guaranteed capacity, or specific regional placement not offered by MaaS, you can Self-Deploy these models to Agent Platform Endpoints. Search for the model in Model Garden and click "Deploy" to select your machine type.
[!IMPORTANT] Finding Inference Examples: The list above is a starting point. For the definitive inference snippets (especially for Chat Completions payload structure):
- Consult the Use Open Models on Agent Platform list.
- Click the link for your specific model (e.g., "DeepSeek-V3") to visit its Model Garden page.
- Look for the "Sample Code" or "Use this model" button on the Model Garden page to get the exact
or Python code for that specific model version.curl
[!NOTE] This list is INCOMPLETE. See [Use Open Models on Agent Platform] (https://docs.cloud.google.com/gemini-enterprise-agent-platform/models/maas/use-open-models) for the full list of supported models.
| Model Family | Model ID Examples | Location | Notes |
|---|---|---|---|
| Llama 4 | | | |
| Llama 4 | | | |
| Llama 3.3 | | | |
| DeepSeek | | | Global ONLY |
| DeepSeek | | | US-West2 ONLY |
| DeepSeek | | | |
| Qwen 3 | | | |
| Qwen 3 | | | |
| Kimi | | | |
| MiniMax | | | |
| GLM | | |
us-central1globalus-central1europe-west4us-central1global