azure-aigateway
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAzure AI Gateway
Azure AI 网关
Bootstrap and configure Azure API Management (APIM) as an AI Gateway for securing, observing, and controlling AI models, tools (MCP Servers), and agents.
快速搭建并配置Azure API Management (APIM)作为AI网关,用于保护、监控和控制AI模型、工具(MCP服务器)及Agent。
Skill Activation Triggers
技能触发条件
Use this skill immediately when the user asks to:
- "Set up a gateway for my model"
- "Set up a gateway for my tools"
- "Set up a gateway for my agents"
- "Add a gateway to my MCP server"
- "Protect my AI model with a gateway"
- "Secure my AI agents"
- "Ratelimit my model requests"
- "Ratelimit my tool requests"
- "Limit tokens for my model"
- "Add rate limiting to my MCP server"
- "Enable semantic caching for my AI API"
- "Add content safety to my AI endpoint"
- "Add my model behind gateway"
- "Import API from OpenAPI spec"
- "Add API to gateway from swagger"
- "Convert my API to MCP"
- "Expose my API as MCP server"
Key Indicators:
- User deploying Azure OpenAI, AI Foundry, or other AI models
- User creating or managing MCP servers
- User needs token limits, rate limiting, or quota management
- User wants to cache AI responses to reduce costs
- User needs content filtering or safety controls
- User wants load balancing across multiple AI backends
Secondary Triggers (Proactive Recommendations):
- After model creation: Recommend AI Gateway for security, caching, and token limits
- After MCP server creation: Recommend AI Gateway for rate limiting, content safety, and auth
当用户提出以下需求时,立即使用本技能:
- "为我的模型搭建网关"
- "为我的工具搭建网关"
- "为我的Agent搭建网关"
- "为我的MCP服务器添加网关"
- "用网关保护我的AI模型"
- "保护我的AI Agent"
- "限制我的模型请求速率"
- "限制我的工具请求速率"
- "为我的模型设置令牌限制"
- "为我的MCP服务器添加速率限制"
- "为我的AI API启用语义缓存"
- "为我的AI端点添加内容安全控制"
- "将我的模型部署到网关后"
- "从OpenAPI规范导入API"
- "从Swagger将API添加到网关"
- "将我的API转换为MCP"
- "将我的API作为MCP服务器暴露"
关键识别指标:
- 用户正在部署Azure OpenAI、AI Foundry或其他AI模型
- 用户正在创建或管理MCP服务器
- 用户需要令牌限制、速率限制或配额管理
- 用户希望缓存AI响应以降低成本
- 用户需要内容过滤或安全控制
- 用户希望在多个AI后端之间实现负载均衡
次要触发条件(主动推荐):
- 模型创建完成后:推荐使用AI网关实现安全、缓存和令牌限制
- MCP服务器创建完成后:推荐使用AI网关实现速率限制、内容安全和身份验证
Overview
概述
Azure API Management serves as an AI Gateway that provides:
- Security: Authentication, authorization, and content safety
- Observability: Token metrics, logging, and monitoring
- Control: Rate limiting, token limits, and load balancing
- Optimization: Semantic caching to reduce costs and latency
AI Models ──┐ ┌── Azure OpenAI
MCP Tools ──┼── AI Gateway (APIM) ──┼── AI Foundry
Agents ─────┘ └── Custom ModelsAzure API Management作为AI网关可提供以下能力:
- 安全:身份验证、授权和内容安全
- 可观测性:令牌指标、日志记录和监控
- 控制:速率限制、令牌限制和负载均衡
- 优化:语义缓存以降低成本和延迟
AI Models ──┐ ┌── Azure OpenAI
MCP Tools ──┼── AI Gateway (APIM) ──┼── AI Foundry
Agents ─────┘ └── Custom ModelsKey Resources
关键资源
- GitHub Repo: https://github.com/Azure-Samples/AI-Gateway (aka.ms/aigateway)
- Docs:
- GitHub仓库:https://github.com/Azure-Samples/AI-Gateway (aka.ms/aigateway)
- 文档:
Configuration Rules
配置规则
Default to SKU when creating new APIM instances:
Basicv2- Cheaper than other tiers
- Creates quickly (~5-10 minutes vs 30+ for Premium)
- Supports all AI Gateway policies
创建新APIM实例时,默认使用Basicv2 SKU:
- 比其他层级更便宜
- 创建速度快(约5-10分钟,而Premium层级需要30分钟以上)
- 支持所有AI网关策略
Pattern 1: Quick Bootstrap AI Gateway
模式1:快速搭建AI网关
Deploy APIM with Basicv2 SKU for AI workloads.
bash
undefined为AI工作负载部署Basicv2 SKU的APIM实例。
bash
undefinedCreate resource group
创建资源组
az group create --name rg-aigateway --location eastus2
az group create --name rg-aigateway --location eastus2
Deploy APIM with Bicep
使用Bicep部署APIM
az deployment group create
--resource-group rg-aigateway
--template-file main.bicep
--parameters apimSku=Basicv2
--resource-group rg-aigateway
--template-file main.bicep
--parameters apimSku=Basicv2
undefinedaz deployment group create
--resource-group rg-aigateway
--template-file main.bicep
--parameters apimSku=Basicv2
--resource-group rg-aigateway
--template-file main.bicep
--parameters apimSku=Basicv2
undefinedBicep Template
Bicep模板
bicep
param location string = resourceGroup().location
param apimSku string = 'Basicv2'
param apimManagedIdentityType string = 'SystemAssigned'
// NOTE: Using 2024-06-01-preview because Basicv2 SKU support currently requires this preview API version.
// Update to the latest stable (GA) API version once Basicv2 is available there.
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' = {
name: 'apim-aigateway-${uniqueString(resourceGroup().id)}'
location: location
sku: {
name: apimSku
capacity: 1
}
properties: {
publisherEmail: 'admin@contoso.com'
publisherName: 'Contoso'
}
identity: {
type: apimManagedIdentityType
}
}
output gatewayUrl string = apimService.properties.gatewayUrl
output principalId string = apimService.identity.principalIdbicep
param location string = resourceGroup().location
param apimSku string = 'Basicv2'
param apimManagedIdentityType string = 'SystemAssigned'
// 注意:使用2024-06-01-preview版本,因为Basicv2 SKU目前需要该预览版API版本。
// 一旦Basicv2正式发布,更新为最新稳定版(GA)API版本。
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' = {
name: 'apim-aigateway-${uniqueString(resourceGroup().id)}'
location: location
sku: {
name: apimSku
capacity: 1
}
properties: {
publisherEmail: 'admin@contoso.com'
publisherName: 'Contoso'
}
identity: {
type: apimManagedIdentityType
}
}
output gatewayUrl string = apimService.properties.gatewayUrl
output principalId string = apimService.identity.principalIdPattern 2: Semantic Caching
模式2:语义缓存
Cache similar prompts to reduce costs and latency.
xml
<policies>
<inbound>
<base />
<!-- Cache lookup with 0.8 similarity threshold -->
<azure-openai-semantic-cache-lookup
score-threshold="0.8"
embeddings-backend-id="embeddings-backend"
embeddings-backend-auth="system-assigned" />
<set-backend-service backend-id="{backend-id}" />
</inbound>
<outbound>
<!-- Cache responses for 120 seconds -->
<azure-openai-semantic-cache-store duration="120" />
<base />
</outbound>
</policies>Options:
| Parameter | Range | Description |
|---|---|---|
| 0.7-0.95 | Higher = stricter matching |
| 60-3600 | Cache TTL in seconds |
缓存相似的提示词以降低成本和延迟。
xml
<policies>
<inbound>
<base />
<!-- 缓存查找,相似度阈值为0.8 -->
<azure-openai-semantic-cache-lookup
score-threshold="0.8"
embeddings-backend-id="embeddings-backend"
embeddings-backend-auth="system-assigned" />
<set-backend-service backend-id="{backend-id}" />
</inbound>
<outbound>
<!-- 缓存响应120秒 -->
<azure-openai-semantic-cache-store duration="120" />
<base />
</outbound>
</policies>配置选项:
| 参数 | 范围 | 描述 |
|---|---|---|
| 0.7-0.95 | 值越高,匹配越严格 |
| 60-3600 | 缓存过期时间(秒) |
Pattern 3: Token Rate Limiting
模式3:令牌速率限制
Limit tokens per minute to control costs and prevent abuse.
xml
<policies>
<inbound>
<base />
<set-backend-service backend-id="{backend-id}" />
<!-- Limit to 500 tokens per minute per subscription -->
<azure-openai-token-limit
counter-key="@(context.Subscription.Id)"
tokens-per-minute="500"
estimate-prompt-tokens="false"
remaining-tokens-variable-name="remainingTokens" />
</inbound>
</policies>Options:
| Parameter | Values | Description |
|---|---|---|
| Subscription.Id, Request.IpAddress, custom | Grouping key for limits |
| 100-100000 | Token quota |
| true/false | true = faster but less accurate |
限制每分钟的令牌使用量,以控制成本并防止滥用。
xml
<policies>
<inbound>
<base />
<set-backend-service backend-id="{backend-id}" />
<!-- 每个订阅每分钟限制500个令牌 -->
<azure-openai-token-limit
counter-key="@(context.Subscription.Id)"
tokens-per-minute="500"
estimate-prompt-tokens="false"
remaining-tokens-variable-name="remainingTokens" />
</inbound>
</policies>配置选项:
| 参数 | 取值 | 描述 |
|---|---|---|
| Subscription.Id, Request.IpAddress, 自定义 | 限制的分组键 |
| 100-100000 | 令牌配额 |
| true/false | true = 速度更快但精度较低 |
Pattern 4: Content Safety
模式4:内容安全
Filter harmful content and detect jailbreak attempts.
xml
<policies>
<inbound>
<base />
<set-backend-service backend-id="{backend-id}" />
<!-- Block severity 4+ content, detect jailbreaks -->
<llm-content-safety backend-id="content-safety-backend" shield-prompt="true">
<categories output-type="EightSeverityLevels">
<category name="Hate" threshold="4" />
<category name="Sexual" threshold="4" />
<category name="SelfHarm" threshold="4" />
<category name="Violence" threshold="4" />
</categories>
<blocklists>
<id>custom-blocklist</id>
</blocklists>
</llm-content-safety>
</inbound>
</policies>Options:
| Parameter | Range | Description |
|---|---|---|
| 0-7 | 0=safe, 7=severe |
| true/false | Detect jailbreak attempts |
过滤有害内容并检测越狱攻击。
xml
<policies>
<inbound>
<base />
<set-backend-service backend-id="{backend-id}" />
<!-- 拦截严重程度4及以上的内容,检测越狱攻击 -->
<llm-content-safety backend-id="content-safety-backend" shield-prompt="true">
<categories output-type="EightSeverityLevels">
<category name="Hate" threshold="4" />
<category name="Sexual" threshold="4" />
<category name="SelfHarm" threshold="4" />
<category name="Violence" threshold="4" />
</categories>
<blocklists>
<id>custom-blocklist</id>
</blocklists>
</llm-content-safety>
</inbound>
</policies>配置选项:
| 参数 | 范围 | 描述 |
|---|---|---|
| 0-7 | 0=安全,7=严重 |
| true/false | 检测越狱攻击尝试 |
Pattern 5: Rate Limits for MCPs/OpenAPI Tools
模式5:MCP/OpenAPI工具的速率限制
Protect MCP servers and tools with request rate limiting.
xml
<policies>
<inbound>
<base />
<!-- 10 calls per 60 seconds per IP -->
<rate-limit-by-key
calls="10"
renewal-period="60"
counter-key="@(context.Request.IpAddress)"
remaining-calls-variable-name="remainingCalls" />
</inbound>
<outbound>
<set-header name="X-Rate-Limit-Remaining" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<int>("remainingCalls", 0).ToString())</value>
</set-header>
<base />
</outbound>
</policies>通过请求速率限制保护MCP服务器和工具。
xml
<policies>
<inbound>
<base />
<!-- 每个IP每分钟10次调用 -->
<rate-limit-by-key
calls="10"
renewal-period="60"
counter-key="@(context.Request.IpAddress)"
remaining-calls-variable-name="remainingCalls" />
</inbound>
<outbound>
<set-header name="X-Rate-Limit-Remaining" exists-action="override">
<value>@(context.Variables.GetValueOrDefault<int>("remainingCalls", 0).ToString())</value>
</set-header>
<base />
</outbound>
</policies>Pattern 6: Managed Identity Authentication
模式6:托管身份认证
Secure backend access with managed identity instead of API keys.
xml
<policies>
<inbound>
<base />
<!-- Managed identity auth to Azure OpenAI -->
<authentication-managed-identity
resource="https://cognitiveservices.azure.com"
output-token-variable-name="managed-id-access-token"
ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value>
</set-header>
<set-backend-service backend-id="{backend-id}" />
<!-- Emit token metrics for monitoring -->
<azure-openai-emit-token-metric namespace="openai">
<dimension name="Subscription ID" value="@(context.Subscription.Id)" />
<dimension name="Client IP" value="@(context.Request.IpAddress)" />
<dimension name="API ID" value="@(context.Api.Id)" />
</azure-openai-emit-token-metric>
</inbound>
</policies>使用托管身份而非API密钥来安全访问后端服务。
xml
<policies>
<inbound>
<base />
<!-- 托管身份认证访问Azure OpenAI -->
<authentication-managed-identity
resource="https://cognitiveservices.azure.com"
output-token-variable-name="managed-id-access-token"
ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value>
</set-header>
<set-backend-service backend-id="{backend-id}" />
<!-- 发送令牌指标用于监控 -->
<azure-openai-emit-token-metric namespace="openai">
<dimension name="Subscription ID" value="@(context.Subscription.Id)" />
<dimension name="Client IP" value="@(context.Request.IpAddress)" />
<dimension name="API ID" value="@(context.Api.Id)" />
</azure-openai-emit-token-metric>
</inbound>
</policies>Pattern 7: Load Balancing with Retry
模式7:带重试的负载均衡
Distribute load across multiple backends with automatic failover.
xml
<policies>
<inbound>
<base />
<set-backend-service backend-id="{backend-pool-id}" />
</inbound>
<backend>
<!-- Retry on 429 (rate limit) or 503 (service unavailable) -->
<retry count="2" interval="0" first-fast-retry="true"
condition="@(context.Response.StatusCode == 429 || context.Response.StatusCode == 503)">
<set-backend-service backend-id="{backend-pool-id}" />
<forward-request buffer-request-body="true" />
</retry>
</backend>
<on-error>
<when condition="@(context.Response.StatusCode == 503)">
<return-response>
<set-status code="503" reason="Service Unavailable" />
</return-response>
</when>
</on-error>
</policies>在多个后端之间分配负载并实现自动故障转移。
xml
<policies>
<inbound>
<base />
<set-backend-service backend-id="{backend-pool-id}" />
</inbound>
<backend>
<!-- 在429(速率限制)或503(服务不可用)时重试 -->
<retry count="2" interval="0" first-fast-retry="true"
condition="@(context.Response.StatusCode == 429 || context.Response.StatusCode == 503)">
<set-backend-service backend-id="{backend-pool-id}" />
<forward-request buffer-request-body="true" />
</retry>
</backend>
<on-error>
<when condition="@(context.Response.StatusCode == 503)">
<return-response>
<set-status code="503" reason="Service Unavailable" />
</return-response>
</when>
</on-error>
</policies>Pattern 8: Add AI Foundry Model Behind Gateway
模式8:将AI Foundry模型部署到网关后
When user asks to "add my model behind gateway", first discover available models from Azure AI Foundry, then ask which model to add.
当用户要求“将我的模型部署到网关后”时,首先从Azure AI Foundry发现可用模型,然后询问用户要添加哪个模型。
Step 1: Discover AI Foundry Projects and Available Models
步骤1:发现AI Foundry项目和可用模型
bash
undefinedbash
undefinedSet environment variables
设置环境变量
accountName="<ai-foundry-resource-name>"
resourceGroupName="<resource-group>"
accountName="<ai-foundry-resource-name>"
resourceGroupName="<resource-group>"
List AI Foundry resources (AI Services accounts)
列出AI Foundry资源(AI服务账户)
az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, resourceGroup:resourceGroup, location:location}" -o table
az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, resourceGroup:resourceGroup, location:location}" -o table
List available models in the AI Foundry resource
列出AI Foundry资源中的可用模型
az cognitiveservices account list-models
-n $accountName
-g $resourceGroupName
| jq '.[] | { name: .name, format: .format, version: .version, sku: .skus[0].name, capacity: .skus[0].capacity.default }'
-n $accountName
-g $resourceGroupName
| jq '.[] | { name: .name, format: .format, version: .version, sku: .skus[0].name, capacity: .skus[0].capacity.default }'
az cognitiveservices account list-models
-n $accountName
-g $resourceGroupName
| jq '.[] | { name: .name, format: .format, version: .version, sku: .skus[0].name, capacity: .skus[0].capacity.default }'
-n $accountName
-g $resourceGroupName
| jq '.[] | { name: .name, format: .format, version: .version, sku: .skus[0].name, capacity: .skus[0].capacity.default }'
List already deployed models
列出已部署的模型
az cognitiveservices account deployment list
-n $accountName
-g $resourceGroupName
-n $accountName
-g $resourceGroupName
undefinedaz cognitiveservices account deployment list
-n $accountName
-g $resourceGroupName
-n $accountName
-g $resourceGroupName
undefinedStep 2: Ask User Which Model to Add
步骤2:询问用户要添加哪个模型
After listing the available models, use the ask_user tool to present the models as choices and let the user select which model to add behind the gateway.
Example choices to present:
- Model deployments from the discovered list
- Include model name, format (provider), version, and SKU info
列出可用模型后,使用ask_user工具将模型作为选项展示,让用户选择要部署到网关后的模型。
示例展示选项:
- 从发现列表中选择模型部署
- 包含模型名称、格式(提供商)、版本和SKU信息
Step 3: Deploy the Model (if not already deployed)
步骤3:部署模型(如果尚未部署)
bash
undefinedbash
undefinedDeploy the selected model to AI Foundry
将选定的模型部署到AI Foundry
az cognitiveservices account deployment create
-n $accountName
-g $resourceGroupName
--deployment-name <model-name>
--model-name <model-name>
--model-version <version>
--model-format <format>
--sku-capacity 1
--sku-name <sku>
-n $accountName
-g $resourceGroupName
--deployment-name <model-name>
--model-name <model-name>
--model-version <version>
--model-format <format>
--sku-capacity 1
--sku-name <sku>
undefinedaz cognitiveservices account deployment create
-n $accountName
-g $resourceGroupName
--deployment-name <model-name>
--model-name <model-name>
--model-version <version>
--model-format <format>
--sku-capacity 1
--sku-name <sku>
-n $accountName
-g $resourceGroupName
--deployment-name <model-name>
--model-name <model-name>
--model-version <version>
--model-format <format>
--sku-capacity 1
--sku-name <sku>
undefinedStep 4: Configure APIM Backend for Selected Model
步骤4:为选定模型配置APIM后端
bash
undefinedbash
undefinedGet the AI Foundry inference endpoint
获取AI Foundry推理端点
ENDPOINT=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
| jq -r '.properties.endpoints["Azure AI Model Inference API"]')
-n $accountName
-g $resourceGroupName
| jq -r '.properties.endpoints["Azure AI Model Inference API"]')
ENDPOINT=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
| jq -r '.properties.endpoints["Azure AI Model Inference API"]')
-n $accountName
-g $resourceGroupName
| jq -r '.properties.endpoints["Azure AI Model Inference API"]')
Create APIM backend for the selected model
为选定模型创建APIM后端
az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <model-deployment-name>-backend
--protocol http
--url "${ENDPOINT}"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <model-deployment-name>-backend
--protocol http
--url "${ENDPOINT}"
undefinedaz apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <model-deployment-name>-backend
--protocol http
--url "${ENDPOINT}"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <model-deployment-name>-backend
--protocol http
--url "${ENDPOINT}"
undefinedStep 5: Create API and Apply Policies
步骤5:创建API并应用策略
bash
undefinedbash
undefinedImport Azure OpenAI API specification
导入Azure OpenAI API规范
az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--path <model-deployment-name>
--specification-format OpenApiJson
--specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--path <model-deployment-name>
--specification-format OpenApiJson
--specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json"
undefinedaz apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--path <model-deployment-name>
--specification-format OpenApiJson
--specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--path <model-deployment-name>
--specification-format OpenApiJson
--specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json"
undefinedStep 6: Grant APIM Access to AI Foundry
步骤6:授予APIM访问AI Foundry的权限
bash
undefinedbash
undefinedGet APIM managed identity principal ID
获取APIM托管身份的主体ID
APIM_PRINCIPAL_ID=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "identity.principalId" -o tsv)
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "identity.principalId" -o tsv)
APIM_PRINCIPAL_ID=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "identity.principalId" -o tsv)
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "identity.principalId" -o tsv)
Get AI Foundry resource ID
获取AI Foundry资源ID
AI_RESOURCE_ID=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
--query "id" -o tsv)
-n $accountName
-g $resourceGroupName
--query "id" -o tsv)
AI_RESOURCE_ID=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
--query "id" -o tsv)
-n $accountName
-g $resourceGroupName
--query "id" -o tsv)
Assign Cognitive Services User role
分配认知服务用户角色
az role assignment create
--assignee $APIM_PRINCIPAL_ID
--role "Cognitive Services User"
--scope $AI_RESOURCE_ID
--assignee $APIM_PRINCIPAL_ID
--role "Cognitive Services User"
--scope $AI_RESOURCE_ID
undefinedaz role assignment create
--assignee $APIM_PRINCIPAL_ID
--role "Cognitive Services User"
--scope $AI_RESOURCE_ID
--assignee $APIM_PRINCIPAL_ID
--role "Cognitive Services User"
--scope $AI_RESOURCE_ID
undefinedBicep Template for Backend Configuration
后端配置的Bicep模板
bicep
param apimServiceName string
param backendId string
param aiFoundryEndpoint string
param modelDeploymentName string
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
name: apimServiceName
}
resource backend 'Microsoft.ApiManagement/service/backends@2024-06-01-preview' = {
parent: apimService
name: backendId
properties: {
protocol: 'http'
url: '${aiFoundryEndpoint}openai/deployments/${modelDeploymentName}'
credentials: {
header: {}
}
tls: {
validateCertificateChain: true
validateCertificateName: true
}
}
}bicep
param apimServiceName string
param backendId string
param aiFoundryEndpoint string
param modelDeploymentName string
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
name: apimServiceName
}
resource backend 'Microsoft.ApiManagement/service/backends@2024-06-01-preview' = {
parent: apimService
name: backendId
properties: {
protocol: 'http'
url: '${aiFoundryEndpoint}openai/deployments/${modelDeploymentName}'
credentials: {
header: {}
}
tls: {
validateCertificateChain: true
validateCertificateName: true
}
}
}Pattern 9: Import API from OpenAPI Specification
模式9:从OpenAPI规范导入API
Add an API to the gateway from an OpenAPI/Swagger specification, either from a local file or web URL.
从OpenAPI/Swagger规范将API添加到网关,支持本地文件或Web URL。
Step 1: Import API from Web URL
步骤1:从Web URL导入API
bash
undefinedbash
undefinedImport API from a publicly accessible OpenAPI spec URL
从公开可访问的OpenAPI规范URL导入API
az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API Display Name>"
--specification-format OpenApiJson
--specification-url "https://example.com/openapi.json"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API Display Name>"
--specification-format OpenApiJson
--specification-url "https://example.com/openapi.json"
undefinedaz apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API显示名称>"
--specification-format OpenApiJson
--specification-url "https://example.com/openapi.json"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API显示名称>"
--specification-format OpenApiJson
--specification-url "https://example.com/openapi.json"
undefinedStep 2: Import API from Local File
步骤2:从本地文件导入API
bash
undefinedbash
undefinedImport API from a local OpenAPI spec file (JSON or YAML)
从本地OpenAPI规范文件(JSON或YAML)导入API
az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API Display Name>"
--specification-format OpenApi
--specification-path "./openapi.yaml"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API Display Name>"
--specification-format OpenApi
--specification-path "./openapi.yaml"
undefinedaz apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API显示名称>"
--specification-format OpenApi
--specification-path "./openapi.yaml"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API显示名称>"
--specification-format OpenApi
--specification-path "./openapi.yaml"
undefinedStep 3: Configure Backend for the API
步骤3:为API配置后端
bash
undefinedbash
undefinedCreate backend pointing to your API server
创建指向API服务器的后端
az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <backend-id>
--protocol http
--url "https://your-api-server.com"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <backend-id>
--protocol http
--url "https://your-api-server.com"
az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <backend-id>
--protocol http
--url "https://your-api-server.com"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <backend-id>
--protocol http
--url "https://your-api-server.com"
Update API to use the backend
更新API以使用该后端
az apim api update
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--set properties.serviceUrl="https://your-api-server.com"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--set properties.serviceUrl="https://your-api-server.com"
undefinedaz apim api update
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--set properties.serviceUrl="https://your-api-server.com"
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--set properties.serviceUrl="https://your-api-server.com"
undefinedStep 4: Apply Policies (Optional)
步骤4:应用策略(可选)
xml
<policies>
<inbound>
<base />
<set-backend-service backend-id="{backend-id}" />
<!-- Add rate limiting -->
<rate-limit-by-key
calls="100"
renewal-period="60"
counter-key="@(context.Request.IpAddress)" />
</inbound>
<outbound>
<base />
</outbound>
</policies>xml
<policies>
<inbound>
<base />
<set-backend-service backend-id="{backend-id}" />
<!-- 添加速率限制 -->
<rate-limit-by-key
calls="100"
renewal-period="60"
counter-key="@(context.Request.IpAddress)" />
</inbound>
<outbound>
<base />
</outbound>
</policies>Supported Specification Formats
支持的规范格式
| Format | Value | File Extension |
|---|---|---|
| OpenAPI 3.x JSON | | |
| OpenAPI 3.x YAML | | |
| Swagger 2.0 JSON | | |
| Swagger 2.0 (link) | | URL |
| WSDL | | |
| WADL | | |
| 格式 | 取值 | 文件扩展名 |
|---|---|---|
| OpenAPI 3.x JSON | | |
| OpenAPI 3.x YAML | | |
| Swagger 2.0 JSON | | |
| Swagger 2.0(链接) | | URL |
| WSDL | | |
| WADL | | |
Pattern 10: Convert API to MCP Server
模式10:将API转换为MCP服务器
Convert existing APIM API operations into an MCP (Model Context Protocol) server, enabling AI agents to discover and use your APIs as tools.
将现有的APIM API操作转换为MCP(模型上下文协议)服务器,使AI Agent能够发现并将您的API作为工具使用。
Prerequisites
前提条件
- APIM instance with Basicv2 SKU or higher
- Existing API imported into APIM
- MCP feature enabled on APIM
- Basicv2 SKU或更高版本的APIM实例
- 已导入APIM的现有API
- APIM上已启用MCP功能
Step 1: List Existing APIs in APIM
步骤1:列出APIM中的现有API
bash
undefinedbash
undefinedList all APIs in APIM
列出APIM中的所有API
az apim api list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--query "[].{id:name, displayName:displayName, path:path}"
-o table
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--query "[].{id:name, displayName:displayName, path:path}"
-o table
undefinedaz apim api list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--query "[].{id:name, displayName:displayName, path:path}"
-o table
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--query "[].{id:name, displayName:displayName, path:path}"
-o table
undefinedStep 2: Ask User Which API to Convert
步骤2:询问用户要转换哪个API
After listing the APIs, use the ask_user tool to let the user select which API to convert to an MCP server.
列出API后,使用ask_user工具让用户选择要转换为MCP服务器的API。
Step 3: List API Operations
步骤3:列出API操作
bash
undefinedbash
undefinedList all operations for the selected API
列出选定API的所有操作
az apim api operation list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--query "[].{operationId:name, displayName:displayName, method:method, urlTemplate:urlTemplate}"
-o table
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--query "[].{operationId:name, displayName:displayName, method:method, urlTemplate:urlTemplate}"
-o table
undefinedaz apim api operation list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--query "[].{operationId:name, displayName:displayName, method:method, urlTemplate:urlTemplate}"
-o table
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--query "[].{operationId:name, displayName:displayName, method:method, urlTemplate:urlTemplate}"
-o table
undefinedStep 4: Ask User Which Operations to Expose as MCP Tools
步骤4:询问用户要将哪些操作暴露为MCP工具
After listing the operations, use the ask_user tool to present the operations as choices. Let the user select which operations to expose as MCP tools. Users may want to expose all operations or only a subset.
Example choices to present:
- All operations (convert entire API)
- Individual operations from the discovered list
- Include operation name, method, and URL template
列出操作后,使用ask_user工具将操作作为选项展示,让用户选择要暴露为MCP工具的操作。用户可以选择暴露所有操作或仅部分操作。
示例展示选项:
- 所有操作(转换整个API)
- 从发现列表中选择单个操作
- 包含操作名称、方法和URL模板
Step 5: Enable MCP Server on APIM
步骤5:在APIM上启用MCP服务器
bash
undefinedbash
undefinedEnable MCP server capability (via ARM/Bicep or Portal)
启用MCP服务器功能(通过ARM/Bicep或门户)
Note: MCP configuration is done via APIM policies and product configuration
注意:MCP配置通过APIM策略和产品配置完成
undefinedundefinedStep 6: Configure MCP Endpoint for API
步骤6:为API配置MCP端点
Create an MCP-compatible endpoint that exposes your API operations as tools:
xml
<policies>
<inbound>
<base />
<!-- MCP tools/list endpoint handler -->
<choose>
<when condition="@(context.Request.Url.Path.EndsWith("/mcp/tools/list"))">
<return-response>
<set-status code="200" reason="OK" />
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
<set-body>@{
var tools = new JArray();
// Define your API operations as MCP tools
tools.Add(new JObject(
new JProperty("name", "operation_name"),
new JProperty("description", "Description of what this operation does"),
new JProperty("inputSchema", new JObject(
new JProperty("type", "object"),
new JProperty("properties", new JObject(
new JProperty("param1", new JObject(
new JProperty("type", "string"),
new JProperty("description", "Parameter description")
))
))
))
));
return new JObject(new JProperty("tools", tools)).ToString();
}</set-body>
</return-response>
</when>
</choose>
</inbound>
</policies>创建兼容MCP的端点,将API操作暴露为工具:
xml
<policies>
<inbound>
<base />
<!-- MCP tools/list端点处理程序 -->
<choose>
<when condition="@(context.Request.Url.Path.EndsWith("/mcp/tools/list"))">
<return-response>
<set-status code="200" reason="OK" />
<set-header name="Content-Type" exists-action="override">
<value>application/json</value>
</set-header>
<set-body>@{
var tools = new JArray();
// 将API操作定义为MCP工具
tools.Add(new JObject(
new JProperty("name", "operation_name"),
new JProperty("description", "该操作的描述"),
new JProperty("inputSchema", new JObject(
new JProperty("type", "object"),
new JProperty("properties", new JObject(
new JProperty("param1", new JObject(
new JProperty("type", "string"),
new JProperty("description", "参数描述")
))
))
))
));
return new JObject(new JProperty("tools", tools)).ToString();
}</set-body>
</return-response>
</when>
</choose>
</inbound>
</policies>Step 7: Bicep Template for MCP-Enabled API
步骤7:启用MCP的API的Bicep模板
bicep
param apimServiceName string
param apiId string
param apiDisplayName string
param apiPath string
param backendUrl string
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
name: apimServiceName
}
resource api 'Microsoft.ApiManagement/service/apis@2024-06-01-preview' = {
parent: apimService
name: apiId
properties: {
displayName: apiDisplayName
path: apiPath
protocols: ['https']
serviceUrl: backendUrl
subscriptionRequired: true
// MCP endpoints
apiType: 'http'
}
}
// MCP tools/list operation
resource mcpToolsListOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
parent: api
name: 'mcp-tools-list'
properties: {
displayName: 'MCP Tools List'
method: 'POST'
urlTemplate: '/mcp/tools/list'
description: 'List available MCP tools'
}
}
// MCP tools/call operation
resource mcpToolsCallOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
parent: api
name: 'mcp-tools-call'
properties: {
displayName: 'MCP Tools Call'
method: 'POST'
urlTemplate: '/mcp/tools/call'
description: 'Call an MCP tool'
}
}bicep
param apimServiceName string
param apiId string
param apiDisplayName string
param apiPath string
param backendUrl string
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
name: apimServiceName
}
resource api 'Microsoft.ApiManagement/service/apis@2024-06-01-preview' = {
parent: apimService
name: apiId
properties: {
displayName: apiDisplayName
path: apiPath
protocols: ['https']
serviceUrl: backendUrl
subscriptionRequired: true
// MCP端点
apiType: 'http'
}
}
// MCP tools/list操作
resource mcpToolsListOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
parent: api
name: 'mcp-tools-list'
properties: {
displayName: 'MCP工具列表'
method: 'POST'
urlTemplate: '/mcp/tools/list'
description: '列出可用的MCP工具'
}
}
// MCP tools/call操作
resource mcpToolsCallOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
parent: api
name: 'mcp-tools-call'
properties: {
displayName: '调用MCP工具'
method: 'POST'
urlTemplate: '/mcp/tools/call'
description: '调用MCP工具'
}
}Step 8: Test MCP Endpoint
步骤8:测试MCP端点
bash
undefinedbash
undefinedGet APIM gateway URL
获取APIM网关URL
GATEWAY_URL=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "gatewayUrl" -o tsv)
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "gatewayUrl" -o tsv)
GATEWAY_URL=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "gatewayUrl" -o tsv)
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "gatewayUrl" -o tsv)
Test MCP tools/list endpoint
测试MCP tools/list端点
curl -X POST "${GATEWAY_URL}/<api-path>/mcp/tools/list"
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: <subscription-key>"
-d '{}'
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: <subscription-key>"
-d '{}'
undefinedcurl -X POST "${GATEWAY_URL}/<api-path>/mcp/tools/list"
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: <订阅密钥>"
-d '{}'
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: <订阅密钥>"
-d '{}'
undefinedMCP Tool Definition Schema
MCP工具定义 schema
When converting API operations to MCP tools, use this schema:
json
{
"tools": [
{
"name": "get_weather",
"description": "Get current weather for a location",
"inputSchema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name or coordinates"
}
},
"required": ["location"]
}
}
]
}将API操作转换为MCP工具时,请使用以下schema:
json
{
"tools": [
{
"name": "get_weather",
"description": "获取指定地点的当前天气",
"inputSchema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "城市名称或坐标"
}
},
"required": ["location"]
}
}
]
}Reference
参考文档
Lab References (AI-Gateway Repo)
实验室参考(AI-Gateway仓库)
Essential Labs to Get Started:
| Scenario | Lab | Description |
|---|---|---|
| Semantic Caching | semantic-caching | Cache similar prompts to reduce costs |
| Token Rate Limiting | token-rate-limiting | Limit tokens per minute |
| Content Safety | content-safety | Filter harmful content |
| Load Balancing | backend-pool-load-balancing | Distribute load across backends |
| MCP from API | mcp-from-api | Convert OpenAPI to MCP server |
| Zero to Production | zero-to-production | Complete production setup guide |
Find more labs at: https://github.com/Azure-Samples/AI-Gateway/tree/main/labs
入门必备实验室:
| 场景 | 实验室 | 描述 |
|---|---|---|
| 语义缓存 | semantic-caching | 缓存相似提示词以降低成本 |
| 令牌速率限制 | token-rate-limiting | 限制每分钟的令牌使用量 |
| 内容安全 | content-safety | 过滤有害内容 |
| 负载均衡 | backend-pool-load-balancing | 在多个后端之间分配负载 |
| 从API创建MCP | mcp-from-api | 将OpenAPI转换为MCP服务器 |
| 从0到生产 | zero-to-production | 完整的生产环境搭建指南 |
Quick Start Checklist
快速开始检查清单
Prerequisites
前提条件
- Azure subscription created
- Azure CLI installed and authenticated ()
az login - Resource group created for AI Gateway resources
- 已创建Azure订阅
- 已安装Azure CLI并完成身份验证 ()
az login - 已为AI网关资源创建资源组
Deployment
部署
- Deploy APIM with Basicv2 SKU
- Configure managed identity
- Add backend for Azure OpenAI or AI Foundry
- Apply policies (caching, rate limits, content safety)
- 部署Basicv2 SKU的APIM实例
- 配置托管身份
- 为Azure OpenAI或AI Foundry添加后端
- 应用策略(缓存、速率限制、内容安全)
Verification
验证
- Test API endpoint through gateway
- Verify token metrics in Application Insights
- Check rate limiting headers in response
- Validate content safety filtering
- 通过网关测试API端点
- 在Application Insights中验证令牌指标
- 检查响应中的速率限制头
- 验证内容安全过滤
Best Practices
最佳实践
| Practice | Description |
|---|---|
| Default to Basicv2 | Use Basicv2 SKU for cost/speed optimization |
| Use managed identity | Prefer managed identity over API keys for backend auth |
| Enable token metrics | Use |
| Semantic caching | Cache similar prompts to reduce costs (60-80% savings possible) |
| Rate limit by key | Use subscription ID or IP for granular rate limiting |
| Content safety | Enable |
| 实践 | 描述 |
|---|---|
| 默认使用Basicv2 | 使用Basicv2 SKU优化成本和速度 |
| 使用托管身份 | 优先使用托管身份而非API密钥进行后端认证 |
| 启用令牌指标 | 使用 |
| 语义缓存 | 缓存相似提示词以降低成本(可节省60-80%) |
| 按密钥限制速率 | 使用订阅ID或IP实现细粒度速率限制 |
| 内容安全 | 启用 |
Troubleshooting
故障排除
| Issue | Symptom | Solution |
|---|---|---|
| Slow APIM creation | Deployment takes 30+ minutes | Use Basicv2 SKU instead of Premium |
| Token limit exceeded | 429 response | Increase |
| Cache not working | No cache hits | Lower |
| Content blocked | False positives | Increase category thresholds |
| Backend auth fails | 401 from Azure OpenAI | Assign Cognitive Services User role to APIM managed identity |
| Rate limit too strict | Legitimate requests blocked | Increase |
| 问题 | 症状 | 解决方案 |
|---|---|---|
| APIM创建缓慢 | 部署耗时30分钟以上 | 使用Basicv2 SKU替代Premium |
| 令牌限制超出 | 429响应 | 提高 |
| 缓存不生效 | 无缓存命中 | 降低 |
| 内容被误拦截 | 误报 | 提高分类阈值 |
| 后端认证失败 | Azure OpenAI返回401 | 为APIM托管身份分配认知服务用户角色 |
| 速率限制过严 | 合法请求被拦截 | 提高 |