azure-aigateway

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Azure AI Gateway

Azure AI 网关

Bootstrap and configure Azure API Management (APIM) as an AI Gateway for securing, observing, and controlling AI models, tools (MCP Servers), and agents.
快速搭建并配置Azure API Management (APIM)作为AI网关,用于保护、监控和控制AI模型、工具(MCP服务器)及Agent。

Skill Activation Triggers

技能触发条件

Use this skill immediately when the user asks to:
  • "Set up a gateway for my model"
  • "Set up a gateway for my tools"
  • "Set up a gateway for my agents"
  • "Add a gateway to my MCP server"
  • "Protect my AI model with a gateway"
  • "Secure my AI agents"
  • "Ratelimit my model requests"
  • "Ratelimit my tool requests"
  • "Limit tokens for my model"
  • "Add rate limiting to my MCP server"
  • "Enable semantic caching for my AI API"
  • "Add content safety to my AI endpoint"
  • "Add my model behind gateway"
  • "Import API from OpenAPI spec"
  • "Add API to gateway from swagger"
  • "Convert my API to MCP"
  • "Expose my API as MCP server"
Key Indicators:
  • User deploying Azure OpenAI, AI Foundry, or other AI models
  • User creating or managing MCP servers
  • User needs token limits, rate limiting, or quota management
  • User wants to cache AI responses to reduce costs
  • User needs content filtering or safety controls
  • User wants load balancing across multiple AI backends
Secondary Triggers (Proactive Recommendations):
  • After model creation: Recommend AI Gateway for security, caching, and token limits
  • After MCP server creation: Recommend AI Gateway for rate limiting, content safety, and auth
当用户提出以下需求时,立即使用本技能:
  • "为我的模型搭建网关"
  • "为我的工具搭建网关"
  • "为我的Agent搭建网关"
  • "为我的MCP服务器添加网关"
  • "用网关保护我的AI模型"
  • "保护我的AI Agent"
  • "限制我的模型请求速率"
  • "限制我的工具请求速率"
  • "为我的模型设置令牌限制"
  • "为我的MCP服务器添加速率限制"
  • "为我的AI API启用语义缓存"
  • "为我的AI端点添加内容安全控制"
  • "将我的模型部署到网关后"
  • "从OpenAPI规范导入API"
  • "从Swagger将API添加到网关"
  • "将我的API转换为MCP"
  • "将我的API作为MCP服务器暴露"
关键识别指标:
  • 用户正在部署Azure OpenAI、AI Foundry或其他AI模型
  • 用户正在创建或管理MCP服务器
  • 用户需要令牌限制、速率限制或配额管理
  • 用户希望缓存AI响应以降低成本
  • 用户需要内容过滤或安全控制
  • 用户希望在多个AI后端之间实现负载均衡
次要触发条件(主动推荐):
  • 模型创建完成后:推荐使用AI网关实现安全、缓存和令牌限制
  • MCP服务器创建完成后:推荐使用AI网关实现速率限制、内容安全和身份验证

Overview

概述

Azure API Management serves as an AI Gateway that provides:
  • Security: Authentication, authorization, and content safety
  • Observability: Token metrics, logging, and monitoring
  • Control: Rate limiting, token limits, and load balancing
  • Optimization: Semantic caching to reduce costs and latency
AI Models ──┐                      ┌── Azure OpenAI
MCP Tools ──┼── AI Gateway (APIM) ──┼── AI Foundry
Agents ─────┘                      └── Custom Models
Azure API Management作为AI网关可提供以下能力:
  • 安全:身份验证、授权和内容安全
  • 可观测性:令牌指标、日志记录和监控
  • 控制:速率限制、令牌限制和负载均衡
  • 优化:语义缓存以降低成本和延迟
AI Models ──┐                      ┌── Azure OpenAI
MCP Tools ──┼── AI Gateway (APIM) ──┼── AI Foundry
Agents ─────┘                      └── Custom Models

Key Resources

关键资源

Configuration Rules

配置规则

Default to
Basicv2
SKU
when creating new APIM instances:
  • Cheaper than other tiers
  • Creates quickly (~5-10 minutes vs 30+ for Premium)
  • Supports all AI Gateway policies
创建新APIM实例时,默认使用Basicv2 SKU
  • 比其他层级更便宜
  • 创建速度快(约5-10分钟,而Premium层级需要30分钟以上)
  • 支持所有AI网关策略

Pattern 1: Quick Bootstrap AI Gateway

模式1:快速搭建AI网关

Deploy APIM with Basicv2 SKU for AI workloads.
bash
undefined
为AI工作负载部署Basicv2 SKU的APIM实例。
bash
undefined

Create resource group

创建资源组

az group create --name rg-aigateway --location eastus2
az group create --name rg-aigateway --location eastus2

Deploy APIM with Bicep

使用Bicep部署APIM

az deployment group create
--resource-group rg-aigateway
--template-file main.bicep
--parameters apimSku=Basicv2
undefined
az deployment group create
--resource-group rg-aigateway
--template-file main.bicep
--parameters apimSku=Basicv2
undefined

Bicep Template

Bicep模板

bicep
param location string = resourceGroup().location
param apimSku string = 'Basicv2'
param apimManagedIdentityType string = 'SystemAssigned'

// NOTE: Using 2024-06-01-preview because Basicv2 SKU support currently requires this preview API version.
//       Update to the latest stable (GA) API version once Basicv2 is available there.
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' = {
  name: 'apim-aigateway-${uniqueString(resourceGroup().id)}'
  location: location
  sku: {
    name: apimSku
    capacity: 1
  }
  properties: {
    publisherEmail: 'admin@contoso.com'
    publisherName: 'Contoso'
  }
  identity: {
    type: apimManagedIdentityType
  }
}

output gatewayUrl string = apimService.properties.gatewayUrl
output principalId string = apimService.identity.principalId
bicep
param location string = resourceGroup().location
param apimSku string = 'Basicv2'
param apimManagedIdentityType string = 'SystemAssigned'

// 注意:使用2024-06-01-preview版本,因为Basicv2 SKU目前需要该预览版API版本。
// 一旦Basicv2正式发布,更新为最新稳定版(GA)API版本。
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' = {
  name: 'apim-aigateway-${uniqueString(resourceGroup().id)}'
  location: location
  sku: {
    name: apimSku
    capacity: 1
  }
  properties: {
    publisherEmail: 'admin@contoso.com'
    publisherName: 'Contoso'
  }
  identity: {
    type: apimManagedIdentityType
  }
}

output gatewayUrl string = apimService.properties.gatewayUrl
output principalId string = apimService.identity.principalId

Pattern 2: Semantic Caching

模式2:语义缓存

Cache similar prompts to reduce costs and latency.
xml
<policies>
    <inbound>
        <base />
        <!-- Cache lookup with 0.8 similarity threshold -->
        <azure-openai-semantic-cache-lookup 
            score-threshold="0.8" 
            embeddings-backend-id="embeddings-backend" 
            embeddings-backend-auth="system-assigned" />
        <set-backend-service backend-id="{backend-id}" />
    </inbound>
    <outbound>
        <!-- Cache responses for 120 seconds -->
        <azure-openai-semantic-cache-store duration="120" />
        <base />
    </outbound>
</policies>
Options:
ParameterRangeDescription
score-threshold
0.7-0.95Higher = stricter matching
duration
60-3600Cache TTL in seconds
缓存相似的提示词以降低成本和延迟。
xml
<policies>
    <inbound>
        <base />
        <!-- 缓存查找,相似度阈值为0.8 -->
        <azure-openai-semantic-cache-lookup 
            score-threshold="0.8" 
            embeddings-backend-id="embeddings-backend" 
            embeddings-backend-auth="system-assigned" />
        <set-backend-service backend-id="{backend-id}" />
    </inbound>
    <outbound>
        <!-- 缓存响应120秒 -->
        <azure-openai-semantic-cache-store duration="120" />
        <base />
    </outbound>
</policies>
配置选项:
参数范围描述
score-threshold
0.7-0.95值越高,匹配越严格
duration
60-3600缓存过期时间(秒)

Pattern 3: Token Rate Limiting

模式3:令牌速率限制

Limit tokens per minute to control costs and prevent abuse.
xml
<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- Limit to 500 tokens per minute per subscription -->
        <azure-openai-token-limit 
            counter-key="@(context.Subscription.Id)"
            tokens-per-minute="500" 
            estimate-prompt-tokens="false" 
            remaining-tokens-variable-name="remainingTokens" />
    </inbound>
</policies>
Options:
ParameterValuesDescription
counter-key
Subscription.Id, Request.IpAddress, customGrouping key for limits
tokens-per-minute
100-100000Token quota
estimate-prompt-tokens
true/falsetrue = faster but less accurate
限制每分钟的令牌使用量,以控制成本并防止滥用。
xml
<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- 每个订阅每分钟限制500个令牌 -->
        <azure-openai-token-limit 
            counter-key="@(context.Subscription.Id)"
            tokens-per-minute="500" 
            estimate-prompt-tokens="false" 
            remaining-tokens-variable-name="remainingTokens" />
    </inbound>
</policies>
配置选项:
参数取值描述
counter-key
Subscription.Id, Request.IpAddress, 自定义限制的分组键
tokens-per-minute
100-100000令牌配额
estimate-prompt-tokens
true/falsetrue = 速度更快但精度较低

Pattern 4: Content Safety

模式4:内容安全

Filter harmful content and detect jailbreak attempts.
xml
<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- Block severity 4+ content, detect jailbreaks -->
        <llm-content-safety backend-id="content-safety-backend" shield-prompt="true">
            <categories output-type="EightSeverityLevels">
                <category name="Hate" threshold="4" />
                <category name="Sexual" threshold="4" />
                <category name="SelfHarm" threshold="4" />
                <category name="Violence" threshold="4" />
            </categories>
            <blocklists>
                <id>custom-blocklist</id>
            </blocklists>
        </llm-content-safety>
    </inbound>
</policies>
Options:
ParameterRangeDescription
threshold
0-70=safe, 7=severe
shield-prompt
true/falseDetect jailbreak attempts
过滤有害内容并检测越狱攻击。
xml
<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- 拦截严重程度4及以上的内容,检测越狱攻击 -->
        <llm-content-safety backend-id="content-safety-backend" shield-prompt="true">
            <categories output-type="EightSeverityLevels">
                <category name="Hate" threshold="4" />
                <category name="Sexual" threshold="4" />
                <category name="SelfHarm" threshold="4" />
                <category name="Violence" threshold="4" />
            </categories>
            <blocklists>
                <id>custom-blocklist</id>
            </blocklists>
        </llm-content-safety>
    </inbound>
</policies>
配置选项:
参数范围描述
threshold
0-70=安全,7=严重
shield-prompt
true/false检测越狱攻击尝试

Pattern 5: Rate Limits for MCPs/OpenAPI Tools

模式5:MCP/OpenAPI工具的速率限制

Protect MCP servers and tools with request rate limiting.
xml
<policies>
    <inbound>
        <base />
        <!-- 10 calls per 60 seconds per IP -->
        <rate-limit-by-key 
            calls="10" 
            renewal-period="60" 
            counter-key="@(context.Request.IpAddress)" 
            remaining-calls-variable-name="remainingCalls" />
    </inbound>
    <outbound>
        <set-header name="X-Rate-Limit-Remaining" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<int>("remainingCalls", 0).ToString())</value>
        </set-header>
        <base />
    </outbound>
</policies>
通过请求速率限制保护MCP服务器和工具。
xml
<policies>
    <inbound>
        <base />
        <!-- 每个IP每分钟10次调用 -->
        <rate-limit-by-key 
            calls="10" 
            renewal-period="60" 
            counter-key="@(context.Request.IpAddress)" 
            remaining-calls-variable-name="remainingCalls" />
    </inbound>
    <outbound>
        <set-header name="X-Rate-Limit-Remaining" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<int>("remainingCalls", 0).ToString())</value>
        </set-header>
        <base />
    </outbound>
</policies>

Pattern 6: Managed Identity Authentication

模式6:托管身份认证

Secure backend access with managed identity instead of API keys.
xml
<policies>
    <inbound>
        <base />
        <!-- Managed identity auth to Azure OpenAI -->
        <authentication-managed-identity 
            resource="https://cognitiveservices.azure.com" 
            output-token-variable-name="managed-id-access-token" 
            ignore-error="false" />
        <set-header name="Authorization" exists-action="override">
            <value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value>
        </set-header>
        <set-backend-service backend-id="{backend-id}" />
        <!-- Emit token metrics for monitoring -->
        <azure-openai-emit-token-metric namespace="openai">
            <dimension name="Subscription ID" value="@(context.Subscription.Id)" />
            <dimension name="Client IP" value="@(context.Request.IpAddress)" />
            <dimension name="API ID" value="@(context.Api.Id)" />
        </azure-openai-emit-token-metric>
    </inbound>
</policies>
使用托管身份而非API密钥来安全访问后端服务。
xml
<policies>
    <inbound>
        <base />
        <!-- 托管身份认证访问Azure OpenAI -->
        <authentication-managed-identity 
            resource="https://cognitiveservices.azure.com" 
            output-token-variable-name="managed-id-access-token" 
            ignore-error="false" />
        <set-header name="Authorization" exists-action="override">
            <value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value>
        </set-header>
        <set-backend-service backend-id="{backend-id}" />
        <!-- 发送令牌指标用于监控 -->
        <azure-openai-emit-token-metric namespace="openai">
            <dimension name="Subscription ID" value="@(context.Subscription.Id)" />
            <dimension name="Client IP" value="@(context.Request.IpAddress)" />
            <dimension name="API ID" value="@(context.Api.Id)" />
        </azure-openai-emit-token-metric>
    </inbound>
</policies>

Pattern 7: Load Balancing with Retry

模式7:带重试的负载均衡

Distribute load across multiple backends with automatic failover.
xml
<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-pool-id}" />
    </inbound>
    <backend>
        <!-- Retry on 429 (rate limit) or 503 (service unavailable) -->
        <retry count="2" interval="0" first-fast-retry="true" 
            condition="@(context.Response.StatusCode == 429 || context.Response.StatusCode == 503)">
            <set-backend-service backend-id="{backend-pool-id}" />
            <forward-request buffer-request-body="true" />
        </retry>
    </backend>
    <on-error>
        <when condition="@(context.Response.StatusCode == 503)">
            <return-response>
                <set-status code="503" reason="Service Unavailable" />
            </return-response>
        </when>
    </on-error>
</policies>
在多个后端之间分配负载并实现自动故障转移。
xml
<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-pool-id}" />
    </inbound>
    <backend>
        <!-- 在429(速率限制)或503(服务不可用)时重试 -->
        <retry count="2" interval="0" first-fast-retry="true" 
            condition="@(context.Response.StatusCode == 429 || context.Response.StatusCode == 503)">
            <set-backend-service backend-id="{backend-pool-id}" />
            <forward-request buffer-request-body="true" />
        </retry>
    </backend>
    <on-error>
        <when condition="@(context.Response.StatusCode == 503)">
            <return-response>
                <set-status code="503" reason="Service Unavailable" />
            </return-response>
        </when>
    </on-error>
</policies>

Pattern 8: Add AI Foundry Model Behind Gateway

模式8:将AI Foundry模型部署到网关后

When user asks to "add my model behind gateway", first discover available models from Azure AI Foundry, then ask which model to add.
当用户要求“将我的模型部署到网关后”时,首先从Azure AI Foundry发现可用模型,然后询问用户要添加哪个模型。

Step 1: Discover AI Foundry Projects and Available Models

步骤1:发现AI Foundry项目和可用模型

bash
undefined
bash
undefined

Set environment variables

设置环境变量

accountName="<ai-foundry-resource-name>" resourceGroupName="<resource-group>"
accountName="<ai-foundry-resource-name>" resourceGroupName="<resource-group>"

List AI Foundry resources (AI Services accounts)

列出AI Foundry资源(AI服务账户)

az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, resourceGroup:resourceGroup, location:location}" -o table
az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, resourceGroup:resourceGroup, location:location}" -o table

List available models in the AI Foundry resource

列出AI Foundry资源中的可用模型

az cognitiveservices account list-models
-n $accountName
-g $resourceGroupName
| jq '.[] | { name: .name, format: .format, version: .version, sku: .skus[0].name, capacity: .skus[0].capacity.default }'
az cognitiveservices account list-models
-n $accountName
-g $resourceGroupName
| jq '.[] | { name: .name, format: .format, version: .version, sku: .skus[0].name, capacity: .skus[0].capacity.default }'

List already deployed models

列出已部署的模型

az cognitiveservices account deployment list
-n $accountName
-g $resourceGroupName
undefined
az cognitiveservices account deployment list
-n $accountName
-g $resourceGroupName
undefined

Step 2: Ask User Which Model to Add

步骤2:询问用户要添加哪个模型

After listing the available models, use the ask_user tool to present the models as choices and let the user select which model to add behind the gateway.
Example choices to present:
  • Model deployments from the discovered list
  • Include model name, format (provider), version, and SKU info
列出可用模型后,使用ask_user工具将模型作为选项展示,让用户选择要部署到网关后的模型。
示例展示选项:
  • 从发现列表中选择模型部署
  • 包含模型名称、格式(提供商)、版本和SKU信息

Step 3: Deploy the Model (if not already deployed)

步骤3:部署模型(如果尚未部署)

bash
undefined
bash
undefined

Deploy the selected model to AI Foundry

将选定的模型部署到AI Foundry

az cognitiveservices account deployment create
-n $accountName
-g $resourceGroupName
--deployment-name <model-name>
--model-name <model-name>
--model-version <version>
--model-format <format>
--sku-capacity 1
--sku-name <sku>
undefined
az cognitiveservices account deployment create
-n $accountName
-g $resourceGroupName
--deployment-name <model-name>
--model-name <model-name>
--model-version <version>
--model-format <format>
--sku-capacity 1
--sku-name <sku>
undefined

Step 4: Configure APIM Backend for Selected Model

步骤4:为选定模型配置APIM后端

bash
undefined
bash
undefined

Get the AI Foundry inference endpoint

获取AI Foundry推理端点

ENDPOINT=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
| jq -r '.properties.endpoints["Azure AI Model Inference API"]')
ENDPOINT=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
| jq -r '.properties.endpoints["Azure AI Model Inference API"]')

Create APIM backend for the selected model

为选定模型创建APIM后端

az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <model-deployment-name>-backend
--protocol http
--url "${ENDPOINT}"
undefined
az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <model-deployment-name>-backend
--protocol http
--url "${ENDPOINT}"
undefined

Step 5: Create API and Apply Policies

步骤5:创建API并应用策略

bash
undefined
bash
undefined

Import Azure OpenAI API specification

导入Azure OpenAI API规范

az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--path <model-deployment-name>
--specification-format OpenApiJson
--specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json"
undefined
az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--path <model-deployment-name>
--specification-format OpenApiJson
--specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json"
undefined

Step 6: Grant APIM Access to AI Foundry

步骤6:授予APIM访问AI Foundry的权限

bash
undefined
bash
undefined

Get APIM managed identity principal ID

获取APIM托管身份的主体ID

APIM_PRINCIPAL_ID=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "identity.principalId" -o tsv)
APIM_PRINCIPAL_ID=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "identity.principalId" -o tsv)

Get AI Foundry resource ID

获取AI Foundry资源ID

AI_RESOURCE_ID=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
--query "id" -o tsv)
AI_RESOURCE_ID=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
--query "id" -o tsv)

Assign Cognitive Services User role

分配认知服务用户角色

az role assignment create
--assignee $APIM_PRINCIPAL_ID
--role "Cognitive Services User"
--scope $AI_RESOURCE_ID
undefined
az role assignment create
--assignee $APIM_PRINCIPAL_ID
--role "Cognitive Services User"
--scope $AI_RESOURCE_ID
undefined

Bicep Template for Backend Configuration

后端配置的Bicep模板

bicep
param apimServiceName string
param backendId string
param aiFoundryEndpoint string
param modelDeploymentName string

resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
  name: apimServiceName
}

resource backend 'Microsoft.ApiManagement/service/backends@2024-06-01-preview' = {
  parent: apimService
  name: backendId
  properties: {
    protocol: 'http'
    url: '${aiFoundryEndpoint}openai/deployments/${modelDeploymentName}'
    credentials: {
      header: {}
    }
    tls: {
      validateCertificateChain: true
      validateCertificateName: true
    }
  }
}
bicep
param apimServiceName string
param backendId string
param aiFoundryEndpoint string
param modelDeploymentName string

resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
  name: apimServiceName
}

resource backend 'Microsoft.ApiManagement/service/backends@2024-06-01-preview' = {
  parent: apimService
  name: backendId
  properties: {
    protocol: 'http'
    url: '${aiFoundryEndpoint}openai/deployments/${modelDeploymentName}'
    credentials: {
      header: {}
    }
    tls: {
      validateCertificateChain: true
      validateCertificateName: true
    }
  }
}

Pattern 9: Import API from OpenAPI Specification

模式9:从OpenAPI规范导入API

Add an API to the gateway from an OpenAPI/Swagger specification, either from a local file or web URL.
从OpenAPI/Swagger规范将API添加到网关,支持本地文件或Web URL。

Step 1: Import API from Web URL

步骤1:从Web URL导入API

bash
undefined
bash
undefined

Import API from a publicly accessible OpenAPI spec URL

从公开可访问的OpenAPI规范URL导入API

az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API Display Name>"
--specification-format OpenApiJson
--specification-url "https://example.com/openapi.json"
undefined
az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API显示名称>"
--specification-format OpenApiJson
--specification-url "https://example.com/openapi.json"
undefined

Step 2: Import API from Local File

步骤2:从本地文件导入API

bash
undefined
bash
undefined

Import API from a local OpenAPI spec file (JSON or YAML)

从本地OpenAPI规范文件(JSON或YAML)导入API

az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API Display Name>"
--specification-format OpenApi
--specification-path "./openapi.yaml"
undefined
az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API显示名称>"
--specification-format OpenApi
--specification-path "./openapi.yaml"
undefined

Step 3: Configure Backend for the API

步骤3:为API配置后端

bash
undefined
bash
undefined

Create backend pointing to your API server

创建指向API服务器的后端

az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <backend-id>
--protocol http
--url "https://your-api-server.com"
az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <backend-id>
--protocol http
--url "https://your-api-server.com"

Update API to use the backend

更新API以使用该后端

az apim api update
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--set properties.serviceUrl="https://your-api-server.com"
undefined
az apim api update
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--set properties.serviceUrl="https://your-api-server.com"
undefined

Step 4: Apply Policies (Optional)

步骤4:应用策略(可选)

xml
<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- Add rate limiting -->
        <rate-limit-by-key 
            calls="100" 
            renewal-period="60" 
            counter-key="@(context.Request.IpAddress)" />
    </inbound>
    <outbound>
        <base />
    </outbound>
</policies>
xml
<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- 添加速率限制 -->
        <rate-limit-by-key 
            calls="100" 
            renewal-period="60" 
            counter-key="@(context.Request.IpAddress)" />
    </inbound>
    <outbound>
        <base />
    </outbound>
</policies>

Supported Specification Formats

支持的规范格式

FormatValueFile Extension
OpenAPI 3.x JSON
OpenApiJson
.json
OpenAPI 3.x YAML
OpenApi
.yaml
,
.yml
Swagger 2.0 JSON
SwaggerJson
.json
Swagger 2.0 (link)
SwaggerLinkJson
URL
WSDL
Wsdl
.wsdl
WADL
Wadl
.wadl
格式取值文件扩展名
OpenAPI 3.x JSON
OpenApiJson
.json
OpenAPI 3.x YAML
OpenApi
.yaml
,
.yml
Swagger 2.0 JSON
SwaggerJson
.json
Swagger 2.0(链接)
SwaggerLinkJson
URL
WSDL
Wsdl
.wsdl
WADL
Wadl
.wadl

Pattern 10: Convert API to MCP Server

模式10:将API转换为MCP服务器

Convert existing APIM API operations into an MCP (Model Context Protocol) server, enabling AI agents to discover and use your APIs as tools.
将现有的APIM API操作转换为MCP(模型上下文协议)服务器,使AI Agent能够发现并将您的API作为工具使用。

Prerequisites

前提条件

  • APIM instance with Basicv2 SKU or higher
  • Existing API imported into APIM
  • MCP feature enabled on APIM
  • Basicv2 SKU或更高版本的APIM实例
  • 已导入APIM的现有API
  • APIM上已启用MCP功能

Step 1: List Existing APIs in APIM

步骤1:列出APIM中的现有API

bash
undefined
bash
undefined

List all APIs in APIM

列出APIM中的所有API

az apim api list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--query "[].{id:name, displayName:displayName, path:path}"
-o table
undefined
az apim api list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--query "[].{id:name, displayName:displayName, path:path}"
-o table
undefined

Step 2: Ask User Which API to Convert

步骤2:询问用户要转换哪个API

After listing the APIs, use the ask_user tool to let the user select which API to convert to an MCP server.
列出API后,使用ask_user工具让用户选择要转换为MCP服务器的API。

Step 3: List API Operations

步骤3:列出API操作

bash
undefined
bash
undefined

List all operations for the selected API

列出选定API的所有操作

az apim api operation list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--query "[].{operationId:name, displayName:displayName, method:method, urlTemplate:urlTemplate}"
-o table
undefined
az apim api operation list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--query "[].{operationId:name, displayName:displayName, method:method, urlTemplate:urlTemplate}"
-o table
undefined

Step 4: Ask User Which Operations to Expose as MCP Tools

步骤4:询问用户要将哪些操作暴露为MCP工具

After listing the operations, use the ask_user tool to present the operations as choices. Let the user select which operations to expose as MCP tools. Users may want to expose all operations or only a subset.
Example choices to present:
  • All operations (convert entire API)
  • Individual operations from the discovered list
  • Include operation name, method, and URL template
列出操作后,使用ask_user工具将操作作为选项展示,让用户选择要暴露为MCP工具的操作。用户可以选择暴露所有操作或仅部分操作。
示例展示选项:
  • 所有操作(转换整个API)
  • 从发现列表中选择单个操作
  • 包含操作名称、方法和URL模板

Step 5: Enable MCP Server on APIM

步骤5:在APIM上启用MCP服务器

bash
undefined
bash
undefined

Enable MCP server capability (via ARM/Bicep or Portal)

启用MCP服务器功能(通过ARM/Bicep或门户)

Note: MCP configuration is done via APIM policies and product configuration

注意:MCP配置通过APIM策略和产品配置完成

undefined
undefined

Step 6: Configure MCP Endpoint for API

步骤6:为API配置MCP端点

Create an MCP-compatible endpoint that exposes your API operations as tools:
xml
<policies>
    <inbound>
        <base />
        <!-- MCP tools/list endpoint handler -->
        <choose>
            <when condition="@(context.Request.Url.Path.EndsWith("/mcp/tools/list"))">
                <return-response>
                    <set-status code="200" reason="OK" />
                    <set-header name="Content-Type" exists-action="override">
                        <value>application/json</value>
                    </set-header>
                    <set-body>@{
                        var tools = new JArray();
                        // Define your API operations as MCP tools
                        tools.Add(new JObject(
                            new JProperty("name", "operation_name"),
                            new JProperty("description", "Description of what this operation does"),
                            new JProperty("inputSchema", new JObject(
                                new JProperty("type", "object"),
                                new JProperty("properties", new JObject(
                                    new JProperty("param1", new JObject(
                                        new JProperty("type", "string"),
                                        new JProperty("description", "Parameter description")
                                    ))
                                ))
                            ))
                        ));
                        return new JObject(new JProperty("tools", tools)).ToString();
                    }</set-body>
                </return-response>
            </when>
        </choose>
    </inbound>
</policies>
创建兼容MCP的端点,将API操作暴露为工具:
xml
<policies>
    <inbound>
        <base />
        <!-- MCP tools/list端点处理程序 -->
        <choose>
            <when condition="@(context.Request.Url.Path.EndsWith("/mcp/tools/list"))">
                <return-response>
                    <set-status code="200" reason="OK" />
                    <set-header name="Content-Type" exists-action="override">
                        <value>application/json</value>
                    </set-header>
                    <set-body>@{
                        var tools = new JArray();
                        // 将API操作定义为MCP工具
                        tools.Add(new JObject(
                            new JProperty("name", "operation_name"),
                            new JProperty("description", "该操作的描述"),
                            new JProperty("inputSchema", new JObject(
                                new JProperty("type", "object"),
                                new JProperty("properties", new JObject(
                                    new JProperty("param1", new JObject(
                                        new JProperty("type", "string"),
                                        new JProperty("description", "参数描述")
                                    ))
                                ))
                            ))
                        ));
                        return new JObject(new JProperty("tools", tools)).ToString();
                    }</set-body>
                </return-response>
            </when>
        </choose>
    </inbound>
</policies>

Step 7: Bicep Template for MCP-Enabled API

步骤7:启用MCP的API的Bicep模板

bicep
param apimServiceName string
param apiId string
param apiDisplayName string
param apiPath string
param backendUrl string

resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
  name: apimServiceName
}

resource api 'Microsoft.ApiManagement/service/apis@2024-06-01-preview' = {
  parent: apimService
  name: apiId
  properties: {
    displayName: apiDisplayName
    path: apiPath
    protocols: ['https']
    serviceUrl: backendUrl
    subscriptionRequired: true
    // MCP endpoints
    apiType: 'http'
  }
}

// MCP tools/list operation
resource mcpToolsListOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
  parent: api
  name: 'mcp-tools-list'
  properties: {
    displayName: 'MCP Tools List'
    method: 'POST'
    urlTemplate: '/mcp/tools/list'
    description: 'List available MCP tools'
  }
}

// MCP tools/call operation
resource mcpToolsCallOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
  parent: api
  name: 'mcp-tools-call'
  properties: {
    displayName: 'MCP Tools Call'
    method: 'POST'
    urlTemplate: '/mcp/tools/call'
    description: 'Call an MCP tool'
  }
}
bicep
param apimServiceName string
param apiId string
param apiDisplayName string
param apiPath string
param backendUrl string

resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
  name: apimServiceName
}

resource api 'Microsoft.ApiManagement/service/apis@2024-06-01-preview' = {
  parent: apimService
  name: apiId
  properties: {
    displayName: apiDisplayName
    path: apiPath
    protocols: ['https']
    serviceUrl: backendUrl
    subscriptionRequired: true
    // MCP端点
    apiType: 'http'
  }
}

// MCP tools/list操作
resource mcpToolsListOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
  parent: api
  name: 'mcp-tools-list'
  properties: {
    displayName: 'MCP工具列表'
    method: 'POST'
    urlTemplate: '/mcp/tools/list'
    description: '列出可用的MCP工具'
  }
}

// MCP tools/call操作
resource mcpToolsCallOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
  parent: api
  name: 'mcp-tools-call'
  properties: {
    displayName: '调用MCP工具'
    method: 'POST'
    urlTemplate: '/mcp/tools/call'
    description: '调用MCP工具'
  }
}

Step 8: Test MCP Endpoint

步骤8:测试MCP端点

bash
undefined
bash
undefined

Get APIM gateway URL

获取APIM网关URL

GATEWAY_URL=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "gatewayUrl" -o tsv)
GATEWAY_URL=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "gatewayUrl" -o tsv)

Test MCP tools/list endpoint

测试MCP tools/list端点

curl -X POST "${GATEWAY_URL}/<api-path>/mcp/tools/list"
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: <subscription-key>"
-d '{}'
undefined
curl -X POST "${GATEWAY_URL}/<api-path>/mcp/tools/list"
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: <订阅密钥>"
-d '{}'
undefined

MCP Tool Definition Schema

MCP工具定义 schema

When converting API operations to MCP tools, use this schema:
json
{
  "tools": [
    {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "inputSchema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City name or coordinates"
          }
        },
        "required": ["location"]
      }
    }
  ]
}
将API操作转换为MCP工具时,请使用以下schema:
json
{
  "tools": [
    {
      "name": "get_weather",
      "description": "获取指定地点的当前天气",
      "inputSchema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "城市名称或坐标"
          }
        },
        "required": ["location"]
      }
    }
  ]
}

Reference

参考文档

Lab References (AI-Gateway Repo)

实验室参考(AI-Gateway仓库)

Essential Labs to Get Started:
ScenarioLabDescription
Semantic Cachingsemantic-cachingCache similar prompts to reduce costs
Token Rate Limitingtoken-rate-limitingLimit tokens per minute
Content Safetycontent-safetyFilter harmful content
Load Balancingbackend-pool-load-balancingDistribute load across backends
MCP from APImcp-from-apiConvert OpenAPI to MCP server
Zero to Productionzero-to-productionComplete production setup guide
入门必备实验室:
场景实验室描述
语义缓存semantic-caching缓存相似提示词以降低成本
令牌速率限制token-rate-limiting限制每分钟的令牌使用量
内容安全content-safety过滤有害内容
负载均衡backend-pool-load-balancing在多个后端之间分配负载
从API创建MCPmcp-from-api将OpenAPI转换为MCP服务器
从0到生产zero-to-production完整的生产环境搭建指南

Quick Start Checklist

快速开始检查清单

Prerequisites

前提条件

  • Azure subscription created
  • Azure CLI installed and authenticated (
    az login
    )
  • Resource group created for AI Gateway resources
  • 已创建Azure订阅
  • 已安装Azure CLI并完成身份验证 (
    az login
    )
  • 已为AI网关资源创建资源组

Deployment

部署

  • Deploy APIM with Basicv2 SKU
  • Configure managed identity
  • Add backend for Azure OpenAI or AI Foundry
  • Apply policies (caching, rate limits, content safety)
  • 部署Basicv2 SKU的APIM实例
  • 配置托管身份
  • 为Azure OpenAI或AI Foundry添加后端
  • 应用策略(缓存、速率限制、内容安全)

Verification

验证

  • Test API endpoint through gateway
  • Verify token metrics in Application Insights
  • Check rate limiting headers in response
  • Validate content safety filtering
  • 通过网关测试API端点
  • 在Application Insights中验证令牌指标
  • 检查响应中的速率限制头
  • 验证内容安全过滤

Best Practices

最佳实践

PracticeDescription
Default to Basicv2Use Basicv2 SKU for cost/speed optimization
Use managed identityPrefer managed identity over API keys for backend auth
Enable token metricsUse
azure-openai-emit-token-metric
for cost tracking
Semantic cachingCache similar prompts to reduce costs (60-80% savings possible)
Rate limit by keyUse subscription ID or IP for granular rate limiting
Content safetyEnable
shield-prompt
to detect jailbreak attempts
实践描述
默认使用Basicv2使用Basicv2 SKU优化成本和速度
使用托管身份优先使用托管身份而非API密钥进行后端认证
启用令牌指标使用
azure-openai-emit-token-metric
进行成本跟踪
语义缓存缓存相似提示词以降低成本(可节省60-80%)
按密钥限制速率使用订阅ID或IP实现细粒度速率限制
内容安全启用
shield-prompt
以检测越狱攻击尝试

Troubleshooting

故障排除

IssueSymptomSolution
Slow APIM creationDeployment takes 30+ minutesUse Basicv2 SKU instead of Premium
Token limit exceeded429 responseIncrease
tokens-per-minute
or add load balancing
Cache not workingNo cache hitsLower
score-threshold
(e.g., 0.7)
Content blockedFalse positivesIncrease category thresholds
Backend auth fails401 from Azure OpenAIAssign Cognitive Services User role to APIM managed identity
Rate limit too strictLegitimate requests blockedIncrease
calls
or
renewal-period
问题症状解决方案
APIM创建缓慢部署耗时30分钟以上使用Basicv2 SKU替代Premium
令牌限制超出429响应提高
tokens-per-minute
或添加负载均衡
缓存不生效无缓存命中降低
score-threshold
(例如0.7)
内容被误拦截误报提高分类阈值
后端认证失败Azure OpenAI返回401为APIM托管身份分配认知服务用户角色
速率限制过严合法请求被拦截提高
calls
renewal-period

Additional Resources

额外资源