azure-aigateway

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Azure AI Gateway

Azure AI 网关

Bootstrap and configure Azure API Management (APIM) as an AI Gateway for securing, observing, and controlling AI models, tools (MCP Servers), and agents.

快速搭建并配置Azure API Management (APIM)作为AI网关，用于保护、监控和控制AI模型、工具（MCP服务器）及Agent。

Skill Activation Triggers

技能触发条件

Use this skill immediately when the user asks to:

"Set up a gateway for my model"
"Set up a gateway for my tools"
"Set up a gateway for my agents"
"Add a gateway to my MCP server"
"Protect my AI model with a gateway"
"Secure my AI agents"
"Ratelimit my model requests"
"Ratelimit my tool requests"
"Limit tokens for my model"
"Add rate limiting to my MCP server"
"Enable semantic caching for my AI API"
"Add content safety to my AI endpoint"
"Add my model behind gateway"
"Import API from OpenAPI spec"
"Add API to gateway from swagger"
"Convert my API to MCP"
"Expose my API as MCP server"

Key Indicators:

User deploying Azure OpenAI, AI Foundry, or other AI models
User creating or managing MCP servers
User needs token limits, rate limiting, or quota management
User wants to cache AI responses to reduce costs
User needs content filtering or safety controls
User wants load balancing across multiple AI backends

Secondary Triggers (Proactive Recommendations):

After model creation: Recommend AI Gateway for security, caching, and token limits
After MCP server creation: Recommend AI Gateway for rate limiting, content safety, and auth

当用户提出以下需求时，立即使用本技能：

"为我的模型搭建网关"
"为我的工具搭建网关"
"为我的Agent搭建网关"
"为我的MCP服务器添加网关"
"用网关保护我的AI模型"
"保护我的AI Agent"
"限制我的模型请求速率"
"限制我的工具请求速率"
"为我的模型设置令牌限制"
"为我的MCP服务器添加速率限制"
"为我的AI API启用语义缓存"
"为我的AI端点添加内容安全控制"
"将我的模型部署到网关后"
"从OpenAPI规范导入API"
"从Swagger将API添加到网关"
"将我的API转换为MCP"
"将我的API作为MCP服务器暴露"

关键识别指标：

用户正在部署Azure OpenAI、AI Foundry或其他AI模型
用户正在创建或管理MCP服务器
用户需要令牌限制、速率限制或配额管理
用户希望缓存AI响应以降低成本
用户需要内容过滤或安全控制
用户希望在多个AI后端之间实现负载均衡

次要触发条件（主动推荐）：

模型创建完成后：推荐使用AI网关实现安全、缓存和令牌限制
MCP服务器创建完成后：推荐使用AI网关实现速率限制、内容安全和身份验证

Overview

概述

Azure API Management serves as an AI Gateway that provides:

Security: Authentication, authorization, and content safety
Observability: Token metrics, logging, and monitoring
Control: Rate limiting, token limits, and load balancing
Optimization: Semantic caching to reduce costs and latency

AI Models ──┐                      ┌── Azure OpenAI
MCP Tools ──┼── AI Gateway (APIM) ──┼── AI Foundry
Agents ─────┘                      └── Custom Models

Azure API Management作为AI网关可提供以下能力：

安全：身份验证、授权和内容安全
可观测性：令牌指标、日志记录和监控
控制：速率限制、令牌限制和负载均衡
优化：语义缓存以降低成本和延迟

AI Models ──┐                      ┌── Azure OpenAI
MCP Tools ──┼── AI Gateway (APIM) ──┼── AI Foundry
Agents ─────┘                      └── Custom Models

Key Resources

关键资源

GitHub Repo: https://github.com/Azure-Samples/AI-Gateway (aka.ms/aigateway)
Docs:

GitHub仓库：https://github.com/Azure-Samples/AI-Gateway (aka.ms/aigateway)
文档：

Configuration Rules

配置规则

Default to
Basicv2
SKU when creating new APIM instances:

Cheaper than other tiers
Creates quickly (~5-10 minutes vs 30+ for Premium)
Supports all AI Gateway policies

创建新APIM实例时，默认使用Basicv2 SKU：

比其他层级更便宜
创建速度快（约5-10分钟，而Premium层级需要30分钟以上）
支持所有AI网关策略

Pattern 1: Quick Bootstrap AI Gateway

模式1：快速搭建AI网关

Deploy APIM with Basicv2 SKU for AI workloads.

bash

undefined

为AI工作负载部署Basicv2 SKU的APIM实例。

bash

undefined

Create resource group

创建资源组

az group create --name rg-aigateway --location eastus2

Deploy APIM with Bicep

使用Bicep部署APIM

az deployment group create
--resource-group rg-aigateway
--template-file main.bicep
--parameters apimSku=Basicv2

undefined

az deployment group create
--resource-group rg-aigateway
--template-file main.bicep
--parameters apimSku=Basicv2

undefined

Bicep Template

Bicep模板

bicep

param location string = resourceGroup().location
param apimSku string = 'Basicv2'
param apimManagedIdentityType string = 'SystemAssigned'

// NOTE: Using 2024-06-01-preview because Basicv2 SKU support currently requires this preview API version.
//       Update to the latest stable (GA) API version once Basicv2 is available there.
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' = {
  name: 'apim-aigateway-${uniqueString(resourceGroup().id)}'
  location: location
  sku: {
    name: apimSku
    capacity: 1
  }
  properties: {
    publisherEmail: 'admin@contoso.com'
    publisherName: 'Contoso'
  }
  identity: {
    type: apimManagedIdentityType
  }
}

output gatewayUrl string = apimService.properties.gatewayUrl
output principalId string = apimService.identity.principalId

bicep

param location string = resourceGroup().location
param apimSku string = 'Basicv2'
param apimManagedIdentityType string = 'SystemAssigned'

// 注意：使用2024-06-01-preview版本，因为Basicv2 SKU目前需要该预览版API版本。
// 一旦Basicv2正式发布，更新为最新稳定版（GA）API版本。
resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' = {
  name: 'apim-aigateway-${uniqueString(resourceGroup().id)}'
  location: location
  sku: {
    name: apimSku
    capacity: 1
  }
  properties: {
    publisherEmail: 'admin@contoso.com'
    publisherName: 'Contoso'
  }
  identity: {
    type: apimManagedIdentityType
  }
}

output gatewayUrl string = apimService.properties.gatewayUrl
output principalId string = apimService.identity.principalId

Pattern 2: Semantic Caching

模式2：语义缓存

Cache similar prompts to reduce costs and latency.

xml

<policies>
    <inbound>
        <base />
        <!-- Cache lookup with 0.8 similarity threshold -->
        <azure-openai-semantic-cache-lookup 
            score-threshold="0.8" 
            embeddings-backend-id="embeddings-backend" 
            embeddings-backend-auth="system-assigned" />
        <set-backend-service backend-id="{backend-id}" />
    </inbound>
    <outbound>
        <!-- Cache responses for 120 seconds -->
        <azure-openai-semantic-cache-store duration="120" />
        <base />
    </outbound>
</policies>

Options:

Parameter	Range	Description
`score-threshold`	0.7-0.95	Higher = stricter matching
`duration`	60-3600	Cache TTL in seconds

缓存相似的提示词以降低成本和延迟。

xml

<policies>
    <inbound>
        <base />
        <!-- 缓存查找，相似度阈值为0.8 -->
        <azure-openai-semantic-cache-lookup 
            score-threshold="0.8" 
            embeddings-backend-id="embeddings-backend" 
            embeddings-backend-auth="system-assigned" />
        <set-backend-service backend-id="{backend-id}" />
    </inbound>
    <outbound>
        <!-- 缓存响应120秒 -->
        <azure-openai-semantic-cache-store duration="120" />
        <base />
    </outbound>
</policies>

配置选项：

参数	范围	描述
`score-threshold`	0.7-0.95	值越高，匹配越严格
`duration`	60-3600	缓存过期时间（秒）

Pattern 3: Token Rate Limiting

模式3：令牌速率限制

Limit tokens per minute to control costs and prevent abuse.

xml

<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- Limit to 500 tokens per minute per subscription -->
        <azure-openai-token-limit 
            counter-key="@(context.Subscription.Id)"
            tokens-per-minute="500" 
            estimate-prompt-tokens="false" 
            remaining-tokens-variable-name="remainingTokens" />
    </inbound>
</policies>

Options:

Parameter	Values	Description
`counter-key`	Subscription.Id, Request.IpAddress, custom	Grouping key for limits
`tokens-per-minute`	100-100000	Token quota
`estimate-prompt-tokens`	true/false	true = faster but less accurate

限制每分钟的令牌使用量，以控制成本并防止滥用。

xml

<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- 每个订阅每分钟限制500个令牌 -->
        <azure-openai-token-limit 
            counter-key="@(context.Subscription.Id)"
            tokens-per-minute="500" 
            estimate-prompt-tokens="false" 
            remaining-tokens-variable-name="remainingTokens" />
    </inbound>
</policies>

配置选项：

参数	取值	描述
`counter-key`	Subscription.Id, Request.IpAddress, 自定义	限制的分组键
`tokens-per-minute`	100-100000	令牌配额
`estimate-prompt-tokens`	true/false	true = 速度更快但精度较低

Pattern 4: Content Safety

模式4：内容安全

Filter harmful content and detect jailbreak attempts.

xml

<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- Block severity 4+ content, detect jailbreaks -->
        <llm-content-safety backend-id="content-safety-backend" shield-prompt="true">
            <categories output-type="EightSeverityLevels">
                <category name="Hate" threshold="4" />
                <category name="Sexual" threshold="4" />
                <category name="SelfHarm" threshold="4" />
                <category name="Violence" threshold="4" />
            </categories>
            <blocklists>
                <id>custom-blocklist</id>
            </blocklists>
        </llm-content-safety>
    </inbound>
</policies>

Options:

Parameter	Range	Description
`threshold`	0-7	0=safe, 7=severe
`shield-prompt`	true/false	Detect jailbreak attempts

过滤有害内容并检测越狱攻击。

xml

<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- 拦截严重程度4及以上的内容，检测越狱攻击 -->
        <llm-content-safety backend-id="content-safety-backend" shield-prompt="true">
            <categories output-type="EightSeverityLevels">
                <category name="Hate" threshold="4" />
                <category name="Sexual" threshold="4" />
                <category name="SelfHarm" threshold="4" />
                <category name="Violence" threshold="4" />
            </categories>
            <blocklists>
                <id>custom-blocklist</id>
            </blocklists>
        </llm-content-safety>
    </inbound>
</policies>

配置选项：

参数	范围	描述
`threshold`	0-7	0=安全，7=严重
`shield-prompt`	true/false	检测越狱攻击尝试

Pattern 5: Rate Limits for MCPs/OpenAPI Tools

模式5：MCP/OpenAPI工具的速率限制

Protect MCP servers and tools with request rate limiting.

xml

<policies>
    <inbound>
        <base />
        <!-- 10 calls per 60 seconds per IP -->
        <rate-limit-by-key 
            calls="10" 
            renewal-period="60" 
            counter-key="@(context.Request.IpAddress)" 
            remaining-calls-variable-name="remainingCalls" />
    </inbound>
    <outbound>
        <set-header name="X-Rate-Limit-Remaining" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<int>("remainingCalls", 0).ToString())</value>
        </set-header>
        <base />
    </outbound>
</policies>

通过请求速率限制保护MCP服务器和工具。

xml

<policies>
    <inbound>
        <base />
        <!-- 每个IP每分钟10次调用 -->
        <rate-limit-by-key 
            calls="10" 
            renewal-period="60" 
            counter-key="@(context.Request.IpAddress)" 
            remaining-calls-variable-name="remainingCalls" />
    </inbound>
    <outbound>
        <set-header name="X-Rate-Limit-Remaining" exists-action="override">
            <value>@(context.Variables.GetValueOrDefault<int>("remainingCalls", 0).ToString())</value>
        </set-header>
        <base />
    </outbound>
</policies>

Pattern 6: Managed Identity Authentication

模式6：托管身份认证

Secure backend access with managed identity instead of API keys.

xml

<policies>
    <inbound>
        <base />
        <!-- Managed identity auth to Azure OpenAI -->
        <authentication-managed-identity 
            resource="https://cognitiveservices.azure.com" 
            output-token-variable-name="managed-id-access-token" 
            ignore-error="false" />
        <set-header name="Authorization" exists-action="override">
            <value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value>
        </set-header>
        <set-backend-service backend-id="{backend-id}" />
        <!-- Emit token metrics for monitoring -->
        <azure-openai-emit-token-metric namespace="openai">
            <dimension name="Subscription ID" value="@(context.Subscription.Id)" />
            <dimension name="Client IP" value="@(context.Request.IpAddress)" />
            <dimension name="API ID" value="@(context.Api.Id)" />
        </azure-openai-emit-token-metric>
    </inbound>
</policies>

使用托管身份而非API密钥来安全访问后端服务。

xml

<policies>
    <inbound>
        <base />
        <!-- 托管身份认证访问Azure OpenAI -->
        <authentication-managed-identity 
            resource="https://cognitiveservices.azure.com" 
            output-token-variable-name="managed-id-access-token" 
            ignore-error="false" />
        <set-header name="Authorization" exists-action="override">
            <value>@("Bearer " + (string)context.Variables["managed-id-access-token"])</value>
        </set-header>
        <set-backend-service backend-id="{backend-id}" />
        <!-- 发送令牌指标用于监控 -->
        <azure-openai-emit-token-metric namespace="openai">
            <dimension name="Subscription ID" value="@(context.Subscription.Id)" />
            <dimension name="Client IP" value="@(context.Request.IpAddress)" />
            <dimension name="API ID" value="@(context.Api.Id)" />
        </azure-openai-emit-token-metric>
    </inbound>
</policies>

Pattern 7: Load Balancing with Retry

模式7：带重试的负载均衡

Distribute load across multiple backends with automatic failover.

xml

<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-pool-id}" />
    </inbound>
    <backend>
        <!-- Retry on 429 (rate limit) or 503 (service unavailable) -->
        <retry count="2" interval="0" first-fast-retry="true" 
            condition="@(context.Response.StatusCode == 429 || context.Response.StatusCode == 503)">
            <set-backend-service backend-id="{backend-pool-id}" />
            <forward-request buffer-request-body="true" />
        </retry>
    </backend>
    <on-error>
        <when condition="@(context.Response.StatusCode == 503)">
            <return-response>
                <set-status code="503" reason="Service Unavailable" />
            </return-response>
        </when>
    </on-error>
</policies>

在多个后端之间分配负载并实现自动故障转移。

xml

<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-pool-id}" />
    </inbound>
    <backend>
        <!-- 在429（速率限制）或503（服务不可用）时重试 -->
        <retry count="2" interval="0" first-fast-retry="true" 
            condition="@(context.Response.StatusCode == 429 || context.Response.StatusCode == 503)">
            <set-backend-service backend-id="{backend-pool-id}" />
            <forward-request buffer-request-body="true" />
        </retry>
    </backend>
    <on-error>
        <when condition="@(context.Response.StatusCode == 503)">
            <return-response>
                <set-status code="503" reason="Service Unavailable" />
            </return-response>
        </when>
    </on-error>
</policies>

Pattern 8: Add AI Foundry Model Behind Gateway

模式8：将AI Foundry模型部署到网关后

When user asks to "add my model behind gateway", first discover available models from Azure AI Foundry, then ask which model to add.

当用户要求“将我的模型部署到网关后”时，首先从Azure AI Foundry发现可用模型，然后询问用户要添加哪个模型。

Step 1: Discover AI Foundry Projects and Available Models

步骤1：发现AI Foundry项目和可用模型

bash

undefined

bash

undefined

Set environment variables

设置环境变量

accountName="<ai-foundry-resource-name>" resourceGroupName="<resource-group>"

List AI Foundry resources (AI Services accounts)

列出AI Foundry资源（AI服务账户）

az cognitiveservices account list --query "[?kind=='AIServices'].{name:name, resourceGroup:resourceGroup, location:location}" -o table

List available models in the AI Foundry resource

列出AI Foundry资源中的可用模型

az cognitiveservices account list-models
-n $accountName
-g $resourceGroupName
| jq '.[] | { name: .name, format: .format, version: .version, sku: .skus[0].name, capacity: .skus[0].capacity.default }'

List already deployed models

列出已部署的模型

az cognitiveservices account deployment list
-n $accountName
-g $resourceGroupName

undefined

az cognitiveservices account deployment list
-n $accountName
-g $resourceGroupName

undefined

Step 2: Ask User Which Model to Add

步骤2：询问用户要添加哪个模型

After listing the available models, use the ask_user tool to present the models as choices and let the user select which model to add behind the gateway.

Example choices to present:

Model deployments from the discovered list
Include model name, format (provider), version, and SKU info

列出可用模型后，使用ask_user工具将模型作为选项展示，让用户选择要部署到网关后的模型。

示例展示选项：

从发现列表中选择模型部署
包含模型名称、格式（提供商）、版本和SKU信息

Step 3: Deploy the Model (if not already deployed)

步骤3：部署模型（如果尚未部署）

bash

undefined

bash

undefined

Deploy the selected model to AI Foundry

将选定的模型部署到AI Foundry

az cognitiveservices account deployment create
-n $accountName
-g $resourceGroupName
--deployment-name <model-name>
--model-name <model-name>
--model-version <version>
--model-format <format>
--sku-capacity 1
--sku-name <sku>

undefined

undefined

Step 4: Configure APIM Backend for Selected Model

步骤4：为选定模型配置APIM后端

bash

undefined

bash

undefined

Get the AI Foundry inference endpoint

获取AI Foundry推理端点

ENDPOINT=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
| jq -r '.properties.endpoints["Azure AI Model Inference API"]')

Create APIM backend for the selected model

为选定模型创建APIM后端

az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <model-deployment-name>-backend
--protocol http
--url "${ENDPOINT}"

undefined

az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <model-deployment-name>-backend
--protocol http
--url "${ENDPOINT}"

undefined

Step 5: Create API and Apply Policies

步骤5：创建API并应用策略

bash

undefined

bash

undefined

Import Azure OpenAI API specification

导入Azure OpenAI API规范

az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--path <model-deployment-name>
--specification-format OpenApiJson
--specification-url "https://raw.githubusercontent.com/Azure/azure-rest-api-specs/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-02-01/inference.json"

undefined

undefined

Step 6: Grant APIM Access to AI Foundry

步骤6：授予APIM访问AI Foundry的权限

bash

undefined

bash

undefined

Get APIM managed identity principal ID

获取APIM托管身份的主体ID

APIM_PRINCIPAL_ID=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "identity.principalId" -o tsv)

Get AI Foundry resource ID

获取AI Foundry资源ID

AI_RESOURCE_ID=$(az cognitiveservices account show
-n $accountName
-g $resourceGroupName
--query "id" -o tsv)

Assign Cognitive Services User role

分配认知服务用户角色

az role assignment create
--assignee $APIM_PRINCIPAL_ID
--role "Cognitive Services User"
--scope $AI_RESOURCE_ID

undefined

az role assignment create
--assignee $APIM_PRINCIPAL_ID
--role "Cognitive Services User"
--scope $AI_RESOURCE_ID

undefined

Bicep Template for Backend Configuration

后端配置的Bicep模板

bicep

param apimServiceName string
param backendId string
param aiFoundryEndpoint string
param modelDeploymentName string

resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
  name: apimServiceName
}

resource backend 'Microsoft.ApiManagement/service/backends@2024-06-01-preview' = {
  parent: apimService
  name: backendId
  properties: {
    protocol: 'http'
    url: '${aiFoundryEndpoint}openai/deployments/${modelDeploymentName}'
    credentials: {
      header: {}
    }
    tls: {
      validateCertificateChain: true
      validateCertificateName: true
    }
  }
}

bicep

param apimServiceName string
param backendId string
param aiFoundryEndpoint string
param modelDeploymentName string

resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
  name: apimServiceName
}

resource backend 'Microsoft.ApiManagement/service/backends@2024-06-01-preview' = {
  parent: apimService
  name: backendId
  properties: {
    protocol: 'http'
    url: '${aiFoundryEndpoint}openai/deployments/${modelDeploymentName}'
    credentials: {
      header: {}
    }
    tls: {
      validateCertificateChain: true
      validateCertificateName: true
    }
  }
}

Pattern 9: Import API from OpenAPI Specification

模式9：从OpenAPI规范导入API

Add an API to the gateway from an OpenAPI/Swagger specification, either from a local file or web URL.

从OpenAPI/Swagger规范将API添加到网关，支持本地文件或Web URL。

Step 1: Import API from Web URL

步骤1：从Web URL导入API

bash

undefined

bash

undefined

Import API from a publicly accessible OpenAPI spec URL

从公开可访问的OpenAPI规范URL导入API

az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API Display Name>"
--specification-format OpenApiJson
--specification-url "https://example.com/openapi.json"

undefined

az apim api import
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--path <api-path>
--display-name "<API显示名称>"
--specification-format OpenApiJson
--specification-url "https://example.com/openapi.json"

undefined

Step 2: Import API from Local File

步骤2：从本地文件导入API

bash

undefined

bash

undefined

Import API from a local OpenAPI spec file (JSON or YAML)

从本地OpenAPI规范文件（JSON或YAML）导入API

undefined

undefined

Step 3: Configure Backend for the API

步骤3：为API配置后端

bash

undefined

bash

undefined

Create backend pointing to your API server

创建指向API服务器的后端

az apim backend create
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--backend-id <backend-id>
--protocol http
--url "https://your-api-server.com"

Update API to use the backend

更新API以使用该后端

az apim api update
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--set properties.serviceUrl="https://your-api-server.com"

undefined

az apim api update
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--set properties.serviceUrl="https://your-api-server.com"

undefined

Step 4: Apply Policies (Optional)

步骤4：应用策略（可选）

xml

<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- Add rate limiting -->
        <rate-limit-by-key 
            calls="100" 
            renewal-period="60" 
            counter-key="@(context.Request.IpAddress)" />
    </inbound>
    <outbound>
        <base />
    </outbound>
</policies>

xml

<policies>
    <inbound>
        <base />
        <set-backend-service backend-id="{backend-id}" />
        <!-- 添加速率限制 -->
        <rate-limit-by-key 
            calls="100" 
            renewal-period="60" 
            counter-key="@(context.Request.IpAddress)" />
    </inbound>
    <outbound>
        <base />
    </outbound>
</policies>

Supported Specification Formats

支持的规范格式

Format	Value	File Extension
OpenAPI 3.x JSON	`OpenApiJson`	`.json`
OpenAPI 3.x YAML	`OpenApi`	`.yaml` , `.yml`
Swagger 2.0 JSON	`SwaggerJson`	`.json`
Swagger 2.0 (link)	`SwaggerLinkJson`	URL
WSDL	`Wsdl`	`.wsdl`
WADL	`Wadl`	`.wadl`

格式	取值	文件扩展名
OpenAPI 3.x JSON	`OpenApiJson`	`.json`
OpenAPI 3.x YAML	`OpenApi`	`.yaml` , `.yml`
Swagger 2.0 JSON	`SwaggerJson`	`.json`
Swagger 2.0（链接）	`SwaggerLinkJson`	URL
WSDL	`Wsdl`	`.wsdl`
WADL	`Wadl`	`.wadl`

Pattern 10: Convert API to MCP Server

模式10：将API转换为MCP服务器

Convert existing APIM API operations into an MCP (Model Context Protocol) server, enabling AI agents to discover and use your APIs as tools.

将现有的APIM API操作转换为MCP（模型上下文协议）服务器，使AI Agent能够发现并将您的API作为工具使用。

Prerequisites

前提条件

APIM instance with Basicv2 SKU or higher
Existing API imported into APIM
MCP feature enabled on APIM

Basicv2 SKU或更高版本的APIM实例
已导入APIM的现有API
APIM上已启用MCP功能

Step 1: List Existing APIs in APIM

步骤1：列出APIM中的现有API

bash

undefined

bash

undefined

List all APIs in APIM

列出APIM中的所有API

az apim api list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--query "[].{id:name, displayName:displayName, path:path}"
-o table

undefined

az apim api list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--query "[].{id:name, displayName:displayName, path:path}"
-o table

undefined

Step 2: Ask User Which API to Convert

步骤2：询问用户要转换哪个API

After listing the APIs, use the ask_user tool to let the user select which API to convert to an MCP server.

列出API后，使用ask_user工具让用户选择要转换为MCP服务器的API。

Step 3: List API Operations

步骤3：列出API操作

bash

undefined

bash

undefined

List all operations for the selected API

列出选定API的所有操作

az apim api operation list
--resource-group <apim-resource-group>
--service-name <apim-service-name>
--api-id <api-id>
--query "[].{operationId:name, displayName:displayName, method:method, urlTemplate:urlTemplate}"
-o table

undefined

undefined

Step 4: Ask User Which Operations to Expose as MCP Tools

步骤4：询问用户要将哪些操作暴露为MCP工具

After listing the operations, use the ask_user tool to present the operations as choices. Let the user select which operations to expose as MCP tools. Users may want to expose all operations or only a subset.

Example choices to present:

All operations (convert entire API)
Individual operations from the discovered list
Include operation name, method, and URL template

列出操作后，使用ask_user工具将操作作为选项展示，让用户选择要暴露为MCP工具的操作。用户可以选择暴露所有操作或仅部分操作。

示例展示选项：

所有操作（转换整个API）
从发现列表中选择单个操作
包含操作名称、方法和URL模板

Step 5: Enable MCP Server on APIM

步骤5：在APIM上启用MCP服务器

bash

undefined

bash

undefined

Enable MCP server capability (via ARM/Bicep or Portal)

启用MCP服务器功能（通过ARM/Bicep或门户）

Note: MCP configuration is done via APIM policies and product configuration

注意：MCP配置通过APIM策略和产品配置完成

undefined

undefined

Step 6: Configure MCP Endpoint for API

步骤6：为API配置MCP端点

Create an MCP-compatible endpoint that exposes your API operations as tools:

xml

<policies>
    <inbound>
        <base />
        <!-- MCP tools/list endpoint handler -->
        <choose>
            <when condition="@(context.Request.Url.Path.EndsWith("/mcp/tools/list"))">
                <return-response>
                    <set-status code="200" reason="OK" />
                    <set-header name="Content-Type" exists-action="override">
                        <value>application/json</value>
                    </set-header>
                    <set-body>@{
                        var tools = new JArray();
                        // Define your API operations as MCP tools
                        tools.Add(new JObject(
                            new JProperty("name", "operation_name"),
                            new JProperty("description", "Description of what this operation does"),
                            new JProperty("inputSchema", new JObject(
                                new JProperty("type", "object"),
                                new JProperty("properties", new JObject(
                                    new JProperty("param1", new JObject(
                                        new JProperty("type", "string"),
                                        new JProperty("description", "Parameter description")
                                    ))
                                ))
                            ))
                        ));
                        return new JObject(new JProperty("tools", tools)).ToString();
                    }</set-body>
                </return-response>
            </when>
        </choose>
    </inbound>
</policies>

创建兼容MCP的端点，将API操作暴露为工具：

xml

<policies>
    <inbound>
        <base />
        <!-- MCP tools/list端点处理程序 -->
        <choose>
            <when condition="@(context.Request.Url.Path.EndsWith("/mcp/tools/list"))">
                <return-response>
                    <set-status code="200" reason="OK" />
                    <set-header name="Content-Type" exists-action="override">
                        <value>application/json</value>
                    </set-header>
                    <set-body>@{
                        var tools = new JArray();
                        // 将API操作定义为MCP工具
                        tools.Add(new JObject(
                            new JProperty("name", "operation_name"),
                            new JProperty("description", "该操作的描述"),
                            new JProperty("inputSchema", new JObject(
                                new JProperty("type", "object"),
                                new JProperty("properties", new JObject(
                                    new JProperty("param1", new JObject(
                                        new JProperty("type", "string"),
                                        new JProperty("description", "参数描述")
                                    ))
                                ))
                            ))
                        ));
                        return new JObject(new JProperty("tools", tools)).ToString();
                    }</set-body>
                </return-response>
            </when>
        </choose>
    </inbound>
</policies>

Step 7: Bicep Template for MCP-Enabled API

步骤7：启用MCP的API的Bicep模板

bicep

param apimServiceName string
param apiId string
param apiDisplayName string
param apiPath string
param backendUrl string

resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
  name: apimServiceName
}

resource api 'Microsoft.ApiManagement/service/apis@2024-06-01-preview' = {
  parent: apimService
  name: apiId
  properties: {
    displayName: apiDisplayName
    path: apiPath
    protocols: ['https']
    serviceUrl: backendUrl
    subscriptionRequired: true
    // MCP endpoints
    apiType: 'http'
  }
}

// MCP tools/list operation
resource mcpToolsListOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
  parent: api
  name: 'mcp-tools-list'
  properties: {
    displayName: 'MCP Tools List'
    method: 'POST'
    urlTemplate: '/mcp/tools/list'
    description: 'List available MCP tools'
  }
}

// MCP tools/call operation
resource mcpToolsCallOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
  parent: api
  name: 'mcp-tools-call'
  properties: {
    displayName: 'MCP Tools Call'
    method: 'POST'
    urlTemplate: '/mcp/tools/call'
    description: 'Call an MCP tool'
  }
}

bicep

param apimServiceName string
param apiId string
param apiDisplayName string
param apiPath string
param backendUrl string

resource apimService 'Microsoft.ApiManagement/service@2024-06-01-preview' existing = {
  name: apimServiceName
}

resource api 'Microsoft.ApiManagement/service/apis@2024-06-01-preview' = {
  parent: apimService
  name: apiId
  properties: {
    displayName: apiDisplayName
    path: apiPath
    protocols: ['https']
    serviceUrl: backendUrl
    subscriptionRequired: true
    // MCP端点
    apiType: 'http'
  }
}

// MCP tools/list操作
resource mcpToolsListOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
  parent: api
  name: 'mcp-tools-list'
  properties: {
    displayName: 'MCP工具列表'
    method: 'POST'
    urlTemplate: '/mcp/tools/list'
    description: '列出可用的MCP工具'
  }
}

// MCP tools/call操作
resource mcpToolsCallOperation 'Microsoft.ApiManagement/service/apis/operations@2024-06-01-preview' = {
  parent: api
  name: 'mcp-tools-call'
  properties: {
    displayName: '调用MCP工具'
    method: 'POST'
    urlTemplate: '/mcp/tools/call'
    description: '调用MCP工具'
  }
}

Step 8: Test MCP Endpoint

步骤8：测试MCP端点

bash

undefined

bash

undefined

Get APIM gateway URL

获取APIM网关URL

GATEWAY_URL=$(az apim show
--name <apim-service-name>
--resource-group <apim-resource-group>
--query "gatewayUrl" -o tsv)

Test MCP tools/list endpoint

测试MCP tools/list端点

curl -X POST "${GATEWAY_URL}/<api-path>/mcp/tools/list"
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: <subscription-key>"
-d '{}'

undefined

curl -X POST "${GATEWAY_URL}/<api-path>/mcp/tools/list"
-H "Content-Type: application/json"
-H "Ocp-Apim-Subscription-Key: <订阅密钥>"
-d '{}'

undefined

MCP Tool Definition Schema

MCP工具定义 schema

When converting API operations to MCP tools, use this schema:

json

{
  "tools": [
    {
      "name": "get_weather",
      "description": "Get current weather for a location",
      "inputSchema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "City name or coordinates"
          }
        },
        "required": ["location"]
      }
    }
  ]
}

将API操作转换为MCP工具时，请使用以下schema：

json

{
  "tools": [
    {
      "name": "get_weather",
      "description": "获取指定地点的当前天气",
      "inputSchema": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "城市名称或坐标"
          }
        },
        "required": ["location"]
      }
    }
  ]
}

Reference

参考文档

Lab References (AI-Gateway Repo)

实验室参考（AI-Gateway仓库）

Essential Labs to Get Started:

Scenario	Lab	Description
Semantic Caching	semantic-caching	Cache similar prompts to reduce costs
Token Rate Limiting	token-rate-limiting	Limit tokens per minute
Content Safety	content-safety	Filter harmful content
Load Balancing	backend-pool-load-balancing	Distribute load across backends
MCP from API	mcp-from-api	Convert OpenAPI to MCP server
Zero to Production	zero-to-production	Complete production setup guide

Find more labs at: https://github.com/Azure-Samples/AI-Gateway/tree/main/labs

入门必备实验室：

场景	实验室	描述
语义缓存	semantic-caching	缓存相似提示词以降低成本
令牌速率限制	token-rate-limiting	限制每分钟的令牌使用量
内容安全	content-safety	过滤有害内容
负载均衡	backend-pool-load-balancing	在多个后端之间分配负载
从API创建MCP	mcp-from-api	将OpenAPI转换为MCP服务器
从0到生产	zero-to-production	完整的生产环境搭建指南

更多实验室请访问： https://github.com/Azure-Samples/AI-Gateway/tree/main/labs

Quick Start Checklist

快速开始检查清单

Prerequisites

前提条件

Azure subscription created
Azure CLI installed and authenticated (
```
az login
```
)
Resource group created for AI Gateway resources

已创建Azure订阅
已安装Azure CLI并完成身份验证 (
```
az login
```
)
已为AI网关资源创建资源组

Deployment

部署

Deploy APIM with Basicv2 SKU
Configure managed identity
Add backend for Azure OpenAI or AI Foundry
Apply policies (caching, rate limits, content safety)

部署Basicv2 SKU的APIM实例
配置托管身份
为Azure OpenAI或AI Foundry添加后端
应用策略（缓存、速率限制、内容安全）

Verification

验证

Test API endpoint through gateway
Verify token metrics in Application Insights
Check rate limiting headers in response
Validate content safety filtering

通过网关测试API端点
在Application Insights中验证令牌指标
检查响应中的速率限制头
验证内容安全过滤

Best Practices

最佳实践

Practice	Description
Default to Basicv2	Use Basicv2 SKU for cost/speed optimization
Use managed identity	Prefer managed identity over API keys for backend auth
Enable token metrics	Use `azure-openai-emit-token-metric` for cost tracking
Semantic caching	Cache similar prompts to reduce costs (60-80% savings possible)
Rate limit by key	Use subscription ID or IP for granular rate limiting
Content safety	Enable `shield-prompt` to detect jailbreak attempts

实践	描述
默认使用Basicv2	使用Basicv2 SKU优化成本和速度
使用托管身份	优先使用托管身份而非API密钥进行后端认证
启用令牌指标	使用 `azure-openai-emit-token-metric` 进行成本跟踪
语义缓存	缓存相似提示词以降低成本（可节省60-80%）
按密钥限制速率	使用订阅ID或IP实现细粒度速率限制
内容安全	启用 `shield-prompt` 以检测越狱攻击尝试

Troubleshooting

故障排除

Issue	Symptom	Solution
Slow APIM creation	Deployment takes 30+ minutes	Use Basicv2 SKU instead of Premium
Token limit exceeded	429 response	Increase `tokens-per-minute` or add load balancing
Cache not working	No cache hits	Lower `score-threshold` (e.g., 0.7)
Content blocked	False positives	Increase category thresholds
Backend auth fails	401 from Azure OpenAI	Assign Cognitive Services User role to APIM managed identity
Rate limit too strict	Legitimate requests blocked	Increase `calls` or `renewal-period`

问题	症状	解决方案
APIM创建缓慢	部署耗时30分钟以上	使用Basicv2 SKU替代Premium
令牌限制超出	429响应	提高 `tokens-per-minute` 或添加负载均衡
缓存不生效	无缓存命中	降低 `score-threshold` （例如0.7）
内容被误拦截	误报	提高分类阈值
后端认证失败	Azure OpenAI返回401	为APIM托管身份分配认知服务用户角色
速率限制过严	合法请求被拦截	提高 `calls` 或 `renewal-period`