databricks-model-serving

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Model Serving Endpoints

Model Serving 端点

FIRST: Use the parent
databricks-core
skill for CLI basics, authentication, and profile selection.
Model Serving provides managed endpoints for serving LLMs, custom ML models, and external models as scalable REST APIs. Endpoints are identified by name (unique per workspace).
首先:使用父级
databricks-core
技能了解CLI基础、身份验证和配置文件选择。
Model Serving提供托管端点,可将LLM、自定义ML模型和外部模型作为可扩展的REST API对外提供服务。端点通过名称标识(每个工作区内唯一)。

Endpoint Types

端点类型

TypeWhen to UseKey Detail
Pay-per-tokenFoundation Model APIs (Llama, DBRX, etc.)Uses
system.ai.*
catalog models, simplest setup
Provisioned throughputDedicated GPU capacityGuaranteed throughput, higher cost
Custom modelYour own MLflow models or containersDeploy any model with an MLflow signature
类型适用场景核心细节
按Token付费基础模型API(Llama、DBRX等)使用
system.ai.*
目录下的模型,配置最简单
预置吞吐量专用GPU容量吞吐量有保障,成本更高
自定义模型您自己的MLflow模型或容器可部署任何带有MLflow签名的模型

Endpoint Structure

端点结构

Serving Endpoint (top-level, identified by NAME)
  ├── Config
  │     ├── Served Entities (model references + scaling config)
  │     └── Traffic Config (routing percentages across entities)
  ├── AI Gateway (rate limits, usage tracking)
  └── State (READY / NOT_READY, config_update status)
  • Served Entities: Each entity references a model (from Unity Catalog or MLflow) with scaling parameters. Get the entity name from
    served_entities[].name
    in the
    get
    output — needed for
    build-logs
    and
    logs
    commands.
  • Traffic Config: Routes requests across served entities by percentage (for A/B testing, canary deployments).
  • State: Endpoints transition
    NOT_READY
    READY
    after creation or config update. Poll via
    get
    to check
    state.ready
    .
Serving Endpoint (top-level, identified by NAME)
  ├── Config
  │     ├── Served Entities (model references + scaling config)
  │     └── Traffic Config (routing percentages across entities)
  ├── AI Gateway (rate limits, usage tracking)
  └── State (READY / NOT_READY, config_update status)
  • 已服务实体:每个实体引用来自Unity Catalog或MLflow的模型,并附带扩缩容参数。你可以从
    get
    命令输出的
    served_entities[].name
    字段获取实体名称,
    build-logs
    logs
    命令需要使用该名称。
  • 流量配置:按照百分比将请求路由到不同的已服务实体(适用于A/B测试、金丝雀发布)。
  • 状态:端点在创建或配置更新后会从
    NOT_READY
    (未就绪)转为
    READY
    (就绪)。你可以通过
    get
    命令轮询检查
    state.ready
    字段。

CLI Discovery — ALWAYS Do This First

CLI指令查询 — 请务必优先执行此操作

Do NOT guess command syntax. Discover available commands and their usage dynamically:
bash
undefined
请勿猜测命令语法。 请动态查询可用命令及其用法:
bash
undefined

List all serving-endpoints subcommands

列出所有serving-endpoints子命令

databricks serving-endpoints -h
databricks serving-endpoints -h

Get detailed usage for any subcommand (flags, args, JSON fields)

获取任意子命令的详细用法(参数、位置参数、JSON字段)

databricks serving-endpoints <subcommand> -h

Run `databricks serving-endpoints -h` before constructing any command. Run `databricks serving-endpoints <subcommand> -h` to discover exact flags, positional arguments, and JSON spec fields for that subcommand.
databricks serving-endpoints <subcommand> -h

在构造任何命令前请先运行`databricks serving-endpoints -h`。运行`databricks serving-endpoints <子命令> -h`可以查询该子命令的具体参数、位置参数和JSON规范字段。

Create an Endpoint

创建端点

Do NOT list endpoints before creating.
bash
databricks serving-endpoints create <ENDPOINT_NAME> \
  --json '{
    "served_entities": [{
      "entity_name": "<MODEL_CATALOG_PATH>",
      "entity_version": "<VERSION>",
      "min_provisioned_throughput": 0,
      "max_provisioned_throughput": 0,
      "workload_size": "Small"
    }],
    "traffic_config": {
      "routes": [{
        "served_entity_name": "<ENTITY_NAME>",
        "traffic_percentage": 100
      }]
    }
  }' --profile <PROFILE>
  • Discover available Foundation Models: check the
    system.ai
    catalog in Unity Catalog.
  • Long-running operation; the CLI waits for completion by default. Use
    --no-wait
    to return immediately, then poll:
    bash
    databricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE>
    # Check: state.ready == "READY"
  • For provisioned throughput or custom model endpoints, run
    databricks serving-endpoints create -h
    to discover the required JSON fields for your endpoint type.
创建前无需列出所有端点。
bash
databricks serving-endpoints create <ENDPOINT_NAME> \
  --json '{
    "served_entities": [{
      "entity_name": "<MODEL_CATALOG_PATH>",
      "entity_version": "<VERSION>",
      "min_provisioned_throughput": 0,
      "max_provisioned_throughput": 0,
      "workload_size": "Small"
    }],
    "traffic_config": {
      "routes": [{
        "served_entity_name": "<ENTITY_NAME>",
        "traffic_percentage": 100
      }]
    }
  }' --profile <PROFILE>
  • 查看可用的基础模型:请检查Unity Catalog中的
    system.ai
    目录。
  • 这是长时间运行的操作,CLI默认会等待操作完成。你可以使用
    --no-wait
    参数让命令立即返回,之后再轮询状态:
    bash
    databricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE>
    # 检查:state.ready == "READY"
  • 对于预置吞吐量或自定义模型端点,请运行
    databricks serving-endpoints create -h
    查询对应端点类型所需的JSON字段。

Query an Endpoint

查询端点

bash
databricks serving-endpoints query <ENDPOINT_NAME> \
  --json '{"messages": [{"role": "user", "content": "Hello, how are you?"}]}' \
  --profile <PROFILE>
  • Use
    --stream
    for streaming responses.
  • For non-chat endpoints (embeddings, custom models): use
    get-open-api <ENDPOINT_NAME>
    first to discover the request/response schema, then construct the appropriate JSON payload.
bash
databricks serving-endpoints query <ENDPOINT_NAME> \
  --json '{"messages": [{"role": "user", "content": "Hello, how are you?"}]}' \
  --profile <PROFILE>
  • 如需流式响应请使用
    --stream
    参数。
  • 对于非聊天类端点(嵌入模型、自定义模型):请先使用
    get-open-api <端点名称>
    查询请求/响应 schema,再构造对应的JSON请求体。

Get Endpoint Schema (OpenAPI)

获取端点Schema(OpenAPI)

Returns the OpenAPI 3.1 JSON schema describing what each served model accepts and returns. Use this to understand an endpoint's input/output format before querying it.
bash
databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>
The schema shows paths per served model (e.g.,
/served-models/<model-name>/invocations
) with full request/response definitions including parameter types, enums, and nullable fields.
返回OpenAPI 3.1标准的JSON schema,描述每个已服务模型支持的请求和返回格式。在查询端点前可以使用该接口了解端点的输入输出格式。
bash
databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>
该schema会展示每个已服务模型的路径(例如
/served-models/<模型名称>/invocations
),以及完整的请求/响应定义,包括参数类型、枚举值和可空字段。

Other Commands

其他命令

Run
databricks serving-endpoints <subcommand> -h
for usage details.
TaskCommandNotes
List all endpoints
list
Get endpoint details
get <NAME>
Shows state, config, served entities
Delete endpoint
delete <NAME>
Update served entities or traffic
update-config <NAME> --json '...'
Zero-downtime: old config serves until new is ready
Rate limits & usage tracking
put-ai-gateway <NAME> --json '...'
Update tags
patch <NAME> --json '...'
Build logs
build-logs <NAME> <SERVED_MODEL>
Get
SERVED_MODEL
from
get
output:
served_entities[].name
Runtime logs
logs <NAME> <SERVED_MODEL>
Metrics (Prometheus format)
export-metrics <NAME>
Permissions
get-permissions <ENDPOINT_ID>
⚠️ Uses endpoint ID (hex string), not name. Find ID via
get
.
运行
databricks serving-endpoints <子命令> -h
查看用法详情。
任务命令备注
列出所有端点
list
获取端点详情
get <NAME>
展示状态、配置、已服务实体
删除端点
delete <NAME>
更新已服务实体或流量配置
update-config <NAME> --json '...'
零停机:新配置就绪前旧配置会持续提供服务
速率限制与用量追踪
put-ai-gateway <NAME> --json '...'
更新标签
patch <NAME> --json '...'
构建日志
build-logs <NAME> <SERVED_MODEL>
get
命令输出的
served_entities[].name
字段获取
SERVED_MODEL
的值
运行时日志
logs <NAME> <SERVED_MODEL>
指标(Prometheus格式)
export-metrics <NAME>
权限
get-permissions <ENDPOINT_ID>
⚠️ 使用端点ID(十六进制字符串)而非名称,可通过
get
命令获取ID

What's Next

后续步骤

Integrate with a Databricks App

与Databricks App集成

After creating a serving endpoint, wire it into a Databricks App.
Step 1 — Check if the
serving
plugin is available
in the AppKit template:
bash
databricks apps manifest --profile <PROFILE>
If the output includes a
serving
plugin, scaffold with:
bash
databricks apps init --name <APP_NAME> \
  --features serving \
  --set "serving.serving-endpoint.name=<ENDPOINT_NAME>" \
  --run none --profile <PROFILE>
Step 2 — If no
serving
plugin
, add the endpoint resource manually to an existing app's
databricks.yml
:
yaml
resources:
  apps:
    my_app:
      resources:
        - name: my-model-endpoint
          serving_endpoint:
            name: <ENDPOINT_NAME>
            permission: CAN_QUERY
And inject the endpoint name as an environment variable in
app.yaml
:
yaml
env:
  - name: SERVING_ENDPOINT
    valueFrom: serving-endpoint
Then add a tRPC route to call it from your app. For the full app integration pattern, use the
databricks-apps
skill and read the Model Serving Guide.
创建服务端点后,你可以将其接入Databricks App。
步骤1 — 检查AppKit模板中是否提供
serving
插件
bash
databricks apps manifest --profile <PROFILE>
如果输出包含
serving
插件,可以使用以下命令生成脚手架:
bash
databricks apps init --name <APP_NAME> \
  --features serving \
  --set "serving.serving-endpoint.name=<ENDPOINT_NAME>" \
  --run none --profile <PROFILE>
步骤2 — 如果没有
serving
插件
,请手动将端点资源添加到现有应用的
databricks.yml
文件中:
yaml
resources:
  apps:
    my_app:
      resources:
        - name: my-model-endpoint
          serving_endpoint:
            name: <ENDPOINT_NAME>
            permission: CAN_QUERY
然后在
app.yaml
中将端点名称作为环境变量注入:
yaml
env:
  - name: SERVING_ENDPOINT
    valueFrom: serving-endpoint
之后添加tRPC路由即可从应用中调用该端点。如需了解完整的应用集成模式,请使用**
databricks-apps
**技能并阅读Model Serving指南

Troubleshooting

问题排查

ErrorSolution
cannot configure default credentials
Use
--profile
flag or authenticate first
PERMISSION_DENIED
Check workspace permissions; for apps, ensure
serving_endpoint
resource declared with
CAN_QUERY
Endpoint stuck in
NOT_READY
Check
build-logs
for the served model (get entity name from
get
output)
RESOURCE_DOES_NOT_EXIST
Verify endpoint name with
list
Query returns 404Endpoint may still be provisioning; check
state.ready
via
get
RATE_LIMIT_EXCEEDED
(429)
AI Gateway rate limit; check
put-ai-gateway
config or retry after backoff
错误解决方案
cannot configure default credentials
使用
--profile
参数或先完成身份验证
PERMISSION_DENIED
检查工作区权限;对于应用,请确保
serving_endpoint
资源声明了
CAN_QUERY
权限
端点一直处于
NOT_READY
状态
检查已服务模型的
build-logs
(从
get
命令输出中获取实体名称)
RESOURCE_DOES_NOT_EXIST
通过
list
命令验证端点名称是否正确
查询返回404端点可能还在部署中,通过
get
命令检查
state.ready
状态
RATE_LIMIT_EXCEEDED
(429)
触发AI Gateway速率限制;检查
put-ai-gateway
配置或退避一段时间后重试