databricks-model-serving

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Model Serving Endpoints

Model Serving 端点

FIRST: Use the parent

databricks-core

skill for CLI basics, authentication, and profile selection.

Model Serving provides managed endpoints for serving LLMs, custom ML models, and external models as scalable REST APIs. Endpoints are identified by name (unique per workspace).

首先：使用父级

databricks-core

技能了解CLI基础、身份验证和配置文件选择。

Model Serving提供托管端点，可将LLM、自定义ML模型和外部模型作为可扩展的REST API对外提供服务。端点通过名称标识（每个工作区内唯一）。

Endpoint Types

端点类型

Type	When to Use	Key Detail
Pay-per-token	Foundation Model APIs (Llama, DBRX, etc.)	Uses `system.ai.*` catalog models, simplest setup
Provisioned throughput	Dedicated GPU capacity	Guaranteed throughput, higher cost
Custom model	Your own MLflow models or containers	Deploy any model with an MLflow signature

类型	适用场景	核心细节
按Token付费	基础模型API（Llama、DBRX等）	使用 `system.ai.*` 目录下的模型，配置最简单
预置吞吐量	专用GPU容量	吞吐量有保障，成本更高
自定义模型	您自己的MLflow模型或容器	可部署任何带有MLflow签名的模型

Endpoint Structure

端点结构

Serving Endpoint (top-level, identified by NAME)
  ├── Config
  │     ├── Served Entities (model references + scaling config)
  │     └── Traffic Config (routing percentages across entities)
  ├── AI Gateway (rate limits, usage tracking)
  └── State (READY / NOT_READY, config_update status)

Served Entities: Each entity references a model (from Unity Catalog or MLflow) with scaling parameters. Get the entity name from
```
served_entities[].name
```
in the
```
get
```
output — needed for
```
build-logs
```
and
```
logs
```
commands.
Traffic Config: Routes requests across served entities by percentage (for A/B testing, canary deployments).
State: Endpoints transition
```
NOT_READY
```
→
```
READY
```
after creation or config update. Poll via
```
get
```
to check
```
state.ready
```
.

Serving Endpoint (top-level, identified by NAME)
  ├── Config
  │     ├── Served Entities (model references + scaling config)
  │     └── Traffic Config (routing percentages across entities)
  ├── AI Gateway (rate limits, usage tracking)
  └── State (READY / NOT_READY, config_update status)

已服务实体：每个实体引用来自Unity Catalog或MLflow的模型，并附带扩缩容参数。你可以从
```
get
```
命令输出的
```
served_entities[].name
```
字段获取实体名称，
```
build-logs
```
和
```
logs
```
命令需要使用该名称。
流量配置：按照百分比将请求路由到不同的已服务实体（适用于A/B测试、金丝雀发布）。
状态：端点在创建或配置更新后会从
```
NOT_READY
```
（未就绪）转为
```
READY
```
（就绪）。你可以通过
```
get
```
命令轮询检查
```
state.ready
```
字段。

CLI Discovery — ALWAYS Do This First

CLI指令查询 — 请务必优先执行此操作

Do NOT guess command syntax. Discover available commands and their usage dynamically:

bash

undefined

请勿猜测命令语法。 请动态查询可用命令及其用法：

bash

undefined

List all serving-endpoints subcommands

列出所有serving-endpoints子命令

databricks serving-endpoints -h

Get detailed usage for any subcommand (flags, args, JSON fields)

获取任意子命令的详细用法（参数、位置参数、JSON字段）

databricks serving-endpoints <subcommand> -h


Run `databricks serving-endpoints -h` before constructing any command. Run `databricks serving-endpoints <subcommand> -h` to discover exact flags, positional arguments, and JSON spec fields for that subcommand.

databricks serving-endpoints <subcommand> -h


在构造任何命令前请先运行`databricks serving-endpoints -h`。运行`databricks serving-endpoints <子命令> -h`可以查询该子命令的具体参数、位置参数和JSON规范字段。

Create an Endpoint

创建端点

Do NOT list endpoints before creating.

bash

databricks serving-endpoints create <ENDPOINT_NAME> \
  --json '{
    "served_entities": [{
      "entity_name": "<MODEL_CATALOG_PATH>",
      "entity_version": "<VERSION>",
      "min_provisioned_throughput": 0,
      "max_provisioned_throughput": 0,
      "workload_size": "Small"
    }],
    "traffic_config": {
      "routes": [{
        "served_entity_name": "<ENTITY_NAME>",
        "traffic_percentage": 100
      }]
    }
  }' --profile <PROFILE>

Discover available Foundation Models: check the
```
system.ai
```
catalog in Unity Catalog.

Long-running operation; the CLI waits for completion by default. Use

--no-wait

to return immediately, then poll:

bash

databricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE>
# Check: state.ready == "READY"

For provisioned throughput or custom model endpoints, run
```
databricks serving-endpoints create -h
```
to discover the required JSON fields for your endpoint type.

创建前无需列出所有端点。

bash

databricks serving-endpoints create <ENDPOINT_NAME> \
  --json '{
    "served_entities": [{
      "entity_name": "<MODEL_CATALOG_PATH>",
      "entity_version": "<VERSION>",
      "min_provisioned_throughput": 0,
      "max_provisioned_throughput": 0,
      "workload_size": "Small"
    }],
    "traffic_config": {
      "routes": [{
        "served_entity_name": "<ENTITY_NAME>",
        "traffic_percentage": 100
      }]
    }
  }' --profile <PROFILE>

查看可用的基础模型：请检查Unity Catalog中的
```
system.ai
```
目录。
这是长时间运行的操作，CLI默认会等待操作完成。你可以使用
```
--no-wait
```
参数让命令立即返回，之后再轮询状态：
bash
```
databricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE>
# 检查：state.ready == "READY"
```
对于预置吞吐量或自定义模型端点，请运行
```
databricks serving-endpoints create -h
```
查询对应端点类型所需的JSON字段。

Query an Endpoint

查询端点

bash

databricks serving-endpoints query <ENDPOINT_NAME> \
  --json '{"messages": [{"role": "user", "content": "Hello, how are you?"}]}' \
  --profile <PROFILE>

Use
```
--stream
```
for streaming responses.
For non-chat endpoints (embeddings, custom models): use
```
get-open-api <ENDPOINT_NAME>
```
first to discover the request/response schema, then construct the appropriate JSON payload.

bash

databricks serving-endpoints query <ENDPOINT_NAME> \
  --json '{"messages": [{"role": "user", "content": "Hello, how are you?"}]}' \
  --profile <PROFILE>

如需流式响应请使用
```
--stream
```
参数。
对于非聊天类端点（嵌入模型、自定义模型）：请先使用
```
get-open-api <端点名称>
```
查询请求/响应 schema，再构造对应的JSON请求体。

Get Endpoint Schema (OpenAPI)

获取端点Schema（OpenAPI）

Returns the OpenAPI 3.1 JSON schema describing what each served model accepts and returns. Use this to understand an endpoint's input/output format before querying it.

bash

databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>

The schema shows paths per served model (e.g.,

/served-models/<model-name>/invocations

) with full request/response definitions including parameter types, enums, and nullable fields.

返回OpenAPI 3.1标准的JSON schema，描述每个已服务模型支持的请求和返回格式。在查询端点前可以使用该接口了解端点的输入输出格式。

bash

databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>

该schema会展示每个已服务模型的路径（例如

/served-models/<模型名称>/invocations

），以及完整的请求/响应定义，包括参数类型、枚举值和可空字段。

Other Commands

其他命令

Run

databricks serving-endpoints <subcommand> -h

for usage details.

Task	Command	Notes
List all endpoints	`list`
Get endpoint details	`get <NAME>`	Shows state, config, served entities
Delete endpoint	`delete <NAME>`
Update served entities or traffic	`update-config <NAME> --json '...'`	Zero-downtime: old config serves until new is ready
Rate limits & usage tracking	`put-ai-gateway <NAME> --json '...'`
Update tags	`patch <NAME> --json '...'`
Build logs	`build-logs <NAME> <SERVED_MODEL>`	Get `SERVED_MODEL` from `get` output: `served_entities[].name`
Runtime logs	`logs <NAME> <SERVED_MODEL>`
Metrics (Prometheus format)	`export-metrics <NAME>`
Permissions	`get-permissions <ENDPOINT_ID>`	⚠️ Uses endpoint ID (hex string), not name. Find ID via `get` .

运行

databricks serving-endpoints <子命令> -h

查看用法详情。

任务	命令	备注
列出所有端点	`list`
获取端点详情	`get <NAME>`	展示状态、配置、已服务实体
删除端点	`delete <NAME>`
更新已服务实体或流量配置	`update-config <NAME> --json '...'`	零停机：新配置就绪前旧配置会持续提供服务
速率限制与用量追踪	`put-ai-gateway <NAME> --json '...'`
更新标签	`patch <NAME> --json '...'`
构建日志	`build-logs <NAME> <SERVED_MODEL>`	从 `get` 命令输出的 `served_entities[].name` 字段获取 `SERVED_MODEL` 的值
运行时日志	`logs <NAME> <SERVED_MODEL>`
指标（Prometheus格式）	`export-metrics <NAME>`
权限	`get-permissions <ENDPOINT_ID>`	⚠️ 使用端点ID（十六进制字符串）而非名称，可通过 `get` 命令获取ID

What's Next

后续步骤

Integrate with a Databricks App

与Databricks App集成

After creating a serving endpoint, wire it into a Databricks App.

Step 1 — Check if the
serving
plugin is available in the AppKit template:

bash

databricks apps manifest --profile <PROFILE>

If the output includes a

serving

plugin, scaffold with:

bash

databricks apps init --name <APP_NAME> \
  --features serving \
  --set "serving.serving-endpoint.name=<ENDPOINT_NAME>" \
  --run none --profile <PROFILE>

Step 2 — If no
serving
plugin, add the endpoint resource manually to an existing app's

databricks.yml

yaml

resources:
  apps:
    my_app:
      resources:
        - name: my-model-endpoint
          serving_endpoint:
            name: <ENDPOINT_NAME>
            permission: CAN_QUERY

And inject the endpoint name as an environment variable in

app.yaml

yaml

env:
  - name: SERVING_ENDPOINT
    valueFrom: serving-endpoint

Then add a tRPC route to call it from your app. For the full app integration pattern, use the databricks-apps
skill and read the Model Serving Guide.

创建服务端点后，你可以将其接入Databricks App。

步骤1 — 检查AppKit模板中是否提供
serving
插件：

bash

databricks apps manifest --profile <PROFILE>

如果输出包含

serving

插件，可以使用以下命令生成脚手架：

bash

databricks apps init --name <APP_NAME> \
  --features serving \
  --set "serving.serving-endpoint.name=<ENDPOINT_NAME>" \
  --run none --profile <PROFILE>

步骤2 — 如果没有
serving
插件，请手动将端点资源添加到现有应用的

databricks.yml

文件中：

yaml

resources:
  apps:
    my_app:
      resources:
        - name: my-model-endpoint
          serving_endpoint:
            name: <ENDPOINT_NAME>
            permission: CAN_QUERY

然后在

app.yaml

中将端点名称作为环境变量注入：

yaml

env:
  - name: SERVING_ENDPOINT
    valueFrom: serving-endpoint

之后添加tRPC路由即可从应用中调用该端点。如需了解完整的应用集成模式，请使用**

databricks-apps

**技能并阅读Model Serving指南。

Troubleshooting

问题排查

Error	Solution
`cannot configure default credentials`	Use `--profile` flag or authenticate first
`PERMISSION_DENIED`	Check workspace permissions; for apps, ensure `serving_endpoint` resource declared with `CAN_QUERY`
Endpoint stuck in `NOT_READY`	Check `build-logs` for the served model (get entity name from `get` output)
`RESOURCE_DOES_NOT_EXIST`	Verify endpoint name with `list`
Query returns 404	Endpoint may still be provisioning; check `state.ready` via `get`
`RATE_LIMIT_EXCEEDED` (429)	AI Gateway rate limit; check `put-ai-gateway` config or retry after backoff

错误	解决方案
`cannot configure default credentials`	使用 `--profile` 参数或先完成身份验证
`PERMISSION_DENIED`	检查工作区权限；对于应用，请确保 `serving_endpoint` 资源声明了 `CAN_QUERY` 权限
端点一直处于 `NOT_READY` 状态	检查已服务模型的 `build-logs` （从 `get` 命令输出中获取实体名称）
`RESOURCE_DOES_NOT_EXIST`	通过 `list` 命令验证端点名称是否正确
查询返回404	端点可能还在部署中，通过 `get` 命令检查 `state.ready` 状态
`RATE_LIMIT_EXCEEDED` (429)	触发AI Gateway速率限制；检查 `put-ai-gateway` 配置或退避一段时间后重试