databricks-model-serving
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseModel Serving Endpoints
Model Serving 端点
FIRST: Use the parent skill for CLI basics, authentication, and profile selection.
databricks-coreModel Serving provides managed endpoints for serving LLMs, custom ML models, and external models as scalable REST APIs. Endpoints are identified by name (unique per workspace).
首先:使用父级技能了解CLI基础、身份验证和配置文件选择。
databricks-coreModel Serving提供托管端点,可将LLM、自定义ML模型和外部模型作为可扩展的REST API对外提供服务。端点通过名称标识(每个工作区内唯一)。
Endpoint Types
端点类型
| Type | When to Use | Key Detail |
|---|---|---|
| Pay-per-token | Foundation Model APIs (Llama, DBRX, etc.) | Uses |
| Provisioned throughput | Dedicated GPU capacity | Guaranteed throughput, higher cost |
| Custom model | Your own MLflow models or containers | Deploy any model with an MLflow signature |
| 类型 | 适用场景 | 核心细节 |
|---|---|---|
| 按Token付费 | 基础模型API(Llama、DBRX等) | 使用 |
| 预置吞吐量 | 专用GPU容量 | 吞吐量有保障,成本更高 |
| 自定义模型 | 您自己的MLflow模型或容器 | 可部署任何带有MLflow签名的模型 |
Endpoint Structure
端点结构
Serving Endpoint (top-level, identified by NAME)
├── Config
│ ├── Served Entities (model references + scaling config)
│ └── Traffic Config (routing percentages across entities)
├── AI Gateway (rate limits, usage tracking)
└── State (READY / NOT_READY, config_update status)- Served Entities: Each entity references a model (from Unity Catalog or MLflow) with scaling parameters. Get the entity name from in the
served_entities[].nameoutput — needed forgetandbuild-logscommands.logs - Traffic Config: Routes requests across served entities by percentage (for A/B testing, canary deployments).
- State: Endpoints transition →
NOT_READYafter creation or config update. Poll viaREADYto checkget.state.ready
Serving Endpoint (top-level, identified by NAME)
├── Config
│ ├── Served Entities (model references + scaling config)
│ └── Traffic Config (routing percentages across entities)
├── AI Gateway (rate limits, usage tracking)
└── State (READY / NOT_READY, config_update status)- 已服务实体:每个实体引用来自Unity Catalog或MLflow的模型,并附带扩缩容参数。你可以从命令输出的
get字段获取实体名称,served_entities[].name和build-logs命令需要使用该名称。logs - 流量配置:按照百分比将请求路由到不同的已服务实体(适用于A/B测试、金丝雀发布)。
- 状态:端点在创建或配置更新后会从(未就绪)转为
NOT_READY(就绪)。你可以通过READY命令轮询检查get字段。state.ready
CLI Discovery — ALWAYS Do This First
CLI指令查询 — 请务必优先执行此操作
Do NOT guess command syntax. Discover available commands and their usage dynamically:
bash
undefined请勿猜测命令语法。 请动态查询可用命令及其用法:
bash
undefinedList all serving-endpoints subcommands
列出所有serving-endpoints子命令
databricks serving-endpoints -h
databricks serving-endpoints -h
Get detailed usage for any subcommand (flags, args, JSON fields)
获取任意子命令的详细用法(参数、位置参数、JSON字段)
databricks serving-endpoints <subcommand> -h
Run `databricks serving-endpoints -h` before constructing any command. Run `databricks serving-endpoints <subcommand> -h` to discover exact flags, positional arguments, and JSON spec fields for that subcommand.databricks serving-endpoints <subcommand> -h
在构造任何命令前请先运行`databricks serving-endpoints -h`。运行`databricks serving-endpoints <子命令> -h`可以查询该子命令的具体参数、位置参数和JSON规范字段。Create an Endpoint
创建端点
Do NOT list endpoints before creating.
bash
databricks serving-endpoints create <ENDPOINT_NAME> \
--json '{
"served_entities": [{
"entity_name": "<MODEL_CATALOG_PATH>",
"entity_version": "<VERSION>",
"min_provisioned_throughput": 0,
"max_provisioned_throughput": 0,
"workload_size": "Small"
}],
"traffic_config": {
"routes": [{
"served_entity_name": "<ENTITY_NAME>",
"traffic_percentage": 100
}]
}
}' --profile <PROFILE>- Discover available Foundation Models: check the catalog in Unity Catalog.
system.ai - Long-running operation; the CLI waits for completion by default. Use to return immediately, then poll:
--no-waitbashdatabricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE> # Check: state.ready == "READY" - For provisioned throughput or custom model endpoints, run to discover the required JSON fields for your endpoint type.
databricks serving-endpoints create -h
创建前无需列出所有端点。
bash
databricks serving-endpoints create <ENDPOINT_NAME> \
--json '{
"served_entities": [{
"entity_name": "<MODEL_CATALOG_PATH>",
"entity_version": "<VERSION>",
"min_provisioned_throughput": 0,
"max_provisioned_throughput": 0,
"workload_size": "Small"
}],
"traffic_config": {
"routes": [{
"served_entity_name": "<ENTITY_NAME>",
"traffic_percentage": 100
}]
}
}' --profile <PROFILE>- 查看可用的基础模型:请检查Unity Catalog中的目录。
system.ai - 这是长时间运行的操作,CLI默认会等待操作完成。你可以使用参数让命令立即返回,之后再轮询状态:
--no-waitbashdatabricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE> # 检查:state.ready == "READY" - 对于预置吞吐量或自定义模型端点,请运行查询对应端点类型所需的JSON字段。
databricks serving-endpoints create -h
Query an Endpoint
查询端点
bash
databricks serving-endpoints query <ENDPOINT_NAME> \
--json '{"messages": [{"role": "user", "content": "Hello, how are you?"}]}' \
--profile <PROFILE>- Use for streaming responses.
--stream - For non-chat endpoints (embeddings, custom models): use first to discover the request/response schema, then construct the appropriate JSON payload.
get-open-api <ENDPOINT_NAME>
bash
databricks serving-endpoints query <ENDPOINT_NAME> \
--json '{"messages": [{"role": "user", "content": "Hello, how are you?"}]}' \
--profile <PROFILE>- 如需流式响应请使用参数。
--stream - 对于非聊天类端点(嵌入模型、自定义模型):请先使用查询请求/响应 schema,再构造对应的JSON请求体。
get-open-api <端点名称>
Get Endpoint Schema (OpenAPI)
获取端点Schema(OpenAPI)
Returns the OpenAPI 3.1 JSON schema describing what each served model accepts and returns. Use this to understand an endpoint's input/output format before querying it.
bash
databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>The schema shows paths per served model (e.g., ) with full request/response definitions including parameter types, enums, and nullable fields.
/served-models/<model-name>/invocations返回OpenAPI 3.1标准的JSON schema,描述每个已服务模型支持的请求和返回格式。在查询端点前可以使用该接口了解端点的输入输出格式。
bash
databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>该schema会展示每个已服务模型的路径(例如),以及完整的请求/响应定义,包括参数类型、枚举值和可空字段。
/served-models/<模型名称>/invocationsOther Commands
其他命令
Run for usage details.
databricks serving-endpoints <subcommand> -h| Task | Command | Notes |
|---|---|---|
| List all endpoints | | |
| Get endpoint details | | Shows state, config, served entities |
| Delete endpoint | | |
| Update served entities or traffic | | Zero-downtime: old config serves until new is ready |
| Rate limits & usage tracking | | |
| Update tags | | |
| Build logs | | Get |
| Runtime logs | | |
| Metrics (Prometheus format) | | |
| Permissions | | ⚠️ Uses endpoint ID (hex string), not name. Find ID via |
运行查看用法详情。
databricks serving-endpoints <子命令> -h| 任务 | 命令 | 备注 |
|---|---|---|
| 列出所有端点 | | |
| 获取端点详情 | | 展示状态、配置、已服务实体 |
| 删除端点 | | |
| 更新已服务实体或流量配置 | | 零停机:新配置就绪前旧配置会持续提供服务 |
| 速率限制与用量追踪 | | |
| 更新标签 | | |
| 构建日志 | | 从 |
| 运行时日志 | | |
| 指标(Prometheus格式) | | |
| 权限 | | ⚠️ 使用端点ID(十六进制字符串)而非名称,可通过 |
What's Next
后续步骤
Integrate with a Databricks App
与Databricks App集成
After creating a serving endpoint, wire it into a Databricks App.
Step 1 — Check if the plugin is available in the AppKit template:
servingbash
databricks apps manifest --profile <PROFILE>If the output includes a plugin, scaffold with:
servingbash
databricks apps init --name <APP_NAME> \
--features serving \
--set "serving.serving-endpoint.name=<ENDPOINT_NAME>" \
--run none --profile <PROFILE>Step 2 — If no plugin, add the endpoint resource manually to an existing app's :
servingdatabricks.ymlyaml
resources:
apps:
my_app:
resources:
- name: my-model-endpoint
serving_endpoint:
name: <ENDPOINT_NAME>
permission: CAN_QUERYAnd inject the endpoint name as an environment variable in :
app.yamlyaml
env:
- name: SERVING_ENDPOINT
valueFrom: serving-endpointThen add a tRPC route to call it from your app. For the full app integration pattern, use the skill and read the Model Serving Guide.
databricks-apps创建服务端点后,你可以将其接入Databricks App。
步骤1 — 检查AppKit模板中是否提供插件:
servingbash
databricks apps manifest --profile <PROFILE>如果输出包含插件,可以使用以下命令生成脚手架:
servingbash
databricks apps init --name <APP_NAME> \
--features serving \
--set "serving.serving-endpoint.name=<ENDPOINT_NAME>" \
--run none --profile <PROFILE>步骤2 — 如果没有插件,请手动将端点资源添加到现有应用的文件中:
servingdatabricks.ymlyaml
resources:
apps:
my_app:
resources:
- name: my-model-endpoint
serving_endpoint:
name: <ENDPOINT_NAME>
permission: CAN_QUERY然后在中将端点名称作为环境变量注入:
app.yamlyaml
env:
- name: SERVING_ENDPOINT
valueFrom: serving-endpoint之后添加tRPC路由即可从应用中调用该端点。如需了解完整的应用集成模式,请使用****技能并阅读Model Serving指南。
databricks-appsTroubleshooting
问题排查
| Error | Solution |
|---|---|
| Use |
| Check workspace permissions; for apps, ensure |
Endpoint stuck in | Check |
| Verify endpoint name with |
| Query returns 404 | Endpoint may still be provisioning; check |
| AI Gateway rate limit; check |
| 错误 | 解决方案 |
|---|---|
| 使用 |
| 检查工作区权限;对于应用,请确保 |
端点一直处于 | 检查已服务模型的 |
| 通过 |
| 查询返回404 | 端点可能还在部署中,通过 |
| 触发AI Gateway速率限制;检查 |