databricks-serverless-migration

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Serverless Compute Migration

Serverless Compute 迁移

FIRST: Use the parent

databricks-core

skill for CLI basics, authentication, and profile selection.

Analyze existing Databricks code for serverless compute compatibility and guide migration from classic clusters. The skill follows a 4-step migration lifecycle: Ingest the workload → Analyze for compatibility → Test via A/B comparison → Validate and iterate.

注意事项：请先使用父技能

databricks-core

完成CLI基础操作、身份验证和配置文件选择。

分析现有Databricks代码的serverless compute兼容性，并指导从classic集群进行迁移。本技能遵循四步迁移生命周期：采集工作负载 → 分析兼容性 → 通过A/B对比测试 → 验证并迭代。

When to Use This Skill

适用场景

Migrating notebooks, jobs, or pipelines from classic compute to serverless
Checking if existing code is serverless-compatible
Writing new code that targets serverless compute
Troubleshooting serverless-specific errors after migration
Choosing between Performance-Optimized and Standard mode

将notebook、任务或管道从classic compute迁移至serverless
检查现有代码是否兼容serverless
编写针对serverless compute的新代码
迁移后排查serverless特有的错误
在Performance-Optimized模式和Standard模式之间做选择

Understanding Migration Blockers

迁移阻碍因素说明

Migration blockers fall into three categories. Focus your effort on category 2 — that's where this skill helps most.

Category	Description	Action
1. Feature expanding	Databricks is actively expanding support (e.g., SparkML, custom JDBC)	Use the workaround now and revisit later
2. Code/config change needed	Your code uses patterns that need updating for serverless (e.g., RDDs, DBFS, streaming triggers)	This skill helps here — it detects these patterns and provides fixes
3. Classic-only	Workload requires capabilities not available on serverless (e.g., root OS access, R language)	Keep on classic compute

迁移阻碍因素分为三类。重点关注第二类——本技能主要针对此类问题提供帮助。

类别	描述	操作
1. 功能扩展中	Databricks正积极扩展支持范围（如SparkML、自定义JDBC）	先使用临时解决方案，后续再跟进
2. 需要修改代码/配置	你的代码使用了需要针对serverless更新的模式（如RDDs、DBFS、流触发器）	本技能可提供帮助——它能检测这些模式并提供修复方案
3. 仅支持classic	工作负载需要serverless不具备的功能（如根操作系统访问权限、R语言）	继续使用classic compute

Decision Tree: Is My Workload Ready?

决策树：我的工作负载是否已准备好迁移？

Workload → Check language
├── R code → Category 3: keep on classic
├── Scala notebook cells → Category 2: port to PySpark/SQL or compile as JAR
├── Python / SQL → Continue
    ├── Uses RDD APIs? → Category 2: rewrite to DataFrame API (see fixes below)
    ├── Uses DBFS paths? → Category 2: migrate to UC Volumes
    ├── Uses Hive Metastore? → Category 2: migrate to Unity Catalog (or use HMS Federation)
    ├── Uses df.cache/persist? → Category 1: remove and materialize to Delta (native support coming soon)
    ├── Uses streaming?
    │   ├── ProcessingTime trigger → Category 2: use AvailableNow or migrate to SDP
    │   ├── Continuous trigger → Category 2: use SDP continuous mode
    │   ├── No trigger specified → Category 2: add explicit .trigger(availableNow=True)
    │   └── AvailableNow / Once → Ready ✓
    ├── Uses init scripts? → Category 2: use Environments
    ├── Uses VPC peering? → Category 2: use NCCs / Private Link
    ├── Uses unsupported Spark configs? → Category 2: remove (serverless auto-tunes)
    ├── Uses custom JDBC drivers? → Category 2: use Lakehouse Federation or built-in JDBC
    ├── Uses Docker containers? → Category 3: use Environments for libs, or keep on classic
    └── All clear → Ready for serverless ✓

Workload → Check language
├── R code → Category 3: keep on classic
├── Scala notebook cells → Category 2: port to PySpark/SQL or compile as JAR
├── Python / SQL → Continue
    ├── Uses RDD APIs? → Category 2: rewrite to DataFrame API (see fixes below)
    ├── Uses DBFS paths? → Category 2: migrate to UC Volumes
    ├── Uses Hive Metastore? → Category 2: migrate to Unity Catalog (or use HMS Federation)
    ├── Uses df.cache/persist? → Category 1: remove and materialize to Delta (native support coming soon)
    ├── Uses streaming?
    │   ├── ProcessingTime trigger → Category 2: use AvailableNow or migrate to SDP
    │   ├── Continuous trigger → Category 2: use SDP continuous mode
    │   ├── No trigger specified → Category 2: add explicit .trigger(availableNow=True)
    │   └── AvailableNow / Once → Ready ✓
    ├── Uses init scripts? → Category 2: use Environments
    ├── Uses VPC peering? → Category 2: use NCCs / Private Link
    ├── Uses unsupported Spark configs? → Category 2: remove (serverless auto-tunes)
    ├── Uses custom JDBC drivers? → Category 2: use Lakehouse Federation or built-in JDBC
    ├── Uses Docker containers? → Category 3: use Environments for libs, or keep on classic
    └── All clear → Ready for serverless ✓

Migration Workflow

迁移流程

Step 1: Ingest — Gather Workload Context

步骤1：采集——收集工作负载上下文

Confirm the migration target is serverless compute. This skill is purpose-built for classic → serverless migrations. The checks, fixes, and workflow all target the serverless compute architecture (Spark Connect, Environments, NCCs). If the user wants to upgrade between classic DBR versions instead, this skill does not apply — classic DBR upgrades have a different compatibility surface and should follow the standard DBR upgrade guide.

Collect the full picture of what needs to migrate to serverless:

Read the user's notebook/script files
Identify the classic cluster configuration (instance type, DBR version, Spark configs, init scripts, libraries)
Note the networking setup (VPC peering, instance profiles, mounts)
Understand the workload type: batch job, streaming, interactive notebook, pipeline
Determine the target: the output is always a serverless compute configuration, not a classic cluster with a newer DBR

确认迁移目标为serverless compute。本技能专为classic → serverless迁移设计，所有检查、修复方案和流程均针对serverless compute架构（Spark Connect、Environments、NCCs）。如果用户希望在classic DBR版本之间升级，本技能不适用——classic DBR升级有不同的兼容性范围，应遵循标准DBR升级指南。

收集需要迁移至serverless的完整信息：

读取用户的notebook/脚本文件
识别classic集群配置（实例类型、DBR版本、Spark配置、初始化脚本、库）
记录网络设置（VPC peering、实例配置文件、挂载点）
了解工作负载类型：批处理任务、流处理、交互式notebook、管道
确定目标：输出始终为serverless compute配置，而非使用更新DBR版本的classic集群

Step 2: Analyze — Scan for Serverless Readiness

步骤2：分析——扫描server就绪状态

Read notebooks before running them — do not rely on failed job runs to discover issues. A pre-run scan surfaces incompatibilities faster than iterating on error traces, and many serverless failures (hardcoded catalog references, init scripts, missing dependencies) are easy to spot statically but expensive to debug after a failed run.

Before creating or running any test job:

Read every notebook and source file referenced by the job

Scan for all hardcoded catalog/schema references (e.g.,

spark.table("main.schema.table")

spark.sql("... FROM main...")

catalog = "main"

)

Check for dependency patterns: init scripts, local wheel files, custom install functions,
```
%pip install
```
lines
Locate any
```
requirements.txt
```
or equivalent and resolve the full dependency set
Flag OS-level installs (
```
apt install
```
,
```
yum install
```
) for conversion or escalation

Scan the code for patterns that are incompatible with the serverless compute architecture. These checks are serverless-specific — most of these patterns work fine on classic compute regardless of DBR version. For each issue found, report:

Category: Which of the 3 blocker categories it falls into
Severity: Blocker (must fix for serverless) / Warning (should fix) / Info (awareness)
Pattern: What was detected and where
Fix: Specific remediation targeting serverless compute

Category A: Unsupported APIs

Pattern	Severity	Fix
`sc.parallelize(data)`	Blocker	`spark.createDataFrame([(x,) for x in data], ["value"])`
`rdd.map(fn)`	Blocker	`df.select(F.col("value") * 2)` or `df.withColumn(...)`
`rdd.filter(fn)`	Blocker	`df.filter(F.col("value") > 3)`
`rdd.reduce(fn)`	Blocker	`df.agg(F.sum("col")).collect()[0][0]`
`rdd.flatMap(fn)`	Blocker	`df.select(F.explode(F.split(col, " ")))`
`rdd.groupByKey()`	Blocker	`df.groupBy("key").agg(F.collect_list("value"))`
`rdd.mapPartitions(fn)`	Blocker	`df.groupBy(F.spark_partition_id()).applyInPandas(fn, schema)`
`sc.textFile(path)`	Blocker	`spark.read.text(path)`
`sc.wholeTextFiles(path)`	Blocker	`spark.read.format("binaryFile").load(path)`
`sc.broadcast(data)`	Blocker	`from pyspark.sql.functions import broadcast; df.join(broadcast(lookup_df), key)`
`sc.accumulator(init)`	Blocker	`df.agg(F.sum("col"))` or `df.count()`
`spark.sparkContext`	Blocker	Use `spark` (SparkSession) directly
`SparkContext.getOrCreate()`	Blocker	Not supported — raises `RuntimeError: Only remote Spark sessions using Databricks Connect are supported` . Replace with `spark.createDataFrame()` or `spark.range()` for data setup.
`sqlContext.sql(query)`	Blocker	`spark.sql(query)`
`sc.hadoopConfiguration.set(...)`	Blocker	Use UC external locations — no credential configs needed
`df.cache()` / `df.persist()`	Warning	Remove caching calls. For expensive intermediate results, materialize to a Delta table. Native support coming soon.
`df.checkpoint()`	Warning	Write to Delta table instead
`spark.catalog.cacheTable(t)` / `CACHE TABLE`	Warning	Remove — not needed on serverless
`%scala` cells in notebook	Blocker	Port to PySpark/SQL or compile as JAR for job tasks
`%r` cells in notebook	Blocker	No serverless equivalent — keep on classic or port to PySpark
Hive variable syntax `${var}`	Warning	Use `DECLARE VARIABLE` / `SET VARIABLE` (SQL) or Python f-strings
`CREATE GLOBAL TEMPORARY VIEW`	Blocker	Use `CREATE OR REPLACE TEMPORARY VIEW` — `global_temp` database doesn't exist on serverless
`global_temp.` prefix in queries	Warning	Remove prefix — session-scoped temp views are accessible without qualifier

Category B: Data Access

Pattern	Severity	Fix
`dbfs:/` or `/dbfs/` paths (persistent data)	Blocker	Replace with `/Volumes/<your_catalog>/schema/volume/path`
`dbfs:/tmp/` , `/dbfs/tmp/` , paths with `cache` / `scratch` / `temp`	Warning	Use `/tmp/` or `/local_disk0/tmp/` (local driver disk) — do not use Volumes for temp files due to performance
`file:///dbfs/` FUSE mount paths	Warning	Replace persistent paths with `/Volumes/...` ; replace temp paths with `/local_disk0/tmp/`
`dbutils.fs.mount(...)`	Blocker	Create UC external location + external volume
`hive_metastore.db.table`	Warning	Migrate to UC or use HMS Federation: `CREATE FOREIGN CATALOG ... USING CONNECTION hms_connection`
`CREATE DATABASE` / `CREATE SCHEMA` without `USE CATALOG` or 3-level name	Blocker	Prepend `spark.sql("USE CATALOG <your_catalog>")` at notebook start before any CREATE statements. Detect target catalog from existing table references, or ask the user.
IAM instance profile references	Warning	Use UC external locations + storage credentials
Hive SerDe tables	Blocker	Migrate to Delta tables in UC

Category C: Streaming

Pattern	Severity	Fix
`.trigger(processingTime=...)`	Blocker	`.trigger(availableNow=True)` + set `maxFilesPerTrigger` or `maxBytesPerTrigger` to prevent OOM
`.trigger(continuous=...)`	Blocker	Migrate to SDP continuous mode
No `.trigger()` call on writeStream	Blocker	Must add `.trigger(availableNow=True)` — Spark defaults to `ProcessingTime("0 seconds")` which is not supported
Kafka source	Info	Works with AvailableNow; use `maxOffsetsPerTrigger` to control batch size
Auto Loader	Info	Works; use `cloudFiles.maxFilesPerTrigger` (note the `cloudFiles.` prefix)

Category D: Configuration

Pattern	Severity	Fix
Unsupported `spark.conf.set(...)`	Warning	Remove — only 6 configs supported: `spark.sql.shuffle.partitions` , `spark.sql.session.timeZone` , `spark.sql.ansi.enabled` , `spark.sql.files.maxPartitionBytes` , `spark.sql.legacy.timeParserPolicy` , `spark.databricks.execution.timeout` . Serverless auto-tunes everything else.
Init scripts	Blocker	Use Environments: add dependencies via notebook Environment panel or `requirements.txt` . Pin specific versions.
Cluster policies	Info	Use budget policies for cost attribution
Docker containers	Blocker	Use Environments for library management. Keep on classic only if Docker is needed for OS-level customization.
`%run ./relative/path` or `%run ../path`	Warning	Relative `%run` paths may not resolve correctly in serverless job tasks. Fix: (1) Inline the referenced notebook's code if <500 lines (preferred), (2) Convert to `dbutils.notebook.run("<absolute_workspace_path>", timeout)` with absolute path. Found in ~19% of repos.
`os.environ["VAR"]` (system/custom env vars)	Warning	Use `os.environ.get()` with fallback, `spark.version` for Spark info, or `dbutils.widgets` for custom vars
`SET hivevar:` / `${hivevar:...}` (Hive variable substitution)	Blocker	Use SQL session variables: `DECLARE OR REPLACE VARIABLE name = value` (DBR 14.1+)
Environment variables (in init scripts)	Warning	Use `dbutils.widgets` or job parameters
Explicit executor count/memory configs	Info	Remove — serverless auto-scales and auto-tunes

Category E: Libraries

Pattern	Severity	Fix
JAR libraries in notebooks	Blocker	Compile as JAR job task (Scala 2.13, JDK 17, env version 4+)
Maven coordinates	Blocker	Replace with PyPI packages in Environments
`%pip install` without version pins	Warning	Pin versions: `%pip install numpy==2.2.2 pandas==2.2.3`
Custom Spark data sources (v1/v2 JARs)	Blocker	Use Lakehouse Federation, Lakeflow Connect, or PySpark custom data sources
LZO format files	Blocker	Convert to Parquet or Delta

Category F: Networking

Pattern	Severity	Fix
VPC peering configuration	Blocker	Create NCCs, get stable IPs, allowlist on resource firewalls. S3 same-region access works without changes.
Direct S3/ADLS access without UC	Warning	Use UC external locations

Category G: Sizing & Debugging

Pattern	Severity	Fix
Large driver memory configs	Info	Serverless REPL default is 8GB (high-memory option for 16GB+ via Environments)
Spark UI references	Info	Use Query Profile instead: click "See performance" under cell output

运行前先读取notebook——不要依赖任务失败来发现问题。预运行扫描比通过错误追踪迭代更快地发现不兼容问题，许多serverless故障（硬编码目录引用、初始化脚本、缺失依赖）静态检查容易发现，但在任务失败后调试成本很高。

在创建或运行任何测试任务前：

读取任务引用的所有notebook和源文件

扫描所有硬编码的目录/模式引用（如

spark.table("main.schema.table")

、

spark.sql("... FROM main...")

、

catalog = "main"

）

检查依赖模式：初始化脚本、本地wheel文件、自定义安装函数、
```
%pip install
```
行
找到任何
```
requirements.txt
```
或等效文件并解析完整依赖集
标记需要转换或升级的操作系统级安装命令（
```
apt install
```
、
```
yum install
```
）

扫描代码中与serverless compute架构不兼容的模式。这些检查是serverless特有的——这些模式在classic compute上无论DBR版本如何都能正常工作。对于发现的每个问题，需报告：

类别：属于三类阻碍因素中的哪一类
严重程度：阻碍（必须修复才能使用serverless）/警告（建议修复）/信息（仅需知晓）
模式：检测到什么问题以及位置
修复方案：针对serverless compute的具体修复措施

类别A：不支持的API

模式	严重程度	修复方案
`sc.parallelize(data)`	阻碍	`spark.createDataFrame([(x,) for x in data], ["value"])`
`rdd.map(fn)`	阻碍	`df.select(F.col("value") * 2)` 或 `df.withColumn(...)`
`rdd.filter(fn)`	阻碍	`df.filter(F.col("value") > 3)`
`rdd.reduce(fn)`	阻碍	`df.agg(F.sum("col")).collect()[0][0]`
`rdd.flatMap(fn)`	阻碍	`df.select(F.explode(F.split(col, " ")))`
`rdd.groupByKey()`	阻碍	`df.groupBy("key").agg(F.collect_list("value"))`
`rdd.mapPartitions(fn)`	阻碍	`df.groupBy(F.spark_partition_id()).applyInPandas(fn, schema)`
`sc.textFile(path)`	阻碍	`spark.read.text(path)`
`sc.wholeTextFiles(path)`	阻碍	`spark.read.format("binaryFile").load(path)`
`sc.broadcast(data)`	阻碍	`from pyspark.sql.functions import broadcast; df.join(broadcast(lookup_df), key)`
`sc.accumulator(init)`	阻碍	`df.agg(F.sum("col"))` 或 `df.count()`
`spark.sparkContext`	阻碍	直接使用 `spark` （SparkSession）
`SparkContext.getOrCreate()`	阻碍	不支持——会抛出 `RuntimeError: Only remote Spark sessions using Databricks Connect are supported` 。替换为 `spark.createDataFrame()` 或 `spark.range()` 进行数据设置。
`sqlContext.sql(query)`	阻碍	`spark.sql(query)`
`sc.hadoopConfiguration.set(...)`	阻碍	使用UC外部位置——无需凭证配置
`df.cache()` / `df.persist()`	警告	移除缓存调用。对于昂贵的中间结果，将其物化到Delta表。原生支持即将推出。
`df.checkpoint()`	警告	写入Delta表替代
`spark.catalog.cacheTable(t)` / `CACHE TABLE`	警告	移除——serverless不需要
`%scala` notebook单元格	阻碍	迁移到PySpark/SQL或编译为JAR用于任务
`%r` notebook单元格	阻碍	无serverless等效方案——继续使用classic或迁移到PySpark
Hive变量语法 `${var}`	警告	使用 `DECLARE VARIABLE` / `SET VARIABLE` （SQL）或Python f-string
`CREATE GLOBAL TEMPORARY VIEW`	阻碍	使用 `CREATE OR REPLACE TEMPORARY VIEW` ——serverless上不存在 `global_temp` 数据库
查询中的 `global_temp.` 前缀	警告	移除前缀——会话级临时视图无需限定符即可访问

类别B：数据访问

模式	严重程度	修复方案
`dbfs:/` 或 `/dbfs/` 路径（持久化数据）	阻碍	替换为 `/Volumes/<your_catalog>/schema/volume/path`
`dbfs:/tmp/` 、 `/dbfs/tmp/` 、包含 `cache` / `scratch` / `temp` 的路径	警告	使用 `/tmp/` 或 `/local_disk0/tmp/` （本地驱动磁盘）——不要使用Volumes存储临时文件，会影响性能
`file:///dbfs/` FUSE挂载路径	警告	持久化路径替换为 `/Volumes/...` ；临时路径替换为 `/local_disk0/tmp/`
`dbutils.fs.mount(...)`	阻碍	创建UC外部位置 + 外部卷
`hive_metastore.db.table`	警告	迁移到UC或使用HMS Federation： `CREATE FOREIGN CATALOG ... USING CONNECTION hms_connection`
未使用 `USE CATALOG` 或三级名称的 `CREATE DATABASE` / `CREATE SCHEMA`	阻碍	在notebook开头的所有CREATE语句前添加 `spark.sql("USE CATALOG <your_catalog>")` 。从现有表引用中检测目标目录，或询问用户。
IAM实例配置文件引用	警告	使用UC外部位置 + 存储凭证
Hive SerDe表	阻碍	迁移到UC中的Delta表

类别C：流处理

模式	严重程度	修复方案
`.trigger(processingTime=...)`	阻碍	`.trigger(availableNow=True)` + 设置 `maxFilesPerTrigger` 或 `maxBytesPerTrigger` 防止OOM
`.trigger(continuous=...)`	阻碍	迁移到SDP连续模式
writeStream未调用 `.trigger()`	阻碍	必须添加 `.trigger(availableNow=True)` ——Spark默认使用 `ProcessingTime("0 seconds")` ，这在serverless上不支持
Kafka源	信息	支持AvailableNow；使用 `maxOffsetsPerTrigger` 控制批处理大小
Auto Loader	信息	支持；使用 `cloudFiles.maxFilesPerTrigger` （注意 `cloudFiles.` 前缀）

类别D：配置

模式	严重程度	修复方案
不支持的 `spark.conf.set(...)`	警告	移除——仅支持6种配置： `spark.sql.shuffle.partitions` 、 `spark.sql.session.timeZone` 、 `spark.sql.ansi.enabled` 、 `spark.sql.files.maxPartitionBytes` 、 `spark.sql.legacy.timeParserPolicy` 、 `spark.databricks.execution.timeout` 。serverless会自动调优其他所有配置。
初始化脚本	阻碍	使用Environments：通过notebook的Environment面板或 `requirements.txt` 添加依赖。固定具体版本。
集群策略	信息	使用预算策略进行成本归因
Docker容器	阻碍	使用Environments进行库管理。仅当需要操作系统级自定义时才继续使用classic
`%run ./relative/path` 或 `%run ../path`	警告	相对路径的 `%run` 在serverless任务中可能无法正确解析。修复方案：(1) 如果引用的notebook代码少于500行，直接内联（推荐）；(2) 转换为 `dbutils.notebook.run("<absolute_workspace_path>", timeout)` 并使用绝对路径。约19%的仓库存在此问题。
`os.environ["VAR"]` （系统/自定义环境变量）	警告	使用 `os.environ.get()` 并设置默认值，使用 `spark.version` 获取Spark信息，或使用 `dbutils.widgets` 处理自定义变量
`SET hivevar:` / `${hivevar:...}` （Hive变量替换）	阻碍	使用SQL会话变量： `DECLARE OR REPLACE VARIABLE name = value` （DBR 14.1+）
初始化脚本中的环境变量	警告	使用 `dbutils.widgets` 或任务参数
显式执行器数量/内存配置	信息	移除——serverless会自动扩缩容和调优

类别E：库

模式	严重程度	修复方案
notebook中的JAR库	阻碍	编译为JAR任务（Scala 2.13、JDK 17、环境版本4+）
Maven坐标	阻碍	替换为Environments中的PyPI包
`%pip install` 未固定版本	警告	固定版本： `%pip install numpy==2.2.2 pandas==2.2.3`
自定义Spark数据源（v1/v2 JAR）	阻碍	使用Lakehouse Federation、Lakeflow Connect或PySpark自定义数据源
LZO格式文件	阻碍	转换为Parquet或Delta格式

类别F：网络

模式	严重程度	修复方案
VPC peering配置	阻碍	创建NCCs，获取稳定IP，在资源防火墙中添加白名单。同区域S3访问无需更改。
未使用UC直接访问S3/ADLS	警告	使用UC外部位置

类别G：规模与调试

模式	严重程度	修复方案
大驱动内存配置	信息	Serverless REPL默认是8GB（可通过Environments选择16GB+的高内存选项）
Spark UI引用	信息	使用Query Profile替代：点击单元格输出下方的"See performance"

Required Output: Serverless Environment Specification

必需输出：Serverless Environment规范

The migration output MUST include a Serverless Environment specification alongside migrated code. Generate this by:

Scanning all
```
import
```
statements and
```
%pip install
```
lines to detect required packages
Extracting init script
```
pip install
```
commands from the job configuration
Producing a JSON block suitable for the Jobs API
```
environments
```
field:

json

{
  "environment_key": "Default",
  "spec": {
    "client": "2",
    "dependencies": ["mlflow==2.12.1", "scikit-learn==1.3.0", "xgboost==2.0.3"]
  }
}

Important: ML runtime libraries (mlflow, scikit-learn, hyperopt, xgboost, tensorflow, torch, etc.) are NOT pre-installed on serverless compute. They MUST be listed explicitly in the environment spec

dependencies

. ML runtime is NOT available on serverless — always use Serverless Environments with explicit package dependencies instead.

迁移输出必须包含Serverless Environment规范以及迁移后的代码。生成方式如下：

扫描所有
```
import
```
语句和
```
%pip install
```
行以检测所需包
从任务配置中提取初始化脚本的
```
pip install
```
命令
生成适用于Jobs API
```
environments
```
字段的JSON块：

json

{
  "environment_key": "Default",
  "spec": {
    "client": "2",
    "dependencies": ["mlflow==2.12.1", "scikit-learn==1.3.0", "xgboost==2.0.3"]
  }
}

重要提示：ML运行时库（mlflow、scikit-learn、hyperopt、xgboost、tensorflow、torch等）未预安装在serverless compute上。必须在环境规范的

dependencies

中显式列出。serverless上不提供ML运行时——始终使用带有显式包依赖的Serverless Environments替代。

Step 3: Test — Two-Branch Strategy

步骤3：测试——双分支策略

Use separate branches for testing and production to keep test-only workarounds out of the code that ships. The test branch is a safe sandbox for experimentation; the production branch contains only changes that production actually needs.

Aspect	Test branch	Production branch
Name pattern	`serverless-test-{job_name}-{timestamp}`	`serverless-prod-{job_name}`
Base branch	Any working branch	Must be master
Purpose	Verify serverless compatibility	Deploy to production
Test-only workarounds	Yes (catalog overrides, sampled data, date limits)	No
Compatibility fixes	Yes (discover them here)	Yes (apply the validated ones)
Job config changes	Yes (for the test job)	Yes (for the prod job)
Catalog	Test catalog	Production catalog
PR required	No	Yes
Merged to master	No	Yes

Test branch (

serverless-test-{job_name}-{timestamp}

): Temporary, no PR needed.

Create a branch from your current working branch
Set up test data: create sampled copies of upstream tables in a test catalog using job lineage (see test data setup below)
Parameterize the catalog so the notebook works with both test and production data (see catalog parameterization pattern below)
Apply all compatibility fixes discovered in Step 2
Create a serverless test job and run it
If it fails, get the error output, debug, fix, and retry
Document which changes are test workarounds vs. real compatibility fixes

Production branch (

serverless-prod-{job_name}

): PR required, created from master.

Create a new branch from master (NOT from the test branch)
Apply ONLY the real compatibility fixes — no test workarounds
Apply job config changes (see job config transformation below)
Commit and create a PR

使用单独的分支进行测试和生产，避免仅用于测试的临时解决方案进入生产代码。测试分支是安全的实验沙箱；生产分支仅包含生产实际需要的更改。

维度	测试分支	生产分支
命名模式	`serverless-test-{job_name}-{timestamp}`	`serverless-prod-{job_name}`
基础分支	任何可用分支	必须是master
用途	验证serverless兼容性	部署到生产
仅测试临时解决方案	是（目录覆盖、采样数据、日期限制）	否
兼容性修复	是（在此分支发现问题）	是（应用已验证的修复）
任务配置更改	是（针对测试任务）	是（针对生产任务）
目录	测试目录	生产目录
是否需要PR	否	是
是否合并到master	否	是

测试分支 (

serverless-test-{job_name}-{timestamp}

)：临时分支，无需PR。

从当前工作分支创建新分支
设置测试数据：使用任务血缘关系在测试目录中创建上游表的采样副本（见下方测试数据设置）
参数化目录，使notebook可同时适配测试和生产数据（见下方目录参数化模式）
应用步骤2中发现的所有兼容性修复
创建serverless测试任务并运行
如果失败，获取错误输出、调试、修复并重试
记录哪些更改是仅测试临时方案 vs 真实兼容性修复

生产分支 (

serverless-prod-{job_name}

)：需要PR，从master创建。

从master创建新分支（不要从测试分支创建）
仅应用真实兼容性修复——不包含仅测试临时方案
应用任务配置更改（见下方任务配置转换）
提交并创建PR

Test Data Setup

测试数据设置

When the job reads from production tables, do not point the test job at production data. Instead, create sampled copies of upstream tables in a dedicated test catalog and run the test job against those.

The recommended pattern:

Resolve the job's upstream tables from its lineage (or from a static scan of the notebook)

For each upstream table, run

CREATE TABLE IF NOT EXISTS <test_catalog>.<schema>.<table> AS SELECT * FROM <prod_catalog>.<schema>.<table> LIMIT N

(typical N: 10–1000 rows)

Keep the schema names identical to production — only the catalog changes
Make the operation idempotent: skip tables that already exist, so the setup step is safe to re-run
Require a running SQL warehouse and
```
CREATE TABLE
```
permission on the test catalog

With schema names preserved, the same notebook code runs in both environments — only the

catalog

widget value changes.

当任务读取生产表时，不要让测试任务指向生产数据。相反，在专用测试目录中创建上游表的采样副本，并让测试任务针对这些副本运行。

推荐模式：

从任务血缘关系（或notebook静态扫描）解析任务的上游表

对每个上游表运行

CREATE TABLE IF NOT EXISTS <test_catalog>.<schema>.<table> AS SELECT * FROM <prod_catalog>.<schema>.<table> LIMIT N

（典型N：10–1000行）

保持模式名称与生产环境一致——仅更改目录
确保操作是幂等的：跳过已存在的表，使设置步骤可安全重运行
需要运行中的SQL warehouse以及测试目录的
```
CREATE TABLE
```
权限

保持模式名称一致后，同一notebook代码可在两种环境中运行——仅需更改

catalog

小部件的值。

Decision Tree: Should This Change Go to Production?

决策树：此更改是否应进入生产？

Change type	Production?	Reason
Remove incompatible Spark configs	Yes	Serverless compatibility fix
Update library versions	Yes	Serverless compatibility fix
Replace DBFS paths with UC Volumes	Yes	Serverless compatibility fix
Remove init scripts, add Environments	Yes	Serverless compatibility fix
Fix hardcoded cluster settings	Yes	Serverless compatibility fix
Catalog override to test catalog	No	Test workaround only
Empty DataFrame handling for missing test data	No	Test workaround only
Date range limiting for faster tests	No	Test workaround only

Simple test: Would production fail without this change on serverless? If yes → include. If no → test branch only.

更改类型	是否进入生产？	原因
移除不兼容的Spark配置	是	Serverless兼容性修复
更新库版本	是	Serverless兼容性修复
将DBFS路径替换为UC Volumes	是	Serverless兼容性修复
移除初始化脚本，添加Environments	是	Serverless兼容性修复
修复硬编码集群设置	是	Serverless兼容性修复
目录覆盖为测试目录	否	仅为测试临时方案
缺失测试数据时的空DataFrame处理	否	仅为测试临时方案
日期范围限制以加快测试	否	仅为测试临时方案

简单测试：如果生产环境在serverless上运行时没有此更改会失败？如果是→包含。如果否→仅保留在测试分支。

A/B Comparison

A/B对比

After both branches are ready, compare outputs:

python

undefined

两个分支都准备好后，对比输出：

python

undefined

Compare outputs between classic and serverless runs

classic_df = spark.read.table("main.output.classic_results") serverless_df = spark.read.table("main.output.serverless_results")

assert classic_df.count() == serverless_df.count(), "Row count mismatch" assert classic_df.schema == serverless_df.schema, "Schema mismatch" diff = classic_df.exceptAll(serverless_df) assert diff.count() == 0, f"Found {diff.count()} differing rows"


**Temporary bridge configs**: If the serverless run fails, you may temporarily set supported Spark configs (like `spark.sql.shuffle.partitions`) to bridge gaps. Mark these as temporary — remove once the workload stabilizes.

classic_df = spark.read.table("main.output.classic_results") serverless_df = spark.read.table("main.output.serverless_results")


**临时桥接配置**：如果serverless运行失败，可临时设置支持的Spark配置（如`spark.sql.shuffle.partitions`）来填补差距。标记这些配置为临时——工作负载稳定后移除。

Step 4: Validate — Confirm and Monitor

步骤4：验证——确认并监控

Once the A/B comparison passes:

Merge the production branch PR
Switch the production job to serverless compute
Monitor cost via system tables (
```
system.billing.usage
```
) and budget policies
Remove any temporary bridge configurations
Set up budget alerts for cost visibility

A/B对比通过后：

合并生产分支的PR
将生产任务切换到serverless compute
通过系统表(
```
system.billing.usage
```
)和预算策略监控成本
移除所有临时桥接配置
设置预算警报以掌握成本情况

Migration Deliverables

迁移交付物

At the end of a successful migration run, surface these artifacts so the user can verify the work and inspect the results:

Deliverable	What it is	Why it matters
Test branch name/URL	The `serverless-test-{job_name}-{timestamp}` branch with all compatibility fixes and test workarounds	Lets the user see what changed during experimentation, including test-only adjustments
Production branch name/URL	The `serverless-prod-{job_name}` branch containing only the validated compatibility fixes	This is what ships — the user reviews and merges the PR from here
Test job ID and run URL	The serverless test job that validated the migration	Proves the notebook runs successfully on serverless against sampled data
Classic vs serverless comparison	A/B result summary (row counts, schema check, row-level diff)	Confidence that serverless output matches classic output
Serverless environment spec	The `environments` JSON block (client version + pinned dependencies)	Ready to paste into the production job config
Change summary	List of what went to production vs test-only (with reasons)	Audit trail for the PR reviewer

If any deliverable is missing, the migration is incomplete — do not mark it as done.

成功完成迁移后，向用户展示以下工件，以便用户验证工作并检查结果：

交付物	说明	重要性
测试分支名称/URL	包含所有兼容性修复和仅测试临时方案的 `serverless-test-{job_name}-{timestamp}` 分支	让用户查看实验期间的所有更改，包括仅测试的调整
生产分支名称/URL	仅包含已验证兼容性修复的 `serverless-prod-{job_name}` 分支	这是要部署的内容——用户从此处审核并合并PR
测试任务ID和运行URL	验证迁移的serverless测试任务	证明notebook在serverless上针对采样数据成功运行
Classic与serverless对比结果	A/B结果摘要（行数、模式检查、行级差异）	确保serverless输出与classic输出一致
Serverless环境规范	`environments` JSON块（客户端版本 + 固定依赖）	可直接粘贴到生产任务配置中
更改摘要	进入生产的更改与仅测试更改的列表（含原因）	为PR审核者提供审计跟踪

如果任何交付物缺失，迁移未完成——不要标记为已完成。

Stopping Conditions

终止条件

Do not attempt workarounds for these — surface them to the user and stop:

Permission failures on source tables, the test catalog, or the workspace
Category 3 blockers (R code, custom Spark data source JARs, features that require classic compute)
SQL warehouse or test catalog not available
Repeated failures (typically 5+) with no new information in the error trace — generate a failure report instead (see Failure Reporting Protocol)

不要尝试以下情况的临时解决方案——告知用户并停止：

源表、测试目录或工作区的权限失败
第三类阻碍因素（R代码、自定义Spark数据源JAR、需要classic compute的功能）
SQL warehouse或测试目录不可用
重复失败（通常5次以上）且错误跟踪中无新信息——生成失败报告替代（见失败报告协议）

Quick Fixes Reference

快速修复参考

Replace DBFS paths with UC Volumes

将DBFS路径替换为UC Volumes

python

undefined

python

undefined

BEFORE (classic)

df = spark.read.csv("dbfs:/mnt/datalake/sales/data.csv", header=True) df.write.parquet("dbfs:/mnt/output/results")

AFTER (serverless)

df = spark.read.csv("/Volumes/main/sales/raw_data/data.csv", header=True) df.write.parquet("/Volumes/main/analytics/output/results")

Replace mounts with external volumes (SQL):

CREATE EXTERNAL VOLUME main.data.raw_files LOCATION 's3://my-bucket/data/';

Then use: /Volumes/main/data/raw_files/

Pandas paths too:

BEFORE: pd.read_csv("/dbfs/mnt/data/file.csv")

AFTER: pd.read_csv("/Volumes/main/data/volume/file.csv")

undefined

undefined

Replace RDD operations with DataFrames

将RDD操作替换为DataFrame

python

from pyspark.sql import functions as F

python

from pyspark.sql import functions as F

parallelize + map

BEFORE:

rdd = sc.parallelize([1, 2, 3]) result = rdd.map(lambda x: x * 2).collect()

AFTER:

df = spark.createDataFrame([(1,), (2,), (3,)], ["value"]) result = df.select((F.col("value") * 2).alias("value")).collect()

flatMap (word splitting)

BEFORE:

words = sc.parallelize(["hello world"]).flatMap(lambda l: l.split(" ")).collect()

AFTER:

df = spark.createDataFrame([("hello world",)], ["line"]) words = df.select(F.explode(F.split("line", " ")).alias("word")).collect()

groupByKey

BEFORE:

rdd = sc.parallelize([("a", 1), ("b", 2), ("a", 3)]) grouped = rdd.groupByKey().mapValues(list).collect()

AFTER:

df = spark.createDataFrame([("a", 1), ("b", 2), ("a", 3)], ["key", "value"]) grouped = df.groupBy("key").agg(F.collect_list("value").alias("values")).collect()

mapPartitions → applyInPandas

BEFORE:

def process_partition(iterator): yield sum(iterator) result = sc.parallelize(range(100), 4).mapPartitions(process_partition).collect()

AFTER:

import pandas as pd def process_group(pdf: pd.DataFrame) -> pd.DataFrame: return pd.DataFrame({"total": [pdf["id"].sum()]}) result = (spark.range(100).repartition(4) .groupBy(F.spark_partition_id()) .applyInPandas(process_group, schema="total long") .collect())

textFile

BEFORE: rdd = sc.textFile("/mnt/data/file.txt")

AFTER: df = spark.read.text("/Volumes/catalog/schema/volume/file.txt")

wholeTextFiles

BEFORE: rdd = sc.wholeTextFiles("/mnt/data/dir/")

AFTER: df = spark.read.format("binaryFile").load("/Volumes/catalog/schema/volume/dir/")

undefined

undefined

Fix streaming triggers

修复流触发器

python

undefined

python

undefined

CRITICAL: Omitting .trigger() defaults to ProcessingTime(0) — not supported on serverless

BEFORE (fails on serverless — no trigger = ProcessingTime default):

query = df.writeStream.format("delta").outputMode("append").start(path)

BEFORE (fails — explicit ProcessingTime):

query = df.writeStream.trigger(processingTime="10 seconds").start(path)

AFTER (serverless compatible):

query = (df.writeStream .format("delta") .outputMode("append") .trigger(availableNow=True) .option("checkpointLocation", "/Volumes/main/data/checkpoints/stream1") .start("/Volumes/main/data/output/stream1")) query.awaitTermination()

With OOM prevention (recommended for large sources):

query = (spark.readStream.format("delta") .option("maxFilesPerTrigger", 100) # Delta/Parquet sources .option("maxBytesPerTrigger", "10g") # Limit data per micro-batch .load(input_path) .writeStream .trigger(availableNow=True) .option("checkpointLocation", checkpoint_path) .start(output_path))

Kafka: use maxOffsetsPerTrigger

query = (spark.readStream.format("kafka") .option("kafka.bootstrap.servers", "broker:9092") .option("subscribe", "topic1") .option("maxOffsetsPerTrigger", 100000) # Kafka-specific .load() .writeStream.trigger(availableNow=True).start(output_path))

Auto Loader: use cloudFiles.maxFilesPerTrigger (note the prefix)

query = (spark.readStream.format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.maxFilesPerTrigger", 1000) # cloudFiles. prefix .load(landing_path) .writeStream.trigger(availableNow=True).start(output_path))

undefined

undefined

Remove caching

移除缓存

python

undefined

python

undefined

BEFORE (classic):

df = spark.read.parquet(path) df.cache() df.count() # materialize cache result1 = df.filter("status = 'active'") result2 = df.groupBy("region").agg(F.sum("revenue"))

AFTER (serverless — remove .cache(); native support coming soon):

df = spark.read.parquet(path) result1 = df.filter("status = 'active'") result2 = df.groupBy("region").agg(F.sum("revenue"))

For truly expensive intermediate results, materialize to Delta:

expensive_df.write.format("delta").mode("overwrite").saveAsTable("main.scratch.intermediate") result = spark.table("main.scratch.intermediate")

SQL equivalent:

BEFORE: CACHE TABLE my_table

AFTER: (just remove the CACHE TABLE statement)

undefined

undefined

Other quick fixes

其他快速修复

Pattern	Fix	Full example
`sc.broadcast` / `sc.accumulator` / `sqlContext.sql`	Use SparkSession equivalents: `broadcast()` join, `df.agg()` , `spark.sql()`	Code Patterns
Init scripts	Move to Environment panel or `requirements.txt` . Do NOT install PySpark. Pin versions.	Code Patterns
Hive Metastore tables	Use HMS Federation as bridge ( `CREATE FOREIGN CATALOG` ) or migrate directly ( `CREATE TABLE ... AS SELECT` )	Code Patterns
Custom JDBC JARs	Use Lakehouse Federation ( `CREATE CONNECTION ... TYPE POSTGRESQL` ) or built-in JDBC (works on serverless)	Code Patterns
Spark UI debugging	Use Query Profile: click "See performance" under cell output, or `df.explain(True)`	Code Patterns

模式	修复方案	完整示例
`sc.broadcast` / `sc.accumulator` / `sqlContext.sql`	使用SparkSession等效方案： `broadcast()` 连接、 `df.agg()` 、 `spark.sql()`	代码模式
初始化脚本	移至Environment面板或 `requirements.txt` 。不要安装PySpark。固定版本。	代码模式
Hive Metastore表	使用HMS Federation作为桥梁（ `CREATE FOREIGN CATALOG` ）或直接迁移（ `CREATE TABLE ... AS SELECT` ）	代码模式
自定义JDBC JAR	使用Lakehouse Federation（ `CREATE CONNECTION ... TYPE POSTGRESQL` ）或内置JDBC（serverless支持）	代码模式
Spark UI调试	使用Query Profile：点击单元格输出下方的"See performance"，或 `df.explain(True)`	代码模式

Detect serverless at runtime

运行时检测serverless环境

python

import os
is_serverless = os.getenv("IS_SERVERLESS", "").lower() == "true"

python

import os
is_serverless = os.getenv("IS_SERVERLESS", "").lower() == "true"

Transform job config from classic to serverless

将任务配置从classic转换为serverless

Remove

job_clusters

new_cluster

, add

environments

with serverless spec, replace

job_cluster_key

with

environment_key

, remove

init_scripts

. See Configuration Guide for full before/after JSON and environment version mapping.

Environment version mapping (match to the DBR version the workload was on):

Classic DBR	Serverless `spec.client`	Python
13.x, 14.x	`"1"`	3.10
15.x	`"2"`	3.11
16.x+	`"3"`	3.12

移除

job_clusters

new_cluster

，添加包含serverless规范的

environments

，将

job_cluster_key

替换为

environment_key

，移除

init_scripts

。完整的前后JSON示例和环境版本映射请参考配置指南。

环境版本映射（与工作负载使用的DBR版本匹配）：

Classic DBR	Serverless `spec.client`	Python
13.x, 14.x	`"1"`	3.10
15.x	`"2"`	3.11
16.x+	`"3"`	3.12

Job Definition Migration

任务定义迁移

When migrating a job, the job configuration JSON must be transformed alongside notebook code. The agent should perform all of the following:

Init scripts to Serverless Environments: Detect

init_scripts

in the job JSON. Extract all

pip install

commands and convert them to Environment

dependencies

. For OS-level packages (

apt install

yum install

) that have pip equivalents (e.g.,

apt install python3-opencv

becomes

opencv-python

), convert them. Flag OS-level packages without pip equivalents as serverless-incompatible (Category 3).

Cluster libraries (Maven/JAR) to Environment or Volumes: Maven coordinates for Python-wrapping JARs should be replaced with their PyPI equivalent in the Environment spec. Custom JARs on DBFS need to be moved to

/Volumes/<your_catalog>/schema/volume/

and referenced there. Custom Spark data source JARs (v1/v2) are a Category 3 blocker — flag them for classic retention.

job_clusters to serverless compute: Remove

job_clusters

new_cluster

blocks entirely. Add an

environments

array with the serverless spec. Replace

job_cluster_key

in each task with

environment_key

. Remove

init_scripts

num_workers

node_type_id

spark_version

. See Configuration Guide for a complete before/after example.

spark_conf migration: Scan all

spark.conf.set(...)

calls in the notebook and

spark_conf

entries in the job JSON. For each:

Supported (keep):

spark.sql.shuffle.partitions

spark.sql.session.timeZone

spark.sql.ansi.enabled

spark.sql.files.maxPartitionBytes

spark.sql.legacy.timeParserPolicy

spark.databricks.execution.timeout

Auto-tuned (remove with comment): AQE configs, Delta auto-compact, executor/driver sizing, parallelism configs
Credential configs (remove):
```
fs.s3a.*
```
,
```
fs.azure.*
```
— replaced by UC external locations

Add a code comment at each removal explaining why:

# Removed: auto-tuned on serverless

# Removed: use UC external locations instead

迁移任务时，任务配置JSON必须与notebook代码一起转换。智能体应执行以下所有操作：

初始化脚本转换为Serverless Environments：检测任务JSON中的

init_scripts

。提取所有

pip install

命令并转换为Environment的

dependencies

。对于有pip等效包的操作系统级包（如

apt install python3-opencv

转换为

opencv-python

），进行转换。将无pip等效包的操作系统级包标记为不兼容serverless（第三类）。

集群库（Maven/JAR）转换为Environment或Volumes：用于Python包装JAR的Maven坐标应替换为Environment规范中的PyPI等效包。DBFS上的自定义JAR需要移至

/Volumes/<your_catalog>/schema/volume/

并在此引用。自定义Spark数据源JAR（v1/v2）属于第三类阻碍因素——标记为需保留在classic环境。

job_clusters转换为serverless compute：完全移除

job_clusters

new_cluster

块。添加包含serverless规范的

environments

数组。将每个任务中的

job_cluster_key

替换为

environment_key

。移除

init_scripts

、

num_workers

、

node_type_id

、

spark_version

。完整的前后示例请参考配置指南。

spark_conf迁移：扫描notebook中的所有

spark.conf.set(...)

调用和任务JSON中的

spark_conf

条目。对于每个配置：

支持（保留）：

spark.sql.shuffle.partitions

、

spark.sql.session.timeZone

、

spark.sql.ansi.enabled

、

spark.sql.files.maxPartitionBytes

、

spark.sql.legacy.timeParserPolicy

、

spark.databricks.execution.timeout

自动调优（移除并添加注释）：AQE配置、Delta自动压缩、执行器/驱动大小、并行度配置
凭证配置（移除）：
```
fs.s3a.*
```
、
```
fs.azure.*
```
——替换为UC外部位置

在每个移除的位置添加代码注释说明原因：

# Removed: auto-tuned on serverless

或

# Removed: use UC external locations instead

Parameterize catalogs for testing

参数化目录用于测试

python

dbutils.widgets.text("catalog", "main")  # Default to production
catalog = dbutils.widgets.get("catalog")
df = spark.table(f"{catalog}.sales.orders")

python

dbutils.widgets.text("catalog", "main")  # Default to production
catalog = dbutils.widgets.get("catalog")
df = spark.table(f"{catalog}.sales.orders")

Pass catalog="test_catalog" as a job parameter during testing


See [Configuration Guide](references/configuration-guide.md) for mock table catalog mapping and test job creation patterns.


模拟表目录映射和测试任务创建模式请参考[配置指南](references/configuration-guide.md)。

Debug failed serverless runs

调试失败的serverless运行

Always get the actual error with

w.jobs.get_run_output(run_id=...)

before guessing. Common errors:

Error	Fix
`INFINITE_STREAMING_TRIGGER_NOT_SUPPORTED`	Add `.trigger(availableNow=True)`
`UNRESOLVED_COLUMN`	Temp view name collision — use unique names
`TABLE_OR_VIEW_NOT_FOUND`	DBFS/HMS table not accessible — migrate to UC
`Py4JError: ... is not available`	SparkContext/RDD used — rewrite to DataFrame
Package installation timeout	Pin versions; do NOT install PySpark as a dependency
`ModuleNotFoundError: No module named 'mlflow'`	Add to environment spec `dependencies` — ML runtime is NOT available on serverless
`SparkContext.getOrCreate() is NOT supported` / `RuntimeError: Only remote Spark sessions`	Replace with `spark.createDataFrame()` or `spark.range()`
`UC_FILE_SCHEME_FOR_TABLE_CREATION_NOT_SUPPORTED`	Use managed tables or `/Volumes/...` paths
`PERMISSION_DENIED: CREATE SCHEMA on Catalog 'main'`	Add `spark.sql("USE CATALOG <your_catalog>")` before CREATE statements
`DATA_SOURCE_NOT_FOUND: Failed to find data source`	Category 3 blocker — custom JAR data source needs classic compute
`SyntaxError` after migration	Ensure comments are inside MAGIC blocks, not straddling cell delimiters

See Configuration Guide for the full error reference and SDK code examples.

在猜测原因前，务必使用

w.jobs.get_run_output(run_id=...)

获取实际错误。常见错误：

错误	修复方案
`INFINITE_STREAMING_TRIGGER_NOT_SUPPORTED`	添加 `.trigger(availableNow=True)`
`UNRESOLVED_COLUMN`	临时视图名称冲突——使用唯一名称
`TABLE_OR_VIEW_NOT_FOUND`	DBFS/HMS表不可访问——迁移到UC
`Py4JError: ... is not available`	使用了SparkContext/RDD——重写为DataFrame
包安装超时	固定版本；不要将PySpark作为依赖安装
`ModuleNotFoundError: No module named 'mlflow'`	添加到环境规范的 `dependencies` 中——serverless上不提供ML运行时
`SparkContext.getOrCreate() is NOT supported` / `RuntimeError: Only remote Spark sessions`	替换为 `spark.createDataFrame()` 或 `spark.range()`
`UC_FILE_SCHEME_FOR_TABLE_CREATION_NOT_SUPPORTED`	使用托管表或 `/Volumes/...` 路径
`PERMISSION_DENIED: CREATE SCHEMA on Catalog 'main'`	在CREATE语句前添加 `spark.sql("USE CATALOG <your_catalog>")`
`DATA_SOURCE_NOT_FOUND: Failed to find data source`	第三类阻碍因素——自定义JAR数据源需要classic compute
迁移后出现 `SyntaxError`	确保注释在MAGIC块内，不要跨单元格分隔符

完整的错误参考和SDK代码示例请参考配置指南。

Performance Mode Selection

性能模式选择

Criteria	Performance-Optimized	Standard
Startup time	<50 seconds	4-6 minutes
Cost	Higher	Significantly lower
Available for	Notebooks, Jobs, SDP	Jobs and SDP only
Best for	Interactive work, dev, time-sensitive	Batch ETL, scheduled pipelines
Default	Yes (UI and API)	Must be explicitly selected

Standard mode is NOT available for notebooks. Notebooks always use Performance-Optimized.

标准	Performance-Optimized	Standard
启动时间	<50秒	4-6分钟
成本	较高	显著更低
适用场景	Notebooks、Jobs、SDP	仅Jobs和SDP
最佳用途	交互式工作、开发、时间敏感型任务	批处理ETL、调度管道
默认模式	是（UI和API）	必须显式选择

Standard模式不适用于notebook。Notebook始终使用Performance-Optimized模式。

Serverless Defaults to Know

Serverless默认设置须知

Setting	Value
REPL VM memory	8GB default (high-memory option available)
Max executors	32 (Premium), 64 (Enterprise) — raise via support
Supported Spark configs	6 only (see Category D above)
Debugging	Query Profile (no Spark UI)
ANSI SQL	Enabled by default (configurable)

设置	值
REPL VM内存	默认8GB（提供高内存选项）
最大执行器数	32（高级版）、64（企业版）——可通过支持渠道提升
支持的Spark配置	仅6种（见类别D）
调试工具	Query Profile（无Spark UI）
ANSI SQL	默认启用（可配置）

Failure Reporting Protocol

失败报告协议

When migration fails irrecoverably, generate a structured failure report to help improve the skill. This applies when:

All retry attempts are exhausted (typically 5)
An unknown pattern is encountered that isn't in the compatibility checks
A fix was applied but didn't resolve the underlying issue
The workload hits a Category 3 blocker the user wasn't aware of

当迁移无法恢复失败时，生成结构化失败报告以帮助改进本技能。适用于以下情况：

所有重试尝试已耗尽（通常5次）
遇到兼容性检查中未涵盖的未知模式
应用了修复方案但未解决根本问题
工作负载遇到用户未意识到的第三类阻碍因素

When to generate a report

何时生成报告

Generate a report at the end of a migration attempt if any of:

```
retry_count >= max_retries
```
and final status is FAILED
A pattern was detected but no fix is available in the skill
The user explicitly requests a failure report (
```
/migration-report
```
)

如果出现以下任一情况，在迁移尝试结束时生成报告：

```
retry_count >= max_retries
```
且最终状态为FAILED
检测到模式但技能中无可用修复方案
用户明确请求失败报告（
```
/migration-report
```
）

How to generate

如何生成

Write a JSON file to

~/.databricks-migration-skill/reports/failure-<ISO-timestamp>.json

. Create the directory if it doesn't exist.

Schema (strictly follow — no free-text code or identifiers):

json

{
  "report_version": "1.0",
  "report_id": "<uuid-v4>",
  "skill_version": "<from SKILL.md frontmatter metadata.version>",
  "timestamp": "<ISO 8601 UTC>",
  "failure_phase": "analyze | migrate | test | validate",
  "detected_patterns": [
    {"category": "A", "pattern_id": "rdd_parallelize", "count": 3}
  ],
  "attempted_fixes": [
    {"pattern_id": "rdd_parallelize", "fix_applied": "<fix_id>", "attempt_number": 1, "outcome": "failed"}
  ],
  "final_error_category": "unknown_api | missing_library | data_access | permission | custom_data_source | other",
  "final_error_signature": "<SHA256 of top 3 stack frames, NOT the frames themselves>",
  "retry_count": 5,
  "total_duration_seconds": 245,
  "notebook_characteristics": {
    "lines_of_code": 180,
    "language": "python | sql | scala | r",
    "uses_streaming": false,
    "uses_ml_libraries": true,
    "databricks_runtime_source": "<DBR version only, no cluster identifiers>"
  }
}

将JSON文件写入

~/.databricks-migration-skill/reports/failure-<ISO-timestamp>.json

。如果目录不存在则创建。

Schema（严格遵循——不要使用自由文本代码或标识符）：

json

{
  "report_version": "1.0",
  "report_id": "<uuid-v4>",
  "skill_version": "<from SKILL.md frontmatter metadata.version>",
  "timestamp": "<ISO 8601 UTC>",
  "failure_phase": "analyze | migrate | test | validate",
  "detected_patterns": [
    {"category": "A", "pattern_id": "rdd_parallelize", "count": 3}
  ],
  "attempted_fixes": [
    {"pattern_id": "rdd_parallelize", "fix_applied": "<fix_id>", "attempt_number": 1, "outcome": "failed"}
  ],
  "final_error_category": "unknown_api | missing_library | data_access | permission | custom_data_source | other",
  "final_error_signature": "<SHA256 of top 3 stack frames, NOT the frames themselves>",
  "retry_count": 5,
  "total_duration_seconds": 245,
  "notebook_characteristics": {
    "lines_of_code": 180,
    "language": "python | sql | scala | r",
    "uses_streaming": false,
    "uses_ml_libraries": true,
    "databricks_runtime_source": "<DBR version only, no cluster identifiers>"
  }
}

What the report MUST NOT contain

报告中禁止包含的内容

This is a hard requirement — the report must be safe to share publicly on GitHub Issues:

No code content — only pattern IDs from this skill's catalog (e.g.,
```
rdd_parallelize
```
), never actual code snippets
No file paths — no notebook names, directory paths, or workspace URLs
No error message text — only the error category enum and a hashed signature
No identifiers — no table names, column names, catalog names, schema names, user emails, workspace IDs, or customer names
No credentials — no secret scope names, API keys, or connection strings
No data descriptions — no column value samples, row counts tied to specific tables, or data shape details beyond the
```
notebook_characteristics
```
fields

这是硬性要求——报告必须可安全分享到GitHub Issues：

无代码内容——仅使用本技能目录中的模式ID（如
```
rdd_parallelize
```
），绝不使用实际代码片段
无文件路径——无notebook名称、目录路径或工作区URL
无错误消息文本——仅使用错误类别枚举和哈希签名
无标识符——无表名、列名、目录名、模式名、用户邮箱、工作区ID或客户名称
无凭证——无密钥范围名称、API密钥或连接字符串
无数据描述——无列值样本、与特定表关联的行数，或
```
notebook_characteristics
```
字段之外的数据形状细节

After generating the report

生成报告后

Tell the user:

Migration failed after <N> attempts. A failure report has been generated at:

  ~/.databricks-migration-skill/reports/failure-<timestamp>.json

This report contains anonymized diagnostic data (detected patterns, error categories, retry count) and no code content or PII. You can:

1. Review the JSON to confirm no sensitive information is present
2. Share it via GitHub Issue to help improve the skill:
   https://github.com/databricks/databricks-agent-skills/issues/new?template=migration-feedback.md

Submission is optional and opt-in. We use reports to prioritize new patterns and fix detection gaps.

Never transmit the report automatically. The user owns their data and must review before sharing.

告知用户：

Migration failed after <N> attempts. A failure report has been generated at:

  ~/.databricks-migration-skill/reports/failure-<timestamp>.json

This report contains anonymized diagnostic data (detected patterns, error categories, retry count) and no code content or PII. You can:

1. Review the JSON to confirm no sensitive information is present
2. Share it via GitHub Issue to help improve the skill:
   https://github.com/databricks/databricks-agent-skills/issues/new?template=migration-feedback.md

Submission is optional and opt-in. We use reports to prioritize new patterns and fix detection gaps.

绝不自动传输报告。用户拥有其数据，必须先审核再分享。

Reference Guides

参考指南

For detailed workarounds and code examples beyond the quick fixes above:

Compatibility Checks — Full pattern detection table with all 40+ checks
Streaming Migration — Trigger migration, SDP continuous mode, continuous jobs
Networking and Security — VPC peering to NCCs, Private Link, firewall setup
Code Patterns — Complete before/after code examples for every migration pattern
Configuration Guide — Supported Spark configs, Environments setup, budget policies

如需快速修复之外的详细解决方案和代码示例：

兼容性检查——包含40+项检查的完整模式检测表
流处理迁移——触发器迁移、SDP连续模式、连续任务
网络与安全——VPC peering转NCCs、Private Link、防火墙设置
代码模式——每种迁移模式的完整前后代码示例
配置指南——支持的Spark配置、Environments设置、预算策略

Documentation

官方文档

Serverless compute overview: https://docs.databricks.com/en/compute/serverless/
Migration guide: https://docs.databricks.com/en/compute/serverless/migration
Limitations: https://docs.databricks.com/en/compute/serverless/limitations
Best practices: https://docs.databricks.com/en/compute/serverless/best-practices
Serverless notebooks: https://docs.databricks.com/en/compute/serverless/notebooks
Serverless jobs: https://docs.databricks.com/en/jobs/run-serverless-jobs
Serverless SDP: https://docs.databricks.com/en/ldp/serverless
Spark Connect vs. classic: https://docs.databricks.com/en/spark/connect-vs-classic
Unity Catalog upgrade: https://docs.databricks.com/en/data-governance/unity-catalog/upgrade/
Supported Spark configs: https://docs.databricks.com/en/spark/conf#serverless

Serverless compute概述：https://docs.databricks.com/en/compute/serverless/
迁移指南：https://docs.databricks.com/en/compute/serverless/migration
限制：https://docs.databricks.com/en/compute/serverless/limitations
最佳实践：https://docs.databricks.com/en/compute/serverless/best-practices
Serverless notebooks：https://docs.databricks.com/en/compute/serverless/notebooks
Serverless jobs：https://docs.databricks.com/en/jobs/run-serverless-jobs
Serverless SDP：https://docs.databricks.com/en/ldp/serverless
Spark Connect vs classic：https://docs.databricks.com/en/spark/connect-vs-classic
Unity Catalog升级：https://docs.databricks.com/en/data-governance/unity-catalog/upgrade/
支持的Spark配置：https://docs.databricks.com/en/spark/conf#serverless