databricks-serverless-migration
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseServerless Compute Migration
Serverless Compute 迁移
FIRST: Use the parent skill for CLI basics, authentication, and profile selection.
databricks-coreAnalyze existing Databricks code for serverless compute compatibility and guide migration from classic clusters. The skill follows a 4-step migration lifecycle: Ingest the workload → Analyze for compatibility → Test via A/B comparison → Validate and iterate.
注意事项:请先使用父技能完成CLI基础操作、身份验证和配置文件选择。
databricks-core分析现有Databricks代码的serverless compute兼容性,并指导从classic集群进行迁移。本技能遵循四步迁移生命周期:采集工作负载 → 分析兼容性 → 通过A/B对比测试 → 验证并迭代。
When to Use This Skill
适用场景
- Migrating notebooks, jobs, or pipelines from classic compute to serverless
- Checking if existing code is serverless-compatible
- Writing new code that targets serverless compute
- Troubleshooting serverless-specific errors after migration
- Choosing between Performance-Optimized and Standard mode
- 将notebook、任务或管道从classic compute迁移至serverless
- 检查现有代码是否兼容serverless
- 编写针对serverless compute的新代码
- 迁移后排查serverless特有的错误
- 在Performance-Optimized模式和Standard模式之间做选择
Understanding Migration Blockers
迁移阻碍因素说明
Migration blockers fall into three categories. Focus your effort on category 2 — that's where this skill helps most.
| Category | Description | Action |
|---|---|---|
| 1. Feature expanding | Databricks is actively expanding support (e.g., SparkML, custom JDBC) | Use the workaround now and revisit later |
| 2. Code/config change needed | Your code uses patterns that need updating for serverless (e.g., RDDs, DBFS, streaming triggers) | This skill helps here — it detects these patterns and provides fixes |
| 3. Classic-only | Workload requires capabilities not available on serverless (e.g., root OS access, R language) | Keep on classic compute |
迁移阻碍因素分为三类。重点关注第二类——本技能主要针对此类问题提供帮助。
| 类别 | 描述 | 操作 |
|---|---|---|
| 1. 功能扩展中 | Databricks正积极扩展支持范围(如SparkML、自定义JDBC) | 先使用临时解决方案,后续再跟进 |
| 2. 需要修改代码/配置 | 你的代码使用了需要针对serverless更新的模式(如RDDs、DBFS、流触发器) | 本技能可提供帮助——它能检测这些模式并提供修复方案 |
| 3. 仅支持classic | 工作负载需要serverless不具备的功能(如根操作系统访问权限、R语言) | 继续使用classic compute |
Decision Tree: Is My Workload Ready?
决策树:我的工作负载是否已准备好迁移?
Workload → Check language
├── R code → Category 3: keep on classic
├── Scala notebook cells → Category 2: port to PySpark/SQL or compile as JAR
├── Python / SQL → Continue
├── Uses RDD APIs? → Category 2: rewrite to DataFrame API (see fixes below)
├── Uses DBFS paths? → Category 2: migrate to UC Volumes
├── Uses Hive Metastore? → Category 2: migrate to Unity Catalog (or use HMS Federation)
├── Uses df.cache/persist? → Category 1: remove and materialize to Delta (native support coming soon)
├── Uses streaming?
│ ├── ProcessingTime trigger → Category 2: use AvailableNow or migrate to SDP
│ ├── Continuous trigger → Category 2: use SDP continuous mode
│ ├── No trigger specified → Category 2: add explicit .trigger(availableNow=True)
│ └── AvailableNow / Once → Ready ✓
├── Uses init scripts? → Category 2: use Environments
├── Uses VPC peering? → Category 2: use NCCs / Private Link
├── Uses unsupported Spark configs? → Category 2: remove (serverless auto-tunes)
├── Uses custom JDBC drivers? → Category 2: use Lakehouse Federation or built-in JDBC
├── Uses Docker containers? → Category 3: use Environments for libs, or keep on classic
└── All clear → Ready for serverless ✓Workload → Check language
├── R code → Category 3: keep on classic
├── Scala notebook cells → Category 2: port to PySpark/SQL or compile as JAR
├── Python / SQL → Continue
├── Uses RDD APIs? → Category 2: rewrite to DataFrame API (see fixes below)
├── Uses DBFS paths? → Category 2: migrate to UC Volumes
├── Uses Hive Metastore? → Category 2: migrate to Unity Catalog (or use HMS Federation)
├── Uses df.cache/persist? → Category 1: remove and materialize to Delta (native support coming soon)
├── Uses streaming?
│ ├── ProcessingTime trigger → Category 2: use AvailableNow or migrate to SDP
│ ├── Continuous trigger → Category 2: use SDP continuous mode
│ ├── No trigger specified → Category 2: add explicit .trigger(availableNow=True)
│ └── AvailableNow / Once → Ready ✓
├── Uses init scripts? → Category 2: use Environments
├── Uses VPC peering? → Category 2: use NCCs / Private Link
├── Uses unsupported Spark configs? → Category 2: remove (serverless auto-tunes)
├── Uses custom JDBC drivers? → Category 2: use Lakehouse Federation or built-in JDBC
├── Uses Docker containers? → Category 3: use Environments for libs, or keep on classic
└── All clear → Ready for serverless ✓Migration Workflow
迁移流程
Step 1: Ingest — Gather Workload Context
步骤1:采集——收集工作负载上下文
Confirm the migration target is serverless compute. This skill is purpose-built for classic → serverless migrations. The checks, fixes, and workflow all target the serverless compute architecture (Spark Connect, Environments, NCCs). If the user wants to upgrade between classic DBR versions instead, this skill does not apply — classic DBR upgrades have a different compatibility surface and should follow the standard DBR upgrade guide.
Collect the full picture of what needs to migrate to serverless:
- Read the user's notebook/script files
- Identify the classic cluster configuration (instance type, DBR version, Spark configs, init scripts, libraries)
- Note the networking setup (VPC peering, instance profiles, mounts)
- Understand the workload type: batch job, streaming, interactive notebook, pipeline
- Determine the target: the output is always a serverless compute configuration, not a classic cluster with a newer DBR
确认迁移目标为serverless compute。本技能专为classic → serverless迁移设计,所有检查、修复方案和流程均针对serverless compute架构(Spark Connect、Environments、NCCs)。如果用户希望在classic DBR版本之间升级,本技能不适用——classic DBR升级有不同的兼容性范围,应遵循标准DBR升级指南。
收集需要迁移至serverless的完整信息:
- 读取用户的notebook/脚本文件
- 识别classic集群配置(实例类型、DBR版本、Spark配置、初始化脚本、库)
- 记录网络设置(VPC peering、实例配置文件、挂载点)
- 了解工作负载类型:批处理任务、流处理、交互式notebook、管道
- 确定目标:输出始终为serverless compute配置,而非使用更新DBR版本的classic集群
Step 2: Analyze — Scan for Serverless Readiness
步骤2:分析——扫描server就绪状态
Read notebooks before running them — do not rely on failed job runs to discover issues. A pre-run scan surfaces incompatibilities faster than iterating on error traces, and many serverless failures (hardcoded catalog references, init scripts, missing dependencies) are easy to spot statically but expensive to debug after a failed run.
Before creating or running any test job:
- Read every notebook and source file referenced by the job
- Scan for all hardcoded catalog/schema references (e.g., ,
spark.table("main.schema.table"),spark.sql("... FROM main..."))catalog = "main" - Check for dependency patterns: init scripts, local wheel files, custom install functions, lines
%pip install - Locate any or equivalent and resolve the full dependency set
requirements.txt - Flag OS-level installs (,
apt install) for conversion or escalationyum install
Scan the code for patterns that are incompatible with the serverless compute architecture. These checks are serverless-specific — most of these patterns work fine on classic compute regardless of DBR version. For each issue found, report:
- Category: Which of the 3 blocker categories it falls into
- Severity: Blocker (must fix for serverless) / Warning (should fix) / Info (awareness)
- Pattern: What was detected and where
- Fix: Specific remediation targeting serverless compute
Category A: Unsupported APIs
| Pattern | Severity | Fix |
|---|---|---|
| Blocker | |
| Blocker | |
| Blocker | |
| Blocker | |
| Blocker | |
| Blocker | |
| Blocker | |
| Blocker | |
| Blocker | |
| Blocker | |
| Blocker | |
| Blocker | Use |
| Blocker | Not supported — raises |
| Blocker | |
| Blocker | Use UC external locations — no credential configs needed |
| Warning | Remove caching calls. For expensive intermediate results, materialize to a Delta table. Native support coming soon. |
| Warning | Write to Delta table instead |
| Warning | Remove — not needed on serverless |
| Blocker | Port to PySpark/SQL or compile as JAR for job tasks |
| Blocker | No serverless equivalent — keep on classic or port to PySpark |
Hive variable syntax | Warning | Use |
| Blocker | Use |
| Warning | Remove prefix — session-scoped temp views are accessible without qualifier |
Category B: Data Access
| Pattern | Severity | Fix |
|---|---|---|
| Blocker | Replace with |
| Warning | Use |
| Warning | Replace persistent paths with |
| Blocker | Create UC external location + external volume |
| Warning | Migrate to UC or use HMS Federation: |
| Blocker | Prepend |
| IAM instance profile references | Warning | Use UC external locations + storage credentials |
| Hive SerDe tables | Blocker | Migrate to Delta tables in UC |
Category C: Streaming
| Pattern | Severity | Fix |
|---|---|---|
| Blocker | |
| Blocker | Migrate to SDP continuous mode |
No | Blocker | Must add |
| Kafka source | Info | Works with AvailableNow; use |
| Auto Loader | Info | Works; use |
Category D: Configuration
| Pattern | Severity | Fix |
|---|---|---|
Unsupported | Warning | Remove — only 6 configs supported: |
| Init scripts | Blocker | Use Environments: add dependencies via notebook Environment panel or |
| Cluster policies | Info | Use budget policies for cost attribution |
| Docker containers | Blocker | Use Environments for library management. Keep on classic only if Docker is needed for OS-level customization. |
| Warning | Relative |
| Warning | Use |
| Blocker | Use SQL session variables: |
| Environment variables (in init scripts) | Warning | Use |
| Explicit executor count/memory configs | Info | Remove — serverless auto-scales and auto-tunes |
Category E: Libraries
| Pattern | Severity | Fix |
|---|---|---|
| JAR libraries in notebooks | Blocker | Compile as JAR job task (Scala 2.13, JDK 17, env version 4+) |
| Maven coordinates | Blocker | Replace with PyPI packages in Environments |
| Warning | Pin versions: |
| Custom Spark data sources (v1/v2 JARs) | Blocker | Use Lakehouse Federation, Lakeflow Connect, or PySpark custom data sources |
| LZO format files | Blocker | Convert to Parquet or Delta |
Category F: Networking
| Pattern | Severity | Fix |
|---|---|---|
| VPC peering configuration | Blocker | Create NCCs, get stable IPs, allowlist on resource firewalls. S3 same-region access works without changes. |
| Direct S3/ADLS access without UC | Warning | Use UC external locations |
Category G: Sizing & Debugging
| Pattern | Severity | Fix |
|---|---|---|
| Large driver memory configs | Info | Serverless REPL default is 8GB (high-memory option for 16GB+ via Environments) |
| Spark UI references | Info | Use Query Profile instead: click "See performance" under cell output |
运行前先读取notebook——不要依赖任务失败来发现问题。预运行扫描比通过错误追踪迭代更快地发现不兼容问题,许多serverless故障(硬编码目录引用、初始化脚本、缺失依赖)静态检查容易发现,但在任务失败后调试成本很高。
在创建或运行任何测试任务前:
- 读取任务引用的所有notebook和源文件
- 扫描所有硬编码的目录/模式引用(如、
spark.table("main.schema.table")、spark.sql("... FROM main..."))catalog = "main" - 检查依赖模式:初始化脚本、本地wheel文件、自定义安装函数、行
%pip install - 找到任何或等效文件并解析完整依赖集
requirements.txt - 标记需要转换或升级的操作系统级安装命令(、
apt install)yum install
扫描代码中与serverless compute架构不兼容的模式。这些检查是serverless特有的——这些模式在classic compute上无论DBR版本如何都能正常工作。对于发现的每个问题,需报告:
- 类别:属于三类阻碍因素中的哪一类
- 严重程度:阻碍(必须修复才能使用serverless)/警告(建议修复)/信息(仅需知晓)
- 模式:检测到什么问题以及位置
- 修复方案:针对serverless compute的具体修复措施
类别A:不支持的API
| 模式 | 严重程度 | 修复方案 |
|---|---|---|
| 阻碍 | |
| 阻碍 | |
| 阻碍 | |
| 阻碍 | |
| 阻碍 | |
| 阻碍 | |
| 阻碍 | |
| 阻碍 | |
| 阻碍 | |
| 阻碍 | |
| 阻碍 | |
| 阻碍 | 直接使用 |
| 阻碍 | 不支持——会抛出 |
| 阻碍 | |
| 阻碍 | 使用UC外部位置——无需凭证配置 |
| 警告 | 移除缓存调用。对于昂贵的中间结果,将其物化到Delta表。原生支持即将推出。 |
| 警告 | 写入Delta表替代 |
| 警告 | 移除——serverless不需要 |
| 阻碍 | 迁移到PySpark/SQL或编译为JAR用于任务 |
| 阻碍 | 无serverless等效方案——继续使用classic或迁移到PySpark |
Hive变量语法 | 警告 | 使用 |
| 阻碍 | 使用 |
查询中的 | 警告 | 移除前缀——会话级临时视图无需限定符即可访问 |
类别B:数据访问
| 模式 | 严重程度 | 修复方案 |
|---|---|---|
| 阻碍 | 替换为 |
| 警告 | 使用 |
| 警告 | 持久化路径替换为 |
| 阻碍 | 创建UC外部位置 + 外部卷 |
| 警告 | 迁移到UC或使用HMS Federation: |
未使用 | 阻碍 | 在notebook开头的所有CREATE语句前添加 |
| IAM实例配置文件引用 | 警告 | 使用UC外部位置 + 存储凭证 |
| Hive SerDe表 | 阻碍 | 迁移到UC中的Delta表 |
类别C:流处理
| 模式 | 严重程度 | 修复方案 |
|---|---|---|
| 阻碍 | |
| 阻碍 | 迁移到SDP连续模式 |
writeStream未调用 | 阻碍 | 必须添加 |
| Kafka源 | 信息 | 支持AvailableNow;使用 |
| Auto Loader | 信息 | 支持;使用 |
类别D:配置
| 模式 | 严重程度 | 修复方案 |
|---|---|---|
不支持的 | 警告 | 移除——仅支持6种配置: |
| 初始化脚本 | 阻碍 | 使用Environments:通过notebook的Environment面板或 |
| 集群策略 | 信息 | 使用预算策略进行成本归因 |
| Docker容器 | 阻碍 | 使用Environments进行库管理。仅当需要操作系统级自定义时才继续使用classic |
| 警告 | 相对路径的 |
| 警告 | 使用 |
| 阻碍 | 使用SQL会话变量: |
| 初始化脚本中的环境变量 | 警告 | 使用 |
| 显式执行器数量/内存配置 | 信息 | 移除——serverless会自动扩缩容和调优 |
类别E:库
| 模式 | 严重程度 | 修复方案 |
|---|---|---|
| notebook中的JAR库 | 阻碍 | 编译为JAR任务(Scala 2.13、JDK 17、环境版本4+) |
| Maven坐标 | 阻碍 | 替换为Environments中的PyPI包 |
| 警告 | 固定版本: |
| 自定义Spark数据源(v1/v2 JAR) | 阻碍 | 使用Lakehouse Federation、Lakeflow Connect或PySpark自定义数据源 |
| LZO格式文件 | 阻碍 | 转换为Parquet或Delta格式 |
类别F:网络
| 模式 | 严重程度 | 修复方案 |
|---|---|---|
| VPC peering配置 | 阻碍 | 创建NCCs,获取稳定IP,在资源防火墙中添加白名单。同区域S3访问无需更改。 |
| 未使用UC直接访问S3/ADLS | 警告 | 使用UC外部位置 |
类别G:规模与调试
| 模式 | 严重程度 | 修复方案 |
|---|---|---|
| 大驱动内存配置 | 信息 | Serverless REPL默认是8GB(可通过Environments选择16GB+的高内存选项) |
| Spark UI引用 | 信息 | 使用Query Profile替代:点击单元格输出下方的"See performance" |
Required Output: Serverless Environment Specification
必需输出:Serverless Environment规范
The migration output MUST include a Serverless Environment specification alongside migrated code. Generate this by:
- Scanning all statements and
importlines to detect required packages%pip install - Extracting init script commands from the job configuration
pip install - Producing a JSON block suitable for the Jobs API field:
environments
json
{
"environment_key": "Default",
"spec": {
"client": "2",
"dependencies": ["mlflow==2.12.1", "scikit-learn==1.3.0", "xgboost==2.0.3"]
}
}Important: ML runtime libraries (mlflow, scikit-learn, hyperopt, xgboost, tensorflow, torch, etc.) are NOT pre-installed on serverless compute. They MUST be listed explicitly in the environment spec . ML runtime is NOT available on serverless — always use Serverless Environments with explicit package dependencies instead.
dependencies迁移输出必须包含Serverless Environment规范以及迁移后的代码。生成方式如下:
- 扫描所有语句和
import行以检测所需包%pip install - 从任务配置中提取初始化脚本的命令
pip install - 生成适用于Jobs API 字段的JSON块:
environments
json
{
"environment_key": "Default",
"spec": {
"client": "2",
"dependencies": ["mlflow==2.12.1", "scikit-learn==1.3.0", "xgboost==2.0.3"]
}
}重要提示:ML运行时库(mlflow、scikit-learn、hyperopt、xgboost、tensorflow、torch等)未预安装在serverless compute上。必须在环境规范的中显式列出。serverless上不提供ML运行时——始终使用带有显式包依赖的Serverless Environments替代。
dependenciesStep 3: Test — Two-Branch Strategy
步骤3:测试——双分支策略
Use separate branches for testing and production to keep test-only workarounds out of the code that ships. The test branch is a safe sandbox for experimentation; the production branch contains only changes that production actually needs.
| Aspect | Test branch | Production branch |
|---|---|---|
| Name pattern | | |
| Base branch | Any working branch | Must be master |
| Purpose | Verify serverless compatibility | Deploy to production |
| Test-only workarounds | Yes (catalog overrides, sampled data, date limits) | No |
| Compatibility fixes | Yes (discover them here) | Yes (apply the validated ones) |
| Job config changes | Yes (for the test job) | Yes (for the prod job) |
| Catalog | Test catalog | Production catalog |
| PR required | No | Yes |
| Merged to master | No | Yes |
Test branch (): Temporary, no PR needed.
serverless-test-{job_name}-{timestamp}- Create a branch from your current working branch
- Set up test data: create sampled copies of upstream tables in a test catalog using job lineage (see test data setup below)
- Parameterize the catalog so the notebook works with both test and production data (see catalog parameterization pattern below)
- Apply all compatibility fixes discovered in Step 2
- Create a serverless test job and run it
- If it fails, get the error output, debug, fix, and retry
- Document which changes are test workarounds vs. real compatibility fixes
Production branch (): PR required, created from master.
serverless-prod-{job_name}- Create a new branch from master (NOT from the test branch)
- Apply ONLY the real compatibility fixes — no test workarounds
- Apply job config changes (see job config transformation below)
- Commit and create a PR
使用单独的分支进行测试和生产,避免仅用于测试的临时解决方案进入生产代码。测试分支是安全的实验沙箱;生产分支仅包含生产实际需要的更改。
| 维度 | 测试分支 | 生产分支 |
|---|---|---|
| 命名模式 | | |
| 基础分支 | 任何可用分支 | 必须是master |
| 用途 | 验证serverless兼容性 | 部署到生产 |
| 仅测试临时解决方案 | 是(目录覆盖、采样数据、日期限制) | 否 |
| 兼容性修复 | 是(在此分支发现问题) | 是(应用已验证的修复) |
| 任务配置更改 | 是(针对测试任务) | 是(针对生产任务) |
| 目录 | 测试目录 | 生产目录 |
| 是否需要PR | 否 | 是 |
| 是否合并到master | 否 | 是 |
测试分支 ():临时分支,无需PR。
serverless-test-{job_name}-{timestamp}- 从当前工作分支创建新分支
- 设置测试数据:使用任务血缘关系在测试目录中创建上游表的采样副本(见下方测试数据设置)
- 参数化目录,使notebook可同时适配测试和生产数据(见下方目录参数化模式)
- 应用步骤2中发现的所有兼容性修复
- 创建serverless测试任务并运行
- 如果失败,获取错误输出、调试、修复并重试
- 记录哪些更改是仅测试临时方案 vs 真实兼容性修复
生产分支 ():需要PR,从master创建。
serverless-prod-{job_name}- 从master创建新分支(不要从测试分支创建)
- 仅应用真实兼容性修复——不包含仅测试临时方案
- 应用任务配置更改(见下方任务配置转换)
- 提交并创建PR
Test Data Setup
测试数据设置
When the job reads from production tables, do not point the test job at production data. Instead, create sampled copies of upstream tables in a dedicated test catalog and run the test job against those.
The recommended pattern:
- Resolve the job's upstream tables from its lineage (or from a static scan of the notebook)
- For each upstream table, run (typical N: 10–1000 rows)
CREATE TABLE IF NOT EXISTS <test_catalog>.<schema>.<table> AS SELECT * FROM <prod_catalog>.<schema>.<table> LIMIT N - Keep the schema names identical to production — only the catalog changes
- Make the operation idempotent: skip tables that already exist, so the setup step is safe to re-run
- Require a running SQL warehouse and permission on the test catalog
CREATE TABLE
With schema names preserved, the same notebook code runs in both environments — only the widget value changes.
catalog当任务读取生产表时,不要让测试任务指向生产数据。相反,在专用测试目录中创建上游表的采样副本,并让测试任务针对这些副本运行。
推荐模式:
- 从任务血缘关系(或notebook静态扫描)解析任务的上游表
- 对每个上游表运行(典型N:10–1000行)
CREATE TABLE IF NOT EXISTS <test_catalog>.<schema>.<table> AS SELECT * FROM <prod_catalog>.<schema>.<table> LIMIT N - 保持模式名称与生产环境一致——仅更改目录
- 确保操作是幂等的:跳过已存在的表,使设置步骤可安全重运行
- 需要运行中的SQL warehouse以及测试目录的权限
CREATE TABLE
保持模式名称一致后,同一notebook代码可在两种环境中运行——仅需更改小部件的值。
catalogDecision Tree: Should This Change Go to Production?
决策树:此更改是否应进入生产?
| Change type | Production? | Reason |
|---|---|---|
| Remove incompatible Spark configs | Yes | Serverless compatibility fix |
| Update library versions | Yes | Serverless compatibility fix |
| Replace DBFS paths with UC Volumes | Yes | Serverless compatibility fix |
| Remove init scripts, add Environments | Yes | Serverless compatibility fix |
| Fix hardcoded cluster settings | Yes | Serverless compatibility fix |
| Catalog override to test catalog | No | Test workaround only |
| Empty DataFrame handling for missing test data | No | Test workaround only |
| Date range limiting for faster tests | No | Test workaround only |
Simple test: Would production fail without this change on serverless? If yes → include. If no → test branch only.
| 更改类型 | 是否进入生产? | 原因 |
|---|---|---|
| 移除不兼容的Spark配置 | 是 | Serverless兼容性修复 |
| 更新库版本 | 是 | Serverless兼容性修复 |
| 将DBFS路径替换为UC Volumes | 是 | Serverless兼容性修复 |
| 移除初始化脚本,添加Environments | 是 | Serverless兼容性修复 |
| 修复硬编码集群设置 | 是 | Serverless兼容性修复 |
| 目录覆盖为测试目录 | 否 | 仅为测试临时方案 |
| 缺失测试数据时的空DataFrame处理 | 否 | 仅为测试临时方案 |
| 日期范围限制以加快测试 | 否 | 仅为测试临时方案 |
简单测试:如果生产环境在serverless上运行时没有此更改会失败?如果是→包含。如果否→仅保留在测试分支。
A/B Comparison
A/B对比
After both branches are ready, compare outputs:
python
undefined两个分支都准备好后,对比输出:
python
undefinedCompare outputs between classic and serverless runs
Compare outputs between classic and serverless runs
classic_df = spark.read.table("main.output.classic_results")
serverless_df = spark.read.table("main.output.serverless_results")
assert classic_df.count() == serverless_df.count(), "Row count mismatch"
assert classic_df.schema == serverless_df.schema, "Schema mismatch"
diff = classic_df.exceptAll(serverless_df)
assert diff.count() == 0, f"Found {diff.count()} differing rows"
**Temporary bridge configs**: If the serverless run fails, you may temporarily set supported Spark configs (like `spark.sql.shuffle.partitions`) to bridge gaps. Mark these as temporary — remove once the workload stabilizes.classic_df = spark.read.table("main.output.classic_results")
serverless_df = spark.read.table("main.output.serverless_results")
assert classic_df.count() == serverless_df.count(), "Row count mismatch"
assert classic_df.schema == serverless_df.schema, "Schema mismatch"
diff = classic_df.exceptAll(serverless_df)
assert diff.count() == 0, f"Found {diff.count()} differing rows"
**临时桥接配置**:如果serverless运行失败,可临时设置支持的Spark配置(如`spark.sql.shuffle.partitions`)来填补差距。标记这些配置为临时——工作负载稳定后移除。Step 4: Validate — Confirm and Monitor
步骤4:验证——确认并监控
Once the A/B comparison passes:
- Merge the production branch PR
- Switch the production job to serverless compute
- Monitor cost via system tables () and budget policies
system.billing.usage - Remove any temporary bridge configurations
- Set up budget alerts for cost visibility
A/B对比通过后:
- 合并生产分支的PR
- 将生产任务切换到serverless compute
- 通过系统表()和预算策略监控成本
system.billing.usage - 移除所有临时桥接配置
- 设置预算警报以掌握成本情况
Migration Deliverables
迁移交付物
At the end of a successful migration run, surface these artifacts so the user can verify the work and inspect the results:
| Deliverable | What it is | Why it matters |
|---|---|---|
| Test branch name/URL | The | Lets the user see what changed during experimentation, including test-only adjustments |
| Production branch name/URL | The | This is what ships — the user reviews and merges the PR from here |
| Test job ID and run URL | The serverless test job that validated the migration | Proves the notebook runs successfully on serverless against sampled data |
| Classic vs serverless comparison | A/B result summary (row counts, schema check, row-level diff) | Confidence that serverless output matches classic output |
| Serverless environment spec | The | Ready to paste into the production job config |
| Change summary | List of what went to production vs test-only (with reasons) | Audit trail for the PR reviewer |
If any deliverable is missing, the migration is incomplete — do not mark it as done.
成功完成迁移后,向用户展示以下工件,以便用户验证工作并检查结果:
| 交付物 | 说明 | 重要性 |
|---|---|---|
| 测试分支名称/URL | 包含所有兼容性修复和仅测试临时方案的 | 让用户查看实验期间的所有更改,包括仅测试的调整 |
| 生产分支名称/URL | 仅包含已验证兼容性修复的 | 这是要部署的内容——用户从此处审核并合并PR |
| 测试任务ID和运行URL | 验证迁移的serverless测试任务 | 证明notebook在serverless上针对采样数据成功运行 |
| Classic与serverless对比结果 | A/B结果摘要(行数、模式检查、行级差异) | 确保serverless输出与classic输出一致 |
| Serverless环境规范 | | 可直接粘贴到生产任务配置中 |
| 更改摘要 | 进入生产的更改与仅测试更改的列表(含原因) | 为PR审核者提供审计跟踪 |
如果任何交付物缺失,迁移未完成——不要标记为已完成。
Stopping Conditions
终止条件
Do not attempt workarounds for these — surface them to the user and stop:
- Permission failures on source tables, the test catalog, or the workspace
- Category 3 blockers (R code, custom Spark data source JARs, features that require classic compute)
- SQL warehouse or test catalog not available
- Repeated failures (typically 5+) with no new information in the error trace — generate a failure report instead (see Failure Reporting Protocol)
不要尝试以下情况的临时解决方案——告知用户并停止:
- 源表、测试目录或工作区的权限失败
- 第三类阻碍因素(R代码、自定义Spark数据源JAR、需要classic compute的功能)
- SQL warehouse或测试目录不可用
- 重复失败(通常5次以上)且错误跟踪中无新信息——生成失败报告替代(见失败报告协议)
Quick Fixes Reference
快速修复参考
Replace DBFS paths with UC Volumes
将DBFS路径替换为UC Volumes
python
undefinedpython
undefinedBEFORE (classic)
BEFORE (classic)
df = spark.read.csv("dbfs:/mnt/datalake/sales/data.csv", header=True)
df.write.parquet("dbfs:/mnt/output/results")
df = spark.read.csv("dbfs:/mnt/datalake/sales/data.csv", header=True)
df.write.parquet("dbfs:/mnt/output/results")
AFTER (serverless)
AFTER (serverless)
df = spark.read.csv("/Volumes/main/sales/raw_data/data.csv", header=True)
df.write.parquet("/Volumes/main/analytics/output/results")
df = spark.read.csv("/Volumes/main/sales/raw_data/data.csv", header=True)
df.write.parquet("/Volumes/main/analytics/output/results")
Replace mounts with external volumes (SQL):
Replace mounts with external volumes (SQL):
CREATE EXTERNAL VOLUME main.data.raw_files LOCATION 's3://my-bucket/data/';
CREATE EXTERNAL VOLUME main.data.raw_files LOCATION 's3://my-bucket/data/';
Then use: /Volumes/main/data/raw_files/
Then use: /Volumes/main/data/raw_files/
Pandas paths too:
Pandas paths too:
BEFORE: pd.read_csv("/dbfs/mnt/data/file.csv")
BEFORE: pd.read_csv("/dbfs/mnt/data/file.csv")
AFTER: pd.read_csv("/Volumes/main/data/volume/file.csv")
AFTER: pd.read_csv("/Volumes/main/data/volume/file.csv")
undefinedundefinedReplace RDD operations with DataFrames
将RDD操作替换为DataFrame
python
from pyspark.sql import functions as Fpython
from pyspark.sql import functions as Fparallelize + map
parallelize + map
BEFORE:
BEFORE:
rdd = sc.parallelize([1, 2, 3])
result = rdd.map(lambda x: x * 2).collect()
rdd = sc.parallelize([1, 2, 3])
result = rdd.map(lambda x: x * 2).collect()
AFTER:
AFTER:
df = spark.createDataFrame([(1,), (2,), (3,)], ["value"])
result = df.select((F.col("value") * 2).alias("value")).collect()
df = spark.createDataFrame([(1,), (2,), (3,)], ["value"])
result = df.select((F.col("value") * 2).alias("value")).collect()
flatMap (word splitting)
flatMap (word splitting)
BEFORE:
BEFORE:
words = sc.parallelize(["hello world"]).flatMap(lambda l: l.split(" ")).collect()
words = sc.parallelize(["hello world"]).flatMap(lambda l: l.split(" ")).collect()
AFTER:
AFTER:
df = spark.createDataFrame([("hello world",)], ["line"])
words = df.select(F.explode(F.split("line", " ")).alias("word")).collect()
df = spark.createDataFrame([("hello world",)], ["line"])
words = df.select(F.explode(F.split("line", " ")).alias("word")).collect()
groupByKey
groupByKey
BEFORE:
BEFORE:
rdd = sc.parallelize([("a", 1), ("b", 2), ("a", 3)])
grouped = rdd.groupByKey().mapValues(list).collect()
rdd = sc.parallelize([("a", 1), ("b", 2), ("a", 3)])
grouped = rdd.groupByKey().mapValues(list).collect()
AFTER:
AFTER:
df = spark.createDataFrame([("a", 1), ("b", 2), ("a", 3)], ["key", "value"])
grouped = df.groupBy("key").agg(F.collect_list("value").alias("values")).collect()
df = spark.createDataFrame([("a", 1), ("b", 2), ("a", 3)], ["key", "value"])
grouped = df.groupBy("key").agg(F.collect_list("value").alias("values")).collect()
mapPartitions → applyInPandas
mapPartitions → applyInPandas
BEFORE:
BEFORE:
def process_partition(iterator):
yield sum(iterator)
result = sc.parallelize(range(100), 4).mapPartitions(process_partition).collect()
def process_partition(iterator):
yield sum(iterator)
result = sc.parallelize(range(100), 4).mapPartitions(process_partition).collect()
AFTER:
AFTER:
import pandas as pd
def process_group(pdf: pd.DataFrame) -> pd.DataFrame:
return pd.DataFrame({"total": [pdf["id"].sum()]})
result = (spark.range(100).repartition(4)
.groupBy(F.spark_partition_id())
.applyInPandas(process_group, schema="total long")
.collect())
import pandas as pd
def process_group(pdf: pd.DataFrame) -> pd.DataFrame:
return pd.DataFrame({"total": [pdf["id"].sum()]})
result = (spark.range(100).repartition(4)
.groupBy(F.spark_partition_id())
.applyInPandas(process_group, schema="total long")
.collect())
textFile
textFile
BEFORE: rdd = sc.textFile("/mnt/data/file.txt")
BEFORE: rdd = sc.textFile("/mnt/data/file.txt")
AFTER: df = spark.read.text("/Volumes/catalog/schema/volume/file.txt")
AFTER: df = spark.read.text("/Volumes/catalog/schema/volume/file.txt")
wholeTextFiles
wholeTextFiles
BEFORE: rdd = sc.wholeTextFiles("/mnt/data/dir/")
BEFORE: rdd = sc.wholeTextFiles("/mnt/data/dir/")
AFTER: df = spark.read.format("binaryFile").load("/Volumes/catalog/schema/volume/dir/")
AFTER: df = spark.read.format("binaryFile").load("/Volumes/catalog/schema/volume/dir/")
undefinedundefinedFix streaming triggers
修复流触发器
python
undefinedpython
undefinedCRITICAL: Omitting .trigger() defaults to ProcessingTime(0) — not supported on serverless
CRITICAL: Omitting .trigger() defaults to ProcessingTime(0) — not supported on serverless
BEFORE (fails on serverless — no trigger = ProcessingTime default):
BEFORE (fails on serverless — no trigger = ProcessingTime default):
query = df.writeStream.format("delta").outputMode("append").start(path)
query = df.writeStream.format("delta").outputMode("append").start(path)
BEFORE (fails — explicit ProcessingTime):
BEFORE (fails — explicit ProcessingTime):
query = df.writeStream.trigger(processingTime="10 seconds").start(path)
query = df.writeStream.trigger(processingTime="10 seconds").start(path)
AFTER (serverless compatible):
AFTER (serverless compatible):
query = (df.writeStream
.format("delta")
.outputMode("append")
.trigger(availableNow=True)
.option("checkpointLocation", "/Volumes/main/data/checkpoints/stream1")
.start("/Volumes/main/data/output/stream1"))
query.awaitTermination()
query = (df.writeStream
.format("delta")
.outputMode("append")
.trigger(availableNow=True)
.option("checkpointLocation", "/Volumes/main/data/checkpoints/stream1")
.start("/Volumes/main/data/output/stream1"))
query.awaitTermination()
With OOM prevention (recommended for large sources):
With OOM prevention (recommended for large sources):
query = (spark.readStream.format("delta")
.option("maxFilesPerTrigger", 100) # Delta/Parquet sources
.option("maxBytesPerTrigger", "10g") # Limit data per micro-batch
.load(input_path)
.writeStream
.trigger(availableNow=True)
.option("checkpointLocation", checkpoint_path)
.start(output_path))
query = (spark.readStream.format("delta")
.option("maxFilesPerTrigger", 100) # Delta/Parquet sources
.option("maxBytesPerTrigger", "10g") # Limit data per micro-batch
.load(input_path)
.writeStream
.trigger(availableNow=True)
.option("checkpointLocation", checkpoint_path)
.start(output_path))
Kafka: use maxOffsetsPerTrigger
Kafka: use maxOffsetsPerTrigger
query = (spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "broker:9092")
.option("subscribe", "topic1")
.option("maxOffsetsPerTrigger", 100000) # Kafka-specific
.load()
.writeStream.trigger(availableNow=True).start(output_path))
query = (spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "broker:9092")
.option("subscribe", "topic1")
.option("maxOffsetsPerTrigger", 100000) # Kafka-specific
.load()
.writeStream.trigger(availableNow=True).start(output_path))
Auto Loader: use cloudFiles.maxFilesPerTrigger (note the prefix)
Auto Loader: use cloudFiles.maxFilesPerTrigger (note the prefix)
query = (spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.maxFilesPerTrigger", 1000) # cloudFiles. prefix
.load(landing_path)
.writeStream.trigger(availableNow=True).start(output_path))
undefinedquery = (spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.option("cloudFiles.maxFilesPerTrigger", 1000) # cloudFiles. prefix
.load(landing_path)
.writeStream.trigger(availableNow=True).start(output_path))
undefinedRemove caching
移除缓存
python
undefinedpython
undefinedBEFORE (classic):
BEFORE (classic):
df = spark.read.parquet(path)
df.cache()
df.count() # materialize cache
result1 = df.filter("status = 'active'")
result2 = df.groupBy("region").agg(F.sum("revenue"))
df = spark.read.parquet(path)
df.cache()
df.count() # materialize cache
result1 = df.filter("status = 'active'")
result2 = df.groupBy("region").agg(F.sum("revenue"))
AFTER (serverless — remove .cache(); native support coming soon):
AFTER (serverless — remove .cache(); native support coming soon):
df = spark.read.parquet(path)
result1 = df.filter("status = 'active'")
result2 = df.groupBy("region").agg(F.sum("revenue"))
df = spark.read.parquet(path)
result1 = df.filter("status = 'active'")
result2 = df.groupBy("region").agg(F.sum("revenue"))
For truly expensive intermediate results, materialize to Delta:
For truly expensive intermediate results, materialize to Delta:
expensive_df.write.format("delta").mode("overwrite").saveAsTable("main.scratch.intermediate")
result = spark.table("main.scratch.intermediate")
expensive_df.write.format("delta").mode("overwrite").saveAsTable("main.scratch.intermediate")
result = spark.table("main.scratch.intermediate")
SQL equivalent:
SQL equivalent:
BEFORE: CACHE TABLE my_table
BEFORE: CACHE TABLE my_table
AFTER: (just remove the CACHE TABLE statement)
AFTER: (just remove the CACHE TABLE statement)
undefinedundefinedOther quick fixes
其他快速修复
| Pattern | Fix | Full example |
|---|---|---|
| Use SparkSession equivalents: | Code Patterns |
| Init scripts | Move to Environment panel or | Code Patterns |
| Hive Metastore tables | Use HMS Federation as bridge ( | Code Patterns |
| Custom JDBC JARs | Use Lakehouse Federation ( | Code Patterns |
| Spark UI debugging | Use Query Profile: click "See performance" under cell output, or | Code Patterns |
| 模式 | 修复方案 | 完整示例 |
|---|---|---|
| 使用SparkSession等效方案: | 代码模式 |
| 初始化脚本 | 移至Environment面板或 | 代码模式 |
| Hive Metastore表 | 使用HMS Federation作为桥梁( | 代码模式 |
| 自定义JDBC JAR | 使用Lakehouse Federation( | 代码模式 |
| Spark UI调试 | 使用Query Profile:点击单元格输出下方的"See performance",或 | 代码模式 |
Detect serverless at runtime
运行时检测serverless环境
python
import os
is_serverless = os.getenv("IS_SERVERLESS", "").lower() == "true"python
import os
is_serverless = os.getenv("IS_SERVERLESS", "").lower() == "true"Transform job config from classic to serverless
将任务配置从classic转换为serverless
Remove /, add with serverless spec, replace with , remove . See Configuration Guide for full before/after JSON and environment version mapping.
job_clustersnew_clusterenvironmentsjob_cluster_keyenvironment_keyinit_scriptsEnvironment version mapping (match to the DBR version the workload was on):
| Classic DBR | Serverless | Python |
|---|---|---|
| 13.x, 14.x | | 3.10 |
| 15.x | | 3.11 |
| 16.x+ | | 3.12 |
移除/,添加包含serverless规范的,将替换为,移除。完整的前后JSON示例和环境版本映射请参考配置指南。
job_clustersnew_clusterenvironmentsjob_cluster_keyenvironment_keyinit_scripts环境版本映射(与工作负载使用的DBR版本匹配):
| Classic DBR | Serverless | Python |
|---|---|---|
| 13.x, 14.x | | 3.10 |
| 15.x | | 3.11 |
| 16.x+ | | 3.12 |
Job Definition Migration
任务定义迁移
When migrating a job, the job configuration JSON must be transformed alongside notebook code. The agent should perform all of the following:
Init scripts to Serverless Environments: Detect in the job JSON. Extract all commands and convert them to Environment . For OS-level packages (/) that have pip equivalents (e.g., becomes ), convert them. Flag OS-level packages without pip equivalents as serverless-incompatible (Category 3).
init_scriptspip installdependenciesapt installyum installapt install python3-opencvopencv-pythonCluster libraries (Maven/JAR) to Environment or Volumes: Maven coordinates for Python-wrapping JARs should be replaced with their PyPI equivalent in the Environment spec. Custom JARs on DBFS need to be moved to and referenced there. Custom Spark data source JARs (v1/v2) are a Category 3 blocker — flag them for classic retention.
/Volumes/<your_catalog>/schema/volume/job_clusters to serverless compute: Remove / blocks entirely. Add an array with the serverless spec. Replace in each task with . Remove , , , . See Configuration Guide for a complete before/after example.
job_clustersnew_clusterenvironmentsjob_cluster_keyenvironment_keyinit_scriptsnum_workersnode_type_idspark_versionspark_conf migration: Scan all calls in the notebook and entries in the job JSON. For each:
spark.conf.set(...)spark_conf- Supported (keep): ,
spark.sql.shuffle.partitions,spark.sql.session.timeZone,spark.sql.ansi.enabled,spark.sql.files.maxPartitionBytes,spark.sql.legacy.timeParserPolicyspark.databricks.execution.timeout - Auto-tuned (remove with comment): AQE configs, Delta auto-compact, executor/driver sizing, parallelism configs
- Credential configs (remove): ,
fs.s3a.*— replaced by UC external locationsfs.azure.* - Add a code comment at each removal explaining why: or
# Removed: auto-tuned on serverless# Removed: use UC external locations instead
迁移任务时,任务配置JSON必须与notebook代码一起转换。智能体应执行以下所有操作:
初始化脚本转换为Serverless Environments:检测任务JSON中的。提取所有命令并转换为Environment的。对于有pip等效包的操作系统级包(如转换为),进行转换。将无pip等效包的操作系统级包标记为不兼容serverless(第三类)。
init_scriptspip installdependenciesapt install python3-opencvopencv-python集群库(Maven/JAR)转换为Environment或Volumes:用于Python包装JAR的Maven坐标应替换为Environment规范中的PyPI等效包。DBFS上的自定义JAR需要移至并在此引用。自定义Spark数据源JAR(v1/v2)属于第三类阻碍因素——标记为需保留在classic环境。
/Volumes/<your_catalog>/schema/volume/job_clusters转换为serverless compute:完全移除 / 块。添加包含serverless规范的数组。将每个任务中的替换为。移除、、、。完整的前后示例请参考配置指南。
job_clustersnew_clusterenvironmentsjob_cluster_keyenvironment_keyinit_scriptsnum_workersnode_type_idspark_versionspark_conf迁移:扫描notebook中的所有调用和任务JSON中的条目。对于每个配置:
spark.conf.set(...)spark_conf- 支持(保留):、
spark.sql.shuffle.partitions、spark.sql.session.timeZone、spark.sql.ansi.enabled、spark.sql.files.maxPartitionBytes、spark.sql.legacy.timeParserPolicyspark.databricks.execution.timeout - 自动调优(移除并添加注释):AQE配置、Delta自动压缩、执行器/驱动大小、并行度配置
- 凭证配置(移除):、
fs.s3a.*——替换为UC外部位置fs.azure.* - 在每个移除的位置添加代码注释说明原因:或
# Removed: auto-tuned on serverless# Removed: use UC external locations instead
Parameterize catalogs for testing
参数化目录用于测试
python
dbutils.widgets.text("catalog", "main") # Default to production
catalog = dbutils.widgets.get("catalog")
df = spark.table(f"{catalog}.sales.orders")python
dbutils.widgets.text("catalog", "main") # Default to production
catalog = dbutils.widgets.get("catalog")
df = spark.table(f"{catalog}.sales.orders")Pass catalog="test_catalog" as a job parameter during testing
Pass catalog="test_catalog" as a job parameter during testing
See [Configuration Guide](references/configuration-guide.md) for mock table catalog mapping and test job creation patterns.
模拟表目录映射和测试任务创建模式请参考[配置指南](references/configuration-guide.md)。Debug failed serverless runs
调试失败的serverless运行
Always get the actual error with before guessing. Common errors:
w.jobs.get_run_output(run_id=...)| Error | Fix |
|---|---|
| Add |
| Temp view name collision — use unique names |
| DBFS/HMS table not accessible — migrate to UC |
| SparkContext/RDD used — rewrite to DataFrame |
| Package installation timeout | Pin versions; do NOT install PySpark as a dependency |
| Add to environment spec |
| Replace with |
| Use managed tables or |
| Add |
| Category 3 blocker — custom JAR data source needs classic compute |
| Ensure comments are inside MAGIC blocks, not straddling cell delimiters |
See Configuration Guide for the full error reference and SDK code examples.
在猜测原因前,务必使用获取实际错误。常见错误:
w.jobs.get_run_output(run_id=...)| 错误 | 修复方案 |
|---|---|
| 添加 |
| 临时视图名称冲突——使用唯一名称 |
| DBFS/HMS表不可访问——迁移到UC |
| 使用了SparkContext/RDD——重写为DataFrame |
| 包安装超时 | 固定版本;不要将PySpark作为依赖安装 |
| 添加到环境规范的 |
| 替换为 |
| 使用托管表或 |
| 在CREATE语句前添加 |
| 第三类阻碍因素——自定义JAR数据源需要classic compute |
迁移后出现 | 确保注释在MAGIC块内,不要跨单元格分隔符 |
完整的错误参考和SDK代码示例请参考配置指南。
Performance Mode Selection
性能模式选择
| Criteria | Performance-Optimized | Standard |
|---|---|---|
| Startup time | <50 seconds | 4-6 minutes |
| Cost | Higher | Significantly lower |
| Available for | Notebooks, Jobs, SDP | Jobs and SDP only |
| Best for | Interactive work, dev, time-sensitive | Batch ETL, scheduled pipelines |
| Default | Yes (UI and API) | Must be explicitly selected |
Standard mode is NOT available for notebooks. Notebooks always use Performance-Optimized.
| 标准 | Performance-Optimized | Standard |
|---|---|---|
| 启动时间 | <50秒 | 4-6分钟 |
| 成本 | 较高 | 显著更低 |
| 适用场景 | Notebooks、Jobs、SDP | 仅Jobs和SDP |
| 最佳用途 | 交互式工作、开发、时间敏感型任务 | 批处理ETL、调度管道 |
| 默认模式 | 是(UI和API) | 必须显式选择 |
Standard模式不适用于notebook。Notebook始终使用Performance-Optimized模式。
Serverless Defaults to Know
Serverless默认设置须知
| Setting | Value |
|---|---|
| REPL VM memory | 8GB default (high-memory option available) |
| Max executors | 32 (Premium), 64 (Enterprise) — raise via support |
| Supported Spark configs | 6 only (see Category D above) |
| Debugging | Query Profile (no Spark UI) |
| ANSI SQL | Enabled by default (configurable) |
| 设置 | 值 |
|---|---|
| REPL VM内存 | 默认8GB(提供高内存选项) |
| 最大执行器数 | 32(高级版)、64(企业版)——可通过支持渠道提升 |
| 支持的Spark配置 | 仅6种(见类别D) |
| 调试工具 | Query Profile(无Spark UI) |
| ANSI SQL | 默认启用(可配置) |
Failure Reporting Protocol
失败报告协议
When migration fails irrecoverably, generate a structured failure report to help improve the skill. This applies when:
- All retry attempts are exhausted (typically 5)
- An unknown pattern is encountered that isn't in the compatibility checks
- A fix was applied but didn't resolve the underlying issue
- The workload hits a Category 3 blocker the user wasn't aware of
当迁移无法恢复失败时,生成结构化失败报告以帮助改进本技能。适用于以下情况:
- 所有重试尝试已耗尽(通常5次)
- 遇到兼容性检查中未涵盖的未知模式
- 应用了修复方案但未解决根本问题
- 工作负载遇到用户未意识到的第三类阻碍因素
When to generate a report
何时生成报告
Generate a report at the end of a migration attempt if any of:
- and final status is FAILED
retry_count >= max_retries - A pattern was detected but no fix is available in the skill
- The user explicitly requests a failure report ()
/migration-report
如果出现以下任一情况,在迁移尝试结束时生成报告:
- 且最终状态为FAILED
retry_count >= max_retries - 检测到模式但技能中无可用修复方案
- 用户明确请求失败报告()
/migration-report
How to generate
如何生成
Write a JSON file to . Create the directory if it doesn't exist.
~/.databricks-migration-skill/reports/failure-<ISO-timestamp>.jsonSchema (strictly follow — no free-text code or identifiers):
json
{
"report_version": "1.0",
"report_id": "<uuid-v4>",
"skill_version": "<from SKILL.md frontmatter metadata.version>",
"timestamp": "<ISO 8601 UTC>",
"failure_phase": "analyze | migrate | test | validate",
"detected_patterns": [
{"category": "A", "pattern_id": "rdd_parallelize", "count": 3}
],
"attempted_fixes": [
{"pattern_id": "rdd_parallelize", "fix_applied": "<fix_id>", "attempt_number": 1, "outcome": "failed"}
],
"final_error_category": "unknown_api | missing_library | data_access | permission | custom_data_source | other",
"final_error_signature": "<SHA256 of top 3 stack frames, NOT the frames themselves>",
"retry_count": 5,
"total_duration_seconds": 245,
"notebook_characteristics": {
"lines_of_code": 180,
"language": "python | sql | scala | r",
"uses_streaming": false,
"uses_ml_libraries": true,
"databricks_runtime_source": "<DBR version only, no cluster identifiers>"
}
}将JSON文件写入。如果目录不存在则创建。
~/.databricks-migration-skill/reports/failure-<ISO-timestamp>.jsonSchema(严格遵循——不要使用自由文本代码或标识符):
json
{
"report_version": "1.0",
"report_id": "<uuid-v4>",
"skill_version": "<from SKILL.md frontmatter metadata.version>",
"timestamp": "<ISO 8601 UTC>",
"failure_phase": "analyze | migrate | test | validate",
"detected_patterns": [
{"category": "A", "pattern_id": "rdd_parallelize", "count": 3}
],
"attempted_fixes": [
{"pattern_id": "rdd_parallelize", "fix_applied": "<fix_id>", "attempt_number": 1, "outcome": "failed"}
],
"final_error_category": "unknown_api | missing_library | data_access | permission | custom_data_source | other",
"final_error_signature": "<SHA256 of top 3 stack frames, NOT the frames themselves>",
"retry_count": 5,
"total_duration_seconds": 245,
"notebook_characteristics": {
"lines_of_code": 180,
"language": "python | sql | scala | r",
"uses_streaming": false,
"uses_ml_libraries": true,
"databricks_runtime_source": "<DBR version only, no cluster identifiers>"
}
}What the report MUST NOT contain
报告中禁止包含的内容
This is a hard requirement — the report must be safe to share publicly on GitHub Issues:
- No code content — only pattern IDs from this skill's catalog (e.g., ), never actual code snippets
rdd_parallelize - No file paths — no notebook names, directory paths, or workspace URLs
- No error message text — only the error category enum and a hashed signature
- No identifiers — no table names, column names, catalog names, schema names, user emails, workspace IDs, or customer names
- No credentials — no secret scope names, API keys, or connection strings
- No data descriptions — no column value samples, row counts tied to specific tables, or data shape details beyond the fields
notebook_characteristics
这是硬性要求——报告必须可安全分享到GitHub Issues:
- 无代码内容——仅使用本技能目录中的模式ID(如),绝不使用实际代码片段
rdd_parallelize - 无文件路径——无notebook名称、目录路径或工作区URL
- 无错误消息文本——仅使用错误类别枚举和哈希签名
- 无标识符——无表名、列名、目录名、模式名、用户邮箱、工作区ID或客户名称
- 无凭证——无密钥范围名称、API密钥或连接字符串
- 无数据描述——无列值样本、与特定表关联的行数,或字段之外的数据形状细节
notebook_characteristics
After generating the report
生成报告后
Tell the user:
Migration failed after <N> attempts. A failure report has been generated at:
~/.databricks-migration-skill/reports/failure-<timestamp>.json
This report contains anonymized diagnostic data (detected patterns, error categories, retry count) and no code content or PII. You can:
1. Review the JSON to confirm no sensitive information is present
2. Share it via GitHub Issue to help improve the skill:
https://github.com/databricks/databricks-agent-skills/issues/new?template=migration-feedback.md
Submission is optional and opt-in. We use reports to prioritize new patterns and fix detection gaps.Never transmit the report automatically. The user owns their data and must review before sharing.
告知用户:
Migration failed after <N> attempts. A failure report has been generated at:
~/.databricks-migration-skill/reports/failure-<timestamp>.json
This report contains anonymized diagnostic data (detected patterns, error categories, retry count) and no code content or PII. You can:
1. Review the JSON to confirm no sensitive information is present
2. Share it via GitHub Issue to help improve the skill:
https://github.com/databricks/databricks-agent-skills/issues/new?template=migration-feedback.md
Submission is optional and opt-in. We use reports to prioritize new patterns and fix detection gaps.绝不自动传输报告。用户拥有其数据,必须先审核再分享。
Reference Guides
参考指南
For detailed workarounds and code examples beyond the quick fixes above:
- Compatibility Checks — Full pattern detection table with all 40+ checks
- Streaming Migration — Trigger migration, SDP continuous mode, continuous jobs
- Networking and Security — VPC peering to NCCs, Private Link, firewall setup
- Code Patterns — Complete before/after code examples for every migration pattern
- Configuration Guide — Supported Spark configs, Environments setup, budget policies
如需快速修复之外的详细解决方案和代码示例:
- 兼容性检查——包含40+项检查的完整模式检测表
- 流处理迁移——触发器迁移、SDP连续模式、连续任务
- 网络与安全——VPC peering转NCCs、Private Link、防火墙设置
- 代码模式——每种迁移模式的完整前后代码示例
- 配置指南——支持的Spark配置、Environments设置、预算策略
Documentation
官方文档
- Serverless compute overview: https://docs.databricks.com/en/compute/serverless/
- Migration guide: https://docs.databricks.com/en/compute/serverless/migration
- Limitations: https://docs.databricks.com/en/compute/serverless/limitations
- Best practices: https://docs.databricks.com/en/compute/serverless/best-practices
- Serverless notebooks: https://docs.databricks.com/en/compute/serverless/notebooks
- Serverless jobs: https://docs.databricks.com/en/jobs/run-serverless-jobs
- Serverless SDP: https://docs.databricks.com/en/ldp/serverless
- Spark Connect vs. classic: https://docs.databricks.com/en/spark/connect-vs-classic
- Unity Catalog upgrade: https://docs.databricks.com/en/data-governance/unity-catalog/upgrade/
- Supported Spark configs: https://docs.databricks.com/en/spark/conf#serverless
- Serverless compute概述:https://docs.databricks.com/en/compute/serverless/
- 迁移指南:https://docs.databricks.com/en/compute/serverless/migration
- 限制:https://docs.databricks.com/en/compute/serverless/limitations
- 最佳实践:https://docs.databricks.com/en/compute/serverless/best-practices
- Serverless notebooks:https://docs.databricks.com/en/compute/serverless/notebooks
- Serverless jobs:https://docs.databricks.com/en/jobs/run-serverless-jobs
- Serverless SDP:https://docs.databricks.com/en/ldp/serverless
- Spark Connect vs classic:https://docs.databricks.com/en/spark/connect-vs-classic
- Unity Catalog升级:https://docs.databricks.com/en/data-governance/unity-catalog/upgrade/
- 支持的Spark配置:https://docs.databricks.com/en/spark/conf#serverless