spark-authoring-cli

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
  • GitHub Copilot CLI / VS Code: invoke the
    check-updates
    skill.
  • Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
  • Skip if the check was already performed earlier in this session.
CRITICAL NOTES
  1. To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
  2. To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering
更新检查 — 每个会话仅需一次(必填) 在会话中首次使用此技能时,请先运行check-updates技能,再进行后续操作。
  • GitHub Copilot CLI / VS Code:调用
    check-updates
    技能。
  • Claude Code / Cowork / Cursor / Windsurf / Codex:对比本地与远程package.json版本。
  • 若本次会话中已提前完成检查,可跳过此步骤。
重要说明
  1. 从工作区名称查找工作区详情(包括其ID):列出所有工作区,然后使用JMESPath进行筛选
  2. 从工作区ID、项目类型和项目名称查找项目详情(包括其ID):列出该工作区中对应类型的所有项目,然后使用JMESPath进行筛选

Spark Authoring — CLI Skill

Spark 开发 — CLI技能

Table of Contents

目录

TaskReferenceNotes
RULES — Read these first, follow them alwaysSKILL.md § RULESMUST read — 3 rules for this skill
Finding Workspaces and Items in FabricCOMMON-CLI.md § Finding Workspaces and Items in FabricMandatoryREAD link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id]
Fabric Topology & Key ConceptsCOMMON-CORE.md § Fabric Topology & Key Concepts
Environment URLsCOMMON-CORE.md § Environment URLs
Authentication & Token AcquisitionCOMMON-CORE.md § Authentication & Token AcquisitionWrong audience = 401; read before any auth issue
Core Control-Plane REST APIsCOMMON-CORE.md § Core Control-Plane REST APIs
PaginationCOMMON-CORE.md § Pagination
Long-Running Operations (LRO)COMMON-CORE.md § Long-Running Operations (LRO)
Rate Limiting & ThrottlingCOMMON-CORE.md § Rate Limiting & Throttling
OneLake Data AccessCOMMON-CORE.md § OneLake Data AccessRequires
storage.azure.com
token, not Fabric token
Definition EnvelopeITEM-DEFINITIONS-CORE.md § Definition EnvelopeDefinition payload structure
Per-Item-Type DefinitionsITEM-DEFINITIONS-CORE.md § Per-Item-Type DefinitionsSupport matrix, decoded content, part paths — REST specs, CLI recipes
Job ExecutionCOMMON-CORE.md § Job Execution
Capacity ManagementCOMMON-CORE.md § Capacity Management
Gotchas & TroubleshootingCOMMON-CORE.md § Gotchas & Troubleshooting
Best PracticesCOMMON-CORE.md § Best Practices
Tool Selection RationaleCOMMON-CLI.md § Tool Selection Rationale
Authentication RecipesCOMMON-CLI.md § Authentication Recipes
az login
flows and token acquisition
Fabric Control-Plane API via
az rest
COMMON-CLI.md § Fabric Control-Plane API via az restAlways pass
--resource https://api.fabric.microsoft.com
or
az rest
fails
Pagination PatternCOMMON-CLI.md § Pagination Pattern
Long-Running Operations (LRO) PatternCOMMON-CLI.md § Long-Running Operations (LRO) Pattern
OneLake Data Access via
curl
COMMON-CLI.md § OneLake Data Access via curlUse
curl
not
az rest
(different token audience)
SQL / TDS Data-Plane AccessCOMMON-CLI.md § SQL / TDS Data-Plane Access
Job Execution (CLI)COMMON-CLI.md § Job Execution
Job SchedulingCOMMON-CLI.md § Job SchedulingURL is
/jobs/{jobType}/schedules
;
endDateTime
required
OneLake ShortcutsCOMMON-CLI.md § OneLake Shortcuts
Capacity Management (CLI)COMMON-CLI.md § Capacity Management
Composite RecipesCOMMON-CLI.md § Composite Recipes
Gotchas & Troubleshooting (CLI-Specific)COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific)
az rest
audience, shell escaping, token expiry
Quick Reference:
az rest
Template
COMMON-CLI.md § Quick Reference: az rest Template
Quick Reference: Token Audience / CLI Tool MatrixCOMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool MatrixWhich
--resource
+ tool for each service
Relationship to SPARK-CONSUMPTION-CORE.mdSPARK-AUTHORING-CORE.md § Relationship to SPARK-CONSUMPTION-CORE.md
Data Engineering Authoring Capability MatrixSPARK-AUTHORING-CORE.md § Data Engineering Authoring Capability Matrix
Lakehouse ManagementSPARK-AUTHORING-CORE.md § Lakehouse Management
Notebook ManagementSPARK-AUTHORING-CORE.md § Notebook Management
Notebook Execution & Job ManagementSPARK-AUTHORING-CORE.md § Notebook Execution & Job Management
CI/CD & Automation PatternsSPARK-AUTHORING-CORE.md § CI/CD & Automation Patterns
Infrastructure-as-CodeSPARK-AUTHORING-CORE.md § Infrastructure-as-Code
Performance Optimization & Resource ManagementSPARK-AUTHORING-CORE.md § Performance Optimization & Resource Management
Authoring Gotchas and TroubleshootingSPARK-AUTHORING-CORE.md § Authoring Gotchas and Troubleshooting
Quick Reference: Authoring Decision GuideSPARK-AUTHORING-CORE.md § Quick Reference: Authoring Decision Guide
Recommended Patterns (Data Engineering)data-engineering-patterns.md § Recommended patterns
Data Ingestion Principlesdata-engineering-patterns.md § Data Ingestion Principles
Transformation Patternsdata-engineering-patterns.md § Transformation Patterns
Delta Lake Best Practicesdata-engineering-patterns.md § Delta Lake Best Practices
Quality Assurance Strategiesdata-engineering-patterns.md § Quality Assurance Strategies
Recommended Patterns (Development Workflow)development-workflow.md § Recommended patterns
Notebook Lifecycledevelopment-workflow.md § Notebook Lifecycle
Parameterization Patternsdevelopment-workflow.md § Parameterization Patterns
Variable Library (notebook + pipeline usage)development-workflow.md § Method 4: Variable Library
getLibrary()
+ dot notation in notebooks;
libraryVariables
+
@pipeline().libraryVariables
in pipelines
Variable Library DefinitionITEM-DEFINITIONS-CORE.md § VariableLibraryDefinition parts, decoded content, types, pipeline mappings, gotchas
Local Testing Strategydevelopment-workflow.md § Local Testing Strategy
Debugging Patternsdevelopment-workflow.md § Debugging Patterns
Recommended Patterns (Infrastructure)infrastructure-orchestration.md § Recommended patterns
Workspace Provisioning Principlesinfrastructure-orchestration.md § Workspace Provisioning Principles
Lakehouse Configuration Guidanceinfrastructure-orchestration.md § Lakehouse Configuration Guidance
Pipeline Design Patternsinfrastructure-orchestration.md § Pipeline Design Patterns
CI/CD Integration Strategyinfrastructure-orchestration.md § CI/CD Integration Strategy
Notebook API — Which Endpoint to Usenotebook-api-operations.md § Quick DecisionStart here for remote notebook edits — getDefinition vs updateDefinition
Notebook Modification Workflownotebook-api-operations.md § WorkflowFive-step flow: retrieve, decode, modify, encode, upload
Notebook API Error Referencenotebook-api-operations.md § Error Reference411, 400 (updateMetadata), 401, 403 explained
Notebook API Gotchasnotebook-api-operations.md § Gotchas
/result
suffix, empty body,
\n
per-line rule,
format=ipynb
Default Lakehouse Bindingnotebook-api-operations.md § Default Lakehouse Binding
.ipynb
metadata vs
.py
# METADATA
block; discover IDs dynamically
Public URL Data Ingestionnotebook-api-operations.md § Public URL Data IngestionUse real source URL, stage into
Files/
, then read with Spark
getDefinition (read notebook content)notebook-api-operations.md § Step 1 — Retrieve Notebook ContentLRO flow,
?format=ipynb
, empty body (
--body '{}'
) requirement
Decode Base64 Notebook Payloadnotebook-api-operations.md § Step 2 — Decode the Notebook ContentExtract payload, base64 decode, ipynb JSON structure
Modify Notebook Cellsnotebook-api-operations.md § Step 3 — Modify the Notebook ContentFind cell, insert/replace lines,
\n
per-line rule
updateDefinition (write notebook content)notebook-api-operations.md § Step 4 — Re-encode and UploadRe-encode, upload, LRO poll, updateMetadata flag pitfall
Verify Notebook Update (Optional)notebook-api-operations.md § Step 5 — Verify the UpdateSkip unless you suspect a silent failure —
Succeeded
from updateDefinition is sufficient (see Rule 2)
Notebook API Error Referencenotebook-api-operations.md § Error Reference411, 400 (updateMetadata), 401, 403 explained
Notebook API End-to-End Scriptnotebook-api-operations.md § Complete End-to-End ScriptFull bash: get → decode → modify → encode → update → verify
Quick Start ExamplesSKILL.md § Quick Start ExamplesMinimal examples for common operations

任务参考文档说明
规则 — 请首先阅读并始终遵守SKILL.md § RULES必须阅读 — 本技能的3条规则
在Fabric中查找工作区和项目COMMON-CLI.md § 在Fabric中查找工作区和项目必填请先阅读链接内容 [根据名称查找工作区ID,或根据名称、项目类型和工作区ID查找项目ID时需用到]
Fabric拓扑结构与核心概念COMMON-CORE.md § Fabric拓扑结构与核心概念
环境URLCOMMON-CORE.md § 环境URL
认证与令牌获取COMMON-CORE.md § 认证与令牌获取受众错误会导致401;遇到认证问题前请先阅读
核心控制平面REST APICOMMON-CORE.md § 核心控制平面REST API
分页COMMON-CORE.md § 分页
长期运行操作(LRO)COMMON-CORE.md § 长期运行操作(LRO)
速率限制与限流COMMON-CORE.md § 速率限制与限流
OneLake数据访问COMMON-CORE.md § OneLake数据访问需要
storage.azure.com
令牌,而非Fabric令牌
定义信封ITEM-DEFINITIONS-CORE.md § 定义信封定义负载结构
按项目类型分类的定义ITEM-DEFINITIONS-CORE.md § 按项目类型分类的定义支持矩阵、解码内容、部分路径 — REST规范CLI示例
任务执行COMMON-CORE.md § 任务执行
容量管理COMMON-CORE.md § 容量管理
常见问题与故障排除COMMON-CORE.md § 常见问题与故障排除
最佳实践COMMON-CORE.md § 最佳实践
工具选择依据COMMON-CLI.md § 工具选择依据
认证示例COMMON-CLI.md § 认证示例
az login
流程与令牌获取
通过
az rest
调用Fabric控制平面API
COMMON-CLI.md § 通过az rest调用Fabric控制平面API必须传递
--resource https://api.fabric.microsoft.com
,否则
az rest
会失败
分页模式COMMON-CLI.md § 分页模式
长期运行操作(LRO)模式COMMON-CLI.md § 长期运行操作(LRO)模式
通过
curl
访问OneLake数据
COMMON-CLI.md § 通过curl访问OneLake数据使用
curl
而非
az rest
(令牌受众不同)
SQL / TDS数据平面访问COMMON-CLI.md § SQL / TDS数据平面访问
任务执行(CLI)COMMON-CLI.md § 任务执行
任务调度COMMON-CLI.md § 任务调度URL为
/jobs/{jobType}/schedules
;必须提供
endDateTime
OneLake快捷方式COMMON-CLI.md § OneLake快捷方式
容量管理(CLI)COMMON-CLI.md § 容量管理
复合示例COMMON-CLI.md § 复合示例
常见问题与故障排除(CLI专属)COMMON-CLI.md § 常见问题与故障排除(CLI专属)
az rest
受众、转义字符、令牌过期
快速参考:
az rest
模板
COMMON-CLI.md § 快速参考:az rest模板
快速参考:令牌受众/CLI工具矩阵COMMON-CLI.md § 快速参考:令牌受众↔CLI工具矩阵各服务对应的
--resource
参数与工具
与SPARK-CONSUMPTION-CORE.md的关联SPARK-AUTHORING-CORE.md § 与SPARK-CONSUMPTION-CORE.md的关联
数据工程开发能力矩阵SPARK-AUTHORING-CORE.md § 数据工程开发能力矩阵
Lakehouse管理SPARK-AUTHORING-CORE.md § Lakehouse管理
Notebook管理SPARK-AUTHORING-CORE.md § Notebook管理
Notebook执行与任务管理SPARK-AUTHORING-CORE.md § Notebook执行与任务管理
CI/CD与自动化模式SPARK-AUTHORING-CORE.md § CI/CD与自动化模式
基础设施即代码SPARK-AUTHORING-CORE.md § 基础设施即代码
性能优化与资源管理SPARK-AUTHORING-CORE.md § 性能优化与资源管理
开发常见问题与故障排除SPARK-AUTHORING-CORE.md § 开发常见问题与故障排除
快速参考:开发决策指南SPARK-AUTHORING-CORE.md § 快速参考:开发决策指南
推荐模式(数据工程)data-engineering-patterns.md § 推荐模式
数据导入原则data-engineering-patterns.md § 数据导入原则
转换模式data-engineering-patterns.md § 转换模式
Delta Lake最佳实践data-engineering-patterns.md § Delta Lake最佳实践
质量保障策略data-engineering-patterns.md § 质量保障策略
推荐模式(开发工作流)development-workflow.md § 推荐模式
Notebook生命周期development-workflow.md § Notebook生命周期
参数化模式development-workflow.md § 参数化模式
变量库(Notebook + 管道使用)development-workflow.md § 方法4:变量库Notebook中使用
getLibrary()
+ 点标记法;管道中使用
libraryVariables
+
@pipeline().libraryVariables
变量库定义ITEM-DEFINITIONS-CORE.md § VariableLibrary定义部分、解码内容、类型、管道映射、常见问题
本地测试策略development-workflow.md § 本地测试策略
调试模式development-workflow.md § 调试模式
推荐模式(基础设施)infrastructure-orchestration.md § 推荐模式
工作区配置原则infrastructure-orchestration.md § 工作区配置原则
Lakehouse配置指南infrastructure-orchestration.md § Lakehouse配置指南
管道设计模式infrastructure-orchestration.md § 管道设计模式
CI/CD集成策略infrastructure-orchestration.md § CI/CD集成策略
Notebook API — 选择哪个端点notebook-api-operations.md § 快速决策远程编辑Notebook请从此处开始 — getDefinition与updateDefinition对比
Notebook修改工作流notebook-api-operations.md § 工作流五步流程:检索、解码、修改、编码、上传
Notebook API错误参考notebook-api-operations.md § 错误参考解释411、400(updateMetadata)、401、403错误
Notebook API常见问题notebook-api-operations.md § 常见问题
/result
后缀、空请求体、每行
\n
规则、
format=ipynb
默认Lakehouse绑定notebook-api-operations.md § 默认Lakehouse绑定
.ipynb
元数据与
.py
文件中的
# METADATA
块;动态发现ID
公共URL数据导入notebook-api-operations.md § 公共URL数据导入(Spark)使用真实源URL,暂存至
Files/
,再通过Spark读取
getDefinition(读取Notebook内容)notebook-api-operations.md § 步骤1 — 检索Notebook内容LRO流程、
?format=ipynb
、需传递空请求体(
--body '{}'
)
解码Base64格式的Notebook负载notebook-api-operations.md § 步骤2 — 解码Notebook内容提取负载、Base64解码、ipynb JSON结构
修改Notebook单元格notebook-api-operations.md § 步骤3 — 修改Notebook内容查找单元格、插入/替换行、每行
\n
规则
updateDefinition(写入Notebook内容)notebook-api-operations.md § 步骤4 — 重新编码并上传重新编码、上传、轮询LRO状态、updateMetadata标志陷阱
验证Notebook更新(可选)notebook-api-operations.md § 步骤5 — 验证更新除非怀疑静默失败,否则可跳过 — updateDefinition返回的
Succeeded
已足够(请参见规则2)
Notebook API错误参考notebook-api-operations.md § 错误参考解释411、400(updateMetadata)、401、403错误
Notebook API端到端脚本notebook-api-operations.md § 完整端到端脚本完整bash脚本:获取→解码→修改→编码→更新→验证
快速入门示例SKILL.md § 快速入门示例常见操作的最简示例

Must/Prefer/Avoid

必须执行/推荐执行/避免执行

MUST DO

必须执行

  • Check for recent jobs BEFORE creating new notebook runs — Query job instances from last 5 minutes; if recent job exists, monitor it instead of creating duplicate
  • Capture job instance ID immediately after POST — Store job ID before any other operations to enable proper monitoring
  • Verify workspace capacity assignment before operations — Workspace must have capacity assigned and active
  • When user provides a public data URL, follow the Public URL Data Ingestion policy — keep detailed behavior in the linked resource section to avoid drift/duplication
  • Format notebook cells correctly — Each line in cell source array MUST end with
    \n
    to prevent code merging
  • Use correct Livy session body format — Send a FLAT JSON with
    name
    ,
    driverMemory
    ,
    driverCores
    ,
    executorMemory
    ,
    executorCores
    . Do NOT wrap in
    {"payload": ...}
    or send only
    {"kind": "pyspark"}
    — that causes HTTP 500. Use valid memory values (28g, 56g, 112g, 224g). See Create Livy Session example below and SPARK-CONSUMPTION-CORE.md.
  • 创建新Notebook运行前检查近期任务 — 查询过去5分钟内的任务实例;若存在近期任务,请监控该任务而非创建重复任务
  • POST请求后立即捕获任务实例ID — 在执行任何其他操作前存储任务ID,以便进行正确监控
  • 操作前验证工作区容量分配 — 工作区必须已分配且激活容量
  • 当用户提供公共数据URL时,请遵循公共URL数据导入策略 — 详细操作请参考链接资源部分,避免流程偏差或重复
  • 正确格式化Notebook单元格 — 单元格源数组中的每一行必须以
    \n
    结尾,防止代码合并
  • 使用正确的Livy会话请求体格式 — 发送包含
    name
    driverMemory
    driverCores
    executorMemory
    executorCores
    的扁平JSON。请勿包裹在
    {"payload": ...}
    中,也不要仅发送
    {"kind": "pyspark"}
    — 这会导致HTTP 500错误。请使用有效的内存值(28g、56g、112g、224g)。请查看下方的创建Livy会话示例及SPARK-CONSUMPTION-CORE.md文档。

PREFER

推荐执行

  • Poll job status with proper intervals — 10-30 seconds between polls; timeout after reasonable duration (e.g., 30 minutes)
  • Check job history when POST response is unreadable — If POST returns "No Content" or unreadable response, query recent jobs (last 1 minute) before retrying
  • Use Starter Pool for development — Development/testing workloads should use
    useStarterPool: true
  • Use Workspace Pool for production — Production workloads need consistent performance with
    useWorkspacePool: true
  • Enable lakehouse schemas during creation — Set
    creationPayload.enableSchemas: true
    for better table organization
  • Implement idempotency checks — Prevent duplicate operations by checking existing state first
  • 按合理间隔轮询任务状态 — 轮询间隔为10-30秒;在合理时长后超时(例如30分钟)
  • 当POST响应不可读时检查任务历史 — 若POST返回"无内容"或不可读响应,请在重试前查询最近1分钟内的任务
  • 开发环境使用Starter Pool — 开发/测试工作负载应设置
    useStarterPool: true
  • 生产环境使用Workspace Pool — 生产工作负载需要稳定性能,应设置
    useWorkspacePool: true
  • 创建Lakehouse时启用架构 — 设置
    creationPayload.enableSchemas: true
    以优化表组织
  • 实现幂等性检查 — 先检查现有状态,避免重复操作

AVOID

避免执行

  • Never retry POST with same parameters — If you have a job ID, only use GET to check status; don't create duplicate job instances
  • Don't skip capacity verification — Operations will fail if workspace capacity is paused or unassigned
  • Avoid immediate POST retries on failures — Check for existing/active jobs first to prevent duplicates
  • Don't create new runs if monitoring existing job — One job at a time; wait for completion before submitting new runs
  • Don't hardcode workspace/lakehouse IDs — Discover dynamically via item listing or catalog search APIs

  • 切勿使用相同参数重试POST请求 — 若已有任务ID,仅使用GET请求检查状态;请勿创建重复任务实例
  • 不要跳过容量验证 — 若工作区容量已暂停或未分配,操作会失败
  • 失败后避免立即重试POST请求 — 先检查是否存在/活跃任务,防止重复创建
  • 监控现有任务时请勿创建新运行实例 — 一次仅运行一个任务;等待任务完成后再提交新任务
  • 不要硬编码工作区/Lakehouse ID — 通过项目列表或目录搜索API动态发现

RULES — Read these first, follow them always

规则 — 请首先阅读并始终遵守

Rule 1 — Validate prerequisites before operations. Verify workspace has capacity assigned (see COMMON-CORE.md Create Workspace and Capacity Management) and resource IDs exist before attempting operations.
Rule 2 — Trust updateDefinition success. A
Succeeded
poll result from
updateDefinition
is sufficient confirmation that content and lakehouse bindings persisted. Do NOT call
getDefinition
after every upload — it is an async LRO that adds significant latency. Only use
getDefinition
for its intended purpose: reading current notebook content before making modifications.
Rule 3 — Prevent duplicate jobs and monitor execution properly. Before submitting new notebook run, ALWAYS check for recent job instances first (last 5 minutes). If recent job exists, monitor it instead of creating duplicate. After submission, capture job instance ID immediately and poll status - never retry POST. See SPARK-AUTHORING-CORE.md Job Monitoring for patterns.

规则1 — 操作前验证前置条件。 验证工作区已分配容量(请参见COMMON-CORE.md中的创建工作区和容量管理部分),且资源ID已存在,再执行操作。
规则2 — 信任updateDefinition的成功结果。
updateDefinition
返回的轮询结果
Succeeded
已足够确认内容和Lakehouse绑定已持久化。请勿每次上传后都调用
getDefinition
— 这是异步LRO操作,会增加显著延迟。仅在修改前读取当前Notebook内容时,才使用
getDefinition
的预期用途。
规则3 — 防止重复任务并正确监控执行状态。 提交新Notebook运行前,务必先检查最近5分钟内的任务实例。若存在近期任务,请监控该任务而非创建重复任务。提交后立即捕获任务实例ID并轮询状态 — 切勿重试POST请求。请参见SPARK-AUTHORING-CORE.md中的任务监控模式。

Quick Start Examples

快速入门示例

For detailed patterns, authentication, and comprehensive API usage, see:
  • COMMON-CORE.md — Fabric REST API patterns, authentication, item discovery
  • COMMON-CLI.md
    az rest
    usage, environment detection, token acquisition
  • SPARK-AUTHORING-CORE.md — Notebook deployment, lakehouse creation, job execution
Below are minimal quick-start examples. Always reference the COMMON- files for production use.*
如需详细模式、认证方式和全面API用法,请查看:
  • COMMON-CORE.md — Fabric REST API模式、认证、项目发现
  • COMMON-CLI.md
    az rest
    用法、环境检测、令牌获取
  • SPARK-AUTHORING-CORE.md — Notebook部署、Lakehouse创建、任务执行
以下是最简快速入门示例。*生产环境使用请务必参考COMMON-系列文档。

Create Workspace & Lakehouse

创建工作区与Lakehouse

bash
undefined
bash
undefined

See COMMON-CORE.md Environment URLs and SPARK-AUTHORING-CORE.md for full patterns

完整模式请参见COMMON-CORE.md环境URL和SPARK-AUTHORING-CORE.md

cat > /tmp/body.json << 'EOF' {"displayName": "DataEng-Dev"} EOF workspace_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces"
--body @/tmp/body.json --query "id" --output tsv)
cat > /tmp/body.json << 'EOF' {"displayName": "DevLakehouse", "type": "Lakehouse", "creationPayload": {"enableSchemas": true}} EOF lakehouse_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/items"
--body @/tmp/body.json --query "id" --output tsv)
undefined
cat > /tmp/body.json << 'EOF' {"displayName": "DataEng-Dev"} EOF workspace_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces"
--body @/tmp/body.json --query "id" --output tsv)
cat > /tmp/body.json << 'EOF' {"displayName": "DevLakehouse", "type": "Lakehouse", "creationPayload": {"enableSchemas": true}} EOF lakehouse_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/items"
--body @/tmp/body.json --query "id" --output tsv)
undefined

Organize Lakehouse Tables with Schemas

使用架构组织Lakehouse表

python
undefined
python
undefined

See SPARK-AUTHORING-CORE.md Lakehouse Schema Organization for table organization patterns

表组织模式请参见SPARK-AUTHORING-CORE.md中的Lakehouse架构组织部分

Create schemas for medallion architecture

为medallion架构创建模式

spark.sql("CREATE SCHEMA IF NOT EXISTS bronze") spark.sql("CREATE SCHEMA IF NOT EXISTS silver") spark.sql("CREATE SCHEMA IF NOT EXISTS gold")
undefined
spark.sql("CREATE SCHEMA IF NOT EXISTS bronze") spark.sql("CREATE SCHEMA IF NOT EXISTS silver") spark.sql("CREATE SCHEMA IF NOT EXISTS gold")
undefined

Create Livy Session

创建Livy会话

bash
undefined
bash
undefined

See SPARK-CONSUMPTION-CORE.md for Livy session configuration and management

Livy会话配置与管理请参见SPARK-CONSUMPTION-CORE.md

IMPORTANT: Body MUST be flat JSON with memory/cores — do NOT wrap in {"payload": ...}

重要提示:请求体必须是包含内存/核心参数的扁平JSON — 请勿包裹在{"payload": ...}中

cat > /tmp/body.json << 'EOF' {"name": "dev-session", "driverMemory": "56g", "driverCores": 8, "executorMemory": "56g", "executorCores": 8, "conf": {"spark.dynamicAllocation.enabled": "true", "spark.fabric.pool.name": "Starter Pool"}} EOF az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/lakehouses/$lakehouse_id/livyapi/versions/2023-12-01/sessions"
--body @/tmp/body.json

> **Livy Session Body — Common Mistakes**
> - ❌ `{"payload": {"kind": "pyspark"}}` → HTTP 500 (wrong wrapper, missing required fields)
> - ❌ `{"kind": "pyspark"}` → HTTP 500 (missing `driverMemory`, `executorMemory`, etc.)
> - ✅ Flat JSON with `name`, `driverMemory`, `driverCores`, `executorMemory`, `executorCores` (and optionally `conf` with Starter Pool)
cat > /tmp/body.json << 'EOF' {"name": "dev-session", "driverMemory": "56g", "driverCores": 8, "executorMemory": "56g", "executorCores": 8, "conf": {"spark.dynamicAllocation.enabled": "true", "spark.fabric.pool.name": "Starter Pool"}} EOF az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/lakehouses/$lakehouse_id/livyapi/versions/2023-12-01/sessions"
--body @/tmp/body.json

> **Livy会话请求体 — 常见错误**
> - ❌ `{"payload": {"kind": "pyspark"}}` → HTTP 500(错误的包裹方式,缺少必填字段)
> - ❌ `{"kind": "pyspark"}` → HTTP 500(缺少`driverMemory`、`executorMemory`等字段)
> - ✅ 包含`name`、`driverMemory`、`driverCores`、`executorMemory`、`executorCores`的扁平JSON(可选添加包含Starter Pool的`conf`)

Spark Performance Configs

Spark性能配置

For detailed workload-specific configurations, see data-engineering-patterns.md Delta Lake Best Practices.
Quick reference:
python
undefined
如需针对特定工作负载的详细配置,请参见data-engineering-patterns.md中的Delta Lake最佳实践。
快速参考:
python
undefined

Write-heavy (Bronze): Disable V-Order, enable autoCompact

写入密集型(Bronze层):禁用V-Order,启用autoCompact

Balanced (Silver): Enable V-Order, adaptive execution

均衡型(Silver层):启用V-Order,自适应执行

Read-heavy (Gold): Vectorized reads, optimal parallelism

读取密集型(Gold层):向量化读取,优化并行度

See data-engineering-patterns.md for complete config tables

完整配置表请参见data-engineering-patterns.md

undefined
undefined

Variable Library in Notebooks

Notebook中的变量库

Use a Variable Library to centralize lakehouse names, workspace IDs, and feature flags.
python
undefined
使用变量库集中管理Lakehouse名称、工作区ID和功能标志。
python
undefined

✅ CORRECT — getLibrary() + dot notation

✅ 正确用法 — getLibrary() + 点标记法

lib = notebookutils.variableLibrary.getLibrary("MyConfig") lakehouse_name = lib.lakehouse_name enable_logging = lib.enable_logging # returns string "true"/"false"
lib = notebookutils.variableLibrary.getLibrary("MyConfig") lakehouse_name = lib.lakehouse_name enable_logging = lib.enable_logging # 返回字符串"true"/"false"

Boolean: compare as string (bool("false") is True in Python!)

布尔值判断:需与字符串比较(Python中bool("false")为True!)

if enable_logging.lower() == "true": print("Logging enabled")
if enable_logging.lower() == "true": print("Logging enabled")

❌ WRONG — .get() does not exist, causes runtime failure

❌ 错误用法 — .get()方法不存在,会导致运行时失败

notebookutils.variableLibrary.get("MyConfig", "lakehouse_name")

notebookutils.variableLibrary.get("MyConfig", "lakehouse_name")


---

**Focus**: Essential CLI patterns for Spark/data engineering development with intelligent routing to specialized resources. For comprehensive patterns, always reference COMMON-* files and resource documents.

---

**核心内容**:用于Spark/数据工程开发的关键CLI模式,可智能路由至专用资源。如需全面模式,请始终参考COMMON-*系列文档和资源文档。