spark-authoring-cli

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Update Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
GitHub Copilot CLI / VS Code: invoke the
check-updates
skill.
Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.

Skip if the check was already performed earlier in this session.

CRITICAL NOTES

To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering

To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering

更新检查 — 每个会话仅需一次（必填） 在会话中首次使用此技能时，请先运行check-updates技能，再进行后续操作。
GitHub Copilot CLI / VS Code：调用
check-updates
技能。
Claude Code / Cowork / Cursor / Windsurf / Codex：对比本地与远程package.json版本。

若本次会话中已提前完成检查，可跳过此步骤。

重要说明

从工作区名称查找工作区详情（包括其ID）：列出所有工作区，然后使用JMESPath进行筛选

从工作区ID、项目类型和项目名称查找项目详情（包括其ID）：列出该工作区中对应类型的所有项目，然后使用JMESPath进行筛选

Spark Authoring — CLI Skill

Spark 开发 — CLI技能

Task	Reference	Notes
RULES — Read these first, follow them always	SKILL.md § RULES	MUST read — 3 rules for this skill
Finding Workspaces and Items in Fabric	COMMON-CLI.md § Finding Workspaces and Items in Fabric	Mandatory — READ link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id]
Fabric Topology & Key Concepts	COMMON-CORE.md § Fabric Topology & Key Concepts
Environment URLs	COMMON-CORE.md § Environment URLs
Authentication & Token Acquisition	COMMON-CORE.md § Authentication & Token Acquisition	Wrong audience = 401; read before any auth issue
Core Control-Plane REST APIs	COMMON-CORE.md § Core Control-Plane REST APIs
Pagination	COMMON-CORE.md § Pagination
Long-Running Operations (LRO)	COMMON-CORE.md § Long-Running Operations (LRO)
Rate Limiting & Throttling	COMMON-CORE.md § Rate Limiting & Throttling
OneLake Data Access	COMMON-CORE.md § OneLake Data Access	Requires `storage.azure.com` token, not Fabric token
Definition Envelope	ITEM-DEFINITIONS-CORE.md § Definition Envelope	Definition payload structure
Per-Item-Type Definitions	ITEM-DEFINITIONS-CORE.md § Per-Item-Type Definitions	Support matrix, decoded content, part paths — REST specs, CLI recipes
Job Execution	COMMON-CORE.md § Job Execution
Capacity Management	COMMON-CORE.md § Capacity Management
Gotchas & Troubleshooting	COMMON-CORE.md § Gotchas & Troubleshooting
Best Practices	COMMON-CORE.md § Best Practices
Tool Selection Rationale	COMMON-CLI.md § Tool Selection Rationale
Authentication Recipes	COMMON-CLI.md § Authentication Recipes	`az login` flows and token acquisition
Fabric Control-Plane API via `az rest`	COMMON-CLI.md § Fabric Control-Plane API via az rest	Always pass `--resource https://api.fabric.microsoft.com` or `az rest` fails
Pagination Pattern	COMMON-CLI.md § Pagination Pattern
Long-Running Operations (LRO) Pattern	COMMON-CLI.md § Long-Running Operations (LRO) Pattern
OneLake Data Access via `curl`	COMMON-CLI.md § OneLake Data Access via curl	Use `curl` not `az rest` (different token audience)
SQL / TDS Data-Plane Access	COMMON-CLI.md § SQL / TDS Data-Plane Access
Job Execution (CLI)	COMMON-CLI.md § Job Execution
Job Scheduling	COMMON-CLI.md § Job Scheduling	URL is `/jobs/{jobType}/schedules` ; `endDateTime` required
OneLake Shortcuts	COMMON-CLI.md § OneLake Shortcuts
Capacity Management (CLI)	COMMON-CLI.md § Capacity Management
Composite Recipes	COMMON-CLI.md § Composite Recipes
Gotchas & Troubleshooting (CLI-Specific)	COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific)	`az rest` audience, shell escaping, token expiry
Quick Reference: `az rest` Template	COMMON-CLI.md § Quick Reference: az rest Template
Quick Reference: Token Audience / CLI Tool Matrix	COMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool Matrix	Which `--resource` + tool for each service
Relationship to SPARK-CONSUMPTION-CORE.md	SPARK-AUTHORING-CORE.md § Relationship to SPARK-CONSUMPTION-CORE.md
Data Engineering Authoring Capability Matrix	SPARK-AUTHORING-CORE.md § Data Engineering Authoring Capability Matrix
Lakehouse Management	SPARK-AUTHORING-CORE.md § Lakehouse Management
Notebook Management	SPARK-AUTHORING-CORE.md § Notebook Management
Notebook Execution & Job Management	SPARK-AUTHORING-CORE.md § Notebook Execution & Job Management
CI/CD & Automation Patterns	SPARK-AUTHORING-CORE.md § CI/CD & Automation Patterns
Infrastructure-as-Code	SPARK-AUTHORING-CORE.md § Infrastructure-as-Code
Performance Optimization & Resource Management	SPARK-AUTHORING-CORE.md § Performance Optimization & Resource Management
Authoring Gotchas and Troubleshooting	SPARK-AUTHORING-CORE.md § Authoring Gotchas and Troubleshooting
Quick Reference: Authoring Decision Guide	SPARK-AUTHORING-CORE.md § Quick Reference: Authoring Decision Guide
Recommended Patterns (Data Engineering)	data-engineering-patterns.md § Recommended patterns
Data Ingestion Principles	data-engineering-patterns.md § Data Ingestion Principles
Transformation Patterns	data-engineering-patterns.md § Transformation Patterns
Delta Lake Best Practices	data-engineering-patterns.md § Delta Lake Best Practices
Quality Assurance Strategies	data-engineering-patterns.md § Quality Assurance Strategies
Recommended Patterns (Development Workflow)	development-workflow.md § Recommended patterns
Notebook Lifecycle	development-workflow.md § Notebook Lifecycle
Parameterization Patterns	development-workflow.md § Parameterization Patterns
Variable Library (notebook + pipeline usage)	development-workflow.md § Method 4: Variable Library	`getLibrary()` + dot notation in notebooks; `libraryVariables` + `@pipeline().libraryVariables` in pipelines
Variable Library Definition	ITEM-DEFINITIONS-CORE.md § VariableLibrary	Definition parts, decoded content, types, pipeline mappings, gotchas
Local Testing Strategy	development-workflow.md § Local Testing Strategy
Debugging Patterns	development-workflow.md § Debugging Patterns
Recommended Patterns (Infrastructure)	infrastructure-orchestration.md § Recommended patterns
Workspace Provisioning Principles	infrastructure-orchestration.md § Workspace Provisioning Principles
Lakehouse Configuration Guidance	infrastructure-orchestration.md § Lakehouse Configuration Guidance
Pipeline Design Patterns	infrastructure-orchestration.md § Pipeline Design Patterns
CI/CD Integration Strategy	infrastructure-orchestration.md § CI/CD Integration Strategy
Notebook API — Which Endpoint to Use	notebook-api-operations.md § Quick Decision	Start here for remote notebook edits — getDefinition vs updateDefinition
Notebook Modification Workflow	notebook-api-operations.md § Workflow	Five-step flow: retrieve, decode, modify, encode, upload
Notebook API Error Reference	notebook-api-operations.md § Error Reference	411, 400 (updateMetadata), 401, 403 explained
Notebook API Gotchas	notebook-api-operations.md § Gotchas	`/result` suffix, empty body, `\n` per-line rule, `format=ipynb`
Default Lakehouse Binding	notebook-api-operations.md § Default Lakehouse Binding	`.ipynb` metadata vs `.py` `# METADATA` block; discover IDs dynamically
Public URL Data Ingestion	notebook-api-operations.md § Public URL Data Ingestion	Use real source URL, stage into `Files/` , then read with Spark
getDefinition (read notebook content)	notebook-api-operations.md § Step 1 — Retrieve Notebook Content	LRO flow, `?format=ipynb` , empty body ( `--body '{}'` ) requirement
Decode Base64 Notebook Payload	notebook-api-operations.md § Step 2 — Decode the Notebook Content	Extract payload, base64 decode, ipynb JSON structure
Modify Notebook Cells	notebook-api-operations.md § Step 3 — Modify the Notebook Content	Find cell, insert/replace lines, `\n` per-line rule
updateDefinition (write notebook content)	notebook-api-operations.md § Step 4 — Re-encode and Upload	Re-encode, upload, LRO poll, updateMetadata flag pitfall
Verify Notebook Update (Optional)	notebook-api-operations.md § Step 5 — Verify the Update	Skip unless you suspect a silent failure — `Succeeded` from updateDefinition is sufficient (see Rule 2)
Notebook API Error Reference	notebook-api-operations.md § Error Reference	411, 400 (updateMetadata), 401, 403 explained
Notebook API End-to-End Script	notebook-api-operations.md § Complete End-to-End Script	Full bash: get → decode → modify → encode → update → verify
Quick Start Examples	SKILL.md § Quick Start Examples	Minimal examples for common operations

任务	参考文档	说明
规则 — 请首先阅读并始终遵守	SKILL.md § RULES	必须阅读 — 本技能的3条规则
在Fabric中查找工作区和项目	COMMON-CLI.md § 在Fabric中查找工作区和项目	必填 — 请先阅读链接内容 [根据名称查找工作区ID，或根据名称、项目类型和工作区ID查找项目ID时需用到]
Fabric拓扑结构与核心概念	COMMON-CORE.md § Fabric拓扑结构与核心概念
环境URL	COMMON-CORE.md § 环境URL
认证与令牌获取	COMMON-CORE.md § 认证与令牌获取	受众错误会导致401；遇到认证问题前请先阅读
核心控制平面REST API	COMMON-CORE.md § 核心控制平面REST API
分页	COMMON-CORE.md § 分页
长期运行操作（LRO）	COMMON-CORE.md § 长期运行操作（LRO）
速率限制与限流	COMMON-CORE.md § 速率限制与限流
OneLake数据访问	COMMON-CORE.md § OneLake数据访问	需要 `storage.azure.com` 令牌，而非Fabric令牌
定义信封	ITEM-DEFINITIONS-CORE.md § 定义信封	定义负载结构
按项目类型分类的定义	ITEM-DEFINITIONS-CORE.md § 按项目类型分类的定义	支持矩阵、解码内容、部分路径 — REST规范、CLI示例
任务执行	COMMON-CORE.md § 任务执行
容量管理	COMMON-CORE.md § 容量管理
常见问题与故障排除	COMMON-CORE.md § 常见问题与故障排除
最佳实践	COMMON-CORE.md § 最佳实践
工具选择依据	COMMON-CLI.md § 工具选择依据
认证示例	COMMON-CLI.md § 认证示例	`az login` 流程与令牌获取
通过 `az rest` 调用Fabric控制平面API	COMMON-CLI.md § 通过az rest调用Fabric控制平面API	必须传递 `--resource https://api.fabric.microsoft.com` ，否则 `az rest` 会失败
分页模式	COMMON-CLI.md § 分页模式
长期运行操作（LRO）模式	COMMON-CLI.md § 长期运行操作（LRO）模式
通过 `curl` 访问OneLake数据	COMMON-CLI.md § 通过curl访问OneLake数据	使用 `curl` 而非 `az rest` （令牌受众不同）
SQL / TDS数据平面访问	COMMON-CLI.md § SQL / TDS数据平面访问
任务执行（CLI）	COMMON-CLI.md § 任务执行
任务调度	COMMON-CLI.md § 任务调度	URL为 `/jobs/{jobType}/schedules` ；必须提供 `endDateTime`
OneLake快捷方式	COMMON-CLI.md § OneLake快捷方式
容量管理（CLI）	COMMON-CLI.md § 容量管理
复合示例	COMMON-CLI.md § 复合示例
常见问题与故障排除（CLI专属）	COMMON-CLI.md § 常见问题与故障排除（CLI专属）	`az rest` 受众、转义字符、令牌过期
快速参考： `az rest` 模板	COMMON-CLI.md § 快速参考：az rest模板
快速参考：令牌受众/CLI工具矩阵	COMMON-CLI.md § 快速参考：令牌受众↔CLI工具矩阵	各服务对应的 `--resource` 参数与工具
与SPARK-CONSUMPTION-CORE.md的关联	SPARK-AUTHORING-CORE.md § 与SPARK-CONSUMPTION-CORE.md的关联
数据工程开发能力矩阵	SPARK-AUTHORING-CORE.md § 数据工程开发能力矩阵
Lakehouse管理	SPARK-AUTHORING-CORE.md § Lakehouse管理
Notebook管理	SPARK-AUTHORING-CORE.md § Notebook管理
Notebook执行与任务管理	SPARK-AUTHORING-CORE.md § Notebook执行与任务管理
CI/CD与自动化模式	SPARK-AUTHORING-CORE.md § CI/CD与自动化模式
基础设施即代码	SPARK-AUTHORING-CORE.md § 基础设施即代码
性能优化与资源管理	SPARK-AUTHORING-CORE.md § 性能优化与资源管理
开发常见问题与故障排除	SPARK-AUTHORING-CORE.md § 开发常见问题与故障排除
快速参考：开发决策指南	SPARK-AUTHORING-CORE.md § 快速参考：开发决策指南
推荐模式（数据工程）	data-engineering-patterns.md § 推荐模式
数据导入原则	data-engineering-patterns.md § 数据导入原则
转换模式	data-engineering-patterns.md § 转换模式
Delta Lake最佳实践	data-engineering-patterns.md § Delta Lake最佳实践
质量保障策略	data-engineering-patterns.md § 质量保障策略
推荐模式（开发工作流）	development-workflow.md § 推荐模式
Notebook生命周期	development-workflow.md § Notebook生命周期
参数化模式	development-workflow.md § 参数化模式
变量库（Notebook + 管道使用）	development-workflow.md § 方法4：变量库	Notebook中使用 `getLibrary()` + 点标记法；管道中使用 `libraryVariables` + `@pipeline().libraryVariables`
变量库定义	ITEM-DEFINITIONS-CORE.md § VariableLibrary	定义部分、解码内容、类型、管道映射、常见问题
本地测试策略	development-workflow.md § 本地测试策略
调试模式	development-workflow.md § 调试模式
推荐模式（基础设施）	infrastructure-orchestration.md § 推荐模式
工作区配置原则	infrastructure-orchestration.md § 工作区配置原则
Lakehouse配置指南	infrastructure-orchestration.md § Lakehouse配置指南
管道设计模式	infrastructure-orchestration.md § 管道设计模式
CI/CD集成策略	infrastructure-orchestration.md § CI/CD集成策略
Notebook API — 选择哪个端点	notebook-api-operations.md § 快速决策	远程编辑Notebook请从此处开始 — getDefinition与updateDefinition对比
Notebook修改工作流	notebook-api-operations.md § 工作流	五步流程：检索、解码、修改、编码、上传
Notebook API错误参考	notebook-api-operations.md § 错误参考	解释411、400（updateMetadata）、401、403错误
Notebook API常见问题	notebook-api-operations.md § 常见问题	`/result` 后缀、空请求体、每行 `\n` 规则、 `format=ipynb`
默认Lakehouse绑定	notebook-api-operations.md § 默认Lakehouse绑定	`.ipynb` 元数据与 `.py` 文件中的 `# METADATA` 块；动态发现ID
公共URL数据导入	notebook-api-operations.md § 公共URL数据导入（Spark）	使用真实源URL，暂存至 `Files/` ，再通过Spark读取
getDefinition（读取Notebook内容）	notebook-api-operations.md § 步骤1 — 检索Notebook内容	LRO流程、 `?format=ipynb` 、需传递空请求体( `--body '{}'` )
解码Base64格式的Notebook负载	notebook-api-operations.md § 步骤2 — 解码Notebook内容	提取负载、Base64解码、ipynb JSON结构
修改Notebook单元格	notebook-api-operations.md § 步骤3 — 修改Notebook内容	查找单元格、插入/替换行、每行 `\n` 规则
updateDefinition（写入Notebook内容）	notebook-api-operations.md § 步骤4 — 重新编码并上传	重新编码、上传、轮询LRO状态、updateMetadata标志陷阱
验证Notebook更新（可选）	notebook-api-operations.md § 步骤5 — 验证更新	除非怀疑静默失败，否则可跳过 — updateDefinition返回的 `Succeeded` 已足够（请参见规则2）
Notebook API错误参考	notebook-api-operations.md § 错误参考	解释411、400（updateMetadata）、401、403错误
Notebook API端到端脚本	notebook-api-operations.md § 完整端到端脚本	完整bash脚本：获取→解码→修改→编码→更新→验证
快速入门示例	SKILL.md § 快速入门示例	常见操作的最简示例

Must/Prefer/Avoid

必须执行/推荐执行/避免执行

MUST DO

必须执行

Check for recent jobs BEFORE creating new notebook runs — Query job instances from last 5 minutes; if recent job exists, monitor it instead of creating duplicate
Capture job instance ID immediately after POST — Store job ID before any other operations to enable proper monitoring
Verify workspace capacity assignment before operations — Workspace must have capacity assigned and active
When user provides a public data URL, follow the Public URL Data Ingestion policy — keep detailed behavior in the linked resource section to avoid drift/duplication
Format notebook cells correctly — Each line in cell source array MUST end with
```
\n
```
to prevent code merging
Use correct Livy session body format — Send a FLAT JSON with
```
name
```
,
```
driverMemory
```
,
```
driverCores
```
,
```
executorMemory
```
,
```
executorCores
```
. Do NOT wrap in
```
{"payload": ...}
```
or send only
```
{"kind": "pyspark"}
```
— that causes HTTP 500. Use valid memory values (28g, 56g, 112g, 224g). See Create Livy Session example below and SPARK-CONSUMPTION-CORE.md.

创建新Notebook运行前检查近期任务 — 查询过去5分钟内的任务实例；若存在近期任务，请监控该任务而非创建重复任务
POST请求后立即捕获任务实例ID — 在执行任何其他操作前存储任务ID，以便进行正确监控
操作前验证工作区容量分配 — 工作区必须已分配且激活容量
当用户提供公共数据URL时，请遵循公共URL数据导入策略 — 详细操作请参考链接资源部分，避免流程偏差或重复
正确格式化Notebook单元格 — 单元格源数组中的每一行必须以
```
\n
```
结尾，防止代码合并
使用正确的Livy会话请求体格式 — 发送包含
```
name
```
、
```
driverMemory
```
、
```
driverCores
```
、
```
executorMemory
```
、
```
executorCores
```
的扁平JSON。请勿包裹在
```
{"payload": ...}
```
中，也不要仅发送
```
{"kind": "pyspark"}
```
— 这会导致HTTP 500错误。请使用有效的内存值（28g、56g、112g、224g）。请查看下方的创建Livy会话示例及SPARK-CONSUMPTION-CORE.md文档。

PREFER

AVOID

避免执行

Never retry POST with same parameters — If you have a job ID, only use GET to check status; don't create duplicate job instances
Don't skip capacity verification — Operations will fail if workspace capacity is paused or unassigned
Avoid immediate POST retries on failures — Check for existing/active jobs first to prevent duplicates
Don't create new runs if monitoring existing job — One job at a time; wait for completion before submitting new runs
Don't hardcode workspace/lakehouse IDs — Discover dynamically via item listing or catalog search APIs

切勿使用相同参数重试POST请求 — 若已有任务ID，仅使用GET请求检查状态；请勿创建重复任务实例
不要跳过容量验证 — 若工作区容量已暂停或未分配，操作会失败
失败后避免立即重试POST请求 — 先检查是否存在/活跃任务，防止重复创建
监控现有任务时请勿创建新运行实例 — 一次仅运行一个任务；等待任务完成后再提交新任务
不要硬编码工作区/Lakehouse ID — 通过项目列表或目录搜索API动态发现

RULES — Read these first, follow them always

规则 — 请首先阅读并始终遵守

Rule 1 — Validate prerequisites before operations. Verify workspace has capacity assigned (see COMMON-CORE.md Create Workspace and Capacity Management) and resource IDs exist before attempting operations.
Rule 2 — Trust updateDefinition success. A
Succeeded
poll result from
updateDefinition
is sufficient confirmation that content and lakehouse bindings persisted. Do NOT call
getDefinition
after every upload — it is an async LRO that adds significant latency. Only use
getDefinition
for its intended purpose: reading current notebook content before making modifications.
Rule 3 — Prevent duplicate jobs and monitor execution properly. Before submitting new notebook run, ALWAYS check for recent job instances first (last 5 minutes). If recent job exists, monitor it instead of creating duplicate. After submission, capture job instance ID immediately and poll status - never retry POST. See SPARK-AUTHORING-CORE.md Job Monitoring for patterns.

规则1 — 操作前验证前置条件。 验证工作区已分配容量（请参见COMMON-CORE.md中的创建工作区和容量管理部分），且资源ID已存在，再执行操作。
规则2 — 信任updateDefinition的成功结果。
updateDefinition
返回的轮询结果
Succeeded
已足够确认内容和Lakehouse绑定已持久化。请勿每次上传后都调用
getDefinition
— 这是异步LRO操作，会增加显著延迟。仅在修改前读取当前Notebook内容时，才使用
getDefinition
的预期用途。
规则3 — 防止重复任务并正确监控执行状态。 提交新Notebook运行前，务必先检查最近5分钟内的任务实例。若存在近期任务，请监控该任务而非创建重复任务。提交后立即捕获任务实例ID并轮询状态 — 切勿重试POST请求。请参见SPARK-AUTHORING-CORE.md中的任务监控模式。

Quick Start Examples

快速入门示例

For detailed patterns, authentication, and comprehensive API usage, see:

COMMON-CORE.md — Fabric REST API patterns, authentication, item discovery
COMMON-CLI.md —
```
az rest
```
usage, environment detection, token acquisition
SPARK-AUTHORING-CORE.md — Notebook deployment, lakehouse creation, job execution

Below are minimal quick-start examples. Always reference the COMMON- files for production use.*

如需详细模式、认证方式和全面API用法，请查看：

COMMON-CORE.md — Fabric REST API模式、认证、项目发现
COMMON-CLI.md —
```
az rest
```
用法、环境检测、令牌获取
SPARK-AUTHORING-CORE.md — Notebook部署、Lakehouse创建、任务执行

以下是最简快速入门示例。*生产环境使用请务必参考COMMON-系列文档。

Create Workspace & Lakehouse

创建工作区与Lakehouse

bash

undefined

bash

undefined

See COMMON-CORE.md Environment URLs and SPARK-AUTHORING-CORE.md for full patterns

完整模式请参见COMMON-CORE.md环境URL和SPARK-AUTHORING-CORE.md

cat > /tmp/body.json << 'EOF' {"displayName": "DataEng-Dev"} EOF workspace_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces"
--body @/tmp/body.json --query "id" --output tsv)

cat > /tmp/body.json << 'EOF' {"displayName": "DevLakehouse", "type": "Lakehouse", "creationPayload": {"enableSchemas": true}} EOF lakehouse_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/items"
--body @/tmp/body.json --query "id" --output tsv)

undefined

undefined

Organize Lakehouse Tables with Schemas

使用架构组织Lakehouse表

python

undefined

python

undefined

See SPARK-AUTHORING-CORE.md Lakehouse Schema Organization for table organization patterns

表组织模式请参见SPARK-AUTHORING-CORE.md中的Lakehouse架构组织部分

Create schemas for medallion architecture

为medallion架构创建模式

spark.sql("CREATE SCHEMA IF NOT EXISTS bronze") spark.sql("CREATE SCHEMA IF NOT EXISTS silver") spark.sql("CREATE SCHEMA IF NOT EXISTS gold")

undefined

spark.sql("CREATE SCHEMA IF NOT EXISTS bronze") spark.sql("CREATE SCHEMA IF NOT EXISTS silver") spark.sql("CREATE SCHEMA IF NOT EXISTS gold")

undefined

Create Livy Session

创建Livy会话

bash

undefined

bash

undefined

See SPARK-CONSUMPTION-CORE.md for Livy session configuration and management

Livy会话配置与管理请参见SPARK-CONSUMPTION-CORE.md

IMPORTANT: Body MUST be flat JSON with memory/cores — do NOT wrap in {"payload": ...}

重要提示：请求体必须是包含内存/核心参数的扁平JSON — 请勿包裹在{"payload": ...}中

cat > /tmp/body.json << 'EOF' {"name": "dev-session", "driverMemory": "56g", "driverCores": 8, "executorMemory": "56g", "executorCores": 8, "conf": {"spark.dynamicAllocation.enabled": "true", "spark.fabric.pool.name": "Starter Pool"}} EOF az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/lakehouses/$lakehouse_id/livyapi/versions/2023-12-01/sessions"
--body @/tmp/body.json


> **Livy Session Body — Common Mistakes**
> - ❌ `{"payload": {"kind": "pyspark"}}` → HTTP 500 (wrong wrapper, missing required fields)
> - ❌ `{"kind": "pyspark"}` → HTTP 500 (missing `driverMemory`, `executorMemory`, etc.)
> - ✅ Flat JSON with `name`, `driverMemory`, `driverCores`, `executorMemory`, `executorCores` (and optionally `conf` with Starter Pool)


> **Livy会话请求体 — 常见错误**
> - ❌ `{"payload": {"kind": "pyspark"}}` → HTTP 500（错误的包裹方式，缺少必填字段）
> - ❌ `{"kind": "pyspark"}` → HTTP 500（缺少`driverMemory`、`executorMemory`等字段）
> - ✅ 包含`name`、`driverMemory`、`driverCores`、`executorMemory`、`executorCores`的扁平JSON（可选添加包含Starter Pool的`conf`）

Spark Performance Configs

Spark性能配置

For detailed workload-specific configurations, see data-engineering-patterns.md Delta Lake Best Practices.

Quick reference:

python

undefined

如需针对特定工作负载的详细配置，请参见data-engineering-patterns.md中的Delta Lake最佳实践。

快速参考：

python

undefined

Write-heavy (Bronze): Disable V-Order, enable autoCompact

写入密集型（Bronze层）：禁用V-Order，启用autoCompact

Balanced (Silver): Enable V-Order, adaptive execution

均衡型（Silver层）：启用V-Order，自适应执行

Read-heavy (Gold): Vectorized reads, optimal parallelism

读取密集型（Gold层）：向量化读取，优化并行度

See data-engineering-patterns.md for complete config tables

完整配置表请参见data-engineering-patterns.md

undefined

undefined

Variable Library in Notebooks

Notebook中的变量库

Use a Variable Library to centralize lakehouse names, workspace IDs, and feature flags.

python

undefined

使用变量库集中管理Lakehouse名称、工作区ID和功能标志。

python

undefined

✅ CORRECT — getLibrary() + dot notation

✅ 正确用法 — getLibrary() + 点标记法

lib = notebookutils.variableLibrary.getLibrary("MyConfig") lakehouse_name = lib.lakehouse_name enable_logging = lib.enable_logging # returns string "true"/"false"

lib = notebookutils.variableLibrary.getLibrary("MyConfig") lakehouse_name = lib.lakehouse_name enable_logging = lib.enable_logging # 返回字符串"true"/"false"

Boolean: compare as string (bool("false") is True in Python!)

布尔值判断：需与字符串比较（Python中bool("false")为True！）

if enable_logging.lower() == "true": print("Logging enabled")

❌ WRONG — .get() does not exist, causes runtime failure

❌ 错误用法 — .get()方法不存在，会导致运行时失败

notebookutils.variableLibrary.get("MyConfig", "lakehouse_name")


---

**Focus**: Essential CLI patterns for Spark/data engineering development with intelligent routing to specialized resources. For comprehensive patterns, always reference COMMON-* files and resource documents.


---

**核心内容**：用于Spark/数据工程开发的关键CLI模式，可智能路由至专用资源。如需全面模式，请始终参考COMMON-*系列文档和资源文档。

spark-authoring-cli

Original

Translation

Spark Authoring — CLI Skill

Spark 开发 — CLI技能

Table of Contents

目录

Must/Prefer/Avoid

必须执行/推荐执行/避免执行

MUST DO

必须执行

PREFER

推荐执行

AVOID

避免执行

RULES — Read these first, follow them always

规则 — 请首先阅读并始终遵守

Quick Start Examples

快速入门示例

Create Workspace & Lakehouse

创建工作区与Lakehouse

See COMMON-CORE.md Environment URLs and SPARK-AUTHORING-CORE.md for full patterns

完整模式请参见COMMON-CORE.md环境URL和SPARK-AUTHORING-CORE.md

Organize Lakehouse Tables with Schemas

使用架构组织Lakehouse表

See SPARK-AUTHORING-CORE.md Lakehouse Schema Organization for table organization patterns

表组织模式请参见SPARK-AUTHORING-CORE.md中的Lakehouse架构组织部分

Create schemas for medallion architecture

为medallion架构创建模式

Create Livy Session

创建Livy会话

See SPARK-CONSUMPTION-CORE.md for Livy session configuration and management

Livy会话配置与管理请参见SPARK-CONSUMPTION-CORE.md

IMPORTANT: Body MUST be flat JSON with memory/cores — do NOT wrap in {"payload": ...}

重要提示：请求体必须是包含内存/核心参数的扁平JSON — 请勿包裹在{"payload": ...}中

Spark Performance Configs

Spark性能配置

Write-heavy (Bronze): Disable V-Order, enable autoCompact

写入密集型（Bronze层）：禁用V-Order，启用autoCompact

Balanced (Silver): Enable V-Order, adaptive execution

均衡型（Silver层）：启用V-Order，自适应执行

Read-heavy (Gold): Vectorized reads, optimal parallelism

读取密集型（Gold层）：向量化读取，优化并行度

See data-engineering-patterns.md for complete config tables

完整配置表请参见data-engineering-patterns.md

Variable Library in Notebooks

Notebook中的变量库

✅ CORRECT — getLibrary() + dot notation

✅ 正确用法 — getLibrary() + 点标记法

Boolean: compare as string (bool("false") is True in Python!)

布尔值判断：需与字符串比较（Python中bool("false")为True！）

❌ WRONG — .get() does not exist, causes runtime failure

❌ 错误用法 — .get()方法不存在，会导致运行时失败

notebookutils.variableLibrary.get("MyConfig", "lakehouse_name")

notebookutils.variableLibrary.get("MyConfig", "lakehouse_name")