spark-authoring-cli
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUpdate Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
- GitHub Copilot CLI / VS Code: invoke the
skill.check-updates- Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
- Skip if the check was already performed earlier in this session.
CRITICAL NOTES
- To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
- To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering
更新检查 — 每个会话仅需一次(必填) 在会话中首次使用此技能时,请先运行check-updates技能,再进行后续操作。
- GitHub Copilot CLI / VS Code:调用
技能。check-updates- Claude Code / Cowork / Cursor / Windsurf / Codex:对比本地与远程package.json版本。
- 若本次会话中已提前完成检查,可跳过此步骤。
重要说明
- 从工作区名称查找工作区详情(包括其ID):列出所有工作区,然后使用JMESPath进行筛选
- 从工作区ID、项目类型和项目名称查找项目详情(包括其ID):列出该工作区中对应类型的所有项目,然后使用JMESPath进行筛选
Spark Authoring — CLI Skill
Spark 开发 — CLI技能
Table of Contents
目录
| Task | Reference | Notes |
|---|---|---|
| RULES — Read these first, follow them always | SKILL.md § RULES | MUST read — 3 rules for this skill |
| Finding Workspaces and Items in Fabric | COMMON-CLI.md § Finding Workspaces and Items in Fabric | Mandatory — READ link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id] |
| Fabric Topology & Key Concepts | COMMON-CORE.md § Fabric Topology & Key Concepts | |
| Environment URLs | COMMON-CORE.md § Environment URLs | |
| Authentication & Token Acquisition | COMMON-CORE.md § Authentication & Token Acquisition | Wrong audience = 401; read before any auth issue |
| Core Control-Plane REST APIs | COMMON-CORE.md § Core Control-Plane REST APIs | |
| Pagination | COMMON-CORE.md § Pagination | |
| Long-Running Operations (LRO) | COMMON-CORE.md § Long-Running Operations (LRO) | |
| Rate Limiting & Throttling | COMMON-CORE.md § Rate Limiting & Throttling | |
| OneLake Data Access | COMMON-CORE.md § OneLake Data Access | Requires |
| Definition Envelope | ITEM-DEFINITIONS-CORE.md § Definition Envelope | Definition payload structure |
| Per-Item-Type Definitions | ITEM-DEFINITIONS-CORE.md § Per-Item-Type Definitions | Support matrix, decoded content, part paths — REST specs, CLI recipes |
| Job Execution | COMMON-CORE.md § Job Execution | |
| Capacity Management | COMMON-CORE.md § Capacity Management | |
| Gotchas & Troubleshooting | COMMON-CORE.md § Gotchas & Troubleshooting | |
| Best Practices | COMMON-CORE.md § Best Practices | |
| Tool Selection Rationale | COMMON-CLI.md § Tool Selection Rationale | |
| Authentication Recipes | COMMON-CLI.md § Authentication Recipes | |
Fabric Control-Plane API via | COMMON-CLI.md § Fabric Control-Plane API via az rest | Always pass |
| Pagination Pattern | COMMON-CLI.md § Pagination Pattern | |
| Long-Running Operations (LRO) Pattern | COMMON-CLI.md § Long-Running Operations (LRO) Pattern | |
OneLake Data Access via | COMMON-CLI.md § OneLake Data Access via curl | Use |
| SQL / TDS Data-Plane Access | COMMON-CLI.md § SQL / TDS Data-Plane Access | |
| Job Execution (CLI) | COMMON-CLI.md § Job Execution | |
| Job Scheduling | COMMON-CLI.md § Job Scheduling | URL is |
| OneLake Shortcuts | COMMON-CLI.md § OneLake Shortcuts | |
| Capacity Management (CLI) | COMMON-CLI.md § Capacity Management | |
| Composite Recipes | COMMON-CLI.md § Composite Recipes | |
| Gotchas & Troubleshooting (CLI-Specific) | COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific) | |
Quick Reference: | COMMON-CLI.md § Quick Reference: az rest Template | |
| Quick Reference: Token Audience / CLI Tool Matrix | COMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool Matrix | Which |
| Relationship to SPARK-CONSUMPTION-CORE.md | SPARK-AUTHORING-CORE.md § Relationship to SPARK-CONSUMPTION-CORE.md | |
| Data Engineering Authoring Capability Matrix | SPARK-AUTHORING-CORE.md § Data Engineering Authoring Capability Matrix | |
| Lakehouse Management | SPARK-AUTHORING-CORE.md § Lakehouse Management | |
| Notebook Management | SPARK-AUTHORING-CORE.md § Notebook Management | |
| Notebook Execution & Job Management | SPARK-AUTHORING-CORE.md § Notebook Execution & Job Management | |
| CI/CD & Automation Patterns | SPARK-AUTHORING-CORE.md § CI/CD & Automation Patterns | |
| Infrastructure-as-Code | SPARK-AUTHORING-CORE.md § Infrastructure-as-Code | |
| Performance Optimization & Resource Management | SPARK-AUTHORING-CORE.md § Performance Optimization & Resource Management | |
| Authoring Gotchas and Troubleshooting | SPARK-AUTHORING-CORE.md § Authoring Gotchas and Troubleshooting | |
| Quick Reference: Authoring Decision Guide | SPARK-AUTHORING-CORE.md § Quick Reference: Authoring Decision Guide | |
| Recommended Patterns (Data Engineering) | data-engineering-patterns.md § Recommended patterns | |
| Data Ingestion Principles | data-engineering-patterns.md § Data Ingestion Principles | |
| Transformation Patterns | data-engineering-patterns.md § Transformation Patterns | |
| Delta Lake Best Practices | data-engineering-patterns.md § Delta Lake Best Practices | |
| Quality Assurance Strategies | data-engineering-patterns.md § Quality Assurance Strategies | |
| Recommended Patterns (Development Workflow) | development-workflow.md § Recommended patterns | |
| Notebook Lifecycle | development-workflow.md § Notebook Lifecycle | |
| Parameterization Patterns | development-workflow.md § Parameterization Patterns | |
| Variable Library (notebook + pipeline usage) | development-workflow.md § Method 4: Variable Library | |
| Variable Library Definition | ITEM-DEFINITIONS-CORE.md § VariableLibrary | Definition parts, decoded content, types, pipeline mappings, gotchas |
| Local Testing Strategy | development-workflow.md § Local Testing Strategy | |
| Debugging Patterns | development-workflow.md § Debugging Patterns | |
| Recommended Patterns (Infrastructure) | infrastructure-orchestration.md § Recommended patterns | |
| Workspace Provisioning Principles | infrastructure-orchestration.md § Workspace Provisioning Principles | |
| Lakehouse Configuration Guidance | infrastructure-orchestration.md § Lakehouse Configuration Guidance | |
| Pipeline Design Patterns | infrastructure-orchestration.md § Pipeline Design Patterns | |
| CI/CD Integration Strategy | infrastructure-orchestration.md § CI/CD Integration Strategy | |
| Notebook API — Which Endpoint to Use | notebook-api-operations.md § Quick Decision | Start here for remote notebook edits — getDefinition vs updateDefinition |
| Notebook Modification Workflow | notebook-api-operations.md § Workflow | Five-step flow: retrieve, decode, modify, encode, upload |
| Notebook API Error Reference | notebook-api-operations.md § Error Reference | 411, 400 (updateMetadata), 401, 403 explained |
| Notebook API Gotchas | notebook-api-operations.md § Gotchas | |
| Default Lakehouse Binding | notebook-api-operations.md § Default Lakehouse Binding | |
| Public URL Data Ingestion | notebook-api-operations.md § Public URL Data Ingestion | Use real source URL, stage into |
| getDefinition (read notebook content) | notebook-api-operations.md § Step 1 — Retrieve Notebook Content | LRO flow, |
| Decode Base64 Notebook Payload | notebook-api-operations.md § Step 2 — Decode the Notebook Content | Extract payload, base64 decode, ipynb JSON structure |
| Modify Notebook Cells | notebook-api-operations.md § Step 3 — Modify the Notebook Content | Find cell, insert/replace lines, |
| updateDefinition (write notebook content) | notebook-api-operations.md § Step 4 — Re-encode and Upload | Re-encode, upload, LRO poll, updateMetadata flag pitfall |
| Verify Notebook Update (Optional) | notebook-api-operations.md § Step 5 — Verify the Update | Skip unless you suspect a silent failure — |
| Notebook API Error Reference | notebook-api-operations.md § Error Reference | 411, 400 (updateMetadata), 401, 403 explained |
| Notebook API End-to-End Script | notebook-api-operations.md § Complete End-to-End Script | Full bash: get → decode → modify → encode → update → verify |
| Quick Start Examples | SKILL.md § Quick Start Examples | Minimal examples for common operations |
| 任务 | 参考文档 | 说明 |
|---|---|---|
| 规则 — 请首先阅读并始终遵守 | SKILL.md § RULES | 必须阅读 — 本技能的3条规则 |
| 在Fabric中查找工作区和项目 | COMMON-CLI.md § 在Fabric中查找工作区和项目 | 必填 — 请先阅读链接内容 [根据名称查找工作区ID,或根据名称、项目类型和工作区ID查找项目ID时需用到] |
| Fabric拓扑结构与核心概念 | COMMON-CORE.md § Fabric拓扑结构与核心概念 | |
| 环境URL | COMMON-CORE.md § 环境URL | |
| 认证与令牌获取 | COMMON-CORE.md § 认证与令牌获取 | 受众错误会导致401;遇到认证问题前请先阅读 |
| 核心控制平面REST API | COMMON-CORE.md § 核心控制平面REST API | |
| 分页 | COMMON-CORE.md § 分页 | |
| 长期运行操作(LRO) | COMMON-CORE.md § 长期运行操作(LRO) | |
| 速率限制与限流 | COMMON-CORE.md § 速率限制与限流 | |
| OneLake数据访问 | COMMON-CORE.md § OneLake数据访问 | 需要 |
| 定义信封 | ITEM-DEFINITIONS-CORE.md § 定义信封 | 定义负载结构 |
| 按项目类型分类的定义 | ITEM-DEFINITIONS-CORE.md § 按项目类型分类的定义 | 支持矩阵、解码内容、部分路径 — REST规范、CLI示例 |
| 任务执行 | COMMON-CORE.md § 任务执行 | |
| 容量管理 | COMMON-CORE.md § 容量管理 | |
| 常见问题与故障排除 | COMMON-CORE.md § 常见问题与故障排除 | |
| 最佳实践 | COMMON-CORE.md § 最佳实践 | |
| 工具选择依据 | COMMON-CLI.md § 工具选择依据 | |
| 认证示例 | COMMON-CLI.md § 认证示例 | |
通过 | COMMON-CLI.md § 通过az rest调用Fabric控制平面API | 必须传递 |
| 分页模式 | COMMON-CLI.md § 分页模式 | |
| 长期运行操作(LRO)模式 | COMMON-CLI.md § 长期运行操作(LRO)模式 | |
通过 | COMMON-CLI.md § 通过curl访问OneLake数据 | 使用 |
| SQL / TDS数据平面访问 | COMMON-CLI.md § SQL / TDS数据平面访问 | |
| 任务执行(CLI) | COMMON-CLI.md § 任务执行 | |
| 任务调度 | COMMON-CLI.md § 任务调度 | URL为 |
| OneLake快捷方式 | COMMON-CLI.md § OneLake快捷方式 | |
| 容量管理(CLI) | COMMON-CLI.md § 容量管理 | |
| 复合示例 | COMMON-CLI.md § 复合示例 | |
| 常见问题与故障排除(CLI专属) | COMMON-CLI.md § 常见问题与故障排除(CLI专属) | |
快速参考: | COMMON-CLI.md § 快速参考:az rest模板 | |
| 快速参考:令牌受众/CLI工具矩阵 | COMMON-CLI.md § 快速参考:令牌受众↔CLI工具矩阵 | 各服务对应的 |
| 与SPARK-CONSUMPTION-CORE.md的关联 | SPARK-AUTHORING-CORE.md § 与SPARK-CONSUMPTION-CORE.md的关联 | |
| 数据工程开发能力矩阵 | SPARK-AUTHORING-CORE.md § 数据工程开发能力矩阵 | |
| Lakehouse管理 | SPARK-AUTHORING-CORE.md § Lakehouse管理 | |
| Notebook管理 | SPARK-AUTHORING-CORE.md § Notebook管理 | |
| Notebook执行与任务管理 | SPARK-AUTHORING-CORE.md § Notebook执行与任务管理 | |
| CI/CD与自动化模式 | SPARK-AUTHORING-CORE.md § CI/CD与自动化模式 | |
| 基础设施即代码 | SPARK-AUTHORING-CORE.md § 基础设施即代码 | |
| 性能优化与资源管理 | SPARK-AUTHORING-CORE.md § 性能优化与资源管理 | |
| 开发常见问题与故障排除 | SPARK-AUTHORING-CORE.md § 开发常见问题与故障排除 | |
| 快速参考:开发决策指南 | SPARK-AUTHORING-CORE.md § 快速参考:开发决策指南 | |
| 推荐模式(数据工程) | data-engineering-patterns.md § 推荐模式 | |
| 数据导入原则 | data-engineering-patterns.md § 数据导入原则 | |
| 转换模式 | data-engineering-patterns.md § 转换模式 | |
| Delta Lake最佳实践 | data-engineering-patterns.md § Delta Lake最佳实践 | |
| 质量保障策略 | data-engineering-patterns.md § 质量保障策略 | |
| 推荐模式(开发工作流) | development-workflow.md § 推荐模式 | |
| Notebook生命周期 | development-workflow.md § Notebook生命周期 | |
| 参数化模式 | development-workflow.md § 参数化模式 | |
| 变量库(Notebook + 管道使用) | development-workflow.md § 方法4:变量库 | Notebook中使用 |
| 变量库定义 | ITEM-DEFINITIONS-CORE.md § VariableLibrary | 定义部分、解码内容、类型、管道映射、常见问题 |
| 本地测试策略 | development-workflow.md § 本地测试策略 | |
| 调试模式 | development-workflow.md § 调试模式 | |
| 推荐模式(基础设施) | infrastructure-orchestration.md § 推荐模式 | |
| 工作区配置原则 | infrastructure-orchestration.md § 工作区配置原则 | |
| Lakehouse配置指南 | infrastructure-orchestration.md § Lakehouse配置指南 | |
| 管道设计模式 | infrastructure-orchestration.md § 管道设计模式 | |
| CI/CD集成策略 | infrastructure-orchestration.md § CI/CD集成策略 | |
| Notebook API — 选择哪个端点 | notebook-api-operations.md § 快速决策 | 远程编辑Notebook请从此处开始 — getDefinition与updateDefinition对比 |
| Notebook修改工作流 | notebook-api-operations.md § 工作流 | 五步流程:检索、解码、修改、编码、上传 |
| Notebook API错误参考 | notebook-api-operations.md § 错误参考 | 解释411、400(updateMetadata)、401、403错误 |
| Notebook API常见问题 | notebook-api-operations.md § 常见问题 | |
| 默认Lakehouse绑定 | notebook-api-operations.md § 默认Lakehouse绑定 | |
| 公共URL数据导入 | notebook-api-operations.md § 公共URL数据导入(Spark) | 使用真实源URL,暂存至 |
| getDefinition(读取Notebook内容) | notebook-api-operations.md § 步骤1 — 检索Notebook内容 | LRO流程、 |
| 解码Base64格式的Notebook负载 | notebook-api-operations.md § 步骤2 — 解码Notebook内容 | 提取负载、Base64解码、ipynb JSON结构 |
| 修改Notebook单元格 | notebook-api-operations.md § 步骤3 — 修改Notebook内容 | 查找单元格、插入/替换行、每行 |
| updateDefinition(写入Notebook内容) | notebook-api-operations.md § 步骤4 — 重新编码并上传 | 重新编码、上传、轮询LRO状态、updateMetadata标志陷阱 |
| 验证Notebook更新(可选) | notebook-api-operations.md § 步骤5 — 验证更新 | 除非怀疑静默失败,否则可跳过 — updateDefinition返回的 |
| Notebook API错误参考 | notebook-api-operations.md § 错误参考 | 解释411、400(updateMetadata)、401、403错误 |
| Notebook API端到端脚本 | notebook-api-operations.md § 完整端到端脚本 | 完整bash脚本:获取→解码→修改→编码→更新→验证 |
| 快速入门示例 | SKILL.md § 快速入门示例 | 常见操作的最简示例 |
Must/Prefer/Avoid
必须执行/推荐执行/避免执行
MUST DO
必须执行
- Check for recent jobs BEFORE creating new notebook runs — Query job instances from last 5 minutes; if recent job exists, monitor it instead of creating duplicate
- Capture job instance ID immediately after POST — Store job ID before any other operations to enable proper monitoring
- Verify workspace capacity assignment before operations — Workspace must have capacity assigned and active
- When user provides a public data URL, follow the Public URL Data Ingestion policy — keep detailed behavior in the linked resource section to avoid drift/duplication
- Format notebook cells correctly — Each line in cell source array MUST end with to prevent code merging
\n - Use correct Livy session body format — Send a FLAT JSON with ,
name,driverMemory,driverCores,executorMemory. Do NOT wrap inexecutorCoresor send only{"payload": ...}— that causes HTTP 500. Use valid memory values (28g, 56g, 112g, 224g). See Create Livy Session example below and SPARK-CONSUMPTION-CORE.md.{"kind": "pyspark"}
- 创建新Notebook运行前检查近期任务 — 查询过去5分钟内的任务实例;若存在近期任务,请监控该任务而非创建重复任务
- POST请求后立即捕获任务实例ID — 在执行任何其他操作前存储任务ID,以便进行正确监控
- 操作前验证工作区容量分配 — 工作区必须已分配且激活容量
- 当用户提供公共数据URL时,请遵循公共URL数据导入策略 — 详细操作请参考链接资源部分,避免流程偏差或重复
- 正确格式化Notebook单元格 — 单元格源数组中的每一行必须以结尾,防止代码合并
\n - 使用正确的Livy会话请求体格式 — 发送包含、
name、driverMemory、driverCores、executorMemory的扁平JSON。请勿包裹在executorCores中,也不要仅发送{"payload": ...}— 这会导致HTTP 500错误。请使用有效的内存值(28g、56g、112g、224g)。请查看下方的创建Livy会话示例及SPARK-CONSUMPTION-CORE.md文档。{"kind": "pyspark"}
PREFER
推荐执行
- Poll job status with proper intervals — 10-30 seconds between polls; timeout after reasonable duration (e.g., 30 minutes)
- Check job history when POST response is unreadable — If POST returns "No Content" or unreadable response, query recent jobs (last 1 minute) before retrying
- Use Starter Pool for development — Development/testing workloads should use
useStarterPool: true - Use Workspace Pool for production — Production workloads need consistent performance with
useWorkspacePool: true - Enable lakehouse schemas during creation — Set for better table organization
creationPayload.enableSchemas: true - Implement idempotency checks — Prevent duplicate operations by checking existing state first
- 按合理间隔轮询任务状态 — 轮询间隔为10-30秒;在合理时长后超时(例如30分钟)
- 当POST响应不可读时检查任务历史 — 若POST返回"无内容"或不可读响应,请在重试前查询最近1分钟内的任务
- 开发环境使用Starter Pool — 开发/测试工作负载应设置
useStarterPool: true - 生产环境使用Workspace Pool — 生产工作负载需要稳定性能,应设置
useWorkspacePool: true - 创建Lakehouse时启用架构 — 设置以优化表组织
creationPayload.enableSchemas: true - 实现幂等性检查 — 先检查现有状态,避免重复操作
AVOID
避免执行
- Never retry POST with same parameters — If you have a job ID, only use GET to check status; don't create duplicate job instances
- Don't skip capacity verification — Operations will fail if workspace capacity is paused or unassigned
- Avoid immediate POST retries on failures — Check for existing/active jobs first to prevent duplicates
- Don't create new runs if monitoring existing job — One job at a time; wait for completion before submitting new runs
- Don't hardcode workspace/lakehouse IDs — Discover dynamically via item listing or catalog search APIs
- 切勿使用相同参数重试POST请求 — 若已有任务ID,仅使用GET请求检查状态;请勿创建重复任务实例
- 不要跳过容量验证 — 若工作区容量已暂停或未分配,操作会失败
- 失败后避免立即重试POST请求 — 先检查是否存在/活跃任务,防止重复创建
- 监控现有任务时请勿创建新运行实例 — 一次仅运行一个任务;等待任务完成后再提交新任务
- 不要硬编码工作区/Lakehouse ID — 通过项目列表或目录搜索API动态发现
RULES — Read these first, follow them always
规则 — 请首先阅读并始终遵守
Rule 1 — Validate prerequisites before operations. Verify workspace has capacity assigned (see COMMON-CORE.md Create Workspace and Capacity Management) and resource IDs exist before attempting operations.Rule 2 — Trust updateDefinition success. Apoll result fromSucceededis sufficient confirmation that content and lakehouse bindings persisted. Do NOT callupdateDefinitionafter every upload — it is an async LRO that adds significant latency. Only usegetDefinitionfor its intended purpose: reading current notebook content before making modifications.getDefinitionRule 3 — Prevent duplicate jobs and monitor execution properly. Before submitting new notebook run, ALWAYS check for recent job instances first (last 5 minutes). If recent job exists, monitor it instead of creating duplicate. After submission, capture job instance ID immediately and poll status - never retry POST. See SPARK-AUTHORING-CORE.md Job Monitoring for patterns.
规则1 — 操作前验证前置条件。 验证工作区已分配容量(请参见COMMON-CORE.md中的创建工作区和容量管理部分),且资源ID已存在,再执行操作。规则2 — 信任updateDefinition的成功结果。返回的轮询结果updateDefinition已足够确认内容和Lakehouse绑定已持久化。请勿每次上传后都调用Succeeded— 这是异步LRO操作,会增加显著延迟。仅在修改前读取当前Notebook内容时,才使用getDefinition的预期用途。getDefinition规则3 — 防止重复任务并正确监控执行状态。 提交新Notebook运行前,务必先检查最近5分钟内的任务实例。若存在近期任务,请监控该任务而非创建重复任务。提交后立即捕获任务实例ID并轮询状态 — 切勿重试POST请求。请参见SPARK-AUTHORING-CORE.md中的任务监控模式。
Quick Start Examples
快速入门示例
For detailed patterns, authentication, and comprehensive API usage, see:
- COMMON-CORE.md — Fabric REST API patterns, authentication, item discovery
- COMMON-CLI.md — usage, environment detection, token acquisition
az rest - SPARK-AUTHORING-CORE.md — Notebook deployment, lakehouse creation, job execution
Below are minimal quick-start examples. Always reference the COMMON- files for production use.*
如需详细模式、认证方式和全面API用法,请查看:
- COMMON-CORE.md — Fabric REST API模式、认证、项目发现
- COMMON-CLI.md — 用法、环境检测、令牌获取
az rest - SPARK-AUTHORING-CORE.md — Notebook部署、Lakehouse创建、任务执行
以下是最简快速入门示例。*生产环境使用请务必参考COMMON-系列文档。
Create Workspace & Lakehouse
创建工作区与Lakehouse
bash
undefinedbash
undefinedSee COMMON-CORE.md Environment URLs and SPARK-AUTHORING-CORE.md for full patterns
完整模式请参见COMMON-CORE.md环境URL和SPARK-AUTHORING-CORE.md
cat > /tmp/body.json << 'EOF'
{"displayName": "DataEng-Dev"}
EOF
workspace_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces"
--body @/tmp/body.json --query "id" --output tsv)
--url "https://api.fabric.microsoft.com/v1/workspaces"
--body @/tmp/body.json --query "id" --output tsv)
cat > /tmp/body.json << 'EOF'
{"displayName": "DevLakehouse", "type": "Lakehouse", "creationPayload": {"enableSchemas": true}}
EOF
lakehouse_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/items"
--body @/tmp/body.json --query "id" --output tsv)
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/items"
--body @/tmp/body.json --query "id" --output tsv)
undefinedcat > /tmp/body.json << 'EOF'
{"displayName": "DataEng-Dev"}
EOF
workspace_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces"
--body @/tmp/body.json --query "id" --output tsv)
--url "https://api.fabric.microsoft.com/v1/workspaces"
--body @/tmp/body.json --query "id" --output tsv)
cat > /tmp/body.json << 'EOF'
{"displayName": "DevLakehouse", "type": "Lakehouse", "creationPayload": {"enableSchemas": true}}
EOF
lakehouse_id=$(az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/items"
--body @/tmp/body.json --query "id" --output tsv)
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/items"
--body @/tmp/body.json --query "id" --output tsv)
undefinedOrganize Lakehouse Tables with Schemas
使用架构组织Lakehouse表
python
undefinedpython
undefinedSee SPARK-AUTHORING-CORE.md Lakehouse Schema Organization for table organization patterns
表组织模式请参见SPARK-AUTHORING-CORE.md中的Lakehouse架构组织部分
Create schemas for medallion architecture
为medallion架构创建模式
spark.sql("CREATE SCHEMA IF NOT EXISTS bronze")
spark.sql("CREATE SCHEMA IF NOT EXISTS silver")
spark.sql("CREATE SCHEMA IF NOT EXISTS gold")
undefinedspark.sql("CREATE SCHEMA IF NOT EXISTS bronze")
spark.sql("CREATE SCHEMA IF NOT EXISTS silver")
spark.sql("CREATE SCHEMA IF NOT EXISTS gold")
undefinedCreate Livy Session
创建Livy会话
bash
undefinedbash
undefinedSee SPARK-CONSUMPTION-CORE.md for Livy session configuration and management
Livy会话配置与管理请参见SPARK-CONSUMPTION-CORE.md
IMPORTANT: Body MUST be flat JSON with memory/cores — do NOT wrap in {"payload": ...}
重要提示:请求体必须是包含内存/核心参数的扁平JSON — 请勿包裹在{"payload": ...}中
cat > /tmp/body.json << 'EOF'
{"name": "dev-session", "driverMemory": "56g", "driverCores": 8, "executorMemory": "56g", "executorCores": 8, "conf": {"spark.dynamicAllocation.enabled": "true", "spark.fabric.pool.name": "Starter Pool"}}
EOF
az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/lakehouses/$lakehouse_id/livyapi/versions/2023-12-01/sessions"
--body @/tmp/body.json
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/lakehouses/$lakehouse_id/livyapi/versions/2023-12-01/sessions"
--body @/tmp/body.json
> **Livy Session Body — Common Mistakes**
> - ❌ `{"payload": {"kind": "pyspark"}}` → HTTP 500 (wrong wrapper, missing required fields)
> - ❌ `{"kind": "pyspark"}` → HTTP 500 (missing `driverMemory`, `executorMemory`, etc.)
> - ✅ Flat JSON with `name`, `driverMemory`, `driverCores`, `executorMemory`, `executorCores` (and optionally `conf` with Starter Pool)cat > /tmp/body.json << 'EOF'
{"name": "dev-session", "driverMemory": "56g", "driverCores": 8, "executorMemory": "56g", "executorCores": 8, "conf": {"spark.dynamicAllocation.enabled": "true", "spark.fabric.pool.name": "Starter Pool"}}
EOF
az rest --method post --resource "https://api.fabric.microsoft.com"
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/lakehouses/$lakehouse_id/livyapi/versions/2023-12-01/sessions"
--body @/tmp/body.json
--url "https://api.fabric.microsoft.com/v1/workspaces/$workspace_id/lakehouses/$lakehouse_id/livyapi/versions/2023-12-01/sessions"
--body @/tmp/body.json
> **Livy会话请求体 — 常见错误**
> - ❌ `{"payload": {"kind": "pyspark"}}` → HTTP 500(错误的包裹方式,缺少必填字段)
> - ❌ `{"kind": "pyspark"}` → HTTP 500(缺少`driverMemory`、`executorMemory`等字段)
> - ✅ 包含`name`、`driverMemory`、`driverCores`、`executorMemory`、`executorCores`的扁平JSON(可选添加包含Starter Pool的`conf`)Spark Performance Configs
Spark性能配置
For detailed workload-specific configurations, see data-engineering-patterns.md Delta Lake Best Practices.
Quick reference:
python
undefined如需针对特定工作负载的详细配置,请参见data-engineering-patterns.md中的Delta Lake最佳实践。
快速参考:
python
undefinedWrite-heavy (Bronze): Disable V-Order, enable autoCompact
写入密集型(Bronze层):禁用V-Order,启用autoCompact
Balanced (Silver): Enable V-Order, adaptive execution
均衡型(Silver层):启用V-Order,自适应执行
Read-heavy (Gold): Vectorized reads, optimal parallelism
读取密集型(Gold层):向量化读取,优化并行度
See data-engineering-patterns.md for complete config tables
完整配置表请参见data-engineering-patterns.md
undefinedundefinedVariable Library in Notebooks
Notebook中的变量库
Use a Variable Library to centralize lakehouse names, workspace IDs, and feature flags.
python
undefined使用变量库集中管理Lakehouse名称、工作区ID和功能标志。
python
undefined✅ CORRECT — getLibrary() + dot notation
✅ 正确用法 — getLibrary() + 点标记法
lib = notebookutils.variableLibrary.getLibrary("MyConfig")
lakehouse_name = lib.lakehouse_name
enable_logging = lib.enable_logging # returns string "true"/"false"
lib = notebookutils.variableLibrary.getLibrary("MyConfig")
lakehouse_name = lib.lakehouse_name
enable_logging = lib.enable_logging # 返回字符串"true"/"false"
Boolean: compare as string (bool("false") is True in Python!)
布尔值判断:需与字符串比较(Python中bool("false")为True!)
if enable_logging.lower() == "true":
print("Logging enabled")
if enable_logging.lower() == "true":
print("Logging enabled")
❌ WRONG — .get() does not exist, causes runtime failure
❌ 错误用法 — .get()方法不存在,会导致运行时失败
notebookutils.variableLibrary.get("MyConfig", "lakehouse_name")
notebookutils.variableLibrary.get("MyConfig", "lakehouse_name")
---
**Focus**: Essential CLI patterns for Spark/data engineering development with intelligent routing to specialized resources. For comprehensive patterns, always reference COMMON-* files and resource documents.
---
**核心内容**:用于Spark/数据工程开发的关键CLI模式,可智能路由至专用资源。如需全面模式,请始终参考COMMON-*系列文档和资源文档。