databricks-jobs
Original:🇺🇸 English
Translated
Develop and deploy Lakeflow Jobs on Databricks. Use when creating data engineering jobs with notebooks, Python wheels, or SQL tasks. Invoke BEFORE starting implementation.
4installs
Added on
NPX Install
npx skill4agent add databricks/databricks-agent-skills databricks-jobsTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Lakeflow Jobs Development
FIRST: Use the parent skill for CLI basics, authentication, profile selection, and data exploration commands.
databricksLakeflow Jobs are scheduled workflows that run notebooks, Python scripts, SQL queries, and other tasks on Databricks.
Scaffolding a New Job Project
Use with a config file to scaffold non-interactively. This creates a project in the directory:
databricks bundle init<project_name>/bash
databricks bundle init default-python --config-file <(echo '{"project_name": "my_job", "include_job": "yes", "include_pipeline": "no", "include_python": "yes", "serverless": "yes"}') --profile <PROFILE> < /dev/null- : letters, numbers, underscores only
project_name
After scaffolding, create and in the project directory. These files are essential to provide agents with guidance on how to work with the project. Use this content:
CLAUDE.mdAGENTS.md# Databricks Asset Bundles Project
This project uses Databricks Asset Bundles for deployment.
## Prerequisites
Install the Databricks CLI (>= v0.288.0) if not already installed:
- macOS: `brew tap databricks/tap && brew install databricks`
- Linux: `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`
- Windows: `winget install Databricks.DatabricksCLI`
Verify: `databricks -v`
## For AI Agents
Read the `databricks` skill for CLI basics, authentication, and deployment workflow.
Read the `databricks-jobs` skill for job-specific guidance.
If skills are not available, install them: `databricks experimental aitools skills install`Project Structure
my-job-project/
├── databricks.yml # Bundle configuration
├── resources/
│ └── my_job.job.yml # Job definition
├── src/
│ ├── my_notebook.ipynb # Notebook tasks
│ └── my_module/ # Python wheel package
│ ├── __init__.py
│ └── main.py
├── tests/
│ └── test_main.py
└── pyproject.toml # Python project config (if using wheels)Configuring Tasks
Edit to configure tasks:
resources/<job_name>.job.ymlyaml
resources:
jobs:
my_job:
name: my_job
tasks:
- task_key: my_notebook
notebook_task:
notebook_path: ../src/my_notebook.ipynb
- task_key: my_python
depends_on:
- task_key: my_notebook
python_wheel_task:
package_name: my_package
entry_point: mainTask types: , , , ,
notebook_taskpython_wheel_taskspark_python_taskpipeline_tasksql_taskJob Parameters
Parameters defined at job level are passed to ALL tasks (no need to repeat per task):
yaml
resources:
jobs:
my_job:
parameters:
- name: catalog
default: ${var.catalog}
- name: schema
default: ${var.schema}Access parameters in notebooks with .
dbutils.widgets.get("catalog")Writing Notebook Code
python
# Read parameters
catalog = dbutils.widgets.get("catalog")
schema = dbutils.widgets.get("schema")
# Read tables
df = spark.read.table(f"{catalog}.{schema}.my_table")
# SQL queries
result = spark.sql(f"SELECT * FROM {catalog}.{schema}.my_table LIMIT 10")
# Write output
df.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.output_table")Scheduling
yaml
resources:
jobs:
my_job:
trigger:
periodic:
interval: 1
unit: DAYSOr with cron:
yaml
schedule:
quartz_cron_expression: "0 0 2 * * ?"
timezone_id: "UTC"Multi-Task Jobs with Dependencies
yaml
resources:
jobs:
my_pipeline_job:
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/extract.ipynb
- task_key: transform
depends_on:
- task_key: extract
notebook_task:
notebook_path: ../src/transform.ipynb
- task_key: load
depends_on:
- task_key: transform
notebook_task:
notebook_path: ../src/load.ipynbUnit Testing
Run unit tests locally:
bash
uv run pytestDevelopment Workflow
- Validate:
databricks bundle validate --profile <profile> - Deploy:
databricks bundle deploy -t dev --profile <profile> - Run:
databricks bundle run <job_name> -t dev --profile <profile> - Check run status:
databricks jobs get-run --run-id <id> --profile <profile>
Documentation
- Lakeflow Jobs: https://docs.databricks.com/jobs
- Task types: https://docs.databricks.com/jobs/configure-task
- Databricks Asset Bundles: https://docs.databricks.com/dev-tools/bundles/examples