Databricks Multi-Environment Setup

Databricks 多环境配置

Overview

概述

Configure Databricks across development, staging, and production environments.

在开发、预发布和生产环境中配置Databricks。

Prerequisites

前提条件

Multiple Databricks workspaces (or single workspace with isolation)
Secret management solution (Databricks Secrets, Azure Key Vault, etc.)
CI/CD pipeline configured
Unity Catalog for cross-workspace governance

多个Databricks工作区（或具备隔离能力的单个工作区）
密钥管理方案（Databricks Secrets、Azure Key Vault等）
已配置CI/CD流水线
用于跨工作区治理的Unity Catalog

Environment Strategy

环境策略

Environment	Purpose	Workspace	Data	Permissions
Development	Local dev, experimentation	Shared dev workspace	Sample/synthetic	Broad access
Staging	Integration testing, UAT	Dedicated staging	Prod clone (masked)	Team access
Production	Live workloads	Dedicated prod	Real data	Restricted

环境	用途	工作区	数据	权限
开发环境	本地开发、实验	共享开发工作区	样本/合成数据	广泛访问
预发布环境	集成测试、用户验收测试（UAT）	专属预发布工作区	生产环境克隆（已脱敏）	团队访问
生产环境	实时工作负载	专属生产工作区	真实数据	受限访问

Instructions

操作步骤

Step 1: Asset Bundle Environment Configuration

步骤1：Asset Bundle环境配置

yaml

undefined

yaml

undefined

databricks.yml

bundle: name: data-platform

variables: catalog: description: Unity Catalog name default: dev_catalog schema_prefix: description: Schema prefix for isolation default: dev warehouse_size: description: SQL Warehouse size default: "2X-Small" cluster_size: description: Default cluster workers default: 1

include:

resources/*.yml

workspace: host: ${DATABRICKS_HOST}

targets:

Development - personal workspaces

dev: default: true mode: development variables: catalog: dev_catalog schema_prefix: "${workspace.current_user.short_name}" warehouse_size: "2X-Small" cluster_size: 1 workspace: root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/dev

Staging - shared test environment

staging: mode: development variables: catalog: staging_catalog schema_prefix: staging warehouse_size: "Small" cluster_size: 2 workspace: host: ${DATABRICKS_HOST_STAGING} root_path: /Shared/.bundle/${bundle.name}/staging run_as: service_principal_name: ${STAGING_SERVICE_PRINCIPAL} permissions: - level: CAN_VIEW group_name: developers - level: CAN_MANAGE_RUN group_name: data-engineers

Production - locked down

prod: mode: production variables: catalog: prod_catalog schema_prefix: prod warehouse_size: "Medium" cluster_size: 4 workspace: host: ${DATABRICKS_HOST_PROD} root_path: /Shared/.bundle/${bundle.name}/prod run_as: service_principal_name: ${PROD_SERVICE_PRINCIPAL} permissions: - level: CAN_VIEW group_name: data-consumers - level: CAN_MANAGE_RUN group_name: data-engineers - level: CAN_MANAGE service_principal_name: ${PROD_SERVICE_PRINCIPAL}

undefined

bundle: name: data-platform

variables: catalog: description: Unity Catalog name default: dev_catalog schema_prefix: description: Schema prefix for isolation default: dev warehouse_size: description: SQL Warehouse size default: "2X-Small" cluster_size: description: Default cluster workers default: 1

include:

resources/*.yml

workspace: host: ${DATABRICKS_HOST}

targets:

Development - personal workspaces

dev: default: true mode: development variables: catalog: dev_catalog schema_prefix: "${workspace.current_user.short_name}" warehouse_size: "2X-Small" cluster_size: 1 workspace: root_path: /Users/${workspace.current_user.userName}/.bundle/${bundle.name}/dev

Staging - shared test environment

staging: mode: development variables: catalog: staging_catalog schema_prefix: staging warehouse_size: "Small" cluster_size: 2 workspace: host: ${DATABRICKS_HOST_STAGING} root_path: /Shared/.bundle/${bundle.name}/staging run_as: service_principal_name: ${STAGING_SERVICE_PRINCIPAL} permissions: - level: CAN_VIEW group_name: developers - level: CAN_MANAGE_RUN group_name: data-engineers

Production - locked down

prod: mode: production variables: catalog: prod_catalog schema_prefix: prod warehouse_size: "Medium" cluster_size: 4 workspace: host: ${DATABRICKS_HOST_PROD} root_path: /Shared/.bundle/${bundle.name}/prod run_as: service_principal_name: ${PROD_SERVICE_PRINCIPAL} permissions: - level: CAN_VIEW group_name: data-consumers - level: CAN_MANAGE_RUN group_name: data-engineers - level: CAN_MANAGE service_principal_name: ${PROD_SERVICE_PRINCIPAL}

undefined

Step 2: Environment-Specific Resources

步骤2：特定环境资源配置

yaml

undefined

yaml

undefined

resources/jobs.yml

resources: jobs: etl_pipeline: name: "${bundle.name}-etl-${bundle.target}"

  # Environment-specific schedule
  schedule:
    quartz_cron_expression: >-
      ${bundle.target == "prod" ? "0 0 6 * * ?" :
        bundle.target == "staging" ? "0 0 8 * * ?" : null}
    timezone_id: "America/New_York"
    pause_status: ${bundle.target == "dev" ? "PAUSED" : "UNPAUSED"}

  # Environment-specific notifications
  email_notifications:
    on_failure: ${bundle.target == "prod" ?
      ["oncall@company.com", "pagerduty@company.pagerduty.com"] :
      ["team@company.com"]}

  # Environment-specific cluster sizing
  job_clusters:
    - job_cluster_key: etl_cluster
      new_cluster:
        spark_version: "14.3.x-scala2.12"
        node_type_id: >-
          ${bundle.target == "prod" ? "Standard_DS4_v2" : "Standard_DS3_v2"}
        num_workers: ${var.cluster_size}
        autoscale:
          min_workers: ${bundle.target == "prod" ? 2 : 1}
          max_workers: ${bundle.target == "prod" ? 10 : 4}

        # Spot instances for non-prod
        azure_attributes:
          availability: >-
            ${bundle.target == "prod" ? "ON_DEMAND_AZURE" : "SPOT_AZURE"}
          first_on_demand: 1

undefined

resources: jobs: etl_pipeline: name: "${bundle.name}-etl-${bundle.target}"

  # Environment-specific schedule
  schedule:
    quartz_cron_expression: >-
      ${bundle.target == "prod" ? "0 0 6 * * ?" :
        bundle.target == "staging" ? "0 0 8 * * ?" : null}
    timezone_id: "America/New_York"
    pause_status: ${bundle.target == "dev" ? "PAUSED" : "UNPAUSED"}

  # Environment-specific notifications
  email_notifications:
    on_failure: ${bundle.target == "prod" ?
      ["oncall@company.com", "pagerduty@company.pagerduty.com"] :
      ["team@company.com"]}

  # Environment-specific cluster sizing
  job_clusters:
    - job_cluster_key: etl_cluster
      new_cluster:
        spark_version: "14.3.x-scala2.12"
        node_type_id: >-
          ${bundle.target == "prod" ? "Standard_DS4_v2" : "Standard_DS3_v2"}
        num_workers: ${var.cluster_size}
        autoscale:
          min_workers: ${bundle.target == "prod" ? 2 : 1}
          max_workers: ${bundle.target == "prod" ? 10 : 4}

        # Spot instances for non-prod
        azure_attributes:
          availability: >-
            ${bundle.target == "prod" ? "ON_DEMAND_AZURE" : "SPOT_AZURE"}
          first_on_demand: 1

undefined

Step 3: Unity Catalog Cross-Environment Setup

步骤3：Unity Catalog跨环境配置

sql

-- Create environment-specific catalogs
CREATE CATALOG IF NOT EXISTS dev_catalog;
CREATE CATALOG IF NOT EXISTS staging_catalog;
CREATE CATALOG IF NOT EXISTS prod_catalog;

-- Grant cross-environment read access for data lineage
GRANT USAGE ON CATALOG prod_catalog TO `staging-service-principal`;
GRANT SELECT ON CATALOG prod_catalog TO `staging-service-principal`;

-- Set up data sharing between environments
CREATE SHARE IF NOT EXISTS prod_to_staging;
ALTER SHARE prod_to_staging ADD SCHEMA prod_catalog.reference;

-- Create recipient for staging workspace
CREATE RECIPIENT IF NOT EXISTS staging_workspace
  USING IDENTITY ('staging-workspace-identity');

GRANT SELECT ON SHARE prod_to_staging TO RECIPIENT staging_workspace;

sql

-- Create environment-specific catalogs
CREATE CATALOG IF NOT EXISTS dev_catalog;
CREATE CATALOG IF NOT EXISTS staging_catalog;
CREATE CATALOG IF NOT EXISTS prod_catalog;

-- Grant cross-environment read access for data lineage
GRANT USAGE ON CATALOG prod_catalog TO `staging-service-principal`;
GRANT SELECT ON CATALOG prod_catalog TO `staging-service-principal`;

-- Set up data sharing between environments
CREATE SHARE IF NOT EXISTS prod_to_staging;
ALTER SHARE prod_to_staging ADD SCHEMA prod_catalog.reference;

-- Create recipient for staging workspace
CREATE RECIPIENT IF NOT EXISTS staging_workspace
  USING IDENTITY ('staging-workspace-identity');

GRANT SELECT ON SHARE prod_to_staging TO RECIPIENT staging_workspace;

Step 4: Secret Management by Environment

步骤4：按环境管理密钥

python

undefined

python

undefined

src/config/secrets.py

from databricks.sdk import WorkspaceClient import os

class EnvironmentSecrets: """Environment-aware secret management."""

def __init__(self, environment: str = None):
    self.environment = environment or os.getenv("ENVIRONMENT", "dev")
    self.w = WorkspaceClient()
    self._secret_scope = f"{self.environment}-secrets"

def get_secret(self, key: str) -> str:
    """Get secret for current environment."""
    # In notebooks, use dbutils
    # return dbutils.secrets.get(scope=self._secret_scope, key=key)

    # Via API (for testing)
    return self.w.secrets.get_secret(
        scope=self._secret_scope,
        key=key
    ).value

def get_database_url(self) -> str:
    """Get environment-specific database URL."""
    return self.get_secret("database_url")

def get_api_key(self, service: str) -> str:
    """Get API key for service."""
    return self.get_secret(f"{service}_api_key")

from databricks.sdk import WorkspaceClient import os

class EnvironmentSecrets: """Environment-aware secret management."""

def __init__(self, environment: str = None):
    self.environment = environment or os.getenv("ENVIRONMENT", "dev")
    self.w = WorkspaceClient()
    self._secret_scope = f"{self.environment}-secrets"

def get_secret(self, key: str) -> str:
    """Get secret for current environment."""
    # In notebooks, use dbutils
    # return dbutils.secrets.get(scope=self._secret_scope, key=key)

    # Via API (for testing)
    return self.w.secrets.get_secret(
        scope=self._secret_scope,
        key=key
    ).value

def get_database_url(self) -> str:
    """Get environment-specific database URL."""
    return self.get_secret("database_url")

def get_api_key(self, service: str) -> str:
    """Get API key for service."""
    return self.get_secret(f"{service}_api_key")

Usage in notebooks

secrets = EnvironmentSecrets()

db_url = secrets.get_database_url()

undefined

undefined

Step 5: Environment Detection

步骤5：环境检测

python

undefined

python

undefined

src/config/environment.py

import os from dataclasses import dataclass from typing import Optional

@dataclass class EnvironmentConfig: """Environment-specific configuration.""" name: str catalog: str schema_prefix: str is_production: bool debug_enabled: bool max_cluster_size: int

def detect_environment() -> EnvironmentConfig: """Detect current environment from context.""" # From environment variable (set by Asset Bundles) env = os.getenv("ENVIRONMENT", "dev")

# Or from Databricks tags
# spark.conf.get("spark.databricks.tags.Environment")

configs = {
    "dev": EnvironmentConfig(
        name="dev",
        catalog="dev_catalog",
        schema_prefix="dev",
        is_production=False,
        debug_enabled=True,
        max_cluster_size=4,
    ),
    "staging": EnvironmentConfig(
        name="staging",
        catalog="staging_catalog",
        schema_prefix="staging",
        is_production=False,
        debug_enabled=True,
        max_cluster_size=8,
    ),
    "prod": EnvironmentConfig(
        name="prod",
        catalog="prod_catalog",
        schema_prefix="prod",
        is_production=True,
        debug_enabled=False,
        max_cluster_size=20,
    ),
}

return configs.get(env, configs["dev"])

import os from dataclasses import dataclass from typing import Optional

@dataclass class EnvironmentConfig: """Environment-specific configuration.""" name: str catalog: str schema_prefix: str is_production: bool debug_enabled: bool max_cluster_size: int

def detect_environment() -> EnvironmentConfig: """Detect current environment from context.""" # From environment variable (set by Asset Bundles) env = os.getenv("ENVIRONMENT", "dev")

# Or from Databricks tags
# spark.conf.get("spark.databricks.tags.Environment")

configs = {
    "dev": EnvironmentConfig(
        name="dev",
        catalog="dev_catalog",
        schema_prefix="dev",
        is_production=False,
        debug_enabled=True,
        max_cluster_size=4,
    ),
    "staging": EnvironmentConfig(
        name="staging",
        catalog="staging_catalog",
        schema_prefix="staging",
        is_production=False,
        debug_enabled=True,
        max_cluster_size=8,
    ),
    "prod": EnvironmentConfig(
        name="prod",
        catalog="prod_catalog",
        schema_prefix="prod",
        is_production=True,
        debug_enabled=False,
        max_cluster_size=20,
    ),
}

return configs.get(env, configs["dev"])

Usage

env = detect_environment() full_table_name = f"{env.catalog}.{env.schema_prefix}_sales.orders"

undefined

env = detect_environment() full_table_name = f"{env.catalog}.{env.schema_prefix}_sales.orders"

undefined

Step 6: Environment Promotion Pipeline

步骤6：环境升级流水线

python

undefined

python

undefined

scripts/promote_to_prod.py

from databricks.sdk import WorkspaceClient import subprocess

def promote_to_production( version_tag: str, dry_run: bool = True, ) -> dict: """ Promote code from staging to production.

Steps:
1. Verify staging tests passed
2. Tag release in git
3. Deploy to production
4. Run smoke tests
5. Enable schedules
"""
results = {"steps": []}

# 1. Verify staging tests
print("Verifying staging tests...")
staging_result = subprocess.run(
    ["databricks", "bundle", "run", "-t", "staging", "integration-tests"],
    capture_output=True
)
if staging_result.returncode != 0:
    raise Exception("Staging tests failed")
results["steps"].append({"stage": "verify_staging", "status": "passed"})

# 2. Tag release
print(f"Tagging release {version_tag}...")
if not dry_run:
    subprocess.run(["git", "tag", version_tag])
    subprocess.run(["git", "push", "origin", version_tag])
results["steps"].append({"stage": "tag_release", "status": "done" if not dry_run else "skipped"})

# 3. Deploy to production
print("Deploying to production...")
if not dry_run:
    subprocess.run(["databricks", "bundle", "deploy", "-t", "prod"])
results["steps"].append({"stage": "deploy_prod", "status": "done" if not dry_run else "skipped"})

# 4. Run smoke tests
print("Running smoke tests...")
if not dry_run:
    subprocess.run(["databricks", "bundle", "run", "-t", "prod", "smoke-tests"])
results["steps"].append({"stage": "smoke_tests", "status": "done" if not dry_run else "skipped"})

return results

if name == "main": import argparse parser = argparse.ArgumentParser() parser.add_argument("--version", required=True) parser.add_argument("--dry-run", action="store_true") args = parser.parse_args()

result = promote_to_production(args.version, args.dry_run)
print(result)

undefined

from databricks.sdk import WorkspaceClient import subprocess

def promote_to_production( version_tag: str, dry_run: bool = True, ) -> dict: """ Promote code from staging to production.

Steps:
1. Verify staging tests passed
2. Tag release in git
3. Deploy to production
4. Run smoke tests
5. Enable schedules
"""
results = {"steps": []}

# 1. Verify staging tests
print("Verifying staging tests...")
staging_result = subprocess.run(
    ["databricks", "bundle", "run", "-t", "staging", "integration-tests"],
    capture_output=True
)
if staging_result.returncode != 0:
    raise Exception("Staging tests failed")
results["steps"].append({"stage": "verify_staging", "status": "passed"})

# 2. Tag release
print(f"Tagging release {version_tag}...")
if not dry_run:
    subprocess.run(["git", "tag", version_tag])
    subprocess.run(["git", "push", "origin", version_tag])
results["steps"].append({"stage": "tag_release", "status": "done" if not dry_run else "skipped"})

# 3. Deploy to production
print("Deploying to production...")
if not dry_run:
    subprocess.run(["databricks", "bundle", "deploy", "-t", "prod"])
results["steps"].append({"stage": "deploy_prod", "status": "done" if not dry_run else "skipped"})

# 4. Run smoke tests
print("Running smoke tests...")
if not dry_run:
    subprocess.run(["databricks", "bundle", "run", "-t", "prod", "smoke-tests"])
results["steps"].append({"stage": "smoke_tests", "status": "done" if not dry_run else "skipped"})

return results

if name == "main": import argparse parser = argparse.ArgumentParser() parser.add_argument("--version", required=True) parser.add_argument("--dry-run", action="store_true") args = parser.parse_args()

result = promote_to_production(args.version, args.dry_run)
print(result)

undefined

Output

输出结果

Multi-environment Asset Bundle configuration
Environment-specific job settings
Cross-environment Unity Catalog setup
Secret management by environment
Promotion pipeline

多环境Asset Bundle配置
特定环境任务设置
Unity Catalog跨环境配置
按环境管理密钥
环境升级流水线

Error Handling

错误处理

Issue	Cause	Solution
Wrong environment	Missing env var	Check ENVIRONMENT variable
Secret not found	Wrong scope	Verify scope name matches environment
Permission denied	Missing grants	Add Unity Catalog grants
Config mismatch	Target override issue	Check bundle target syntax

问题	原因	解决方案
环境识别错误	缺少环境变量	检查ENVIRONMENT变量配置
密钥未找到	密钥范围错误	验证范围名称与环境是否匹配
权限拒绝	缺少授权	添加Unity Catalog权限
配置不匹配	目标覆盖问题	检查bundle目标语法

Examples

示例

Quick Environment Check

快速环境检查

python

env = detect_environment()
print(f"Running in {env.name} environment")
print(f"Using catalog: {env.catalog}")
print(f"Production mode: {env.is_production}")

python

env = detect_environment()
print(f"Running in {env.name} environment")
print(f"Using catalog: {env.catalog}")
print(f"Production mode: {env.is_production}")

Terraform Multi-Workspace

Terraform多工作区配置

hcl

undefined

hcl

undefined

infrastructure/terraform/main.tf

locals { environments = { dev = { workspace_name = "data-platform-dev" sku = "premium" } staging = { workspace_name = "data-platform-staging" sku = "premium" } prod = { workspace_name = "data-platform-prod" sku = "premium" } } }

resource "azurerm_databricks_workspace" "workspace" { for_each = local.environments

name = each.value.workspace_name resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location sku = each.value.sku

tags = { environment = each.key managed_by = "terraform" } }

undefined

locals { environments = { dev = { workspace_name = "data-platform-dev" sku = "premium" } staging = { workspace_name = "data-platform-staging" sku = "premium" } prod = { workspace_name = "data-platform-prod" sku = "premium" } } }

resource "azurerm_databricks_workspace" "workspace" { for_each = local.environments

name = each.value.workspace_name resource_group_name = azurerm_resource_group.main.name location = azurerm_resource_group.main.location sku = each.value.sku

tags = { environment = each.key managed_by = "terraform" } }

undefined

Resources

参考资源

Next Steps

后续步骤

For observability setup, see

databricks-observability

.

如需配置可观测性，请查看

databricks-observability

文档。

databricks-multi-env-setup

Original

Translation

Databricks Multi-Environment Setup

Databricks 多环境配置

Overview

概述

Prerequisites

前提条件

Environment Strategy

环境策略

Instructions

操作步骤

Step 1: Asset Bundle Environment Configuration

步骤1：Asset Bundle环境配置

databricks.yml

databricks.yml

Development - personal workspaces

Staging - shared test environment

Production - locked down

Development - personal workspaces

Staging - shared test environment

Production - locked down

Step 2: Environment-Specific Resources

步骤2：特定环境资源配置

resources/jobs.yml

resources/jobs.yml

Step 3: Unity Catalog Cross-Environment Setup

步骤3：Unity Catalog跨环境配置

Step 4: Secret Management by Environment

步骤4：按环境管理密钥

src/config/secrets.py

src/config/secrets.py

Usage in notebooks

Usage in notebooks

secrets = EnvironmentSecrets()

secrets = EnvironmentSecrets()

db_url = secrets.get_database_url()

db_url = secrets.get_database_url()

Step 5: Environment Detection

步骤5：环境检测

src/config/environment.py

src/config/environment.py

Usage

Usage

Step 6: Environment Promotion Pipeline

步骤6：环境升级流水线

scripts/promote_to_prod.py

scripts/promote_to_prod.py

Output

输出结果

Error Handling

错误处理

Examples

示例

Quick Environment Check

快速环境检查

Terraform Multi-Workspace

Terraform多工作区配置

infrastructure/terraform/main.tf

infrastructure/terraform/main.tf

Resources

参考资源

Next Steps

后续步骤