Back to integrations
Dagster + Databricks

Dagster + Databricks

Launch a Databricks job as a Dagster op.

About this integration

Looking to use Dagster as the orchestration layer for your Databricks analytics or AI workloads? The dagster-databricks package lets you:

  • Execute an op within a Databricks context on a cluster, such that the pyspark resource uses the cluster’s Spark instance.
  • Create a op that submits an external configurable job to Databricks using the ‘Run Now’ API.


pip install dagster-databricks


from dagster import job
from dagster_databricks import create_databricks_job_op, databricks_client

sparkpi = create_databricks_job_op().configured(
        "job": {
            "run_name": "SparkPi Python job",
            "new_cluster": {
                "spark_version": "7.3.x-scala2.12",
                "node_type_id": "i3.xlarge",
                "num_workers": 2,
            "spark_python_task": {"python_file": "dbfs:/docs/", "parameters": ["10"]},

        "databricks_client": databricks_client.configured(
            {"host": "my.workspace.url", "token": "my.access.token"}
def do_stuff():

About Databricks

Founded by the creators of Apache Spark, Databricks develops a web-based platform for working with Spark, that provides automated cluster management and IPython-style notebooks.