Dask | Dagster Integrations
Back to integrations
Tap into PyData libraries and Dask parallelization

Tap into PyData libraries and Dask parallelization

Dask-based executor for Dagster.

About this integration

This integration enables a dask_executor for local or distributed Dask clusters. Computation is distributed across the cluster at the execution step level which allows you to orchestrate execution of the steps in a job, not to parallelize computation within those steps.

This executor takes the compiled execution plan, and converts each execution step into a Dask Future configured with the appropriate task dependencies to ensure tasks are properly sequenced. When the job is executed, these futures are generated and then awaited by the parent Dagster process.

Installation

pip install dagster-dask

Example

# Materialize your assets with Dask
# Read the docs on Executors to learn more: https://docs.dagster.io/deployment/executors
from dagster_dask import dask_executor
from dagster import define_asset_job

executor = dask_executor.configured({
  "cluster": {
    "existing": {
      "address": "dask_scheduler.dns_name:8787",
    }
  }
})

local_dask_job = define_asset_job("local_dask_job", executor_def=executor)

About Dask

Dask is a Python library for parallel computing. Dask makes it easy to scale the Python libraries like NumPy, pandas, and scikit-learn using Dask dataframes and parallelization.