Back to integrations
Dagster + Delta Lake

Dagster Integration:
Dagster + Delta Lake

Integrate your pipelines into Delta Lake.

About this integration

Delta Lake is a great storage format for Dagster workflows. With this integration, you can use the Delta Lake I/O Manager to read and write your Dagster Software-Defined Assets (SDAs).

Here are some of the benefits that Delta Lake provides Dagster users:

  • Native PyArrow integration for lazy computation of large datasets
  • More efficient querying with file skipping via Z Ordering and liquid clustering
  • Built-in vacuuming to remove unnecessary files and versions
  • ACID transactions for reliable writes
  • Smooth versioning integration (versions can be use to trigger downstream updates).
  • Surfacing table stats based on the file statistics

Installation

pip install dagster-deltalake
pip install dagster-deltalake-pandas
pip install dagster-deltalake-polars

About Delta Lake

Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, and Python.


Community / Partner integration:

This integration was built and is maintained by a community user or a technology partner from outside of Dagster Labs.