Dagster Integration:
Dagster + Delta Lake
Integrate your pipelines into Delta Lake.
About this integration
Delta Lake is a great storage format for Dagster workflows. With this integration, you can use the Delta Lake I/O Manager to read and write your Dagster Software-Defined Assets (SDAs).
Here are some of the benefits that Delta Lake provides Dagster users:
- Native PyArrow integration for lazy computation of large datasets
- More efficient querying with file skipping via Z Ordering and liquid clustering
- Built-in vacuuming to remove unnecessary files and versions
- ACID transactions for reliable writes
- Smooth versioning integration (versions can be use to trigger downstream updates).
- Surfacing table stats based on the file statistics
Installation
pip install dagster-deltalake
pip install dagster-deltalake-pandas
pip install dagster-deltalake-polars
About Delta Lake
Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, and Python.
Community / Partner integration:
This integration was built and is maintained by a community user or a technology partner from outside of Dagster Labs.