PySpark | Dagster Integrations
Back to integrations
Dagster + PySpark

Dagster + PySpark

Scale up data processing by executing PySpark code within Dagster.

About this integration

This resource provides access to a PySpark SparkSession for executing PySpark code within Dagster.

Installation

pip install dagster-pyspark

Example

See the with_pyspark_emr example project.

About PySpark

PySpark is the Python API for Apache Spark, a distributed framework and set of libraries for real-time, large-scale data processing. PySpark allows you to create more scalable analyses and data pipelines.

The PySpark integration enables: