Dagster is a data orchestrator for machine learning, analytics, and ETL

Build pipelines of computations written in Spark, SQL, DBT, or any other framework.

Locally develop pipelines in-process, then flexibly deploy on Kubernetes or your custom infrastructure.

Unify your view of pipelines and the tables, ML models, and other assets they produce.

Develop and test on your laptop, deploy anywhere

With Dagster’s pluggable execution, the same pipeline can run in-process, against your local file system or on a distributed work queue, against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, or deploy it on-premise or in any cloud.

Model and type the data produced and consumed by each step

Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.

Learn More >

Link data to computations

Track what’s produced by your pipelines with Dagster's Asset Manager, so you can understand how your data was generated and trace issues when it doesn’t look how you expect.

Learn More >

Build a self-service data platform

Dagster helps platform teams build systems for data practitioners. Pipelines are built from shared, reusable, configurable data processing and infrastructure components. Dagster’s web interface lets anyone inspect these objects and discover how to use them.

Avoid dependency nightmares

Dagster’s repository model lets you isolate codebases, so that problems in one pipeline don’t bring down the rest. Each pipeline can have its own package dependencies and Python version. Pipelines run in isolated processes so user code issues can't bring the system down.

Learn More >

Debug pipelines from a rich UI

Dagit, Dagster’s web interface, includes wide facilities for understanding the pipelines it orchestrates.

When inspecting a pipeline run, you can query over logs, discover the most time consuming tasks via a Gantt chart, and re-execute subsets of steps.

Dagster’s UI runs locally on your machine and can also be deployed to your production infrastructure for operational monitoring.

You’re in good company

Dagster is used to orchestrate data pipelines at some of our favorite companies. Here are a few:

Recent blog posts

Community Memo: Approachability Improvements

In the last two months, we've made a set of changes aimed at making Dagster more approachable: to smooth out its learning curve and reduce its boilerplate.

Incrementally Adopting Dagster at Mapbox

At Mapbox, we've adopted Dagster without breaking compatibility with our legacy Airflow systems -- and with huge gains to developer productivity.

Broad support for existing pipelines and deployments

Incrementally adopt Dagster by wrapping existing code into Dagster solids.