Dagster is a data orchestrator for machine learning, analytics, and ETL
Implement components in any tool, such as Pandas, Spark, SQL, or DBT.
Define your pipelines in terms of the data flow between reusable, logical components.
Test locally and run anywhere with a unified view of data pipelines and assets.
Develop and test on your laptop, deploy anywhere
With Dagster’s pluggable execution, the same pipeline can run in-process, against your local file system or on a distributed work queue, against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, or deploy it on-premise or in any cloud.
Model and type the data produced and consumed by each step
Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.Learn More >
Link data to computations
Track what’s produced by your pipelines with Dagster's Asset Manager, so you can understand how your data was generated and trace issues when it doesn’t look how you expect.Learn More >
Build a self-service data platform
Dagster helps platform teams build systems for data practitioners. Pipelines are built from shared, reusable, configurable data processing and infrastructure components. Dagster’s web interface lets anyone inspect these objects and discover how to use them.
Avoid dependency nightmares
Dagster’s repository model lets you isolate codebases, so that problems in one pipeline don’t bring down the rest. Each pipeline can have its own package dependencies and Python version. Pipelines run in isolated processes so user code issues can't bring the system down.Learn More >
Debug pipelines from a rich UI
Dagit, Dagster’s web interface, includes wide facilities for understanding the pipelines it orchestrates.
When inspecting a pipeline run, you can query over logs, discover the most time consuming tasks via a Gantt chart, and re-execute subsets of steps.
Dagster’s UI runs locally on your machine and can also be deployed to your production infrastructure for operational monitoring.
You’re in good company
Dagster is used to orchestrate data pipelines at some of our favorite companies. Here are a few:
Recent blog posts
Good Data at Good Eggs: Data observability with the asset catalog
What we’re aiming for with Dagster is a completely horizontal view of our data assets. Our analysts will be able to look up when a raw data ingest from Stitch occurred, when a dbt model ran, or when a plot was generated by a Jupyter notebook and posted in Slack, through a single portal — a single "pane of glass."
Broad support for existing pipelines and deployments
Incrementally adopt Dagster by wrapping existing code into Dagster solids.