Dagster is a data orchestrator for machine learning, analytics, and ETL
Implement components in any tool, such as Pandas, Spark, SQL, or DBT.
Define your pipelines in terms of the data flow between reusable, logical components.
Test locally and run anywhere with a unified view of data pipelines and assets.
Develop and test on your laptop, deploy anywhere
With Dagster’s pluggable execution, the same pipeline can run in-process, against your local file system or on a distributed work queue, against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, or deploy it on-premise or in any cloud.
Model and type the data produced and consumed by each step
Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.Learn More >
Link data to computations
Track what’s produced by your pipelines with Dagster's Asset Manager, so you can understand how your data was generated and trace issues when it doesn’t look how you expect.Learn More >
Build a self-service data platform
Dagster helps platform teams build systems for data practitioners. Pipelines are built from shared, reusable, configurable data processing and infrastructure components. Dagster’s web interface lets anyone inspect these objects and discover how to use them.
Avoid dependency nightmares
Dagster’s repository model lets you isolate codebases, so that problems in one pipeline don’t bring down the rest. Each pipeline can have its own package dependencies and Python version. Pipelines run in isolated processes so user code issues can't bring the system down.Learn More >
Debug pipelines from a rich UI
Dagit, Dagster’s web interface, includes wide facilities for understanding the pipelines it orchestrates.
When inspecting a pipeline run, you can query over logs, discover the most time consuming tasks via a Gantt chart, and re-execute subsets of steps.
Dagster’s UI runs locally on your machine and can also be deployed to your production infrastructure for operational monitoring.
You’re in good company
Dagster is used to orchestrate data pipelines at some of our favorite companies.
Recent blog posts
Good Data at Good Eggs: Correctness and reliability for data infrastructure
Dagster’s support for custom data types helped us achieve better correctness and reliability in our data ingest process, which meant less downstream breakage and better recovery when bad data made it through.
Broad support for existing pipelines and deployments
Incrementally adopt Dagster by wrapping existing code into Dagster solids.