Dagster is a data orchestrator for machine learning, analytics, and ETL
Build pipelines of computations written in Spark, SQL, DBT, or any other framework.
Locally develop pipelines in-process, then flexibly deploy on Kubernetes or your custom infrastructure.
Unify your view of pipelines and the tables, ML models, and other assets they produce.
Develop and test on your laptop, deploy anywhere
With Dagster’s pluggable execution, the same pipeline can run in-process, against your local file system or on a distributed work queue, against your production data lake. You can set up Dagster’s web interface in a minute on your laptop, or deploy it on-premise or in any cloud.
Model and type the data produced and consumed by each step
Dagster models data dependencies between steps in your orchestration graph and handles passing data between them. Optional typing on inputs and outputs helps catch bugs early.Learn More >
Link data to computations
Track what’s produced by your pipelines with Dagster's Asset Manager, so you can understand how your data was generated and trace issues when it doesn’t look how you expect.Learn More >
Build a self-service data platform
Dagster helps platform teams build systems for data practitioners. Pipelines are built from shared, reusable, configurable data processing and infrastructure components. Dagster’s web interface lets anyone inspect these objects and discover how to use them.
Avoid dependency nightmares
Dagster’s repository model lets you isolate codebases, so that problems in one pipeline don’t bring down the rest. Each pipeline can have its own package dependencies and Python version. Pipelines run in isolated processes so user code issues can't bring the system down.Learn More >
Debug pipelines from a rich UI
Dagit, Dagster’s web interface, includes wide facilities for understanding the pipelines it orchestrates.
When inspecting a pipeline run, you can query over logs, discover the most time consuming tasks via a Gantt chart, and re-execute subsets of steps.
Dagster’s UI runs locally on your machine and can also be deployed to your production infrastructure for operational monitoring.
You’re in good company
Dagster is used to orchestrate data pipelines at some of our favorite companies. Here are a few:
Recent blog posts
Dagster 0.12.0: Into the Groove
In 0.12.0, we introduce pipeline failure sensors, solid-level retries, and more convenient testing APIs.
Broad support for existing pipelines and deployments
Incrementally adopt Dagster by wrapping existing code into Dagster solids.