August 1, 2023 • 4 minute read •
Orchestrating dbt™ with Dagster
- Name
- Rex Ledesma
- Handle
- @_rexledesma
- Name
- Sandy Ryza
- Handle
- @s_ryz
dbt™ has become an industry standard for structuring SQL transformations within a warehouse. But teams using dbt still struggle with orchestration: scheduling their dbt models and running them in step with the other data assets in their data platforms.
We first presented Dagster’s dbt integration in October 2020. Since then, Dagster + dbt usage has ramped up, and now over half of Dagster users run dbt models as part of their data pipelines. In Dagster’s latest release – 1.4 – we’ve put a heavy focus on Dagster’s dbt integration, making it both more flexible and easier to get started. With the latest improvements, we believe Dagster is far and away the best way to orchestrate dbt.
This isn’t because Dagster has more features than the alternatives, although it has a wide set of capabilities. It’s because Dagster’s core design principles go really well together with dbt. The similarities between the way that Dagster thinks about data pipelines and the way that dbt thinks about data pipelines means that Dagster can orchestrate dbt much more faithfully than other general-purpose orchestrators like Airflow.
At the same time, Dagster is able to compensate for dbt’s biggest limitations. dbt is rarely used in a vacuum: the data transformed using dbt needs to come from somewhere and go somewhere. When a data platform needs more than just dbt, Dagster is a better fit than dbt-specific orchestrators, like the job scheduling system inside dbt Cloud.
Why Dagster is the best way to orchestrate dbt
Looking to migrate off dbt Cloud? Check out the step-by-step migration guide.
Dagster and dbt share a mental model
One of the elements of dbt that makes it so intuitive and powerful is that it centers on data assets. When you build a data pipeline using dbt, each model you define is one of the tables or intermediate datasets that make up your pipeline.
From the very beginning, you’re thinking about the data products that your pipeline is there to support. Data lineage comes automatically because the references between your tables are part of how you define your data pipeline.
Like dbt, Dagster puts data assets at the center. Dagster pipelines are graphs of connected data assets.
This means that Dagster can understand a dbt project at a really deep level. When you use Dagster’s dbt integration to load your dbt project into Dagster, you get a faithful representation of your dbt models and the connections between them, inside Dagster.
Unlike other orchestrators, Dagster doesn’t need to run each dbt model in a separate task, which incurs a lot of overhead. Dagster can execute a full dbt project (or project sub-selection) with a single invocation of the dbt CLI, but still provide observability at the level of individual models.
Automatically load your models as Dagster assets
by importing your dbt project in Dagster Cloud.
Work beyond dbt: cross-technology, cross-team collaboration
Beyond understanding and enhancing your dbt work, Dagster provides the framework needed by teams working with dbt as part of a larger data platform.
A Dagster data asset can be a dbt model, but can equally be:
- A table ingested using a tool like Fivetran, Stitch, or Airbyte
- A machine learning model
- A dataset of images
- A file
You can compute Dagster assets using any Python code, running on any platform.
This means that you can build Dagster pipelines that connect the models in your dbt project to these other kinds of data assets, allowing you to orchestrate and track lineage across your entire organization.
For example, you might have a machine learning team that trains models using data that’s transformed using dbt. Dagster makes it easy to kick off ML training and inference based on changes to dbt models, and it can render the lineage between the ML models and the dbt models they depend on. Dagster was built from the start for organization-wide collaboration and execution at the enterprise scale. It has a set of abstractions that make it easy to scope what you’re doing to the parts of the pipeline that you own, but zoom out to the entire asset graph spanning multiple data teams when you need to.
A full orchestration feature set
Dagster provides a greater depth of orchestration features when compared to dbt-specific tools like dbt Cloud. Orchestration has been Dagster’s primary job since its inception, and over that time, it’s grown to handle a very long tail of orchestration needs. Below is a sample:
Flexible scheduling
Observability
Partitioning
Alerting
Self-deployment will always be an option
With that said, we are strongly committed to building a true open-source solution without compromise. We have shared our high-level roadmap and have supportive investors who backed Elementl with this understanding.
Unlike dbt Cloud, a proprietary cloud service with no open-source equivalent, you will always have the option to take your Dagster pipelines and deploy them on your own, including the full scheduler and UI, using the open-source project.
Why not orchestrate dbt with dbt Cloud?
We believe that dbt Cloud is a good enough solution for small analytics engineering teams who need basic scheduling of jobs and alerts without any integration needs.
However, as teams grow and scale, dbt Cloud is no longer a viable solution. dbt Cloud does not scale well as organizations grow in complexity. To truly work cross-collaboratively across many technologies, a full-featured orchestrator is non-negotiable.
Only true orchestrators can have the requisite context on your source systems in order to schedule downstream dbt models with confidence, with flexible scheduling options that ensure your data is materialized to meet your SLAs.
Dagster | dbt Cloud | |
---|---|---|
Asset-aware | Yes | Yes |
Cron-based scheduling | Yes | Yes |
Source-aware | Yes | A little (source freshness tests) |
Full-featured orchestration | Yes | No |
Flexible Scheduling Options | Yes | No |
Native Asset Observability | Yes | No |
Partitioned Data Support | Yes | Yes (through incremental models) |
Powerful dynamic alerting | Yes | No |
Help guide our dbt work
The new features rolling out and those planned were guided and influenced by members of the Dagster community. We would love to have more perspectives in the mix. You can join us on Slack or pitch in on our GitHub issues to help guide our work and define the developer experience you would like to see. We hope to see you on our channels soon!
We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!
Follow us:
There is no affiliation between Elementl and dbt Labs.
Running Singer on Dagster
- Name
- Fraser Marlow
- Handle
- @frasermarlow
Orchestrate Unstructured Data Pipelines with Dagster and dlt
- Name
- Zaeem Athar
- Handle
- @zaeem
Parallel Computing on Dagster with Dask
- Name
- Odette Harary
- Handle
- @odette