Orchestrating dbt™ with Dagster | Dagster Blog

August 1, 20234 minute read

Orchestrating dbt™ with Dagster

Orchestrate dbt with Dagster’s popular dbt integration, now with major enhancements to supercharge your dbt models as part of your data pipeline.
Rex Ledesma
Name
Rex Ledesma
Handle
@_rexledesma
Sandy Ryza
Name
Sandy Ryza
Handle
@s_ryz

dbt™ has become an industry standard for structuring SQL transformations within a warehouse. But teams using dbt still struggle with orchestration: scheduling their dbt models and running them in step with the other data assets in their data platforms.

We first presented Dagster’s dbt integration in October 2020. Since then, Dagster + dbt usage has ramped up, and now over half of Dagster users run dbt models as part of their data pipelines. In Dagster’s latest release – 1.4 – we’ve put a heavy focus on Dagster’s dbt integration, making it both more flexible and easier to get started. With the latest improvements, we believe Dagster is far and away the best way to orchestrate dbt.

Neelesh Salian on twitter: Turns out the people building a better dbt experience might just end up being dagster

This isn’t because Dagster has more features than the alternatives, although it has a wide set of capabilities. It’s because Dagster’s core design principles go really well together with dbt. The similarities between the way that Dagster thinks about data pipelines and the way that dbt thinks about data pipelines means that Dagster can orchestrate dbt much more faithfully than other general-purpose orchestrators like Airflow.

At the same time, Dagster is able to compensate for dbt’s biggest limitations. dbt is rarely used in a vacuum: the data transformed using dbt needs to come from somewhere and go somewhere. When a data platform needs more than just dbt, Dagster is a better fit than dbt-specific orchestrators, like the job scheduling system inside dbt Cloud.

Why Dagster is the best way to orchestrate dbt

Looking to migrate off dbt Cloud? Check out the step-by-step migration guide.

Check out the migration guide
.

Dagster and dbt share a mental model

One of the elements of dbt that makes it so intuitive and powerful is that it centers on data assets. When you build a data pipeline using dbt, each model you define is one of the tables or intermediate datasets that make up your pipeline.

From the very beginning, you’re thinking about the data products that your pipeline is there to support. Data lineage comes automatically because the references between your tables are part of how you define your data pipeline.

Like dbt, Dagster puts data assets at the center. Dagster pipelines are graphs of connected data assets.

This means that Dagster can understand a dbt project at a really deep level. When you use Dagster’s dbt integration to load your dbt project into Dagster, you get a faithful representation of your dbt models and the connections between them, inside Dagster.

Unlike other orchestrators, Dagster doesn’t need to run each dbt model in a separate task, which incurs a lot of overhead. Dagster can execute a full dbt project (or project sub-selection) with a single invocation of the dbt CLI, but still provide observability at the level of individual models.

Automatically load your models as Dagster assets
by importing your dbt project in Dagster Cloud.

Check out Dagster Cloud

Work beyond dbt: cross-technology, cross-team collaboration

Beyond understanding and enhancing your dbt work, Dagster provides the framework needed by teams working with dbt as part of a larger data platform.

A Dagster data asset can be a dbt model, but can equally be:

  • A table ingested using a tool like Fivetran, Stitch, or Airbyte
  • A machine learning model
  • A dataset of images
  • A file

You can compute Dagster assets using any Python code, running on any platform.

This means that you can build Dagster pipelines that connect the models in your dbt project to these other kinds of data assets, allowing you to orchestrate and track lineage across your entire organization.

For example, you might have a machine learning team that trains models using data that’s transformed using dbt. Dagster makes it easy to kick off ML training and inference based on changes to dbt models, and it can render the lineage between the ML models and the dbt models they depend on. Dagster was built from the start for organization-wide collaboration and execution at the enterprise scale. It has a set of abstractions that make it easy to scope what you’re doing to the parts of the pipeline that you own, but zoom out to the entire asset graph spanning multiple data teams when you need to.

A full orchestration feature set

Dagster provides a greater depth of orchestration features when compared to dbt-specific tools like dbt Cloud. Orchestration has been Dagster’s primary job since its inception, and over that time, it’s grown to handle a very long tail of orchestration needs. Below is a sample:

Flexible scheduling

To determine when to run your dbt models, you often need to rely on logic that’s specific to your use case. For example, you might have a particular way to check whether new source data has arrived or need to incorporate a specific business calendar into your scheduling. In Dagster, you can write arbitrary Python code that triggers runs of your dbt models to tailor your scheduling to your circumstances.

Observability

Dagster offers granular observability and operational tooling. For each dbt model in your pipeline, you can track when it fails, every time in the past that it ran, and pick up where you left off after fixing problems.

Partitioning

Dagster has rich support for partitions. Partitioning allows you to update your data without dropping and recreating the entire table and simultaneously maintain an interpretable record of the status of your asset.

Alerting

Dagster offers general-purpose alerting. You can run arbitrary Python logic whenever one of your runs fails.

Self-deployment will always be an option

Dagster does have a commercial offering in Dagster Cloud, and it is an option that drives a lot of value for teams looking to offload operational concerns and tap into some features that, as Nick Schrock spelled out in "The Open Core Business Model", would not make sense to build into an open-source solution.

With that said, we are strongly committed to building a true open-source solution without compromise. We have shared our high-level roadmap and have supportive investors who backed Elementl with this understanding.

Unlike dbt Cloud, a proprietary cloud service with no open-source equivalent, you will always have the option to take your Dagster pipelines and deploy them on your own, including the full scheduler and UI, using the open-source project.

Why not orchestrate dbt with dbt Cloud?

We believe that dbt Cloud is a good enough solution for small analytics engineering teams who need basic scheduling of jobs and alerts without any integration needs.

However, as teams grow and scale, dbt Cloud is no longer a viable solution. dbt Cloud does not scale well as organizations grow in complexity. To truly work cross-collaboratively across many technologies, a full-featured orchestrator is non-negotiable.

Only true orchestrators can have the requisite context on your source systems in order to schedule downstream dbt models with confidence, with flexible scheduling options that ensure your data is materialized to meet your SLAs.

Dagsterdbt Cloud
 Asset-awareYesYes
 Cron-based schedulingYesYes
 Source-awareYesA little (source freshness tests)
 Full-featured orchestrationYesNo
 Flexible Scheduling OptionsYesNo
 Native Asset ObservabilityYesNo
 Partitioned Data SupportYesYes (through incremental models)
 Powerful dynamic alertingYesNo

Help guide our dbt work

The new features rolling out and those planned were guided and influenced by members of the Dagster community. We would love to have more perspectives in the mix. You can join us on Slack or pitch in on our GitHub issues to help guide our work and define the developer experience you would like to see. We hope to see you on our channels soon!


View the Aug 2nd, 2023 event in which the Dagster team reviewed the new functionality. Individual video chapters can be found here.

The Dagster Labs logo

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Follow us:

dbt™, dbt Labs™, dbt Cloud™ and the dbt™ logo are all trademarks of dbt Labs™.
There is no affiliation between Elementl and dbt Labs.

Read more filed under
Blog post category for Integration. Integration