August 29, 2023 • 6 minute read •
Migrating off dbt Cloud™
- Name
- Tim Castillo
- Handle
- @tims_tangents
- Name
- Claire Lin
- Handle
dbt Cloud™ is a managed service for dbt-core
offered by their maintainers, dbt Labs™. It comes with a variety of features that make it easy to get started with dbt, such as a scheduler, a hosted IDE, and a documentation site.
If you're currently using dbt Cloud, you may be looking for alternatives. But how do you get started replacing the functionality that dbt Cloud provides?
Recently, many organizations are moving their data transformation tasks off dbt Cloud because its functionality no longer meets their needs. Here are some of the reasons why users might migrate off of dbt Cloud:
- Meet more demanding requirements: It satisfied their needs as an early data team, but now they have more data, SLAs, and complexity.
- Scale their dbt projects: Multiple teams have built large dbt projects that depend on each other.
- Unify their stack: They must also orchestrate their entire ELT/ETL pipeline.
If you’re interested in learning how to migrate from dbt Cloud, this blog post will show you how to use the free and the free and open-source version of dbt-core
and orchestrate it with Dagster. We believe that Dagster is the best way to orchestrate your dbt projects, and this blog post shows you how to get started running your dbt projects in Dagster.
Here, we are expediting the migration process by using Dagster Cloud, but most things in this post can also be applied to running your dbt project on open-source Dagster.
We provide a summary sheet here.
What about those dbt Cloud exclusive features?
Besides the lightweight scheduler there are other dbt Cloud features that you might be using. We’ll also provide resources on how you can use other tools in the community and ecosystem to:
- Set up your IDE for dbt development
- Host your documentation
- Get alerts after runs
- Create development, staging, and production environments
In this article:
Move your dbt project to Dagster
dbt is one of the most commonly used tools with Dagster, and Dagster’s dbt integration leverages the best of both tools to run your data pipelines. We shared a recent update on Dagster's powerful features for orchestrating dbt in the blog post "Orchestrating dbt™ with Dagster."
In addition to the integration, utilities exist to add Dagster to your dbt project easily. Notably, Dagster Cloud makes it extra easy. You don’t need to have Python installed or understand how Dagster works when you’re getting started. It’s as simple as telling Dagster Cloud where your dbt project’s repository is.
For an in-depth walkthrough on how to move your dbt project to Dagster Cloud, watch the Loom below:
Importing your dbt project to Dagster Cloud
In the video above, we outline how to run your dbt project on Dagster Cloud quickly. Let’s dive into the details.
Dagster Cloud has a curated onboarding experience for new users. When signing up for an account, you’ll have a list of sample projects to get started.
Run this project on Dagster Cloud: free for 30 days.
Click on the Import a dbt project tab, and you’ll be guided through a four-step workflow.
- Connect Dagster Cloud to your GitHub or GitLab organization.
- Select the repository that your dbt project is in and grant access to Dagster Cloud to pull the code. Dagster Cloud creates a pull request to add a Dagster project to your repository and sets up a Branch Deployment.
- Add secrets that Dagster will need to run dbt against your data warehouse.
- Merge the pull request and run your dbt project in production.
In step 2, Dagster Cloud will create a Branch Deployment for you to test and verify that your dbt project is loaded correctly. Branch Deployments are temporary deployments of your project that exist until the pull request they’re associated with is merged. If you encounter any issues with Dagster loading your dbt project, you can tinker around with your project and configurations in the Branch Deployment until your dbt project is successfully imported.
If your dbt project depends on environment variables to access your data warehouse like our example project does, you should add each environment variable to your repository's secrets as well as Dagster Cloud's secrets. Also, the environment variables should be referenced in your CI/CD action.
After adding your environment variables and deploying to production, you should be able to run your dbt models in Dagster!
Running your dbt project
Now that you’ve created a new Dagster project and integrated your dbt project with it, you can run your data pipelines. Dagster and dbt both use the word materialization to describe how your asset (or dbt models) are run and persisted in storage. Therefore, we'll be materializing your dbt models in Dagster.
Go to your Dagster Cloud deployment. In the top right-hand corner above your asset graph, click Materialize all if you want to execute all models. Alternatively, click on one or more models (while holding ⌘ Command or Ctrl control ) to select multiple dbt models. Once you have your selection, click the Materialize selected button.
After clicking that button, Dagster will start a run and notify you through a banner at the top of the screen. Click the link at the top to take you to Dagster’s detailed overview of your run’s progress and a bar representing dbt’s progress. Dagster combines the models run into one command to efficiently run dbt with minimal overhead and an experience consistent with running dbt locally.
You’ll also find Dagster’s structured and searchable logs on this page. These logs will show you the dbt command output you’re familiar with but enriched with Dagster's structured logging system. By using the search bar and filtering by categories, you can search for an individual model, how long a model took to finish, or which tests passed.
Scheduling your dbt runs
After manually running your dbt project in Dagster, the next step to migrate from dbt Cloud is to schedule your project. Thankfully, the utility that generated your complementary Dagster project also created a simple schedule to run your dbt project.
Open your repository’s schedules.py
file in your Dagster project. Like in dbt Cloud, you can create jobs to define cron schedules and use dbt’s selection syntax to choose which models to run. By default, this schedule is commented out, but you can uncomment it and push the changes to your repository's main
branch to see the schedule in production.
Other functionality
Besides orchestrating dbt projects, dbt Cloud has other features that departing users may miss. Now that your dbt project is running in production with Dagster, let’s learn how to stick as close as possible to your existing workflows and processes.
Using an IDE that improves your dbt experience
dbt Cloud has a specialized IDE that is optimized for the dbt development workflow. This dbt Cloud UI lets you run dbt models and view documentation inline. It is great for onboarding data analysts new to dbt.
However, the dbt community has made amazing strides in developing alternatives, improving the dbt development experience for everyone.
First, we recommend using a free IDE like Visual Studio Code (VS Code). If you want to try this out, we recommend starting with GitHub Codespaces, which gives you a private Visual Studio Code IDE in the cloud for a generous amount of free usage.
VS Code has a feature called Dev Containers that will set up your IDE exactly to match a pre-defined configuration. Here are Dev Containers that make it easy to start developing dbt models in VS Code:
- https://github.com/panasenco/dbt-devcontainer-demo-template
- https://github.com/davidgasquez/dbt-devcontainer
- dbt Labs has an official tutorial on setting up a Codespace with a Dev Container
Once you’ve set up VS Code, you'll have unlocked extensions that will improve your developer experience and make writing code easier. In particular, the dbt community has released an extension called dbt Power User that curates the VS Code experience for dbt model development and replicates much of the functionality of dbt Cloud. After setting up dbt Power User, you can run your models, view your lineage, and push them to your repository all from the VS Code UI and never have to touch the terminal.
Automating your dbt documentation
One of dbt’s greatest features is the automated data documentation site that users get out of the box. dbt Cloud made it easy by hosting the docs on your behalf.
Luckily, Dagster helps make it easier to host your own dbt docs.
Around line 20 in your .github/workflows/deploy.yml
file, you’ll see where Dagster loads your dbt project. You can add your own GitHub Action steps after that to perform more tasks, such as generating your docs site. Below is a sample code snippet that you can use as a reference to build your dbt docs site. The snippet will compile an index.html
that can be directly uploaded without starting a process like dbt docs serve
.
steps:
- name: Build
run: dbt docs generate --no-compile --empty-catalog
shell: bash
- name: Generate the docs
run: |
import json
search_str = 'o=[i("manifest","manifest.json"+t),i("catalog","catalog.json"+t)]'
with open('target/index.html', 'r') as f:
content_index = f.read()
with open('target/manifest.json', 'r') as f:
json_manifest = json.loads(f.read())
with open('target/catalog.json', 'r') as f:
json_catalog = json.loads(f.read())
with open('index.html', 'w') as f:
new_str = "o=[{label: 'manifest', data: "+json.dumps(json_manifest)+"},{label: 'catalog', data: "+json.dumps(json_catalog)+"}]"
new_content = content_index.replace(search_str, new_str)
f.write(new_content)
shell: python
In the current working directory, you’ll have an index.html
that you can use to programmatically deploy your docs wherever you want, such as an AWS S3 bucket, GCS, Vercel, or Netlify. All of these services allow for basic authentication, so you can host your docs site securely.
Once you’ve become more accustomed to your new orchestrator, you can embed the metadata from the docs site directly into your Dagster assets or radically speed up the performance of your dbt data documentation site.
Working in different environments
dbt Cloud enables people to work in three different types of environments:
- in development with the dbt Cloud IDE,
- in a temporary deployment for each pull request, and
- in production.
Dagster Cloud follows a similar model. Users can make changes and run Dagster locally with dagster dev
. This is a full Dagster instance that contains all the functionality of Dagster, including the ability to run dbt models. Aside from running your dbt project locally, can also use Dagster to do full code reviews and test that your dbt models integrate properly before pushing them to your repository.
After making a pull request, Dagster Cloud will create a temporary Dagster environment called a Branch Deployment. Exclusive to Dagster Cloud, this deployment can run jobs, test dbt models, and simulate what your pipeline will experience in production.
Once pull requests are merged, Dagster Cloud will automatically re-deploy your project for you. If you followed this guide and used Dagster Cloud’s dbt onboarding experience, every merged pull request for your dbt project will update your Dagster project automatically.
Branch deployments are a flexible and powerful environment for the developer experience. Once you’ve become more comfortable, you can learn how to:
- Test against production-scale data with Snowflake’s zero-copy clones
- Branch and merge the changes you make to your data with LakeFS
Alerting your stakeholders
A feature of dbt Cloud is the ability to send an email or Slack notification when a run finishes.
With Dagster Cloud, you can easily set up similar notifications, based on events in your pipeline. You can send alerts to Datadog, customize your email, refresh your BI tool’s dashboards, and more to keep stakeholders like data analysts in the loop when their data sources are updated.
Unify your stack
Now that your data modeling runs in Dagster, you can look to use this fully-featured orchestration tool to control your entire data pipeline, end-to-end. Aside from dbt, Dagster integrates with many other modern data stack tools in your stack, such as Fivetran, Meltano, Hightouch, databases, and more.
For example, dbt Cloud has been working on the dbt Semantic Layer. However, the Semantic Layer locks you into using dbt Cloud. With Dagster, you're not locked into a single solution and you can easily replace your solution depending on your needs. In the case of the Semantic Layer, community members have illustrated some prior art by integrating Cube with Dagster.
As a framework, Dagster fits with many of the tools in your data stack, so you are not locked into any solution. Refer to our integrations page or write your own to have your tooling orchestrated by Dagster.
Conclusion
By the end of this guide, you should have dbt running in production on Dagster Cloud. You also have the analytics engineering foundation needed to do the following:
- have an IDE developer experience with dbt-specific features in VS Code
- update your dbt docs on every deployment
- run Dagster locally and create branch deployments for pull requests
You can continue using Dagster Cloud to run your data projects as-is. You’re also welcome to take your newly-enriched repository and self-host Dagster on your own. However, once you’re ready, we recommend diving more into Dagster and learning what your organization can do with a dedicated data orchestrator to observe and optimize your pipelines for performance and lower data warehouse costs.
Additionally, here are some resources to help you get there:
- The official Dagster tutorial
- dbt (dagster-dbt) API documentation
- The in-depth tutorial on how to use Dagster and dbt together
- A reference of common use cases for Dagster and dbt, such as adding metadata to your dbt models or partitioning them
There are few dbt alternatives, but if your organization is looking to run the data transformation steps as part of a more fully-featured orchestrator, give Dagster a go.
We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!
Follow us:
There is no affiliation between Elementl and dbt Labs.
Data Visibility -- A Primer
- Name
- TéJaun RiChard
- Handle
- @tejaun
Combining Dagster and SDF: The Post-Modern Data Stack for End-to-End Data Platforms
- Name
- TéJaun RiChard
- Handle
- @tejaun
A Look Inside the Dagster Labs Culture
- Name
- Eunice Ho
- Handle
- @eunice