In this blog post we will explore orchestrating the popular ingestion solution Meltano from inside Dagster.
By executing the commands from within Dagster, we get to take full advantage of the solution's other capabilities such as scheduling, dependency management, end-to-end testing, partitioning and more.
Meltano is one of dozens of integrations for Dagster, and the complete list can be found here.
Note: This tutorial was updated in May 2023 to include version updates.
It was last tested on:Dagster, version 1.2.6 and Meltano, version 2.16.1.
Contents
- An introduction to Meltano
- Project overview
- Setting up a Meltano project
- Setting up Dagster
- Using the dagster-meltano library
- Three ways to run Meltano from Dagster
- References
An introduction to Meltano
The origins: Singer.io
Back in the late 2010’s, a dozen companies and open source projects popped up aiming to solve the problem of ELt in the new world of SaaS - namely, how to easily ingest data from a dozen SaaS sources into a centralized warehouse for analysis, typically to support a Business Analytics use case.
One popular open-source project for ELt was Singer, a specification with a simple premise: you could write any data extraction program to pull data from a source (say, a simple Python program using requests
) and write any data loading program to push your data into a destination like mySQL or Redshift (or later Snowflake, Databricks, Azure Synapse, Duck DB…). As long as your data extraction program (called a ‘tap’) and your data loading program (called a ‘target’) could write/read in a serialized JSON format that met the Singer standard, you could pipe the data over ‘stdout’ from the tap
to the target
with a simple command.
The Singer.io framework provided some other capabilities, such as configuring a catalog.py
for selecting what data to replicate, a STATE
JSON map for persisting information between invocations of a tap, and a config file that contains the parameters needed to pull data from the source (such as credentials).
Once this standard was established, any member of the data community could submit a tap or target of their choice to the collective open-source library.
I wrote an introduction guide to using Singer a while back that you can find here.
Enter Meltano
The in-house data team at GitLab adopted the Singer spec, and built an internal framework for better managing their custom taps and targets. The project was branded Meltano, which became an open-source project. It launched publicly in 2018 and became independent of GitLab in 2021.
Building on the original Singer specification, Meltano added an SDK for building new integrations, a configuration wrapper, and an integrations Hub to support the community of Singer users. At the time of writing, the Meltano Hub offers over 550 integrations and the company is prepping the launch of its cloud service.
As such, Meltano is an interesting open-source tool that Dagster users may be interested in integrating into their pipelines.
Project overview
For simplicity, we will work off the Meltano tutorial example, which involves ingesting data from GitHub and storing it in a dockerized Postgres database. We will then add to this by orchestrating this pipeline with Dagster.
Upon completing the installation steps, your project files will look like the folder structure below. There are four key files to be aware of:

Dagster’s __init__.py
file (inside the Dagster project subfolder). This is where we will be making our main code changes for Dagster, but note that in a typical Dagster project, we would organize our code in a more structured fashion. Dagster’s setup.py
where we will specify our Python dependencies before installing Dagster. In this tutorial our only dependency is the dagster-meltano
library. Meltano’s .env
file where any sensitive configuration values get stored. Meltano’s meltano.yml
file. This is the main configuration file for the Meltano instance, and each `meltano config` command will make updates to this file.
Interested in trying Dagster Cloud for Free?
Enterprise orchestration that puts developer experience first. Serverless or hybrid deployments, native branching, and out-of-the-box CI/CD.
Try Dagster Cloud Free for 30 days
Setting up a Meltano project (a bash cheatsheet)
To set up Meltano, you can either follow the four-part tutorial, or, if you would rather zip through that, you will find below the commands you need if working locally on Mac (or you can grab the bash script here). Both the tutorial and the script below should get you to the same place:
To use the script:
- Install and boot up Docker desktop on Mac or install Docker on Linux
- Execute the set of commands listed below. Note that I provided a set of variables in the shell script, and I will refer back throughout the tutorial, including in some commands. You can use whichever variables make sense for your project, but they are:
- PROJECT: The top-level folder for our project.
- GITHUB_TOKEN: a Personal Access Token (Classic) with minimum access permissions.
- REPOS_TO_IMPORT: a list of Github repositories you want to pull data for.
- START_DATE: the date of the earliest data you want to extract.
- DOCKERCONTAINERNAME: a unique name for the docker container.
- POSTGRES_USER: a new user for your postgres database.
- POSTGRES_PASSWORD: the postgres user's password.
- DATABASE: an arbitrary name for the database.
- ENVIRONMENT: our Meltano environment.
Bash
Setting up Dagster
Now that we have a basic Meltano E(t)L process set up let’s add Dagster to the mix.
From the commands above, we created our Meltano project in a folder called ~/dag-melt/
so we will now create a folder for our Dagster instance at ~/dag-melt/dagster
. We will create a separate venv
for Dagster, so we will deactivate
first just in case you have the previous venv
still active.
Bash
Next we will install Dagster. In most cases this will simply involve
Bash
…but you should refer to https://docs.dagster.io/getting-started/install and follow the most recent instructions.
Once installed, we will scaffold a blank dagster project for demo purposes:
Bash
Add the dagster-meltano
library as a required install item in the Dagster project setup.py
:
Python
We can now install our dependencies and launch Dagster:
Bash
This should start the Dagster instance at https://localhost:3000
Arguably, it looks a bit empty right now, but it's up and running.

A blank Dagster instance
Using the Dagster-Meltano library
Now that we have Meltano up and running, we can get to the good stuff: how to execute Meltano commands straight from Dagster. Let's explore some of the options for executing Meltano commands from Dagster.
Run config
When initiating a run in Dagster, we can pass along configuration variables at run time such as the location of the Meltano project. Look for the 'Launchpad' tab after clicking on the job name in the left nav.
YAML
If you fail to specify this, you will run into the error “meltano run
must be run inside a Meltano project.”
Side note: Injecting Env Variables
Meltano stores any env variables in a local .env
file in the root of the Meltano project folder.
You can, however, pass such configuration variables along at runtime from the Dagster Launchpad as follows:
YAML
Three ways to run Meltano from Dagster
There are several techniques for triggering a Meltano run using the integration.
Note that Meltano tracks the STATE (for incremental replication), and subsequent invocations will not duplicate the import unless you explicitly ask it to by overriding it. You can do a full refresh (to ignore existing state) using meltano run tap target --full-refresh
. You can also use meltano state clear <state_id>
to delete the existing state as documented here.
Option 1:
Our first option is to Earlier, during the Meltano setup, we created a job for our Github->Postgres pipeline with the command
Bash
So we can now import that job (along with any other defined jobs).
Edit the file Dagster project __init__.py
as follows, replacing the path with the one on your machine:
Python
For example, the path for me is /Users/frasermarlow/dag-melt/meltano-project
.
Note that, since we are providing the path, this job requires no configuration. You can refresh the Dagster project, then click on the Launchpad
tab, and click 'Launch Run'.
Option 2: issue a meltano run
command.
Now edit the file __init__.py
and replace the file contents with the following:
Bash
Again, refresh the Dagster project, click on the Jon, click on Launchpad, add the configuration for the run as detailed in the "Run config" section above, and then click 'Launch run'.
Option 3: Issue a Meltano run job command:
Very similar to option 2, if you have a job that has been defined in Meltano you can simply run meltano_run_op("my-meltano-job-name")()
As the job executes you will see the Meltano command run:
Running other Meltano commands
Now that we have demonstrated how to trigger a basic run, we can look at how to do any other configuration changes in Meltano. You can make any changes programmatically from Dagster using the meltano_command_op()
function.
The meltano_resource
will access the Meltano project location, prepend the meltano reference, and execute the command:
Bash
In conclusion
We hope this guide will be helpful to anybody looking to tap into Meltano's capabilities as part of a Dagster managed project. This guide covered the basics of getting a Meltano project running, and we encourage you to investigate further as there is a lot more capabilities under the hood.
References
The Meltano intro tutorial
The Dagster utility on the Meltano Hub
The Quantile README for the dagster-meltano library