Blog
Orchestrate Meltano Jobs with Dagster

Orchestrate Meltano Jobs with Dagster

Meltano provides 550 connectors and tools, all of which can be configured and orchestrated straight from Dagster.

Orchestrate Meltano Jobs with Dagster

In this blog post we will explore orchestrating the popular ingestion solution Meltano from inside Dagster.

By executing the commands from within Dagster, we get to take full advantage of the solution's other capabilities such as scheduling, dependency management, end-to-end testing, partitioning and more.

Meltano is one of dozens of integrations for Dagster, and the complete list can be found here.

Note: This tutorial was updated in May 2023 to include version updates.
It was last tested on:Dagster, version 1.2.6 and Meltano, version 2.16.1.

Contents

An introduction to Meltano

The origins: Singer.io

Back in the late 2010’s, a dozen companies and open source projects popped up aiming to solve the problem of ELt in the new world of SaaS - namely, how to easily ingest data from a dozen SaaS sources into a centralized warehouse for analysis, typically to support a Business Analytics use case.

One popular open-source project for ELt was Singer, a specification with a simple premise: you could write any data extraction program to pull data from a source (say, a simple Python program using requests) and write any data loading program to push your data into a destination like mySQL or Redshift (or later Snowflake, Databricks, Azure Synapse, Duck DB…). As long as your data extraction program (called a ‘tap’) and your data loading program (called a ‘target’) could write/read in a serialized JSON format that met the Singer standard, you could pipe the data over ‘stdout’ from the tap to the target with a simple command.

The Singer.io framework provided some other capabilities, such as configuring a catalog.py for selecting what data to replicate, a STATE JSON map for persisting information between invocations of a tap, and a config file that contains the parameters needed to pull data from the source (such as credentials).

Once this standard was established, any member of the data community could submit a tap or target of their choice to the collective open-source library.

I wrote an introduction guide to using Singer a while back that you can find here.

Enter Meltano

The in-house data team at GitLab adopted the Singer spec, and built an internal framework for better managing their custom taps and targets. The project was branded Meltano, which became an open-source project. It launched publicly in 2018 and became independent of GitLab in 2021.

Building on the original Singer specification, Meltano added an SDK for building new integrations, a configuration wrapper, and an integrations Hub to support the community of Singer users. At the time of writing, the Meltano Hub offers over 550 integrations and the company is prepping the launch of its cloud service.

As such, Meltano is an interesting open-source tool that Dagster users may be interested in integrating into their pipelines.

Project overview

For simplicity, we will work off the Meltano tutorial example, which involves ingesting data from GitHub and storing it in a dockerized Postgres database.  We will then add to this by orchestrating this pipeline with Dagster.

Upon completing the installation steps, your project files will look like the folder structure below. There are four key files to be aware of:

Meltano

Dagster’s __init__.py file (inside the Dagster project subfolder).  This is where we will be making our main code changes for Dagster, but note that in a typical Dagster project, we would organize our code in a more structured fashion.        Dagster’s setup.py where we will specify our Python dependencies before installing Dagster.  In this tutorial our only dependency is the dagster-meltano library.        Meltano’s .env file where any sensitive configuration values get stored.        Meltano’s meltano.yml file.  This is the main configuration file for the Meltano instance, and each `meltano config` command will make updates to this file.    

Interested in trying Dagster Cloud for Free?

        Enterprise orchestration that puts developer experience first. Serverless or hybrid deployments, native branching, and out-of-the-box CI/CD.        

             Try Dagster Cloud Free for 30 days            

Setting up a Meltano project (a bash cheatsheet)

To set up Meltano, you can either follow the four-part tutorial, or, if you would rather zip through that, you will find below the commands you need if working locally on Mac (or you can grab the bash script here).  Both the tutorial and the script below should get you to the same place:

To use the script:

  1. Install and boot up Docker desktop on Mac  or install Docker on Linux
  2. Execute the set of commands listed below. Note that I provided a set of variables in the shell script, and I will refer back throughout the tutorial, including in some commands. You can use whichever variables make sense for your project, but they are:
  3. PROJECT:   The top-level folder for our project.
  4. GITHUB_TOKEN: a Personal Access Token (Classic) with minimum access permissions.
  5. REPOS_TO_IMPORT: a list of Github repositories you want to pull data for.
  6. START_DATE: the date of the earliest data you want to extract.
  7. DOCKERCONTAINERNAME: a unique name for the docker container.
  8. POSTGRES_USER: a new user for your postgres database.
  9. POSTGRES_PASSWORD: the postgres user's password.
  10. DATABASE: an arbitrary name for the database.
  11. ENVIRONMENT: our Meltano environment.

Bash

Setting up Dagster

Now that we have a basic Meltano E(t)L process set up let’s add Dagster to the mix.

From the commands above, we created our Meltano project in a folder called ~/dag-melt/ so we will now create a folder for our Dagster instance at ~/dag-melt/dagster. We will create a separate venv for Dagster, so we will deactivate first just in case you have the previous venv still active.

Bash

Next we will install Dagster.  In most cases this will simply involve

Bash

…but you should refer to https://docs.dagster.io/getting-started/install and follow the most recent instructions.

Once installed, we will scaffold a blank dagster project for demo purposes:

Bash

Add the dagster-meltano library as a required install item in the Dagster project setup.py:

Python

We can now install our dependencies and launch Dagster:

Bash

This should start the Dagster instance at https://localhost:3000

Arguably, it looks a bit empty right now, but it's up and running.

Meltano

       A blank Dagster instance    

Using the Dagster-Meltano library

Now that we have Meltano up and running, we can get to the good stuff: how to execute Meltano commands straight from Dagster. Let's explore some of the options for executing Meltano commands from Dagster.

Run config

When initiating a run in Dagster, we can pass along configuration variables at run time such as the location of the Meltano project.  Look for the 'Launchpad' tab after clicking on the job name in the left nav.

YAML

If you fail to specify this, you will run into the error meltano run must be run inside a Meltano project.”

Side note: Injecting Env Variables

Meltano stores any env variables in a local .env file in the root of the Meltano project folder.

You can, however, pass such configuration variables along at runtime from the Dagster Launchpad as follows:

YAML

Three ways to run Meltano from Dagster

There are several techniques for triggering a Meltano run using the integration.

Note that Meltano tracks the STATE (for  incremental replication), and subsequent invocations will not duplicate the import unless you explicitly ask it to by overriding it. You can do a full refresh (to ignore existing state) using meltano run tap target --full-refresh. You can also use meltano state clear <state_id> to delete the existing state as documented here.

Option 1:

Our first option is to Earlier, during the Meltano setup, we created a job for our Github->Postgres pipeline with the command

Bash

So we can now import that job (along with any other defined jobs).

Edit the file Dagster project __init__.py as follows, replacing the path with the one on your machine:

Python

       For example, the path for me is /Users/frasermarlow/dag-melt/meltano-project.

Note that, since we are providing the path, this job requires no configuration.  You can refresh the Dagster project, then click on the Launchpad tab, and click 'Launch Run'.

Option 2: issue a meltano run command.

Now edit the file __init__.py and replace the file contents with the following:

Bash

Again, refresh the Dagster project, click on the Jon, click on Launchpad, add the configuration for the run as detailed in the "Run config" section above, and then click 'Launch run'.

Option 3:  Issue a Meltano run job command:

Very similar to option 2, if you have a job that has been defined in Meltano you can simply run meltano_run_op("my-meltano-job-name")()

As the job executes you will see the Meltano command run:

Running other Meltano commands

Now that we have demonstrated how to trigger a basic run, we can look at how to do any other configuration changes in Meltano.  You can make any changes programmatically from Dagster using the meltano_command_op() function.

The meltano_resource will access the Meltano project location, prepend the meltano reference, and execute the command:

Bash

In conclusion

We hope this guide will be helpful to anybody looking to tap into Meltano's capabilities as part of a Dagster managed project.  This guide covered the basics of getting a Meltano project running, and we encourage you to investigate further as there is a lot more capabilities under the hood.

References

The Meltano intro tutorial
The Dagster utility on the Meltano Hub
The Quantile README for the dagster-meltano library

     Explore more Dagster integrations    

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Follow us:

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Code Location Best Practices

June 12, 2025

Code Location Best Practices

How to organize your code locations for clarity, maintainability, and reuse.

Connect 211's Small Team, Big Impact: Building a Community Resource Data Platform That Serves Millions

June 10, 2025

Connect 211's Small Team, Big Impact: Building a Community Resource Data Platform That Serves Millions

Data orchestration is our primary business, so Dagster has been a total game changer for us.‍

Big Cartel Brought Fragmented Data into a Unified Control Plane with Dagster

June 3, 2025

Big Cartel Brought Fragmented Data into a Unified Control Plane with Dagster

Within six months, Big Cartel went from "waiting for dashboards to break" to proactive monitoring through their custom "Data Firehose," eliminated inconsistent business metrics that varied "depending on the day you asked," and built a foundation that scales from internal analytics to customer-facing data products.