Blog
Expanding the Dagster Embedded ELT Ecosystem with dltHub for Data Ingestion

Expanding the Dagster Embedded ELT Ecosystem with dltHub for Data Ingestion

April 5, 2024
Expanding the Dagster Embedded ELT Ecosystem with dltHub for Data Ingestion
Expanding the Dagster Embedded ELT Ecosystem with dltHub for Data Ingestion

We now have an officially supported dlt integration.

Today, we’re excited to introduce our officially supported dlt integration. Last October we introduced the first integration in the Embedded ELT family of integrations; leveraging Sling to easily ingest and replicate data between systems.

The excitement around the Sling integration has been motivating. Approaching nearly 1000 downloads a day, it is clear that the community has a need for ELT integrations in the Dagster platform.

   https://pypistats.org/packages/dagster-embedded-elt    

In the RFC: Community Input for the Dagster Embedded ELT GitHub discussion we polled the Dagster community members to see which ELT integrations were most desired, and it became clear that there was a strong demand for a dlt integration. With some users already using dlt in production — these individuals were integral in helping form the requirements for what was necessary for flushing out the officially supported integration, and we look forward to continued collaboration and enhancements to the dlt integration over time.

While Sling works fantastically for replicating data between databases and filesystems, and ingesting files, adding dlt to the embedded ELT library expands our functionality greatly. The dlt library fills the gap in ingesting data from endpoints, APIs, and disparate systems. Plus, with the flexibility of defining your own sources in Python, the possibilities are nearly limitless.

Overview

Data load tool (dlt) is an open-source Python library for building pipelines to ingest organic data sources into well-structured datasets. The dlt ecosystem and community are rapidly growing, with a large number of verified sources and destinations. Some notable sources that ingest data from APIs include Salesforce, Hubspot, and Stripe, but dlt also supports ingesting data from databases. As for destinations, dlt supports file systems like S3, databases like PostgreSQL, and data warehouses like MotherDuck, Snowflake, and BigQuery.

One of the magical things about dlt is that the developer does not need to define the logic for transforming their ingested data to a structure that matches their destination. One has to yield their Python objects in their pipeline, and the transformation happens under the hood. This alone can drastically improve the developer’s velocity when building pipelines.

Both dlt and Dagster offer Pythonic approaches for developing pipelines, relying on decorators. For example, when building a pipeline in dlt you define a @source comprised of @resources. These concepts map quite nicely to Dagster, allowing us to translate these resource objects into Dagster assets!

Our integration is built on top of the concept of multi-assets. The dlt pipeline and source are defined as one would use them traditionally, however, instead of running the pipeline directly, they are passed into the @dlt_assets decorator as parameters. The resources are extracted from the dlt source and converted into Dagster assets. Then, from within the body of our decorated function, we can trigger the pipeline to run, with materialized results being returned. Let’s walk through a quick example of building a pipeline that ingests GitHub issues and pull requests using the official GitHub-verified source.

By initializing the pipeline, we are able to pull the ingestion code for collecting data from GitHub directly into our codebase.

$ mkdir dlt_sources && cd dlt_sources
$ dlt init github snowflake

Without writing any of the ingestion code ourselves, we are able to leverage it from our Dagster project, defining a @dlt_assets definition from the GitHub source, and passing the github_reactions source definition that we’re importing from the initialized code.@dlt_assets definition from the GitHub source, and passing the github_reactions source definition that we’re        

from dagster import AssetExecutionContext, Definitions
from dagster_embedded_elt.dlt import DagsterDltResource, dlt_assets
from dlt import pipeline
from dlt_sources.github import github_reactions

@dlt_assets(
    dlt_source=github_reactions("dagster-io", "dagster"),
    dlt_pipeline=pipeline(
        pipeline_name="github_issues",
        dataset_name="github",
        destination="snowflake",
    ),
    name="github",
    group_name="github",
)
def dagster_github_reactions(context: AssetExecutionContext, dlt: DagsterDltResource):
    yield from dlt.run(context=context)

And just like that, we were able to define a simple pipeline for collecting data from GitHub in nearly no time at all.

For more advanced scenarios the dlt integration has followed the implementation architecture of the dbt and Sling integrations, in which a translator can be defined to adjust the parameters of the generated assets. For more information, reference the API documentation.

While dlt provides a fantastic framework for building ingestion pipelines, Dagster provides a robust framework for building and orchestrating fault tolerant jobs. By combining these two technologies, it’s possible to build data ingestion pipelines from a variety of sources with an intuitive developer experience.

Conclusion

We’re excited to join forces with dltHub by providing an integration that can pair with their engineering and ecosystem. If you find yourself writing a new source for dlt while working with Dagster, please consider contributing it upstream!

Community members are crucial in measuring the value of an integration, and our members along with those from dlt have gone above and beyond to provide early feedback and testing for the dlt integration. We are very grateful for their activity, and we look forward to further collaboration. We’re always looking for feedback and contributions. You’re welcome to join the conversation on GitHub discussions.

Finally, if you’re looking to get up-and-running with Dagster right away, consider exploring Dagster Cloud and you’ll have a production ready orchestrator in minutes.

Additional Reading

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 7, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
Blog

March 17, 2026

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform

DataOps is about building a system that provides visibility into what's happening and control over how it behaves

Unlocking the Full Value of Your Databricks
Unlocking the Full Value of Your Databricks
Blog

March 12, 2026

Unlocking the Full Value of Your Databricks

Standardizing on Databricks is a smart strategic move, but consolidation alone does not create a working operating model across teams, tools, and downstream systems. By pairing Databricks and Unity Catalog with Dagster, enterprises can add the coordination layer needed for dependency visibility, end-to-end lineage, and faster, more confident delivery at scale.

Announcing AI Driven Data Engineering
Announcing AI Driven Data Engineering
Blog

March 5, 2026

Announcing AI Driven Data Engineering

AI coding agents are changing how data engineers work. This Dagster University course shows how to build a production-ready ELT pipeline from prompts while learning practical patterns for reliable AI-assisted development.

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on how to scale data teams
Guide

November 5, 2025

Download the eBook on how to scale data teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the e-book primer on how to build data platforms
Guide

February 21, 2025

Download the e-book primer on how to build data platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.