Blog
Community Showcase Part 2

Community Showcase Part 2

June 30, 2026
Community Showcase Part 2
Community Showcase Part 2

Some of the most interesting Dagster projects come from the community. This post highlights creative community-built applications.

Some of our favorite Dagster use cases are the ones we never could have predicted.

People in the community are using Dagster to explore public datasets, monitor infrastructure, automate research workflows, build internal tools, and experiment with entirely new kinds of data applications. Some projects are deeply technical, some are wonderfully niche, and all of them reflect the creativity of the people building with Dagster.

This post highlights a few community-built projects that caught our attention along with the stories behind them, why their creators picked Dagster, and what the experience of building them was like.

Working on something fun with Dagster? We’d love to hear about it.

Edwin Weber

Linkedin | Github | Email

Tell us a little about yourself and what you work on

I am Edwin Weber, an independent data engineer based in The Netherlands. I strongly believe in metadata-driven data engineering automation and have developed those kinds of 'frameworks' frequently in my career. I am mainly active on the 'back-end' of data projects, extracting data from source systems, data modeling, writing transformations, managing orchestration, tuning SQL queries etc.

How did you first discover Dagster?

I first encountered Dagster when I read the book 'DuckDB in Action' by Manning Publications. The book mentioned Dagster as a tool for orchestrating data pipelines.

Then I chose to use dbt for my data transformations and read that Dagster has great integration with dbt, including discovering the dependencies automatically. This made the choice for my data orchestration tool clear: Dagster.

What project have you been building with Dagster?

I wanted a pet project to actually build something with the tools I kept reading about, what is often called The Modern Data Stack. I forked a project that resembled my setup (https://github.com/bgarcevic/danish-democracy-data) and added my own ideas to it.

The basic idea is: having a complete data engineering project for a small budget, using only open source tools and as little moving parts to maintain as possible. It is available here: https://github.com/edwinweber/dbt_duckdb_demo_public

The stack I used for this project is:

  • Orchestration: Dagster
  • IDE: Visual Studio Code
  • Programming language: Python
  • Data extraction: dlt
  • Data transformation: dbt
  • Data storage: DuckDB and Delta tables
  • Compute: DuckDB
  • Visualization: Metabase
  • Infrastructure: Docker containers on local machine for development, on a Hetzner cloud server for 'production'.

What does it do: Extracting data from a public API, transforming it with dbt and loading it into DuckDB and Delta tables. The data is about meetings and votes on subjects in the Danish parliament. The data follows a path through the medallion layers:

  • Bronze (json data extracted fro the API with DuckDB views on it)
  • Silver: fully historized data, derived via hash-based CDC detection. Available as tables in DuckDB and Delta tables in the chosen file storage (local or Microsoft Fabric Onelake)
  • Gold: a dimensional model for analysis, available as views in DuckDB and Delta tables in the chosen file storage (local or Microsoft Fabric Onelake)

For visualization, I use Metabase, which connects to the DuckDB instance and allows me to create dashboards and explore the data about the parliament, but also the data from Dagster (the SQLite database where it stores the metadata about the runs, assets, materializations etc).

What was the hardest or most interesting problem you solved?

I had to stop the Metabase container when the dbt part of the project was running to avoid locking conflicts in DuckDB.

The solution was to create 2 Dagster assets, one to stop the Metabase container and one to start it again after the dbt part is done and adding those 'assets' automatically to every Dagster job. Assets are very generic things, not limited to processes for data.

@asset(
    name="stop_metabase_asset",
    description="Stops Metabase before pipeline runs."
)
def stop_metabase_asset():
    subprocess.run(["./stop_metabase_and_wait.sh"], check=True)


def build_start_metabase_asset(upstream_asset_keys: Iterable[AssetKey]) -> AssetsDefinition:
    unique_keys = sorted(set(upstream_asset_keys), key=lambda key: key.to_user_string())

    @asset(
        name="start_metabase_asset",
        deps=unique_keys,
        description="Starts Metabase after pipeline runs.",
    )
    def start_metabase_asset():
        subprocess.run(["./start_metabase_and_wait.sh"], check=True)

    return start_metabase_asset

Why was Dagster a good fit for this project?

Dagster was a great fit for my project. It is pure Python, integrated well with my dbt project and the UI is really nice to work with. I had no experience with Dagster at all, but I was able to get started really quickly and the documentation is really good.

What advice would you give someone starting with Dagster?

Read the documentation, follow some tutorials and just start building. If you also use dbt, make sure to use the dbt integration, so you do not have to define dependencies twice.

The UI is actually great. If you need even more runtime data in a 'DevOps'-dashboard, it is worth it to use the underlying SQLite database for reporting. I defined views in DuckDB on top of that database for that purpose.

Parag Ekbote

Linkedin | Github | Hugging Face

Tell us a little about yourself and what you work on

I'm currently in my final year of an undergraduate degree focused on Artificial Intelligence and Data Science. Alongside my studies. I'm also a Technical Reviewer for Manning Publications and as a guest speaker, which has given me the opportunity to work closely with technical content and stay current with emerging tools and best practices.

A lot of my work revolves around machine learning, data engineering, and the Hugging Face ecosystem. I enjoy building tools that make it easier for developers and researchers to work with data at scale. Open source has been a huge part of my journey because it allows me to learn from experienced contributors while building software that can have a real impact on the community.

How did you first discover Dagster?

I first discovered Dagster while working with Hugging Face Datasets. I was looking for ways to expand the reach and usability of datasets hosted on the Hugging Face Hub, and I realized that data orchestration could play an important role in making dataset ingestion, transformation and monitoring more reliable and reusable.

What initially attracted me to Dagster was its strong open-source community and exceptionally clear documentation. Since this was one of my first major experiences working with a dedicated data engineering and orchestration tool, having thorough guides and examples made a huge difference. As I explored further, I found that Dagster's asset-based approach aligned naturally with the way I was thinking about datasets and data pipelines. That alignment eventually led me from being a user to a contributor to help connect Hugging Face and Dagster more seamlessly.

What project have you been building with Dagster?

I built an open-source Python library that integrates Dagster with Hugging Face Datasets. The goal was to make it easy for users to load any dataset from the Hugging Face Hub, process it using Dagster, and then push the resulting dataset back to the Hub.

The implementation relied heavily on Dagster's asset system and decorators, allowing users to define dataset workflows with relatively little boilerplate. I iterated on the library extensively, continuously testing different dataset configurations and processing workflows until I was able to reliably handle datasets in the way users would expect. Throughout the process, I also received valuable feedback from members of the Dagster community, including Colton Padden, which helped improve both the usability and design of the integration.

What was the hardest or most interesting problem you solved?

One of the most challenging and interesting problems was adding streaming support to the library. Hugging Face datasets can be extremely large. Supporting those workflows required me to think carefully about how data would move through Dagster's execution model.

To make this work, I implemented a custom IO manager that could properly handle streamed datasets while still fitting within Dagster's asset architecture.

Another challenge was metadata management. Hugging Face datasets contain valuable metadata that users often want to preserve and publish alongside their processed datasets.

So, I developed a module that automatically extracts metadata from publicly available dataset information while also allowing users to define and attach their own custom metadata. This metadata can then be pushed back to the Hugging Face Hub along with the transformed dataset, creating a more complete and reproducible workflow.

Why was Dagster a good fit for this project?

Dagster was a great fit primarily because of its asset-based architecture. Hugging Face datasets naturally map to assets, so the mental model felt intuitive from the beginning. Instead of thinking about pipelines as a series of disconnected tasks, I could model datasets as first-class objects and focus on how they evolved throughout the workflow.

I also appreciated Dagster's ecosystem of integrations and its philosophy around extensibility. Many orchestration tools require users to manage external integrations themselves or maintain them separately from the core project. With Dagster, there is a strong ecosystem of maintained integrations and a community that actively supports new connectors and use cases.

What advice would you give someone starting with Dagster?

My biggest piece of advice is to start small and learn through building. After that, spend time with the documentation. Dagster has some of the clearest documentation and usage guides I've worked with, and many common questions are already covered with practical examples. Rather than trying to learn everything at once, start with a small project, experiment with assets, and gradually explore more advanced concepts.

I'd also encourage people to engage with the community. I received valuable feedback through the Hugging Face community, helping to shape the project significantly. Building with Dagster often feels much more productive than maintaining a collection of custom Python scripts.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

How we use AI to get to yes (and no!) 2x faster at Dagster
Webinar

July 9, 2026

How we use AI to get to yes (and no!) 2x faster at Dagster

Learn how Dagster uses AI to build custom demos that deliver a personalized experience for every customer.

Multi-Tenancy for Modern Data Platforms
Webinar

April 13, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Community Showcase Part 2
Community Showcase Part 2
Blog

June 30, 2026

Community Showcase Part 2

Some of the most interesting Dagster projects come from the community. This post highlights creative community-built applications.

Operationalizing Data Orchestration: Best Practices for DevOps, Infra, and Code Locations
Operationalizing Data Orchestration: Best Practices for DevOps, Infra, and Code Locations
Blog

June 24, 2026

Operationalizing Data Orchestration: Best Practices for DevOps, Infra, and Code Locations

A complete guide with all the insights, tips, and some predictions for the data platform engineer, just like an Almanack provides, with practical information for daily life.

How to Make the Architectural Case for Dagster
How to Make the Architectural Case for Dagster
Blog

June 9, 2026

How to Make the Architectural Case for Dagster

Mature orchestration environments often work operationally while still leaving critical data dependencies implicit. This post introduces the Orchestration Maturity Model, explains the architectural ceiling of job-centric systems, and shows how Dagster’s asset-aware approach helps teams reason about freshness, lineage, quality, and self-service at enterprise scale.

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on How to Scale Data Teams
Guide

November 5, 2025

Download the eBook on How to Scale Data Teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the eBook Primer on How to Build Data Platforms
Guide

February 21, 2025

Download the eBook Primer on How to Build Data Platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.

AI Driven Data Engineering
Course

March 19, 2026

AI Driven Data Engineering

Learn how to build Dagster applications faster using AI-driven workflows. You'll use Dagster's AI tools and skills to scaffold pipelines, write quality code, and ship data products with confidence while still learning the fundamentals.

Dagster & ETL
Course

July 11, 2025

Dagster & ETL

Learn how to ingest data to power your assets. You’ll build custom pipelines and see how to use Embedded ETL and Dagster Components to build out your data platform.

Testing with Dagster
Course

April 21, 2025

Testing with Dagster

In this course, learn best practices for testing, including unit tests, mocks, integration tests and applying them to Dagster.