Moving Beyond Airflow in the Modern Data Stack

How we think about data orchestration needs to fundamentally change, and Dagster represents that shift in thinking.

In the world of data orchestration, we've seen countless tools rise and fall. Some stick around long enough to become the legacy systems we all love to complain about. Airflow has had a good run. It's been the default choice for many teams, and for good reason - it solved a real problem when it was created. But just as we've moved beyond Hadoop clusters to cloud data warehouses, it's time to critically examine if Airflow is still the right tool for modern data platforms.

How we think about data orchestration needs to fundamentally change, and Dagster represents that shift in thinking.

The Black Box Problem

At its core, Airflow is a task-based orchestrator. You define a series of tasks, string them together with dependencies, and Airflow runs them in the right order. What happens inside each task? Airflow has no idea; it's a black box. A task successfully exited with code 0? Great! What data did it produce? How many rows? What columns? What quality?

This task-centric approach made sense when Airflow was created. However, as our data platforms have grown more complex and critical to business operations, this model has become increasingly limiting.

In Dagster, we flip the script. Instead of focusing on tasks, we focus on what you actually care about: the tables, files, models, and notebooks that make up your data platform. By making these data assets first-class citizens, we gain a representation of your entire data ecosystem in a beautiful graph. You get lineage between your actual data assets, not just between opaque tasks.

But Airflow 3 Is Adding Data-Centric Features!

Yes, it only took them three years to catch up to where Dagster was in 2021. It's great to see the Airflow team validating what Dagster has believed from day one: that data is at the core of your data pipelines.

But here's the thing about playing catch-up: by implementing yesterday's innovation, your competitor has already moved on to tomorrow's. While Airflow is busy implementing basic data-centric orchestration, Dagster has been building on that foundation to enable rich data quality assertions, column-level lineage, cost management, and a unified data catalog.

Data teams need to be more thoughtful about allocating their limited engineering resources. Do you want to invest in a platform that's perpetually three years behind or one that's defining the future of data orchestration?

The Platform Engineer's Dilemma

As I wrote recently about the rise of data platform engineers, the fundamental challenge facing data teams today isn't just building individual pipelines - it's building scalable platforms that enable self-service for their various consumers.

This is where Dagster truly shines. By modeling the data assets that make up your platform, Dagster provides a framework allowing downstream consumers to build and maintain their pipelines without deep orchestration expertise. It's the difference between building bespoke pipelines and building a platform that enables others.

Being Right Isn't Enough

Being right isn't enough—you need to be effective. Airflow may be "right" in the sense that it can technically orchestrate your pipelines, but is it the most effective tool for the job?

Consider what effectiveness looks like for a modern data platform:

Developer productivity - Can your team iterate quickly without fighting their tools?
Observability - How quickly can you identify and resolve the issue when something breaks?
Self-service - Can data consumers answer their questions without engineering involvement?
Resource optimization - Are you getting the most value from your cloud spend?
Data quality - Can you trust the data that powers your business decisions?

Airflow struggles with all these dimensions because its task-centric model doesn't map to how modern data teams work. Dagster's asset-centric approach, on the other hand, aligns perfectly with these needs.

Making the Switch

The reluctance to switch orchestrators is understandable. Migration is never fun, and the "devil you know" argument has merit. But here's what I've observed from teams that have made the switch:

Migrating to Dagster is often easier than upgrading Airflow or managing multiple disparate instances
The productivity gains are immediate and substantial
The unified visibility into your data ecosystem pays dividends in reduced debugging time
The ability to selectively run parts of your pipeline based on data assets rather than tasks is a game-changer

You don't have to rip and replace overnight. Dagster can integrate with your existing Airflow instances, providing a global view of lineage and orchestration across your entire platform. This lets you incrementally migrate at your own pace, starting with new pipelines and gradually moving existing ones as it makes sense.

The Choice Is Yours

If you're choosing an orchestrator today, you have two paths:

Go with the company that takes three years to catch up to what others are building today
Choose the orchestrator that's setting the stage for where the future is going

I know which one I'd pick. But then again, I've seen enough data platform evolutions to know that sometimes you have to let go of the familiar to embrace something better.

If you're ready to build for the future of data - a future where your teams are more productive, your platform is more observable, and your business gets more value from its data - it's time to take a serious look at Dagster.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.