Blog
Dagster Deep Dive Recap: Building a True Data Platform

Dagster Deep Dive Recap: Building a True Data Platform

September 6, 2024
Dagster Deep Dive Recap: Building a True Data Platform
Dagster Deep Dive Recap: Building a True Data Platform

Move past the MDS and build a data platform for observability, cost-efficiency, and top-tier orchestrating.

As data engineering evolves, companies are running into problems with the modern data stack: observability, orchestration, and cost. To solve these problems, our recent Dagster Deep Dive, led by [Pedram Navid] (https://www.linkedin.com/in/pedramnavid/), went into how to build a real data platform that goes beyond the modern data stack.

If you missed the live event, don’t worry.

We’ve embedded the video below so you can watch below.

>    Building a True Data Platform: Beyond the Modern Data Stack (A Dagster Deep Dive)  

Highlights

We covered the following during the deep dive:

The Unmet Promise of the Modern Data Stack

While the modern data stack has improved upon previous tools, it has also introduced new problems:

  • No observability across multiple tools
  • Limited orchestration beyond basic scheduling
  • High cost and vendor lock-in

The Data Platform Engineer

We introduced the data platform engineer as the role that’s been created to solve these problems.

This role is about managing complex data infrastructure, with engineers building platforms that serve the needs of stakeholders. They enable consumers to build pipelines without having to learn complex languages, making data more accessible to a broader set of users.

And they’re moving from building individual pipelines to building frameworks and services that support the entire data ecosystem within an organization.

This is a big change and evolution in data engineering.

What Makes a Good Data Platform?

We went into how a real data platform is the foundation of modern data driven companies, and what characteristics a platform should have to meet the changing needs of companies and their data teams:

  • Scalability and Maintainability: It should grow with your company’s data maturity
  • High-Quality Governance: Data testing, fact assertion, alerting
  • Data Observability and Insights: Stakeholders should be able to see the state of the data and dependencies
  • Software Development Lifecycle Integration: Testing, version control, branching
  • Support for Heterogeneous Use Cases: Different languages and tools
  • Declarative Workflows: Flexible, easy to understand process definitions

Takeaways

Here's a recap of the main takeaways from the deep dive:

  • The modern data stack is an improvement over legacy systems but has introduced new problems in data engineering: fragmented observability, limited orchestration, cost.
  • The data platform engineer role has emerged to solve these problems.
  • A good data platform should be scalable and maintainable, have high-quality governance and data observability and insights.
  • A data platform should also have:
    • Software development lifecycle integration
    • Support for heterogeneous use cases
    • Different languages and tools
    • Declarative workflows
  • Code-based solutions over no-code or low-code for complex data engineering tasks.
  • Dagster can help build such a unified data platform with features like code locations and asset checks for data quality and governance.

Q&A

Near the end of the deep dive, Pedram answered questions from our audience. Here are those questions and answers:

  1. What are Dagster’s features for data governance, especially data access management?
    • The roadmap for data access management features isn’t set, but we’re considering role-based access control (RBAC). It’s possible for future development, but the timeline is unknown.
  2. Are column-level lineage and data catalog in Dagster Plus or open source?
    • Column-level lineage is a Dagster+ feature, not open source.
  3. How to schedule assets from multiple code locations?
    • Keep each code location separate as much as possible. For dependencies between code locations, you can use an asset sensor as input into the pipeline. You can also look into declarative automation, which allows you to specify asset materialization at a specific cadence without getting too granular about the details.
  4. How to move from an existing Airflow setup to Dagster?
    • We’re working on a solution for this exact problem. Reach out on Slack and we’ll connect you with someone who can help.
  5. What’s Dagster’s view on machine learning processes or LLM Ops?
    • We use Dagster internally for LLM Ops, basically as data ops. With LLMs, data quality and metadata become important. Dagster has built-in integrations like OpenAI, so you can observe token consumption through Dagster Insights. You can also emit custom metadata to track metrics like token count over time or LLM response quality.
  6. Can I use Dagster Plus and then move to my own server with the open source option?
    • Yes, that’s possible. But if you become dependent on Dagster+ only features, you’ll need to remove those when you move. The core code itself isn’t locked in.
  7. Can I access the data catalog programmatically and tag datasets for filtering?
    • Tagging and filtering by tags in the catalog is supported. For programmatic access to the data catalog, ask in the Slack channel for more info.
  8. Can I use the Great Expectations plugin for data assets?
    • Yes, you can use Great Expectations or Dagster’s built-in asset checks. Depends if you need the more complex features Great Expectations offers.
  9. What plugins or integrations would be helpful for a Financial Service Company?
    • Financial service companies face similar data engineering challenges as other industries. Data replication is key, so tools like Airbyte for replicating data across databases to your data warehouse are useful. Remember, Dagster is Python-based, so you can use any Python library, even without a specific built-in integration.

Conclusion

Building a real data platform is more than just using modern data stack tools. It’s thinking about scalability, governance and observability. Dagster addresses those needs with features that tackle the challenges of today’s data engineering.

Watch the on-demand webinar above to learn how to build a data platform with Dagster and stay tuned for more or catch up on past deep dives to learn more about data platform engineering, Dagster and data engineering best practices and tools.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 7, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

Monorepos, the hub-and-spoke model, and Copybara
Monorepos, the hub-and-spoke model, and Copybara
Blog

April 3, 2026

Monorepos, the hub-and-spoke model, and Copybara

How we configure Copybara for bi-directional syncing to enable a hub-and-spoke model for Git repositories

Making Dagster Easier to Contribute to in an AI-Driven World
Making Dagster Easier to Contribute to in an AI-Driven World
Blog

April 1, 2026

Making Dagster Easier to Contribute to in an AI-Driven World

AI has made contributing to open source easier but reviewing contributions is still hard. At Dagster, we’re improving the contributor experience with smarter review tooling, clearer guidelines, and a focus on contributions that are easier to evaluate, merge, and maintain.

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
Blog

March 17, 2026

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform

DataOps is about building a system that provides visibility into what's happening and control over how it behaves

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on how to scale data teams
Guide

November 5, 2025

Download the eBook on how to scale data teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the e-book primer on how to build data platforms
Guide

February 21, 2025

Download the e-book primer on how to build data platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.