Blog
Dagster Deep Dive Recap: Orchestrating Flexible Compute for ML with Dagster and Modal

Dagster Deep Dive Recap: Orchestrating Flexible Compute for ML with Dagster and Modal

September 27, 2024
Dagster Deep Dive Recap: Orchestrating Flexible Compute for ML with Dagster and Modal
Dagster Deep Dive Recap: Orchestrating Flexible Compute for ML with Dagster and Modal

Learn how to use Dagster and Modal to automate and streamline your machine learning model training and data processing.

Machine learning requires scalable and flexible infrastructure to handle heavy computing tasks like model training and data processing. In many teams, the challenge comes when trying to manage this infrastructure without getting slammed with complex configurations, like writing Kubernetes YAML or managing GPU operators.

In our most recent Dagster Deep Dive, led by Colton Padden (Developer Advocate at Dagster) and Charles Frye (AI Engineer at Modal), we jumped into how using Dagster and Modal together help automate and streamline these processes so that they become more developer-friendly and scalable.

In case you missed it (or just want to watch it over again), I’ve embedded the video below.

Orchestrating Flexible Compute for ML with Dagster and Modal (A Dagster Deep Dive)

Highlights

The deep dive covered different ways of using Dagster and Modal for machine learning workflows. Here’s a rundown of the main points Colton and Charles discussed:

Orchestration with Dagster

Colton began the demo by talking about Dagster’s ability to orchestrate ML pipelines. He highlighted key Dagster features like asset-based workflows for managing data dependencies, partitioned assets for handling time-series data, and sensors for triggering runs based on external events, such as a new episode of a podcast.

Colton finished up this segment by talking about how Dagster could integrate with different compute environments through Dagster Pipes.

Scalable Infrastructure with Modal

Charles followed up by showing Modal’s scaling capabilities and explaining how it can easily parallelize workloads. He demonstrated this by splitting podcast audio and transcribing multiple segments simultaneously. Modal also has serverless execution, which Charles demonstrated by showing how it automatically allocates necessary resources, including GPU support for accelerated machine learning tasks.

Demo: A Podcast Summary Application

Colton and Charles then combined Dagster and Modal to automate podcast summarization.

A screenshot of the structure of the podcast summary demo from Modal and Dagster.
   An overview of the demo project, a podcast summarizer.  

The system was designed to fetch new podcast episodes using RSS feeds, download and store audio files in cloud storage, transcribe audio segments using the Whisper model, summarize the transcript with OpenAI’s language models, and email concise podcast summaries to users.

An outline of the code structure for the podcast summary demo.
   The code and file structure for the podcast summary demo.  

The demo ultimately and effectively showed the combined strength of Dagster’s orchestration and Modal’s scalable computing power.

Takeaways

Here’s the TL:DR of all of the insights from the deep dive.

  • Dagster’s orchestration and Modal’s scalable infrastructure give you a strong solution for machine learning pipelines, particularly for tasks needing parallelism like audio transcription and large-scale data processing.
  • Modal offers flexibility without complexity, giving you a way to scale GPU workloads without needing to deal with complex Kubernetes configurations so you can focus on development.
  • Being able to rerun specific assets or pipelines through Dagster’s UI while relying on Modal’s support for parallel workloads lets you handle production-ready machine learning systems.
  • Dagster’s developer-friendly features (partitions, sensors, and cursors) let you orchestrate and simplify pipelines for engineers, especially when dealing with data sources like RSS feeds

Conclusion

With Dagster and Modal, teams can streamline and optimize their machine learning pipelines while reducing infrastructure complexity. On top of simplifying pipeline orchestration and giving you scalable, auto-scaling compute infrastructure, the two tools let developers focus on building and refining applications instead of tediously managing complex infrastructure.

If you’re interested in using this approach in your own projects:

  • Watch the video above to see the full conversation.
  • Connect with the Dagster and Modal communities in Slack to learn more and connect with other developers tackling similar challenges.
  • Visit our platform page for more information about Dagster or start a free trial and start creating projects.

Stay tuned for the next deep dive!

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 7, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

Monorepos, the hub-and-spoke model, and Copybara
Monorepos, the hub-and-spoke model, and Copybara
Blog

April 3, 2026

Monorepos, the hub-and-spoke model, and Copybara

How we configure Copybara for bi-directional syncing to enable a hub-and-spoke model for Git repositories

Making Dagster Easier to Contribute to in an AI-Driven World
Making Dagster Easier to Contribute to in an AI-Driven World
Blog

April 1, 2026

Making Dagster Easier to Contribute to in an AI-Driven World

AI has made contributing to open source easier but reviewing contributions is still hard. At Dagster, we’re improving the contributor experience with smarter review tooling, clearer guidelines, and a focus on contributions that are easier to evaluate, merge, and maintain.

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
Blog

March 17, 2026

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform

DataOps is about building a system that provides visibility into what's happening and control over how it behaves

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on how to scale data teams
Guide

November 5, 2025

Download the eBook on how to scale data teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the e-book primer on how to build data platforms
Guide

February 21, 2025

Download the e-book primer on how to build data platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.