Blog
Dagster Deep Dive Recap: Orchestrating Flexible Compute for ML with Dagster and Modal

Dagster Deep Dive Recap: Orchestrating Flexible Compute for ML with Dagster and Modal

September 27, 2024
Dagster Deep Dive Recap: Orchestrating Flexible Compute for ML with Dagster and Modal
Dagster Deep Dive Recap: Orchestrating Flexible Compute for ML with Dagster and Modal

Learn how to use Dagster and Modal to automate and streamline your machine learning model training and data processing.

Machine learning requires scalable and flexible infrastructure to handle heavy computing tasks like model training and data processing. In many teams, the challenge comes when trying to manage this infrastructure without getting slammed with complex configurations, like writing Kubernetes YAML or managing GPU operators.

In our most recent Dagster Deep Dive, led by Colton Padden (Developer Advocate at Dagster) and Charles Frye (AI Engineer at Modal), we jumped into how using Dagster and Modal together help automate and streamline these processes so that they become more developer-friendly and scalable.

In case you missed it (or just want to watch it over again), I’ve embedded the video below.

Orchestrating Flexible Compute for ML with Dagster and Modal (A Dagster Deep Dive)

Highlights

The deep dive covered different ways of using Dagster and Modal for machine learning workflows. Here’s a rundown of the main points Colton and Charles discussed:

Orchestration with Dagster

Colton began the demo by talking about Dagster’s ability to orchestrate ML pipelines. He highlighted key Dagster features like asset-based workflows for managing data dependencies, partitioned assets for handling time-series data, and sensors for triggering runs based on external events, such as a new episode of a podcast.

Colton finished up this segment by talking about how Dagster could integrate with different compute environments through Dagster Pipes.

Scalable Infrastructure with Modal

Charles followed up by showing Modal’s scaling capabilities and explaining how it can easily parallelize workloads. He demonstrated this by splitting podcast audio and transcribing multiple segments simultaneously. Modal also has serverless execution, which Charles demonstrated by showing how it automatically allocates necessary resources, including GPU support for accelerated machine learning tasks.

Demo: A Podcast Summary Application

Colton and Charles then combined Dagster and Modal to automate podcast summarization.

A screenshot of the structure of the podcast summary demo from Modal and Dagster.
   An overview of the demo project, a podcast summarizer.  

The system was designed to fetch new podcast episodes using RSS feeds, download and store audio files in cloud storage, transcribe audio segments using the Whisper model, summarize the transcript with OpenAI’s language models, and email concise podcast summaries to users.

An outline of the code structure for the podcast summary demo.
   The code and file structure for the podcast summary demo.  

The demo ultimately and effectively showed the combined strength of Dagster’s orchestration and Modal’s scalable computing power.

Takeaways

Here’s the TL:DR of all of the insights from the deep dive.

  • Dagster’s orchestration and Modal’s scalable infrastructure give you a strong solution for machine learning pipelines, particularly for tasks needing parallelism like audio transcription and large-scale data processing.
  • Modal offers flexibility without complexity, giving you a way to scale GPU workloads without needing to deal with complex Kubernetes configurations so you can focus on development.
  • Being able to rerun specific assets or pipelines through Dagster’s UI while relying on Modal’s support for parallel workloads lets you handle production-ready machine learning systems.
  • Dagster’s developer-friendly features (partitions, sensors, and cursors) let you orchestrate and simplify pipelines for engineers, especially when dealing with data sources like RSS feeds

Conclusion

With Dagster and Modal, teams can streamline and optimize their machine learning pipelines while reducing infrastructure complexity. On top of simplifying pipeline orchestration and giving you scalable, auto-scaling compute infrastructure, the two tools let developers focus on building and refining applications instead of tediously managing complex infrastructure.

If you’re interested in using this approach in your own projects:

  • Watch the video above to see the full conversation.
  • Connect with the Dagster and Modal communities in Slack to learn more and connect with other developers tackling similar challenges.
  • Visit our platform page for more information about Dagster or start a free trial and start creating projects.

Stay tuned for the next deep dive!

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 13, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

The Missing Half of the Enterprise Context Layer
The Missing Half of the Enterprise Context Layer
Blog

April 22, 2026

The Missing Half of the Enterprise Context Layer

AI agents that only understand business definitions without knowing whether the underlying pipeline actually succeeded are confidently wrong and operational context from the orchestrator is the missing piece.

How to Orchestrate Across Multiple Databricks Workspaces Without Losing Your Mind
How to Orchestrate Across Multiple Databricks Workspaces Without Losing Your Mind
Blog

April 20, 2026

How to Orchestrate Across Multiple Databricks Workspaces Without Losing Your Mind

Once your pipelines span multiple Databricks workspaces, you're no longer orchestrating a single system you're coordinating a distributed one.

Dagster 1.13: Octopus's Garden
Dagster 1.13: Octopus's Garden
Blog

April 9, 2026

Dagster 1.13: Octopus's Garden

Dagster skills, partitioned asset checks, state backed components, virtual assets, and stronger integrations.

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on How to Scale Data Teams
Guide

November 5, 2025

Download the eBook on How to Scale Data Teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the eBook Primer on How to Build Data Platforms
Guide

February 21, 2025

Download the eBook Primer on How to Build Data Platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.

AI Driven Data Engineering
Course

March 19, 2026

AI Driven Data Engineering

Learn how to build Dagster applications faster using AI-driven workflows. You'll use Dagster's AI tools and skills to scaffold pipelines, write quality code, and ship data products with confidence while still learning the fundamentals.

Dagster & ETL
Course

July 11, 2025

Dagster & ETL

Learn how to ingest data to power your assets. You’ll build custom pipelines and see how to use Embedded ETL and Dagster Components to build out your data platform.

Testing with Dagster
Course

April 21, 2025

Testing with Dagster

In this course, learn best practices for testing, including unit tests, mocks, integration tests and applying them to Dagster.