Sandy Ryza, lead engineer on the Dagster project, joins Tobias Macey on The Machine Learning Podcast.
Building a machine learning model one time can be done in an ad-hoc manner, but if you ever want to update it and serve it in production you need a way of repeating a complex sequence of operations. Dagster is an orchestration engine that understands the data that it is manipulating so that you can move beyond coarse task-based representations of your dependencies. In this episode Sandy Ryza explains how his background in machine learning has informed his work on the Dagster project and the foundational principles that it is built on to allow for collaboration across data engineering and machine learning concerns.
- How did you get involved in machine learning?
- Can you start by sharing a definition of "orchestration" in the context of machine learning projects?
- What is your assessment of the state of the orchestration ecosystem as it pertains to ML?
- Modeling cycles and managing experiment iterations in the execution graph
- How to balance flexibility with repeatability
- What are the most interesting, innovative, or unexpected ways that you have seen orchestration implemented/applied for machine learning?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on orchestration of ML workflows?
- When is Dagster the wrong choice?
- What do you have planned for the future of ML support in Dagster?
We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!