December 14, 2022 • 1 minute read •
Troubleshooting Productionalized Notebooks using Dagster and Noteable
- Name
- Jamie DeMaria
- Handle
In an earlier blog post entitled Running data science notebooks with Dagster I shared the details of our Noteable integration.
Subsequently, we had the opportunity to do a presentation at PyData NYC in 2022. You can now walk through the workshop with us here as we demonstrate how to build and debug a data pipeline using Noteable and Dagster on Gitpod.
Why run notebooks on Noteable & Dagster:
Data engineers waste a lot of time troubleshooting long-running pipelines and know only too well the frustration of minor errors consuming hours of work. In this practical tutorial we demonstrate an approach for dramatically shortening testing cycles and reducing the number of reruns required, boosting developer/practitioner productivity, and reducing frustration on the team.
A productionalized notebook integrated with an orchestration platform provides an excellent balance of reproducibility, flexibility, and intent in a way that will be quickly consumable. This tutorial is valuable to data scientists and data engineers. This setup makes it easy to take Jupyter notebooks from exploratory to production, but even easier to debug and ensure quality over time.
This tutorial will show how you can achieve:
- Time-saving in initiating jobs: Allowing users to seamlessly transition an exploratory workflow created within a Noteable notebook, into a productionalized scheduled workflow in Dagster.
- Time and cost saving for debugging failed runs: Allowing users to immediately dive into a live running notebook at the point of failure, with all of the in-memory state preserved. This saves the users' time, as well as saves companies' compute costs by not requiring debugging to re-execute previous steps of the workflow.
We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!
Follow us:
Parallel Computing on Dagster with Dask
- Name
- Odette Harary
- Handle
- @odette
Orchestrating dbt™ with Dagster
- Name
- Rex Ledesma
- Handle
- @_rexledesma
- Name
- Sandy Ryza
- Handle
- @s_ryz
Orchestrate Meltano Jobs with Dagster
- Name
- Fraser Marlow
- Handle
- @frasermarlow