About Dagstermill (Dagster's Papermill integration)
Fast iteration, the literate combination of arbitrary code with markdown blocks, and inline plotting make notebooks an indispensible tool for data science. The Dagstermill package makes it straightforward to run notebooks using the Dagster tools and to integrate them into data jobs with heterogeneous ops: for instance, Spark jobs, SQL statements run against a data warehouse, or arbitrary Python code.
Dagstermill lets you:
- Run notebooks as ops in heterogeneous data jobs with minimal changes to notebook code
- Define data dependencies to flow inputs and outputs between notebooks, and between notebooks and other ops
- Use Dagster resources, and the Dagster config system, from inside notebooks
- Aggregate notebook logs with logs from other Dagster ops
- Yield custom materializations and other Dagster events from your notebook code
Our goal is to make it unnecessary to go through a tedious "productionization" process where code developed in notebooks must be translated into some other (less readable and interpretable) format in order to be integrated into production workflows. Instead, we can use notebooks as ops directly, with minimal, incremental metadata declarations to integrate them into jobs that may also contain arbitrary heterogeneous ops.