Welcome to our latest major Dagster release: 1.2: Formation.
While this release contains a number of important incremental updates, we would like to focus on two in particular: enhanced partitioned asset support and the introduction of Pythonic config and resources.
Partitioned data asset and backfill support
This release includes major additions to Dagster’s support for partitioned Software-defined Assets.
As a data orchestrator, Dagster strives to model the relationship between computation and data at a deep level. To this end, Dagster has comprehensive and flexible support for modeling partitioned data assets and data pipelines. It handles all the complex interactions between partitions, assets, computations, and time.
Here are new developments rolling out with our 1.2 release for partitioned data assets:
- Dynamic Asset Partitions (Experimental): Sometimes, you don't know the set of partitions ahead of time when you're defining your assets. For example, maybe you want to add a new partition every time a new data file lands in a directory or every time you want to experiment with a new set of hyperparameters. In these cases, you can use a
- The updated asset graph in the UI now displays the number of materialized, missing, and failed partitions for each partitioned asset.
- Asset partitions can now depend on earlier time partitions of the same asset by using
TimeWindowPartitionMapping. Backfills and the asset reconciliation sensor respect these dependencies when requesting runs [example provided here].
end_offsetarguments that allow specifying that time partitions depend on earlier or later time partitions of upstream assets [check out the docs].
We are also enhancing Dagster’s backfill capabilities.
- Dagster now allows backfills that target assets with different partitions, such as a daily asset which rolls up into a weekly asset, as long as the root assets in the selection are partitioned similarly.
- You can now choose to pass a range of asset partitions to a single run rather than launching a backfill with a run per partition [instructions].
In addition, we are bringing the work on partitioned assets to our integrations with data warehouses. Check out the integrations section below.
Pythonic Config and Resources
In Dagster 1.2, we are rolling out the first stage of Pythonic Config and Resources.
User-defined values are provided to Dagster jobs or Software-defined Assets at runtime through a configuration API.
The new Pythonic configuration APIs released in 1.2. allow Dagster developers to provide such parameters to assets and jobs in a more streamlined and reliable fashion.
Under the hood, these config models utilize Pydantic, a popular Python library for data validation and serialization, and therefore should feel familiar to many Python developers.
- During execution, the passed config values are accessed within the op or asset using the
configparameter, which is reserved specifically for this purpose.
- The new API supports complex config schemas, such as a list of files, nested schemas, or union types.
The resource page surfaces useful resource metadata, with values sourced from environment variables highlighted. The resource page makes it easier to tell at-a-glance what external services your Dagster instance is configured to interact with.
As reported on the Dagster blog recently, we continue to make investments in Dagster’s library of integrations. We have updated the Snowflake, DuckDB, and BigQuery integrations, adding partition support to the IO Managers. The updates we announced in that blog post are now live in 1.2.
- Weights and Biases - orchestrate your MLOps pipelines and maintain ML assets with Dagster. [read the docs]
- Snowflake + PySpark - store and load PySpark DataFrames as Snowflake tables using the
snowflake_pyspark_io_manager. [read the docs]
- Google BigQuery - store and load Pandas and PySpark DataFrames as BigQuery tables using the
bigquery_pyspark_io_manager. [read the docs]
- Airflow - The updated
dagster-airflowintegration makes migration from Airflow to Dagster much easier. Refer to the docs for an in-depth migration guide or hear from some of the companies that have made the transition.
New guides and Tutorials
Alongside the release of these new features, 1.2 sees the release of several companion guides:
- “Asset versioning and caching”: Why spend time re-materializing an asset if the result is going to be the same? Build memoizable graphs of assets to speed up the developer workflow and save computational resources.
- “Automating your pipelines”: Dagster offers several ways to automate pipelines. This guide helps you select the right approach for your project.
- “Project structure best practices guide”: This guide recommends some best practices for structuring your projects and will be most useful for teams starting to scale up their Dagster implementation.
- “Dagster Dev”: Following this smaller (but very popular) update, we are pleased to add a full guide to the
dagster devcommand which launches a full deployment of Dagster from the command line with one command.
- “Intro to Software-defined Assets”: a walkthrough of the basics of creating, maintaining, and testing assets in Dagster.
Here is a shout-out to all contributors from 1.1.0 to 1.1.21 - Dagster would not be what it is without your help.
We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!