March 1, 20222 minute read

Dagster 0.14.0: Never Felt Like This Before

Mollie Pettit
Name
Mollie Pettit
Handle
@mollie

We’re thrilled to release version 0.14.0 of Dagster. This version introduces much more mature version of software-defined assets, new integrations, a new homepage for Dagit, and a wide set of other features and improvements.

Software Defined Assets

Software-defined assets, which were a seed in Dagster 0.13, have come into full bloom.

Software-defined assets offer a new, declarative approach to data orchestration that orients around assets rather than tasks. Built on top of Dagster’s core APIs, they enable users to explicitly declare the tables, ML models, and datasets that they want to exist and to tightly link those assets to the computations that generate their contents. This results in a reconciliation-based approach to orchestration, adds a new dimension to data observability, and helps make a Python a native citizen of the Modern Data Stack

0.14.0 includes a much more mature set of software-defined asset APIs, partitioned assets, a revamped asset details page in Dagit, a cross-repository asset graph view in Dagit, Dagster types on assets, structured metadata on assets, and the ability to materialize ad-hoc selections of assets without defining jobs. Users can expect the APIs to only undergo minor changes before being declared fully stable in Dagster’s next major release.

For a full introduction to software-defined assets, read Introducing Software-Defined Assets.

Integrations

We’re continuing to add integrations to make it easier for you to connect the tools of your choice with Dagster.

Dagster-Airbyte

A new Airbyte integration (dagster-airbyte) allows you to kick off and monitor Airbyte syncs from within Dagster. This contribution includes a resource implementation as well as a pre-built op for this purpose, and we’ve extended this library to support software-defined asset use cases as well. Read more about this integration in our blog post Dagster-Airbyte Integration, or Airbyte's Orchestrate data ingestion and transformation pipelines with Dagster post.

Dagster-Pandera

A new Pandera integration (dagster-pandera) allows you to use Pandera’s validation library to wrap dataframe schemas in Dagster types, enabling runtime data validation of Pandas dataframes in Dagster ops/assets. Additionally, Dagit displays Pandera schema information using a new TableSchema API. Read more about this integration in Observability in Dagster: Table Schema API and Pandera Integration.

Increased operational maturity

New Dagit Homepage: Factory Floor View

Dagit has a new homepage, dubbed the “factory floor” view, that provides an overview of recent activity for all the jobs. From it, you can monitor the status of each job’s latest run or quickly re-execute a job. The new timeline view reports the status of all recent runs in a convenient gantt chart.

Auto-start Sensors and Schedules

Before this release, whenever a new schedule or sensor was added to a Dagster repository, it needed to be turned on in Dagit before it started submitting runs. This manual step was particularly onerous in cases where users were dynamically creating schedules and sensors from some other data source.

Starting in Dagster 0.14.0, sensors and schedules can now be defined with a default_status parameter. If this parameter is set to RUNNING, sensors and schedules will default to running as soon as they are loaded in your workspace.

ECSRunLauncher

The ECSRunLauncher is no longer considered experimental. You can bootstrap your own Dagster deployment on ECS using our docker compose example or you can use it in conjunction with a managed Dagster Cloud deployment!

And More

Metadata on Dagster Types

Dagster Types can now have attached metadata. This was added to support the display of rich schema information for Dagster Types in Dagit. The pilot case is the attachment of TableSchema objects to Dagster Types via TableSchemaMetadata. A Dagster Type with a TableSchema will have the schema rendered in Dagit.

Op Events Without Yield

Previously, if you wanted to record AssetMaterialization, ExpectationResult and AssetObservation events, you needed to turn your op or IO manager method into a generator and emit the events with yield statements. This caused a variety of small frictions, including making it difficult to annotate ops with Python type annotations and making ops more awkward to test.

Dagster now also supports logging AssetMaterialization, ExpectationResult and AssetObservation events via OpExecutionContext.log_event.

Op Selection in Subgraphs

Op selection now supports selecting ops inside subgraphs. For example, to select an op my_op inside a subgraph my_graph, you can now specify the query as my_graph.my_op. This is supported in both Dagit and Python APIs.

Asset Observations

AssetMaterializations signify that a Dagster op or IO manager has mutated or created an asset, and allow recording metadata about the asset at the time that the mutation occurs. On many occasions, a Dagster op will read or run a quality check check on an asset, without mutating it. The new AssetObservation event enables recording metadata about an asset without indicating that the asset has been updated.

This allows the Dagster asset catalog to represent a richer log of information about each asset - not just a history of the occasions it was updated, but also a history of occasions when code reading it observed something noteworthy about it. It allows buildilng up a history of data quality checks, from the perspective of the asset’s stakeholders, not just its owners.

Wrapping up

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Follow us:


Read more filed under
Release