Dagster + GCP

Integrate with GCPs cloud capabilities: BigQuery, Dataproc, GCS, File Manager.

About this integration

The dagster-gcp integration allows you to orchestrate GCP resources from a Dagster pipeline.

Tap into:

  • BigQuery (Enterprise data warehouse)
  • Dataproc (managed open source tools and frameworks for data lake modernization, ETL, and data science)
  • GCS (Cloud object storage)


pip install dagster-gcp


from dagster import asset, with_resources
from dagster_gcp.gcs import gcs_pickle_io_manager, gcs_resource
import pandas as pd

def asset1():
    return pd.DataFrame()

def asset2(asset1):
    return df[:5]

assets = with_resources(
    [asset1, asset2],
        "io_manager": gcs_pickle_io_manager.configured(
            {"gcs_bucket": "my-cool-bucket", "gcs_prefix": "my-cool-prefix"}
        "gcs": gcs_resource,

About Google Cloud Platform

Google Cloud Platform (GCP) is a cloud environment favored by data teams working on ML/AI. Data science, AI infrastructure and frameworks for ML like AutoML make it a favored platform.

Dagster shines in its ability to let data teams train and the productionize ML models with minimal disruption to their production pipelines.

As a result, a Dagster-GCP is a winning combination for many data teams dealing with more complex pipelines, such as productionizing ML models.