Back to integrations
Dagster + GCP

Dagster + GCP

Integrate with GCPs cloud capabilities: BigQuery, Dataproc, GCS, File Manager.

About this integration

The dagster-gcp integration allows you to orchestrate GCP resources from a Dagster pipeline.

Tap into:

  • BigQuery (Enterprise data warehouse)
  • Dataproc (managed open source tools and frameworks for data lake modernization, ETL, and data science)
  • GCS (Cloud object storage)

Installation

pip install dagster-gcp

Example

from dagster import asset, with_resources
from dagster_gcp.gcs import gcs_pickle_io_manager, gcs_resource
import pandas as pd

@asset
def asset1():
    return pd.DataFrame()

@asset
def asset2(asset1):
    return df[:5]

assets = with_resources(
    [asset1, asset2],
    resource_defs={
        "io_manager": gcs_pickle_io_manager.configured(
            {"gcs_bucket": "my-cool-bucket", "gcs_prefix": "my-cool-prefix"}
        ),
        "gcs": gcs_resource,
    },
)

About Google Cloud Platform

Google Cloud Platform (GCP) is a cloud environment favored by data teams working on ML/AI. Data science, AI infrastructure and frameworks for ML like AutoML make it a favored platform.

Dagster shines in its ability to let data teams train and the productionize ML models with minimal disruption to their production pipelines.

As a result, a Dagster-GCP is a winning combination for many data teams dealing with more complex pipelines, such as productionizing ML models.