Google Cloud Platform | Dagster Integrations
Back to integrations
Dagster + GCP

Dagster + GCP

Integrate with GCPs cloud capabilities: BigQuery, Dataproc, GCS, File Manager.

About this integration

The dagster-gcp integration allows you to orchestrate GCP resources from a Dagster pipeline.

Tap into:

  • BigQuery (Enterprise data warehouse)
  • Dataproc (managed open source tools and frameworks for data lake modernization, ETL, and data science)
  • GCS (Cloud object storage)

Installation

pip install dagster-gcp

Example

from dagster import Definitions, asset
from dagster_gcp.gcs import GCSPickleIOManager, GCSResource
import pandas as pd

@asset
def asset1():
    return pd.DataFrame()

@asset
def asset2(asset1):
    return df[:5]


defs = Definitions(
    assets=[asset1, asset2],
    resources={
        "io_manager": GCSPickleIOManager(
            gcs_bucket="my-cool-bucket",
            gcs_prefix="my-cool-prefix",
            gcs=GCSResource()
        ),
    },
)

About Google Cloud Platform

Google Cloud Platform (GCP) is a cloud environment favored by data teams working on ML/AI. Data science, AI infrastructure and frameworks for ML like AutoML make it a favored platform.

Dagster shines in its ability to let data teams train and the productionize ML models with minimal disruption to their production pipelines.

As a result, a Dagster-GCP is a winning combination for many data teams dealing with more complex pipelines, such as productionizing ML models.