Dagster + GCP
Integrate with GCPs cloud capabilities: BigQuery, Dataproc, GCS, File Manager.
About this integration
The dagster-gcp
integration allows you to orchestrate GCP resources from a Dagster pipeline.
Tap into:
- BigQuery (Enterprise data warehouse)
- Dataproc (managed open source tools and frameworks for data lake modernization, ETL, and data science)
- GCS (Cloud object storage)
Installation
pip install dagster-gcp
Example
from dagster import asset, with_resources
from dagster_gcp.gcs import gcs_pickle_io_manager, gcs_resource
import pandas as pd
@asset
def asset1():
return pd.DataFrame()
@asset
def asset2(asset1):
return df[:5]
assets = with_resources(
[asset1, asset2],
resource_defs={
"io_manager": gcs_pickle_io_manager.configured(
{"gcs_bucket": "my-cool-bucket", "gcs_prefix": "my-cool-prefix"}
),
"gcs": gcs_resource,
},
)
About Google Cloud Platform
Google Cloud Platform (GCP) is a cloud environment favored by data teams working on ML/AI. Data science, AI infrastructure and frameworks for ML like AutoML make it a favored platform.
Dagster shines in its ability to let data teams train and the productionize ML models with minimal disruption to their production pipelines.
As a result, a Dagster-GCP is a winning combination for many data teams dealing with more complex pipelines, such as productionizing ML models.