Dagster + GCP
Integrate with GCPs cloud capabilities: BigQuery, Dataproc, GCS, File Manager.
About this integration
The dagster-gcp
integration allows you to orchestrate GCP resources from a Dagster pipeline.
Tap into:
- BigQuery (Enterprise data warehouse)
- Dataproc (managed open source tools and frameworks for data lake modernization, ETL, and data science)
- GCS (Cloud object storage)
Installation
pip install dagster-gcp
Example
from dagster import Definitions, asset
from dagster_gcp.gcs import GCSPickleIOManager, GCSResource
import pandas as pd
@asset
def asset1():
return pd.DataFrame()
@asset
def asset2(asset1):
return df[:5]
defs = Definitions(
assets=[asset1, asset2],
resources={
"io_manager": GCSPickleIOManager(
gcs_bucket="my-cool-bucket",
gcs_prefix="my-cool-prefix",
gcs=GCSResource()
),
},
)
About Google Cloud Platform
Google Cloud Platform (GCP) is a cloud environment favored by data teams working on ML/AI. Data science, AI infrastructure and frameworks for ML like AutoML make it a favored platform.
Dagster shines in its ability to let data teams train and the productionize ML models with minimal disruption to their production pipelines.
As a result, a Dagster-GCP is a winning combination for many data teams dealing with more complex pipelines, such as productionizing ML models.