Using AWS with Dagster
Utilities for interfacing with AWS: S3, ECS, EMR, Cloudwatch, SecretsManager and Redshift.
About this integration
The Dagster-AWS integration allows you to seamlessly integrate key AWS services into data pipelines:
- S3 (File storage)
- ECS (Amazon Elastic Compute Cloud)
- Redshift (Data warehousing)
- EMR for petabyte-scale data processing (Easily run and scale Apache Spark, Hive, Presto, and other big data workloads)
- CloudWatch (Application and infrastructure monitoring)
- SecretsManager (Manage, retrieve, and rotate database credentials, API keys, and other secrets.)
Installation
pip install dagster-aws
Examples
# Store your software-defined assets in S3
from dagster import asset, repository, with_resources
from dagster_aws.s3 import s3_pickle_io_manager, s3_resource
import pandas as pd
@asset
def asset1():
return pd.DataFrame()
@asset
def asset2(asset1):
return df[:5]
@repository
def repo():
return with_resources(
[asset1, asset2],
resource_defs={
"io_manager": s3_pickle_io_manager.configured(
{"s3_bucket": "my-cool-bucket", "s3_prefix": "my-cool-prefix"}
),
"s3": s3_resource,
},
)
About Amazon Web Services
AWS provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. Whether you're looking for compute power, database storage, content delivery, or other functionality, AWS has the services to help you build sophisticated applications with increased flexibility, scalability and reliability.