Back to integrations
Using AWS with Dagster

Using AWS with Dagster

Utilities for interfacing with AWS: S3, ECS, EMR, Cloudwatch, SecretsManager and Redshift.

About this integration

The Dagster-AWS integration allows you to seamlessly integrate key AWS services into data pipelines:

  • S3 (File storage)
  • ECS (Amazon Elastic Compute Cloud)
  • Redshift (Data warehousing)
  • EMR for petabyte-scale data processing (Easily run and scale Apache Spark, Hive, Presto, and other big data workloads)
  • CloudWatch (Application and infrastructure monitoring)
  • SecretsManager (Manage, retrieve, and rotate database credentials, API keys, and other secrets.)

Installation

pip install dagster-aws

Examples

# Store your software-defined assets in S3

from dagster import asset, repository, with_resources
from dagster_aws.s3 import s3_pickle_io_manager, s3_resource
import pandas as pd

@asset
def asset1():
    return pd.DataFrame()

@asset
def asset2(asset1):
    return df[:5]

@repository
def repo():
    return with_resources(
        [asset1, asset2],
        resource_defs={
            "io_manager": s3_pickle_io_manager.configured(
                {"s3_bucket": "my-cool-bucket", "s3_prefix": "my-cool-prefix"}
            ),
            "s3": s3_resource,
        },
    )

About Amazon Web Services

AWS provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. Whether you're looking for compute power, database storage, content delivery, or other functionality, AWS has the services to help you build sophisticated applications with increased flexibility, scalability and reliability.