Amazon Web Services | Dagster Integrations
Back to integrations
Using AWS with Dagster

Using AWS with Dagster

Utilities for interfacing with AWS: S3, ECS, EMR, Cloudwatch, SecretsManager and Redshift.

About this integration

The Dagster-AWS integration allows you to seamlessly integrate key AWS services into data pipelines:

  • S3 (File storage)
  • ECS (Amazon Elastic Compute Cloud)
  • Redshift (Data warehousing)
  • EMR for petabyte-scale data processing (Easily run and scale Apache Spark, Hive, Presto, and other big data workloads)
  • CloudWatch (Application and infrastructure monitoring)
  • SecretsManager (Manage, retrieve, and rotate database credentials, API keys, and other secrets.)

Installation

pip install dagster-aws

Examples

# Store your software-defined assets in S3

from dagster import Definitions, asset
from dagster_aws.s3 import S3PickleIOManager, S3Resource
import pandas as pd

@asset
def asset1():
    return pd.DataFrame()

@asset
def asset2(asset1):
    return df[:5]

defs = Definitions(
    assets=[asset1, asset2],
    resources={
    "io_manager": S3PickleIOManager(
        s3_bucket="my-cool-bucket",
        s3_prefix="my-cool-prefix"
        s3_resource=S3Resource()
    )}
)

About Amazon Web Services

AWS provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. Whether you're looking for compute power, database storage, content delivery, or other functionality, AWS has the services to help you build sophisticated applications with increased flexibility, scalability and reliability.