Dagster Integration:
Using AWS Glue with Dagster
About this Integration
The dagster-aws
integration library provides the PipesGlueClient
resource, enabling you to launch AWS Glue jobs directly from Dagster assets and ops. This integration allows you to pass parameters to Glue code while Dagster receives real-time events, such as logs, asset checks, and asset materializations, from the initiated jobs. With minimal code changes required on the job side, this integration is both efficient and easy to implement.
Installation
pip install dagster-aws
Examples
import boto3
from dagster import AssetExecutionContext, Definitions, asset
from dagster_aws.pipes import (
PipesGlueClient,
PipesS3ContextInjector,
PipesS3MessageReader,
)
@asset
def glue_pipes_asset(
context: AssetExecutionContext, pipes_glue_client: PipesGlueClient
):
return pipes_glue_client.run(
context=context,
job_name="Example Job",
arguments={"some_parameter_value": "1"},
).get_materialize_result()
defs = Definitions(
assets=[glue_pipes_asset],
resources={
"pipes_glue_client": PipesGlueClient(
client=boto3.client("glue"),
context_injector=PipesS3ContextInjector(
client=boto3.client("s3"),
bucket="my-bucket",
),
message_reader=PipesS3MessageReader(
client=boto3.client("s3"), bucket="my-bucket"
),
)
},
)
About AWS Glue
AWS Glue is a fully managed cloud service designed to simplify and automate the process of discovering, preparing, and integrating data for analytics, machine learning, and application development. It supports a wide range of data sources and formats, offering seamless integration with other AWS services. AWS Glue provides the tools to create, run, and manage ETL (Extract, Transform, Load) jobs, making it easier to handle complex data workflows. Its serverless architecture allows for scalability and flexibility, making it a preferred choice for data engineers and analysts who need to process and prepare data efficiently.