Azure Data Lake Storage Gen 2 (ADLS2)

About this integration

Dagster helps you use Azure Storage Accounts as part of your data pipeline. Azure Data Lake Storage Gen 2 (ADLS2) is our primary focus but we also provide utilities for Azure Blob Storage.

Installation

pip install dagster-azure

Examples

import pandas as pd
from dagster import Definitions, asset, job
from dagster_azure.adls2 import ADLS2Resource, ADLS2SASToken


@asset
def example_adls2_asset(adls2: ADLS2Resource):
    df = pd.DataFrame({"column1": [1, 2, 3], "column2": ["A", "B", "C"]})

    csv_data = df.to_csv(index=False)

    file_client = adls2.adls2_client.get_file_client(
        "my-file-system", "path/to/my_dataframe.csv"
    )
    file_client.upload_data(csv_data, overwrite=True)


defs = Definitions(
    assets=[example_adls2_asset],
    resources={
        "adls2": ADLS2Resource(
            storage_account="my_storage_account",
            credential=ADLS2SASToken(token="my_sas_token"),
        )
    },
)

In this updated code, we use ADLS2Resource directly instead of adls2_resource. The configuration is passed to ADLS2Resource during its instantiation.

About Azure Data Lake Storage Gen 2 (ADLS2)

Azure Data Lake Storage Gen 2 (ADLS2) is a set of capabilities dedicated to big data analytics, built on Azure Blob Storage. ADLS2 combines the scalability, cost-effectiveness, security, and rich capabilities of Azure Blob Storage with a high-performance file system that is built for analytics and is compatible with the Hadoop Distributed File System (HDFS). This makes it an ideal choice for data lakes and big data analytics.

Dagster Integration:Get the right tools for running Dagster with Azure Data Lake Storage Gen 2 (ADLS2)

About this integration

Installation

Examples

About Azure Data Lake Storage Gen 2 (ADLS2)

Dagster Integration:
Get the right tools for running Dagster with Azure Data Lake Storage Gen 2 (ADLS2)