Dagster vs. Azure Data Factory

Azure offers a visual workflow tool for creating data pipelines as part of its ecosystem. Why should you opt for Dagster instead?

Get started with Dagster

Try Dagster+ for free

30-day trial. No credit card required.

The big differences between Dagster and Data Factory:

Azure Data Factory is a drag-and-drop data integration tool that lets you ingest and transform data from different sources.

Drag-and-drop GUI first, with limited programmatic support
Used to move data between different Azure services
Runs manually, on a schedule, or through a limited set of configurable events

Dagster is specifically designed for data engineers.

Define critical data assets in code
Use a declarative approach to make your data engineering team far more productive
Provides local development, deep integrations with the modern data stack, and scheduling built around stakeholder SLAs.

Software Development Life Cycle and Developer Experience

Data ingestion, transformation, and control flow is defined in the web interface and saved in JSON.

Easy to get started, but impossible to customize beyond the pre-defined "building blocks".
Integrated with source control but requires manual action for testing, PRs are not readable, and no local development.

The data transformation logic, resource integrations, DAGs, and pipeline automation are all defined and versioned in code.

Developers can define, review, test, and version every aspect of the data platform locally
Code PRs are easy to digest and test
No limits on what transformations, control flows, or source and destination systems can be used, plus support for dynamic programming.

How ‘data aware’ are these systems?

Azure Data Factory is a pipeline-first system. Datasets are secondary, and dataset lineage requires integrations with other Azure tools.

Provides limited data lineage.
New data assets must be wedged into existing pipelines.
Dependencies across pipelines are not explicit.

With Dagster, data assets are first class citizens.

Full dataset lineage.
Clear real-time status of each dataset.
Pipeline schedules based on data freshness SLAs (i.e. hourly, daily, when upstream dependencies update, etc.).
Cross-pipeline dependencies are shown to support multi-team data mesh use cases.

To summarize the main differences between Azure Data Factory and Dagster:


Goal of the solution	Cloud ETL service to help ingest data into Azure.	Help data engineers define and manage critical data assets.
Run Python code reliably and provide flexibility for complex programming tasks	Azure Data Factory is a code-free platform. Pre-built transformations and control flows Limited scheduling options Limited source control with manual testing	Python function decorators create DAGs of assets. Fully custom transformations and conditional execution, no black boxes Local development, readable PRs, fully automated CICD including unit tests Retries, run queues, and parallelization Custom logging and metadata Dynamic programming
Data assets	Pipeline first - datasets second.	Asset-centric framework: Global asset lineage even across jobs and teams Partitions & backfills Data SLAs
Automation	Tumbling window schedules, cron schedules with limitations, some Azure-based event driven runs.	In Python Code: Fully custom schedules Fully custom sensors (Event-Driven) Data SLAs
Integrations	Built around 80+ data ingestion services	Asset-first integrations for common data tools

Community

Dagster has a growing community of forward-thinking engineers who see the value of our differentiated approach. The Dagster engineering team is directly involved in supporting both open-source and Dagster+ users.

Interested in getting an objective 3rd party perspective? Join the Dagster Slack and interact with current users.

Join us on Slack

Dagster+ for Enterprise

Looking for unlimited deployments, advanced RBAC and SAML-based SSO, all on a SOC2 certified platform? Contact the Dagster Labs sales team today to discuss your requirements.

Get started for free

Contact Sales