The big differences between Dagster and Data Factory:
Azure Data Factory is a drag-and-drop data integration tool that lets you ingest and transform data from different sources.
- Drag-and-drop GUI first, with limited programmatic support
- Used to move data between different Azure services
- Runs manually, on a schedule, or through a limited set of configurable events
Dagster is specifically designed for data engineers.
- Define critical data assets in code
- Use a declarative approach to make your data engineering team far more productive
- Provides local development, deep integrations with the modern data stack, and scheduling built around stakeholder SLAs.
Software Development Life Cycle and Developer Experience
Data ingestion, transformation, and control flow is defined in the web interface and saved in JSON.
- Easy to get started, but impossible to customize beyond the pre-defined "building blocks".
- Integrated with source control but requires manual action for testing, PRs are not readable, and no local development.
The data transformation logic, resource integrations, DAGs, and pipeline automation are all defined and versioned in code.
- Developers can define, review, test, and version every aspect of the data platform locally
- Code PRs are easy to digest and test
- No limits on what transformations, control flows, or source and destination systems can be used, plus support for dynamic programming.
How ‘data aware’ are these systems?
Azure Data Factory is a pipeline-first system. Datasets are secondary, and dataset lineage requires integrations with other Azure tools.
- Provides limited data lineage.
- New data assets must be wedged into existing pipelines.
- Dependencies across pipelines are not explicit.
With Dagster, data assets are first class citizens.
- Full dataset lineage.
- Clear real-time status of each dataset.
- Pipeline schedules based on data freshness SLAs (i.e. hourly, daily, when upstream dependencies update, etc.).
- Cross-pipeline dependencies are shown to support multi-team data mesh use cases.
To summarize the main differences between Azure Data Factory and Dagster:
|Goal of the solution||Cloud ETL service to help ingest data into Azure.||Help data engineers define and manage critical data assets.|
|Run Python code reliably and provide flexibility for complex programming tasks|
Azure Data Factory is a code-free platform.
Python function decorators create DAGs of assets.
|Data assets||Pipeline first - datasets second.||Asset-centric framework:|
Tumbling window schedules, cron schedules with limitations, some Azure-based event driven runs.
|In Python Code:|
Built around 80+ data ingestion services
Asset-first integrations for common data tools
Dagster has a growing community of forward-thinking engineers who see the value of our differentiated approach. The Dagster engineering team is directly involved in supporting both open-source and Dagster Cloud users.
Interested in getting an objective 3rd party perspective? Join the Dagster Slack and interact with current users.