The big differences between Dagster and Step Functions:
AWS Step Functions is a serverless orchestration service that lets you integrate with AWS Lambda functions and other AWS services to build business-critical application.
- General-purpose workflow runner to execute AWS services
- Used to manage infrastructure, AWS services, or applications
- Build visually or organize tasks using AWS’s JSON State Language
- Calls out to other services to execute user-defined code
- Run either manually or via CloudWatch triggers
Dagster is specifically designed for data engineers.
- Define critical data assets in code
- Use a declarative approach to make your data engineering team far more productive
- Provides local development, deep integrations with the modern data stack, and scheduling built around stakeholder SLAs.
Software Development Life Cycle and Developer Experience
Represent the data transformation logic in Python via Lambda functions.
- The rest of the data platform (resource integrations, DAGs, and automation) is handled separately outside of code
- Hard to run and test the data platform locally
- Data processing code must be kept in sync with all other service configurations
- Painful to write manually; most developers resort to drag-and-drop visual development, increasing the risk of code drifting from configuration. This limits what types of DAGs, resources, and integrations can be used.
The data transformation logic, resource integrations, DAGs, and pipeline automation are all defined and versioned in code.
- Developers can define,review, test, and version every aspect of the data platform locally
- Code PRs can include both changes to ETL logic and the definition of what warehouse the ETL runs on
- Use Python for every aspect of the data platform, including unit testing, type checking, shared utility code, mock resources, and dynamic programming.
How ‘data aware’ are these systems?
AWS Step Functions are not aware of the datasets they create.
- Do not provide any form of data lineage
- New data assets must be wedged into existing pipelines
- Dependencies across pipelines are not explicit.
With Dagster, data assets are first class citizens.
- Full dataset lineage
- Clear real-time status of each dataset
- Pipeline schedules based on data freshness SLAs (i.e. hourly, daily, when upstream dependencies update, etc.)
To summarize the main differences between AWS Step Functions and Dagster:
|Goal of the solution
|Help coordinate AWS services.
|Help data engineers define and manage critical data assets.
|Run Python code reliably and provide flexibility for complex programming tasks
Step Functions defined in JSON call Python code in Lambdas.
Python function decorators create DAGs of assets.
|No data asset abstraction.
Schedules in CloudWatch with event triggers.
|In Python Code:
|Software development lifecycle
|Best-in-class local development:
Blocks built around AWS services
Dagster has a growing community of forward-thinking engineers who see the value of our differentiated approach. The Dagster engineering team is directly involved in supporting both open-source and Dagster Cloud users.
Interested in getting an objective 3rd party perspective? Join the Dagster Slack and interact with current users.