About dagster-ray
The community-supported dagster-ray
package allows orchestrating distributed Ray compute from Dagster pipelines. This integration includes RayRunLauncher, ray_executor, RayIOManager, and PipesRayJobClient for distributed computing workflows.
The integration provides multiple options for running Dagster code on Ray clusters, from executing individual ops as Ray jobs to storing intermediate values in Ray's object store.
Installation
pip install dagster dagster-ray
# For KubeRay support:
pip install dagster 'dagster-ray[kuberay]'
Example
from dagster import job, op, asset, Definitions
from dagster_ray import ray_executor, RayIOManager
@op(
tags={
"dagster-ray/config": {
"num_cpus": 8,
"num_gpus": 2,
"runtime_env": {"pip": {"packages": ["torch"]}},
}
}
)
def my_ray_op():
import torch
# your expensive computation here
result = ...
return result
@asset(io_manager_key="ray_io_manager")
def ray_asset() -> int:
return 42
@job(executor_def=ray_executor.configured({"ray": {"num_cpus": 1}}))
def my_ray_job():
return my_ray_op()
defs = Definitions(
assets=[ray_asset],
jobs=[my_ray_job],
resources={
"ray_io_manager": RayIOManager(),
}
)
About Ray
Ray is a unified framework for scaling AI and Python applications. It provides a simple, universal API for building distributed applications, enabling you to parallelize single machine code with minimal code changes. Ray is used for distributed computing, machine learning, and hyperparameter tuning at scale.