Dagster + Ray

Scale your Python workloads with Ray's distributed computing framework directly from Dagster.

About dagster-ray

The community-supported dagster-ray package allows orchestrating distributed Ray compute from Dagster pipelines. This integration includes RayRunLauncher, ray_executor, RayIOManager, and PipesRayJobClient for distributed computing workflows.

The integration provides multiple options for running Dagster code on Ray clusters, from executing individual ops as Ray jobs to storing intermediate values in Ray's object store.

Installation

pip install dagster dagster-ray
# For KubeRay support:
pip install dagster 'dagster-ray[kuberay]'

Example

from dagster import job, op, asset, Definitions
from dagster_ray import ray_executor, RayIOManager

@op(
    tags={
        "dagster-ray/config": {
            "num_cpus": 8,
            "num_gpus": 2,
            "runtime_env": {"pip": {"packages": ["torch"]}},
        }
    }
)
def my_ray_op():
    import torch
    # your expensive computation here
    result = ...
    return result

@asset(io_manager_key="ray_io_manager")
def ray_asset() -> int:
    return 42

@job(executor_def=ray_executor.configured({"ray": {"num_cpus": 1}}))
def my_ray_job():
    return my_ray_op()

defs = Definitions(
    assets=[ray_asset],
    jobs=[my_ray_job],
    resources={
        "ray_io_manager": RayIOManager(),
    }
)

About Ray

Ray is a unified framework for scaling AI and Python applications. It provides a simple, universal API for building distributed applications, enabling you to parallelize single machine code with minimal code changes. Ray is used for distributed computing, machine learning, and hyperparameter tuning at scale.