Dagster + Ray

Scale your Python workloads with Ray's distributed computing framework directly from Dagster.

About dagster-ray

The community-supported dagster-ray package allows orchestrating distributed Ray compute from Dagster pipelines. This integration includes RayRunLauncher, ray_executor, RayIOManager, and PipesRayJobClient for distributed computing workflows.

`dagster-ray` enables working with distributed Ray compute from Dagster pipelines, combining Dagster's excellent orchestration capabilities and Ray's distributed computing power together.

Learn more in `dagster-ray` docs

Installation

pip install dagster dagster-ray
# For KubeRay support:
pip install dagster 'dagster-ray[kuberay]'

Example

from dagster import job, op, asset, Definitions
from dagster_ray import ray_executor, RayIOManager

@op(
    tags={
        "dagster-ray/config": {
            "num_cpus": 8,
            "num_gpus": 2,
            "runtime_env": {"pip": {"packages": ["torch"]}},
        }
    }
)
def my_ray_op():
    import torch
    # your expensive computation here
    result = ...
    return result

@asset(io_manager_key="ray_io_manager")
def ray_asset() -> int:
    return 42

@job(executor_def=ray_executor.configured({"ray": {"num_cpus": 1}}))
def my_ray_job():
    return my_ray_op()

defs = Definitions(
    assets=[ray_asset],
    jobs=[my_ray_job],
    resources={
        "ray_io_manager": RayIOManager(),
    }
)

About Ray

Ray is a unified framework for scaling AI and Python applications. It provides a simple, universal API for building distributed applications, enabling you to parallelize single machine code with minimal code changes. Ray is used for distributed computing, machine learning, and hyperparameter tuning at scale.