Blog
Routing LLM prompts with Dagster and Not Diamond

Routing LLM prompts with Dagster and Not Diamond

February 14, 2025
Routing LLM prompts with Dagster and Not Diamond

Learn how LLM routing with Not Diamond can improve accuracy, and cost savings in your AI workflows

When developing an AI application, you've likely encountered these common challenges:

  • Which model is best suited for my specific needs?
  • How can I be certain my application is functioning as intended?

The inherent non-determinism of Large Language Models (LLMs) and the numerous options available make it essential to thoroughly evaluate and test each model to ensure production-ready performance. However, even the most effective model overall might struggle with certain edge cases, while other models might offer significant cost savings without compromising performance for most inputs.

A promising solution to this dilemma is to employ an LLM router, which can learn to dynamically select the optimal LLM for each query, thereby enhancing accuracy by up to 25% and reducing both inference costs and latency by a factor of 10.

How model routing works

When analyzing any dataset distribution, it's uncommon to find a single model which consistently outperforms all others across every possible query. Model routing addresses this by creating a "meta-model" that integrates multiple models and intelligently determines when to utilize each LLM. This approach not only surpasses the performance of any individual model but also reduces costs and latency by strategically employing smaller, more economical models when performance won't be compromised.

Routing in a Dagster pipeline

You can now add LLM routing to your Dagster pipelines with Not Diamond by instantiating a NotDiamondResource . Specifically,

  • create an asset with your desired routing options and the prompt in question,
  • pass the recommended model to your usage of the LLM provider of your choosing,
  • then materialize that asset as needed

Let's walk through a simple pipeline example.

First, we'll define an asset representing a hypothetical dataset called book_review_data providing a list of book reviews for our favorite book, "Cat's Cradle" by Kurt Vonnegut.

import time

import dagster as dg
import dagster_notdiamond as nd
import dagster_openai as oai


@dg.asset(kinds={"python"})
def book_review_data(context: dg.AssetExecutionContext) -> dict:
    data = {
        "title": "Cat's Cradle",
        "author": "Kurt Vonnegut",
        "genre": "Science Fiction",
        "publicationYear": 1963,
        "reviews": [
            {
                "reviewer": "John Doe",
                "rating": 4.5,
                "content": "A thought-provoking satire on science and religion. Vonnegut's wit shines through.",
            },
            {
                "reviewer": "Jane Smith",
                "rating": 5,
                "content": "An imaginative and darkly humorous exploration of humanity's follies. A must-read!",
            },
            {
                "reviewer": "Alice Johnson",
                "rating": 3.5,
                "content": "Intriguing premise but felt a bit disjointed at times. Still enjoyable.",
            },
        ],
    }
    context.add_output_metadata(metadata={"num_reviews": len(data.get("reviews", []))})
    return data

Our goal is to generate a summary of these reviews in a cost effective way through LLM routing.

So we'll define another asset, book_review_summary which uses the NotDiamondResource for model routing, and the OpenAIResource for completion. We call invoke the model_select method from our Not Diamond resource passing in our summarization prompt, the subset of models that we want to consider, and the parameter tradeoff="cost" parameter to optimize for cost savings.

This call returns a best_llm variable, which we can then pass to OpenAI in our usage of the chat.completions.create method.

@dg.asset(
    kinds={"openai", "notdiamond"}, automation_condition=dg.AutomationCondition.eager()
)
def book_reviews_summary(
    context: dg.AssetExecutionContext,
    notdiamond: nd.NotDiamondResource,
    openai: oai.OpenAIResource,
    book_review_data: dict,
) -> dg.MaterializeResult:
    prompt = f"""
    Given the book reviews for {book_review_data["title"]}, provide a detailed summary:

    {'|'.join([r['content'] for r in book_review_data["reviews"]])}
    """

    with notdiamond.get_client(context) as client:
        start = time.time()
        session_id, best_llm = client.model_select(
            model=["openai/gpt-4o", "openai/gpt-4o-mini"],
            tradeoff="cost",
            messages=[
                {"role": "system", "content": "You are an expert in literature"},
                {"role": "user", "content": prompt},
            ],
        )
        duration = time.time() - start

    with openai.get_client(context) as client:
        chat_completion = client.chat.completions.create(
            model=best_llm.model,
            messages=[{"role": "user", "content": prompt}],
        )

    summary = chat_completion.choices[0].message.content or ""

    return dg.MaterializeResult(
        metadata={
            "nd_session_id": session_id,
            "nd_best_llm_model": best_llm.model,
            "nd_best_llm_provider": best_llm.provider,
            "nd_routing_latency": duration,
            "summary": dg.MetadataValue.md(summary),
        }
    )

Finally, we return a MaterializeResult with the metadata from our call to both Not Diamond and OpenAI.

Once we materialize these assets we can see the summary of our book reviews, along with the otherassociated metadata!

  • session_id which we can use in subsequent routing requests,
  • best_llm as recommended by Not Diamond, and
  • routing_latency for time (in seconds) taken to fulfill the request.
Dagster global asset lineage

Here we demonstrated how perform model routing, but it's worth noting that Not Diamond also supports automatic routing through their model gateway! You can find an example of that in our recent deep dive presentation, and in the community-integration repository.

Routing prompts in complex workflows

You can expand this pipeline in various ways:

  • Try submitting dynamic prompts to Not Diamond from previous pipeline nodes or even your data,
  • Explore the model gateway to automatically route across LLM providers: OpenAI, Anthropic, Gemini, and more!
  • Combine both of the above with Dagster Pipes to build a fully-automated workflow which leverages generative AI for your custom business logic.

Conclusion

To try out this example or add Not Diamond’s state-of-the-art routing to your Dagster pipelines, sign up to Not Diamond to get your API key and read the docs to learn more about adding LLM routing to your AI workflows.

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

dbt Fusion Support Comes to Dagster

August 22, 2025

dbt Fusion Support Comes to Dagster

Learn how to use the beta dbt Fusion engine in your Dagster pipelines, and the technical details of how support was added

What CoPilot Won’t Teach You About Python (Part 2)

August 20, 2025

What CoPilot Won’t Teach You About Python (Part 2)

Explore another set of powerful yet overlooked Python features—from overload and cached_property to contextvars and ExitStack

Dagster’s MCP Server

August 8, 2025

Dagster’s MCP Server

We are announcing the release of our MCP server, enabling AI assistants like Cursor to seamlessly integrate with Dagster projects through Model Context Protocol, unlocking composable workflows across your entire data stack.