February 14, 2025 • 3 minute read •

Routing LLM prompts with Dagster and Not Diamond

Learn how LLM routing with Not Diamond can improve accuracy, and cost savings in your AI workflows

Name: Colton Padden
Handle: @colton

Routing LLM prompts with Dagster and Not Diamond

When developing an AI application, you've likely encountered these common challenges:

Which model is best suited for my specific needs?
How can I be certain my application is functioning as intended?

The inherent non-determinism of Large Language Models (LLMs) and the numerous options available make it essential to thoroughly evaluate and test each model to ensure production-ready performance. However, even the most effective model overall might struggle with certain edge cases, while other models might offer significant cost savings without compromising performance for most inputs.

A promising solution to this dilemma is to employ an LLM router, which can learn to dynamically select the optimal LLM for each query, thereby enhancing accuracy by up to 25% and reducing both inference costs and latency by a factor of 10.

How model routing works

When analyzing any dataset distribution, it's uncommon to find a single model which consistently outperforms all others across every possible query. Model routing addresses this by creating a "meta-model" that integrates multiple models and intelligently determines when to utilize each LLM. This approach not only surpasses the performance of any individual model but also reduces costs and latency by strategically employing smaller, more economical models when performance won't be compromised.

Routing in a Dagster pipeline

You can now add LLM routing to your Dagster pipelines with Not Diamond by instantiating a NotDiamondResource . Specifically,

create an asset with your desired routing options and the prompt in question,
pass the recommended model to your usage of the LLM provider of your choosing,
then materialize that asset as needed

Let's walk through a simple pipeline example.

First, we'll define an asset representing a hypothetical dataset called book_review_data providing a list of book reviews for our favorite book, "Cat's Cradle" by Kurt Vonnegut.

import time

import dagster as dg
import dagster_notdiamond as nd
import dagster_openai as oai


@dg.asset(kinds={"python"})
def book_review_data(context: dg.AssetExecutionContext) -> dict:
    data = {
        "title": "Cat's Cradle",
        "author": "Kurt Vonnegut",
        "genre": "Science Fiction",
        "publicationYear": 1963,
        "reviews": [
            {
                "reviewer": "John Doe",
                "rating": 4.5,
                "content": "A thought-provoking satire on science and religion. Vonnegut's wit shines through.",
            },
            {
                "reviewer": "Jane Smith",
                "rating": 5,
                "content": "An imaginative and darkly humorous exploration of humanity's follies. A must-read!",
            },
            {
                "reviewer": "Alice Johnson",
                "rating": 3.5,
                "content": "Intriguing premise but felt a bit disjointed at times. Still enjoyable.",
            },
        ],
    }
    context.add_output_metadata(metadata={"num_reviews": len(data.get("reviews", []))})
    return data

Our goal is to generate a summary of these reviews in a cost effective way through LLM routing.

So we'll define another asset, book_review_summary which uses the NotDiamondResource for model routing, and the OpenAIResource for completion. We call invoke the model_select method from our Not Diamond resource passing in our summarization prompt, the subset of models that we want to consider, and the parameter tradeoff="cost" parameter to optimize for cost savings.

This call returns a best_llm variable, which we can then pass to OpenAI in our usage of the chat.completions.create method.

Finally, we return a MaterializeResult with the metadata from our call to both Not Diamond and OpenAI.

@dg.asset(
    kinds={"openai", "notdiamond"}, automation_condition=dg.AutomationCondition.eager()
)
def book_reviews_summary(
    context: dg.AssetExecutionContext,
    notdiamond: nd.NotDiamondResource,
    openai: oai.OpenAIResource,
    book_review_data: dict,
) -> dg.MaterializeResult:
    prompt = f"""
    Given the book reviews for {book_review_data["title"]}, provide a detailed summary:

    {'|'.join([r['content'] for r in book_review_data["reviews"]])}
    """

    with notdiamond.get_client(context) as client:
        start = time.time()
        session_id, best_llm = client.model_select(
            model=["openai/gpt-4o", "openai/gpt-4o-mini"],
            tradeoff="cost",
            messages=[
                {"role": "system", "content": "You are an expert in literature"},
                {"role": "user", "content": prompt},
            ],
        )
        duration = time.time() - start

    with openai.get_client(context) as client:
        chat_completion = client.chat.completions.create(
            model=best_llm.model,
            messages=[{"role": "user", "content": prompt}],
        )

    summary = chat_completion.choices[0].message.content or ""

    return dg.MaterializeResult(
        metadata={
            "nd_session_id": session_id,
            "nd_best_llm_model": best_llm.model,
            "nd_best_llm_provider": best_llm.provider,
            "nd_routing_latency": duration,
            "summary": dg.MetadataValue.md(summary),
        }
    )

Once we materialize these assets we can see the summary of our book reviews, along with the other associated metadata!

session_id which we can use in subsequent routing requests,
best_llm as recommended by Not Diamond, and
routing_latency for time (in seconds) taken to fulfill the request.

Here we demonstrated how perform model routing, but it's worth noting that Not Diamond also supports automatic routing through their model gateway! You can find an example of that in our recent deep dive presentation, and in the community-integration repository.

Routing prompts in complex workflows

You can expand this pipeline in various ways:

Try submitting dynamic prompts to Not Diamond from previous pipeline nodes or even your data,
Explore the model gateway to automatically route across LLM providers: OpenAI, Anthropic, Gemini, and more!
Combine both of the above with Dagster Pipes to build a fully-automated workflow which leverages generative AI for your custom business logic.

Conclusion

To try out this example or add Not Diamond’s state-of-the-art routing to your Dagster pipelines, sign up to Not Diamond to get your API key and read the docs to learn more about adding LLM routing to your AI workflows.