February 14, 2025 • 3 minute read •
Routing LLM prompts with Dagster and Not Diamond
- Name
- Colton Padden
- Handle
- @colton
![Routing LLM prompts with Dagster and Not Diamond](/posts/routing-llms-with-not-diamond/cover.png)
When developing an AI application, you've likely encountered these common challenges:
- Which model is best suited for my specific needs?
- How can I be certain my application is functioning as intended?
The inherent non-determinism of Large Language Models (LLMs) and the numerous options available make it essential to thoroughly evaluate and test each model to ensure production-ready performance. However, even the most effective model overall might struggle with certain edge cases, while other models might offer significant cost savings without compromising performance for most inputs.
A promising solution to this dilemma is to employ an LLM router, which can learn to dynamically select the optimal LLM for each query, thereby enhancing accuracy by up to 25% and reducing both inference costs and latency by a factor of 10.
How model routing works
When analyzing any dataset distribution, it's uncommon to find a single model which consistently outperforms all others across every possible query. Model routing addresses this by creating a "meta-model" that integrates multiple models and intelligently determines when to utilize each LLM. This approach not only surpasses the performance of any individual model but also reduces costs and latency by strategically employing smaller, more economical models when performance won't be compromised.
Routing in a Dagster pipeline
You can now add LLM routing to your Dagster pipelines with Not Diamond by instantiating a NotDiamondResource
. Specifically,
- create an
asset
with your desired routing options and the prompt in question, - pass the recommended model to your usage of the LLM provider of your choosing,
- then materialize that asset as needed
Let's walk through a simple pipeline example.
First, we'll define an asset representing a hypothetical dataset called book_review_data
providing a list of book reviews for our favorite book, "Cat's Cradle" by Kurt Vonnegut.
import time
import dagster as dg
import dagster_notdiamond as nd
import dagster_openai as oai
@dg.asset(kinds={"python"})
def book_review_data(context: dg.AssetExecutionContext) -> dict:
data = {
"title": "Cat's Cradle",
"author": "Kurt Vonnegut",
"genre": "Science Fiction",
"publicationYear": 1963,
"reviews": [
{
"reviewer": "John Doe",
"rating": 4.5,
"content": "A thought-provoking satire on science and religion. Vonnegut's wit shines through.",
},
{
"reviewer": "Jane Smith",
"rating": 5,
"content": "An imaginative and darkly humorous exploration of humanity's follies. A must-read!",
},
{
"reviewer": "Alice Johnson",
"rating": 3.5,
"content": "Intriguing premise but felt a bit disjointed at times. Still enjoyable.",
},
],
}
context.add_output_metadata(metadata={"num_reviews": len(data.get("reviews", []))})
return data
Our goal is to generate a summary of these reviews in a cost effective way through LLM routing.
So we'll define another asset, book_review_summary
which uses the NotDiamondResource
for model routing, and the OpenAIResource
for completion. We call invoke the model_select
method from our Not Diamond resource passing in our summarization prompt, the subset of models that we want to consider, and the parameter tradeoff="cost"
parameter to optimize for cost savings.
This call returns a best_llm
variable, which we can then pass to OpenAI in our usage of the chat.completions.create
method.
Finally, we return a MaterializeResult
with the metadata from our call to both Not Diamond and OpenAI.
@dg.asset(
kinds={"openai", "notdiamond"}, automation_condition=dg.AutomationCondition.eager()
)
def book_reviews_summary(
context: dg.AssetExecutionContext,
notdiamond: nd.NotDiamondResource,
openai: oai.OpenAIResource,
book_review_data: dict,
) -> dg.MaterializeResult:
prompt = f"""
Given the book reviews for {book_review_data["title"]}, provide a detailed summary:
{'|'.join([r['content'] for r in book_review_data["reviews"]])}
"""
with notdiamond.get_client(context) as client:
start = time.time()
session_id, best_llm = client.model_select(
model=["openai/gpt-4o", "openai/gpt-4o-mini"],
tradeoff="cost",
messages=[
{"role": "system", "content": "You are an expert in literature"},
{"role": "user", "content": prompt},
],
)
duration = time.time() - start
with openai.get_client(context) as client:
chat_completion = client.chat.completions.create(
model=best_llm.model,
messages=[{"role": "user", "content": prompt}],
)
summary = chat_completion.choices[0].message.content or ""
return dg.MaterializeResult(
metadata={
"nd_session_id": session_id,
"nd_best_llm_model": best_llm.model,
"nd_best_llm_provider": best_llm.provider,
"nd_routing_latency": duration,
"summary": dg.MetadataValue.md(summary),
}
)
Once we materialize these assets we can see the summary of our book reviews, along with the other associated metadata!
session_id
which we can use in subsequent routing requests,best_llm
as recommended by Not Diamond, androuting_latency
for time (in seconds) taken to fulfill the request.
Here we demonstrated how perform model routing, but it's worth noting that Not Diamond also supports automatic routing through their model gateway! You can find an example of that in our recent deep dive presentation, and in the community-integration repository.
Routing prompts in complex workflows
You can expand this pipeline in various ways:
- Try submitting dynamic prompts to Not Diamond from previous pipeline nodes or even your data,
- Explore the model gateway to automatically route across LLM providers: OpenAI, Anthropic, Gemini, and more!
- Combine both of the above with Dagster Pipes to build a fully-automated workflow which leverages generative AI for your custom business logic.
Conclusion
To try out this example or add Not Diamond’s state-of-the-art routing to your Dagster pipelines, sign up to Not Diamond to get your API key and read the docs to learn more about adding LLM routing to your AI workflows.
AI Reference Architectures
- Name
- Dennis Hume
- Handle
- @Dennis_Hume
Data Platform Week 2024
- Name
- Alex Noonan
- Handle
- @noonan
Interactive Debugging With Dagster and Docker
- Name
- Gianfranco Demarco
- Handle
- @gianfranco