Integrate OpenAI calls into your pipelines | Dagster Blog

March 11, 20242 minute read

Integrate OpenAI calls into your pipelines

The new dagster-openai integration lets you tap into the power of LLMs in a cost-efficient way.
Yuhan Luo
Name
Yuhan Luo
Handle
@yuhan
Maxime Armstrong
Name
Maxime Armstrong
Handle
@maxime

We are pleased to announce a new integration that will allow data practitioners to easily include OpenAI API calls as part of their data pipelines. Besides generating AI responses, the integration provides insights that let you optimize your API calls and credit consumption.

Building Generative AI Steps into Your Data Pipelines

There are many potential uses of an OpenAI API call in a data pipeline. Allow me to quote an expert on the matter: ChatGPT:

While we can't yet picture all of the possible use cases of a generative AI step in a data pipeline, here are some scenarios that seem valuable:

  • Submit a large document to OpenAI's API and request a summary of the document.
  • Submit a customer testimonial and request a standardized classification for sentiment analysis
  • Submit foreign language text and request a local translation

Here at Dagster Labs, we've used dagster-openai to summarize the category and generate learning summaries from GitHub issues and discussions. Our pipeline handles complex support requests. It provides a first-stab answer to user questions (speeding up our support team's response times), auto-categorizes the issue, and generates learning summaries on a weekly basis.

Keeping Your Costs Under Control

While generative AI offers a broad spectrum of capabilities, managing costs is essential. Dagster Labs is committed to providing the necessary tools to build your pipeline for optimal cost efficiency and performance. To this end, we introduce the OpenAIResource alongside the with_usage_metadata function from our library, ensuring uniform resource utilization across our platform.

Both Dagster Cloud and open-source users can take advantage of these features to monitor and optimize their data pipelines. For Dagster Cloud users, this functionality is seamlessly integrated with Dagster Insights, providing an enhanced experience with additional analytical capabilities. Meanwhile, open-source users can also leverage these tools and log their metadata, which they can then visualize as a metadata plot directly in the UI.

This unified approach ensures all users can effectively control their costs while maximizing the benefits of generative AI.

A screenshot of the Dagster UI showing the OpenAI API consumption graph.
A screenshot of the Dagster Cloud "Insights" UI showing the OpenAI API consumption graph.
Explore the guide here

Read more filed under
Blog post category for Blog Post. Blog Post