Blog
New Dagster Integration: Include OpenAI Calls Into Your Data Pipelines

New Dagster Integration: Include OpenAI Calls Into Your Data Pipelines

The new dagster-openai integration lets you tap into the power of LLMs in a cost-efficient way.

New Dagster Integration: Include OpenAI Calls Into Your Data Pipelines

We are pleased to announce a new integration that will allow data practitioners to easily include OpenAI API calls as part of their data pipelines.  Besides generating AI responses, the integration provides insights that let you optimize your API calls and credit consumption.

Building Generative AI Steps into Your Data Pipelines

There are many potential uses of an OpenAI API call in a data pipeline.  Allow me to quote an expert on the matter: ChatGPT:

OpenAI's API, including services powered by GPT (like ChatGPT), Codex, and DALL·E, has found a wide range of applications across various industries. Below are some of the most popular use cases adopted by large corporations:

Top Use Cases of OpenAI's API:

  1. Customer Service Automation
    – Deploying AI chatbots to handle customer queries, provide 24/7 support, and reduce response times.
  2. Content Generation
    – Writing articles, social media posts, product descriptions, and marketing copy at scale.
  3. Data Analysis and Insights
    – Processing large datasets, extracting key insights, and generating summaries or reports.
  4. Language Translation and Localization
    – Translating content and adapting it to different languages and cultural contexts.
  5. Personalized Recommendations
    – Suggesting products, content, or services based on user preferences and behavior.
  6. Educational Tools and Tutoring
    – Assisting learners with explanations, practice problems, and personalized study help.
  7. Document Automation and Summarization
    – Automating repetitive document tasks and summarizing long-form content quickly.
  8. Creative Design and Art
    – Using tools like DALL·E to generate images, design concepts, and visual assets.
  9. Financial Analysis and Forecasting
    – Assisting with interpreting financial data, generating forecasts, and risk assessment.

While we can't yet picture all of the possible use cases of a generative AI step in a data pipeline, here are some scenarios that seem valuable:

  • Submit a large document to OpenAI's API and request a summary of the document.
  • Submit a customer testimonial and request a standardized classification for sentiment analysis
  • Submit foreign language text and request a local translation

Here at Dagster Labs, we've used dagster-openai to summarize the category and generate learning summaries from GitHub issues and discussions.  Our pipeline handles complex support requests. It provides a first-stab answer to user questions (speeding up our support team's response times), auto-categorizes the issue, and generates learning summaries on a weekly basis.

Keeping Your Costs Under Control

While generative AI offers a broad spectrum of capabilities, managing costs is essential. Dagster Labs is committed to providing the necessary tools to build your pipeline for optimal cost efficiency and performance. To this end, we introduce the OpenAIResource alongside the with_usage_metadata function from our library, ensuring uniform resource utilization across our platform.

Both Dagster Cloud and open-source users can take advantage of these features to monitor and optimize their data pipelines. For Dagster Cloud users, this functionality is seamlessly integrated with Dagster Insights, providing an enhanced experience with additional analytical capabilities. Meanwhile, open-source users can also leverage these tools and log their metadata, which they can then visualize as a metadata plot directly in the UI.  

This unified approach ensures all users can effectively control their costs while maximizing the benefits of generative AI.

A screenshot of the Dagster UI showing the OpenAI API consumption graph.
   A screenshot of the Dagster Cloud "Insights" UI showing the OpenAI API consumption graph.  

           Explore the guide here        

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Follow us:

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Code Location Best Practices

June 12, 2025

Code Location Best Practices

How to organize your code locations for clarity, maintainability, and reuse.

Connect 211's Small Team, Big Impact: Building a Community Resource Data Platform That Serves Millions

June 10, 2025

Connect 211's Small Team, Big Impact: Building a Community Resource Data Platform That Serves Millions

Data orchestration is our primary business, so Dagster has been a total game changer for us.‍

Big Cartel Brought Fragmented Data into a Unified Control Plane with Dagster

June 3, 2025

Big Cartel Brought Fragmented Data into a Unified Control Plane with Dagster

Within six months, Big Cartel went from "waiting for dashboards to break" to proactive monitoring through their custom "Data Firehose," eliminated inconsistent business metrics that varied "depending on the day you asked," and built a foundation that scales from internal analytics to customer-facing data products.