Blog
New Dagster Integration: Include OpenAI Calls Into Your Data Pipelines

New Dagster Integration: Include OpenAI Calls Into Your Data Pipelines

March 11, 2024
New Dagster Integration: Include OpenAI Calls Into Your Data Pipelines

The new dagster-openai integration lets you tap into the power of LLMs in a cost-efficient way.

We are pleased to announce a new integration that will allow data practitioners to easily include OpenAI API calls as part of their data pipelines.  Besides generating AI responses, the integration provides insights that let you optimize your API calls and credit consumption.

Building Generative AI Steps into Your Data Pipelines

There are many potential uses of an OpenAI API call in a data pipeline.  Allow me to quote an expert on the matter: ChatGPT:

OpenAI's API, including services powered by GPT (like ChatGPT), Codex, and DALL·E, has found a wide range of applications across various industries. Below are some of the most popular use cases adopted by large corporations:

Top Use Cases of OpenAI's API:

  1. Customer Service Automation
    – Deploying AI chatbots to handle customer queries, provide 24/7 support, and reduce response times.
  2. Content Generation
    – Writing articles, social media posts, product descriptions, and marketing copy at scale.
  3. Data Analysis and Insights
    – Processing large datasets, extracting key insights, and generating summaries or reports.
  4. Language Translation and Localization
    – Translating content and adapting it to different languages and cultural contexts.
  5. Personalized Recommendations
    – Suggesting products, content, or services based on user preferences and behavior.
  6. Educational Tools and Tutoring
    – Assisting learners with explanations, practice problems, and personalized study help.
  7. Document Automation and Summarization
    – Automating repetitive document tasks and summarizing long-form content quickly.
  8. Creative Design and Art
    – Using tools like DALL·E to generate images, design concepts, and visual assets.
  9. Financial Analysis and Forecasting
    – Assisting with interpreting financial data, generating forecasts, and risk assessment.

While we can't yet picture all of the possible use cases of a generative AI step in a data pipeline, here are some scenarios that seem valuable:

  • Submit a large document to OpenAI's API and request a summary of the document.
  • Submit a customer testimonial and request a standardized classification for sentiment analysis
  • Submit foreign language text and request a local translation

Here at Dagster Labs, we've used dagster-openai to summarize the category and generate learning summaries from GitHub issues and discussions.  Our pipeline handles complex support requests. It provides a first-stab answer to user questions (speeding up our support team's response times), auto-categorizes the issue, and generates learning summaries on a weekly basis.

Keeping Your Costs Under Control

While generative AI offers a broad spectrum of capabilities, managing costs is essential. Dagster Labs is committed to providing the necessary tools to build your pipeline for optimal cost efficiency and performance. To this end, we introduce the OpenAIResource alongside the with_usage_metadata function from our library, ensuring uniform resource utilization across our platform.

Both Dagster Cloud and open-source users can take advantage of these features to monitor and optimize their data pipelines. For Dagster Cloud users, this functionality is seamlessly integrated with Dagster Insights, providing an enhanced experience with additional analytical capabilities. Meanwhile, open-source users can also leverage these tools and log their metadata, which they can then visualize as a metadata plot directly in the UI.  

This unified approach ensures all users can effectively control their costs while maximizing the benefits of generative AI.

A screenshot of the Dagster UI showing the OpenAI API consumption graph.
   A screenshot of the Dagster Cloud "Insights" UI showing the OpenAI API consumption graph.  

           Explore the guide here        

We're always happy to hear your feedback, so please reach out to us! If you have any questions, ask them in the Dagster community Slack (join here!) or start a Github discussion. If you run into any bugs, let us know with a Github issue. And if you're interested in working with us, check out our open roles!

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

dbt Fusion Support Comes to Dagster

August 22, 2025

dbt Fusion Support Comes to Dagster

Learn how to use the beta dbt Fusion engine in your Dagster pipelines, and the technical details of how support was added

What CoPilot Won’t Teach You About Python (Part 2)

August 20, 2025

What CoPilot Won’t Teach You About Python (Part 2)

Explore another set of powerful yet overlooked Python features—from overload and cached_property to contextvars and ExitStack

Dagster’s MCP Server

August 8, 2025

Dagster’s MCP Server

We are announcing the release of our MCP server, enabling AI assistants like Cursor to seamlessly integrate with Dagster projects through Model Context Protocol, unlocking composable workflows across your entire data stack.

No items found.