Blog
AI Reference Architectures

AI Reference Architectures

January 24, 2025
AI Reference Architectures
AI Reference Architectures

Guide to the some common AI Architectures patterns with Dagster

You likely have an AI application you want to get off the ground. Given how quickly the space is evolving it can be difficult to know where to get started. Below are some common patterns that can help get you started.These are a potential toolbox for ways to solve AI problems. They will be helpful in different situations, and you may use some more often (you may never pre-train a model from scratch yourself). They can also be combined; you can combine RAG with a fine-tuned model.

Prompt Engineering

Overview

Prompt engineering is the practice of designing input prompts to guide the behavior of an LLM. Users can get more accurate and task-specific outputs by structuring prompts for specific use cases without knowing anything about the underlying model.

Benefits

  • Development Speed. Prompt engineering requires the least amount of engineering. As a result, it is very easy to iterate and experiment at low cost.
  • Minimal Infrastructure. Unlike RAG, which requires an external storage layer, or pretraining, which requires extensive GPUs for compute, prompt engineering is self-contained.

Architecture

Prompt engineering diagram

A user asks a question. The information from the question is parsed and injected into a predefined prompt. The prompt adds context or specific rules to help craft a more detailed response from the model.

When the prompt has been constructed, the full prompt is sent to the LLM, which then provides an answer.

Example Prompt Engineering with Dagster

Imagine you are designing an application to help users answer questions about Pandas. You want the answers to be grounded in the context of Pandas and to supply useful supplemental information.

Prompt engineering example Dagster DAG
  • Question. A user inputs a question. With Dagster, these can be set at execution time with run configurations so you can reuse the same DAG to answer multiple questions.In this case, the user asks the ambiguous question, “What is a series?” but we know they are interested in Pandas.
  • Prompt. A prompt is needed to generate a helpful answer grounded in Pandas. We will use Langchain to help with the prompt engineering processing. Langchain makes it easy to develop and use prompt templates. We may iterate on the prompt template as we calibrate an appropriate response from the LLM. Our template may look like this:        
  • system_template = f"""
        In the context of the python framework Pandas. Answer the questions in 1 or 2 sentences.
        Also include two links for further relevant information.
    
        Question: {question}
    """

  • Inference. The prompt and input can now be combined and passed to the LLM. Again, we will use Langchain but also need an underlying model such as OpenAI’s gpt-4o-mini to answer the question.Now, we can see that even a somewhat unclear question will return a relevant answer in the format we feel is most instructive for the user.
In Pandas, a Series is a one-dimensional labeled array capable of holding any data type, similar to a column in a spreadsheet or a database table. Each element in a Series is associated with an index, which allows for efficient data access and manipulation.\n\nFor further information, you can check the following links:\n- Pandas Series Documentation\n- Introduction to Pandas Series

RAG

Overview

Retrieval augmented generation (RAG) enhances performance by integrating external knowledge. With this approach, a retrieval system fetches relevant data from an external storage layer (often a vector database). The retrieved data is then used as context for the LLM, which can provide more accurate and informed responses.

Benefits

  • Dynamic. Allows data to be updated separately without the need to retrain the model.
  • Domain-specific contextualization. Provide traceable sources and citations for responses and reduce hallucinations.
  • Cost efficiency. Requires less parameter storage for the LLM as storage is pushed to the external storage layer.

Architecture

RAG diagram

Source data is converted into high-dimension vectors (embeddings) that represent the semantics of the content. These embeddings are added to an index within a vector database. Vector databases function like traditional databases but allow queries based on similarity measures, which is ideal for content retrieval.

After the vector database is populated, relevant information can be combined with an LLM to answer a prompt. The answer generated will be grounded in the domain-specific context of the data contained within the vector database.

Example RAG with Dagster

Imagine you want to provide answers specific to your organization. These answers should be grounded in the constantly occurring discussions in Slack and GitHub.

RAG example Dagster DAG
  • Ingestion. Data from Slack and GitHub is extracted by either writing your own code or leveraging frameworks such as dlt and Dagster embedded ETL. The cadence of extraction can be based on a traditional schedule or event-based, such as new GitHub issues being created using sensors.
  • Processing. After the data has been extracted, embeddings can be created. Dagster integrates easily with AI tools such as LangChain and offers native support for the OpenAI Python client. Using these tools, data can be split appropriately, and embeddings can be generated.
  • Vector Database. Once the embeddings are created, data is uploaded to the vector index. Dagster can manage the index creation and orchestrate the addition of new embeddings. As the application scales, Dagster can run processes concurrently.
  • Inference. With the vector database populated with Slack and GitHub information. One or more AI applications can now rely on that database as it is continuously refreshed.

Fine-Tuning

Overview

Fine-tuning is the processing of using an existing pre-trained LLM and adapting it based on a small dataset for a specific task. When a model is fine-tuned, only a few weights are changed.

Benefits

  • Customization. Tailors the model's behavior to align with specific goals, voice, or application requirements.
  • Cost efficiency. Allows for token savings due to shorter prompts. Fine-tuning also allows for less computationally expensive models to be used while still offering the performance of more expensive alternatives.
  • Output control. Have total control of the format and style of a model’s output.

Architecture

Fine-tune diagram

Data is collected and split into two samples, one for training the final model and one for validation. These samples do not have to be large; a few dozen examples are usually enough.

Next, each sample is converted into prompts that mimic the desired interaction with the model. These prompts can be grounded in specific knowledge, conversation style (such as answering cheerfully), or both.

After creating the sample prompts, they can be provided as inputs for a fine-tuning job. This may take a while, depending on the number of samples and the underlying model used to train. After the fine-tuning job is completed, a new model will be produced that can be queried.

Example Fine-Tuning with Dagster

Imagine you want to provide answers to questions asked in your GitHub repository. You want these answers to match the tone of the answers already in the repo and be consistent with the style users are familiar with.

Fine-tune example Dagster DAG
  • Ingestion. Data from GitHub can be extracted using several tools you are already familiar with. This data can also be categorized into discrete periods using partitions, which might help understand how answers have changed over time.
  • Processing. The data can then be sampled and split across a training and validation set. Using this data, we can craft the prompts. This might involve combining the data from Github with some additional prompt engineering.The format of the prompts will depend on what is used to fine-tune the models. If using OpenAI, the conversations might look like this:
{"messages": [{"role": "system", "content": "You answer Github issues in an upbeat and helpful way to ensure user success."}, {"role": "user", "content": data["github_question"]}, {"role": "assistant", "content": data["github_answer"], "weight": 1}]}
  • Fine-Tuning. After the prompts are generated, they can be uploaded to the storage layer in OpenAI. The Dagster OpenAI resource makes this easy and can pass the file IDs to the fine-tuning job endpoint, generating a model specific to this use case. A Dagster asset check can also be tied to the new model to ensure it performs as expected.
  • Inference. The new model is ready to be used. If it ever needs to be retrained the assets can be re-executed to pull in additional Github issues to generate additional prompts.

Pretraining

Overview

Pretraining a model involves training it on a large volume of data without using any prior weights from an existing model. This results in a model that can be used or further fine-tuned.

This is generally the most intensive AI pattern regarding data, work, and computational resources. It is less common for organizations to need to retrain their own model and instead rely more on some combination of prompt engineering, RAG, and pretraining.

Benefits

  • Avoid biases. Starting from scratch can ensure that your model has no unintended biases from existing models.
  • Unique data. If your data is unique and specific to your use case and is unlikely to be included as part of the knowledge base for open and more generalized LLMs.
  • Different Languages. Models for languages not well represented by general models may benefit from pretraining. For example, if a model is trained only on English text, even with fine-tuning or RAG, it may struggle to give correct answers in other languages.

Architecture

Pretrain diagram

There is no single way to pre-train a model, and many tradeoffs to be made between cost and performance. Training smaller models generally starts with a large amount of unstructured text. This might be books, articles, wikis, or anything with extended text examples.

Next, you will need to turn the text into tokens. The tokens represent the individual fragments of the overall text. These tokens then need to be converted into their corresponding IDs, which will be used for training.

Tokenization process

This data will then be combined with model weights. There are several ways to handle weights, such as using random weights, which is the most expensive or more commonly, using the weights of an existing model.

Data and weights can now be combined in training. Depending on the number of parameters, this can take several weeks or more, potentially costing hundreds of thousands of dollars. Training uses much more memory than inference and requires expensive GPUs. This will produce a decoder-only model that can generate the next token (word) in a sequence.

Example Pretraining with Dagster

Imagine you have written a new programming language and would like to create a model that can assist users in developing with it.

Pretrain example Dagster DAG
  • Ingestion. A large amount of data needs to be collected. We can imagine ingesting the entirety of the GitHub repo (though that may not be enough). After the data is collected, it needs to be cleaned. This includes deduplication, removing typos, filtering out languages unrelated to the language you wish to train your model on, and other steps that may bias the data. Assets checks can be included to help ensure data quality.
  • Processing. The unstructured data needs to be split into tokens and IDs. To help speed up the process, this work can be partitioned over several concurrent assets. After the unstructured data has been mapped to IDs, it can be shaped and converted to a format such as a Hugging Face DataFrame for training.
  • Model Init. In addition to our own data, we may use the weights of an existing model. This will be much more cost-effective than generating random weights. We will have to determine an appropriate existing model and whether we want to use its existing number of layers, upscale a smaller existing model (add the layers), or downscale (remove the layers) a larger model.
  • Training. The most expensive step is training a model. Depending on the number of parameters, this step may take considerable time. We can use HuggingFace to execute our training job and partition the checkpoints of our model within Dagster. This will produce a model that can generate the next token in our custom programming language, which will most likely be the next. We will want to include some model evaluations or asset checks to ensure it behaves as expected.
  • Inference. The model can now be used, though it may need to be fine-tuned or combined with RAG to give more context-aware answers.

Have feedback or questions? Start a discussion in Slack or Github.

Interested in working with us? View our open roles.

Want more content like this? Follow us on LinkedIn.

Dagster Newsletter

Get updates delivered to your inbox

Latest writings

The latest news, technologies, and resources from our team.

Multi-Tenancy for Modern Data Platforms
Webinar

April 7, 2026

Multi-Tenancy for Modern Data Platforms

Learn the patterns, trade-offs, and production-tested strategies for building multi-tenant data platforms with Dagster.

Deep Dive: Building a Cross-Workspace Control Plane for Databricks
Webinar

March 24, 2026

Deep Dive: Building a Cross-Workspace Control Plane for Databricks

Learn how to build a cross-workspace control plane for Databricks using Dagster — connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.

Dagster Running Dagster: How We Use Compass for AI Analytics
Webinar

February 17, 2026

Dagster Running Dagster: How We Use Compass for AI Analytics

In this Deep Dive, we're joined by Dagster Analytics Lead Anil Maharjan, who demonstrates how our internal team utilizes Compass to drive AI-driven analysis throughout the company.

Monorepos, the hub-and-spoke model, and Copybara
Monorepos, the hub-and-spoke model, and Copybara
Blog

April 3, 2026

Monorepos, the hub-and-spoke model, and Copybara

How we configure Copybara for bi-directional syncing to enable a hub-and-spoke model for Git repositories

Making Dagster Easier to Contribute to in an AI-Driven World
Making Dagster Easier to Contribute to in an AI-Driven World
Blog

April 1, 2026

Making Dagster Easier to Contribute to in an AI-Driven World

AI has made contributing to open source easier but reviewing contributions is still hard. At Dagster, we’re improving the contributor experience with smarter review tooling, clearer guidelines, and a focus on contributions that are easier to evaluate, merge, and maintain.

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform
Blog

March 17, 2026

DataOps with Dagster: A Practical Guide to Building a Reliable Data Platform

DataOps is about building a system that provides visibility into what's happening and control over how it behaves

How Magenta Telekom Built the Unsinkable Data Platform
Case study

February 25, 2026

How Magenta Telekom Built the Unsinkable Data Platform

Magenta Telekom rebuilt its data infrastructure from the ground up with Dagster, cutting developer onboarding from months to a single day and eliminating the shadow IT and manual workflows that had long slowed the business down.

Scaling FinTech: How smava achieved zero downtime with Dagster
Case study

November 25, 2025

Scaling FinTech: How smava achieved zero downtime with Dagster

smava achieved zero downtime and automated the generation of over 1,000 dbt models by migrating to Dagster's, eliminating maintenance overhead and reducing developer onboarding from weeks to 15 minutes.

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster
Case study

November 18, 2025

Zero Incidents, Maximum Velocity: How HIVED achieved 99.9% pipeline reliability with Dagster

UK logistics company HIVED achieved 99.9% pipeline reliability with zero data incidents over three years by replacing cron-based workflows with Dagster's unified orchestration platform.

Modernize Your Data Platform for the Age of AI
Guide

January 15, 2026

Modernize Your Data Platform for the Age of AI

While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.

Download the eBook on how to scale data teams
Guide

November 5, 2025

Download the eBook on how to scale data teams

From a solo data practitioner to an enterprise-wide platform, learn how to build systems that scale with clarity, reliability, and confidence.

Download the e-book primer on how to build data platforms
Guide

February 21, 2025

Download the e-book primer on how to build data platforms

Learn the fundamental concepts to build a data platform in your organization; covering common design patterns for data ingestion and transformation, data modeling strategies, and data quality tips.