Back to the Index

Retrieval Augmented Generation (RAG) for LLMs

What is Retrieval Augmented Generation (RAG) for LLMs?

Retrieval-augmented generation (RAG) for large language models (LLMs) aims to improve prediction quality by using an external datastore at inference time to build a richer prompt that includes some combination of context, history, and recent/relevant knowledge (RAG LLMs). RAG LLMs can outperform LLMs without retrieval by a large margin with much fewer parameters, and they can update their knowledge by replacing their retrieval corpora, and provide citations for users to easily verify and evaluate the predictions.

The most common systems used to provide external data for RAG LLMs are vector databases and feature stores.  RAG for LLMs works because of the ability of LLMs to perform in-context learning.

RAG for LLMs
RAG for LLMs

RAG integrates information retrieval (or searching) into LLM text generation. It uses the user input prompt to retrieve external “context” information from a data store that is then included with the user-entered prompt to build a richer prompt containing context information that otherwise would not have been available to the LLM. Some examples of context information used by RAG include:

  • real-time context (the weather, your location, etc);
  • user-specific information (orders the user has made at this website, actions the user has taken on the website, the user’s status, etc);
  • relevant factual information (documents not included in the LLM’s training data - either because they are private or they were updated after the LLM was trained).

Why is there a need for RAG LLMs?

Pre-trained LLMs (foundation models) do not learn over time, often hallucinate, and may leak private data from the training corpus. To overcome these limitations, there has been growing interest in retrieval-augmented generation which incorporate a vector database and/or feature store with their LLM to provide context to prompts, also known as RAG LLMs.

What data systems are used for RAG LLMs?

Vector databases are used to retrieve relevant documents using similarity search. Vector databases can be standalone or embedded with the LLM application (e.g., Chroma embedded vector database). When structured (tabular) data is needed, an operational data store, such as a feature store, is typically used. Popular vector databases and feature stores are Weaviate and Hopsworks that both provide time-unlimited free tiers. 

How does RAG compare with Fine-Tuning?

Fine-tuning takes a pre-trained LLM and further trains the model on a smaller dataset, often with data not previously used to train the LLM, to improve the LLM’s performance for a particular task. 

LLMs can be extended with both RAG and Fine-Tuning
LLMs can be extended with both RAG and Fine-Tuning

In the above figure, we can see that fine-tuning is appropriate when you want to customize a LLM to perform well in a particular domain using private data. For example, you can fine-tune a LLM to become better at producing Python programs by further training the LLM on high-quality Python source code. 

In contrast, you should use RAG when you are able to augment your LLM prompt with data that was not known to your LLM at the time of training, such as real-time data, personal (user) data, or context information useful for the prompt.

Challenges with RAG

One of the challenges in using a vector database is the stochastic nature of similarity search - how do you know the document you retrieve is actually relevant for the prompt?

LLM RAG question-answering
LLM Question-Answering System using RAG (feature store and vector database) as well as Fine-Tuning

In contrast, using structured data in RAG, such as in the example above is quite straightforward. In this figure, the user has logged onto the website, so we have the user’s ID. With the user’s ID, we can retrieve the customer’s recent investments as a list of strings that we prepend to the prompt. In the above figure, looking up the relevant document in a vector database to the user’s concern about “rising interest rates affects property” is more challenging. If you query the vector database with “rising interest rates affects”, it has a good chance of returning a relevant document, but if you include “property” in your vector database query, it may return a document related to property, which is not the context we would like to provide to steer the conversation towards other investment opportunities.

Does this content look outdated? If you are interested in helping us maintain this, feel free to contact us.

© Hopsworks 2024. All rights reserved. Various trademarks held by their respective owners.

Privacy Policy
Cookie Policy
Terms and Conditions