In this Microsoft Azure OpenAI blog post we will try to explain the concept of RAG Retrieval Augmented Generation).
When working with Azure OpenAI assistant and chat completion, when we send prompts, the input is generated from a large dataset of information that is not always needed.
In many use cases, we need to limit and “ground” the LLM’s output.
A good example is where a corporate department needs to use an AI tool to provide a corporate service to employees but needs to limit the output to a limited dataset.
In cases like that, a RAG provides AI capabilities with “grounded” information that is 100% accurate and based on relevant data.
To use RAG with Azure OpenAI we need to use Azure AI search. The search capabilities index the data that can be in the form of text, images, PDF files and also databases (currently in Preview).
The AI search creates search indexes that allow an LLM model like GPT-4 to provide data intelligence.
The main idea behind RAG is to create an AI assistant that provides answers based on private data.
To create an AI service that uses RAG we need to use two different services, a GPT model and a search service like Azure AI search.
How RAG Works
RAG operates by attaching a retrieval component to a generative model. When the model receives a query:
1. Retrieval Step: The model searches a large corpus of documents to find passages that are most relevant to the query.
2. Generation Step: The generative model then uses both the query and the retrieved documents as inputs to generate a coherent and contextually relevant response.
This two-step process helps the model produce responses that are not only informed by the training data but also by the most current and relevant information available in the external text corpus.