Join our upcoming webinar “Deriving Business Value from LLMs and RAGs.”
Register now

What is retrieval augmented generation (RAG) [examples included]

Thank you for subscribing to our newsletter!
Oops! Something went wrong while submitting the form.

Artificial intelligence has come a long way, especially with the introduction of large language models (LLMs). These models have opened up new possibilities in natural language processing, powering tools from automatic content creators to chatbots. But as good as they are, these models often struggle with accuracy and relevance, which can be a big problem when details matter.

That's where retrieval-augmented generation (RAG) comes in. RAG builds on existing models like GPT by adding the ability to pull in information from external sources like databases and articles during the text generation process. This means the AI isn't just guessing; it's using real, reliable information to form its responses.

This approach is changing the game by bridging the gap between generating text and using real-world knowledge. In this article, we'll take a closer look at how RAG works, its applications, and how it could improve our interactions with AI systems, making them more useful and reliable.

What is retrieval augmented generation (RAG)?

Retrieval augmented generation (RAG) combines the advanced text-generation capabilities of GPT and other large language models with information retrieval functions to provide precise and contextually relevant information. This approach improves language models' ability to understand and process user queries by integrating the latest and most relevant data.

In general, large language models are really good at doing many natural language processing tasks. Their generated text is sometimes straight to the point, accurate, and just what the user needs. But a lot of the time, this isn’t the case.

You've most probably encountered a situation where you ask a question to ChatGPT, and you feel like something is wrong with the generated output, no matter how confident the model seems to be. Then you go to check the information yourself and find out that GPT actually "lied."  This phenomenon of large language models is called hallucination. Let's think about why this happens.

General-purpose language models are pre-trained on vast amounts of data from everywhere. But this doesn't mean that the model knows the answer to every question. General LLMs fall short in cases like up-to-date or relevant information, domain-specific context, fact-checking, etc. That's why they're called general-purpose and need the assistance of other techniques that are widely implemented to make LLMs more versatile.

2020, Meta, RAG

In 2020, Meta researchers came up with a paper introducing one of such "assisting" techniques – retrieval augmented generation (RAG). At its core, RAG is an innovative technique that merges the capabilities of natural language generation (NLG) and information retrieval (IR).

The fundamental idea behind RAG is to bridge the gap between the vast knowledge in general-purpose language models and the need for precise, contextually accurate, and up-to-date information. While general LLMs are powerful, they are not infallible, especially in scenarios that demand real-time data, domain-specific expertise, or fact-checking.

How does retrieval augmented generation (RAG) work?

RAG is about feeding language models with necessary information. Instead of asking LLM directly(like in general-purpose models), we first retrieve the very accurate data from our knowledge library that is well maintained and then use that context to return the answer. When the user sends a query(question) to the retriever, we use vector embeddings(numerical representations) to retrieve the requested document. Once the needed information is found from the vector databases, the result is returned to the user. This largely reduces the possibility of hallucinations and updates the model without retraining the model, which is a costly process. Here's a very simple diagram that shows the process.

rag system components

RAG brings together four key components:

  • Embedding model: This is where documents are turned into vectors, or numerical representations, which make it easier for the system to manage and compare large amounts of text data.
  • Retriever: Think of this as the search engine within RAG. It uses the embedding model to process a question and fetch the most relevant document vectors that match the query.
  • Reranker (optional): This component takes things a step further by evaluating the retrieved documents to determine how relevant they are to the question at hand, providing a relevance score for each one.
  • Language model: Finally, this part of the system takes the top documents provided by the retriever or reranker, along with the original question, and crafts a precise answer.
how does rag work

In practical terms, RAG is mainly famous in applications that require up-to-date and contextually accurate content. It bridges the gap between general language models and external knowledge sources, paving the way for improved content generation, question-answering, personalized recommendations, and more.

RAG use cases

Retrieval augmented generation, the new hot topic in large language models, is used in many LLM applications. Let's discuss a few cases discussed in SuperAnnotate's webinar with Databricks.

databricks chatbot

Databricks LLM-powered chatbot

During the webinar, we explored how Databricks is pioneering the use of large language models (LLMs) in creating advanced documentation chatbots. These bots are designed to simplify the search for information by providing direct access to relevant documents.

Intelligent document retrieval

The chatbot serves as a dynamic assistant, offering immediate responses to user queries about various features, such as deploying Spark for data processing. With RAG, the chatbot efficiently pulls the appropriate document from the Spark knowledge repository in response to a question. This strategy ensures that users obtain accurate and pertinent documentation, facilitating an effective and user-friendly learning experience.

Personalized user experience with enhanced language models

Databricks' use case extends into personalized information retrieval, harnessing the full potential of LLMs. By doing so, the system not only provides general documentation but also adapts its responses to fit the user's specific needs, paving the way for a revolution in user support interactions.

Evaluating the effectiveness of LLMs

The main challenge discussed during the webinar was evaluating LLM effectiveness. Assessing these models is difficult due to the subjective nature of testing and the diverse range of user experiences. Despite these challenges, it remains crucial to maintain consistent and standardized evaluation practices. A comprehensive feedback collection from customer interactions is essential to refine and validate the model's performance – and SuperAnnotate helped Databricks achieve this.

llm evaluation

SuperAnnotate's role in streamlining RAG evaluations

The collaboration between Databricks and SuperAnnotate has introduced a fresh angle to LLM evaluation. SuperAnnotate assists Databricks in standardization – cutting down the time and costs associated with LLM evaluations.

Deploying LLMs as initial evaluators can delegate routine judgment tasks to the generative AI, reserving more complex decision-making for human experts. Instead of humans annotating data for LLMs (evaluating LLM results), AI does that. This process is a relatively new topic in AI and is called reinforcement learning from AI feedback (RLAIF). It’s  an alternative to the famous reinforcement learning from human feedback (RLHF). This approach promotes a more effective distribution of tasks, where LLMs are used as evaluators instead of humans, ensuring that human intellect is applied in more complex and nuanced areas. It underscores a strategic collaboration where generative AI and human expertise work together to achieve superior evaluation standards in various LLM use cases.

Agentic RAG

Agentic AI and LLM agents have been around for a few months, and the idea is simple: AI systems that don’t just give answers but actively assist with tasks, adapt to new information, and work independently when needed. The challenge, though, is making these systems reliable and up-to-date, especially in areas where the information changes all the time.

RAG was one of the first approaches to tackle this problem. It combines two things: the ability to fetch real-time, relevant data (retrieval) and the power to generate responses using that data (generation). As soon as agentic AI became a focus, RAG quickly stood out as a natural fit. It gave AI systems the ability to stay current and respond with information that made sense for the situation.

What are RAG agents?

RAG agents are AI tools designed to do more than retrieve and generate—they’re built for doing specific tasks. Think of them as goal-oriented assistants that know where to find the information they need and how to use it. Instead of generic answers, they’re tailored for real-world situations.

For example:

  • A RAG agent in customer support doesn’t only tell you the refund policy; it finds the exact details for your specific order.
  • In healthcare, a RAG agent doesn’t only summarize medical studies; it pulls the most relevant research based on a patient’s case.

So, if an LLM-based RAG would only answer questions, RAG agents fit into workflows and make decisions based on fresh, relevant data.

RAG agent frameworks

Some of the famous agentic RAG frameworks are DB GPT, Quadrant Rag Eval, MetaGPT, Ragapp, GPT RAG by Azure, IBM Granite, Langflow, AgentGPT, with their corresponding Github scores as shown below.

rag agent frameworks
Source

RAG vs fine-tuning

Retrieval augmented generation and LLM fine-tuning, although having similar objectives, are two different techniques for optimizing large language model performances. Let's discuss the differences and understand when to choose RAG vs. fine-tuning.

The chart gives a detailed comparison between RAG and fine-tuning based on different criterias.

rag vs fine-tuning
RAG vs. Fine-tuning: Source

Fine-tuning involves additional training data stages for a large language model on new datasets to refine its performance for particular functions or knowledge areas. This specificity means while a model becomes more adept in certain scenarios, it may not maintain its effectiveness across unrelated tasks.

In contrast, RAG empowers LLMs and LLM agents by dynamically enriching them with updated, relevant information from external databases. This method boosts the model's capability to answer questions and provide timely, pertinent, and context-aware responses. While this sounds catchy, there's always a trade-off in increased computational demands and possibly extended response times due to the added complexity of integrating fresh information.

One particular advantage RAG has over fine-tuning lies in information management. Traditional fine-tuning embeds data into the model's architecture, essentially 'hardwiring' the knowledge, which prevents easy modification. On the other hand, vector storage used in RAG systems permits continuous updates, including the removal or revision of data, ensuring the model remains current and accurate.

It's worth to mention that RAG and fine-tuning can also be used together to improve LLMs performance. Particularly, if a component of a RAG system has defects, fine-tuning can be used to tackle that issue. This is especially the case when you want your model to excel at a specific task.

RAG vs semantic search

Another technique used to enhance large language model performance is semantic search. Unlike traditional search methods that rely heavily on keyword matching, semantic search delves into the contextual meaning of the terms used in a query, offering a more nuanced and precise retrieval of information.

Let's consider the limitations of basic search functionality using an everyday scenario. If someone uses a generative AI system to find information about apple cultivation areas, the system might typically look for instances where the words "apple" and "cultivation" appear in its database. This could lead to a mixture of relevant and irrelevant results, like bringing up documents about apple products or cultivation practices unrelated to apples, because the keyword search isn’t literal. Plus, it might overlook articles on specific regions known for apple farming if they don't include the exact phrase the user searched for.

Semantic search improves upon this by grasping the essence behind a user's inquiry. It understands that the user is interested in locations where apples grow, not in general agricultural methods or the company Apple. By interpreting the query's intent and the contextual meaning within source material, semantic search can accurately pinpoint information that matches the user's actual needs. In the context of RAG, semantic search acts as a sophisticated lens, focusing the LLM's broad capabilities on finding and utilizing the most relevant data to answer a question. It filters information through a layer of comprehension, ensuring that the AI system's generative responses aren’t only accurate but also contextually grounded and informative.

RAG business value

It’s no secret that most enterprises today consider integrating language models into their business operations. Retrieval augmented generation changed the way businesses handle information and customer queries. By integrating the retrieval of specific information with the generative capabilities of language models, RAG provides precise, context-rich answers to complex questions. This integration brings value to businesses in several ways.

Accurate information: RAG ensures a high degree of accuracy in responses. Since the system first retrieves information from a reliable database before generating an answer, it minimizes the risk of providing incorrect or irrelevant information. This can be particularly beneficial for customer service platforms, where accurate information is crucial for maintaining customer trust and satisfaction.

Recourse efficiency: RAG enhances the efficiency of information retrieval, saving time for both employees and customers. Instead of sifting through databases or documents manually, users can receive instant access to the information they need. This rapid delivery of knowledge not only improves the user experience but also frees up employee time for other critical tasks.

Knowledge efficiency: RAG ensures that responses are matched with the most up-to-date information and relevant documentation, and businesses can maintain a high standard of information dissemination. This is vital in fields like tech and finance, where outdated information can lead to significant errors or compliance issues.

Wrapping up

The collaboration of vast language models like GPT with retrieval techniques represents a significant stride toward more intelligent, aware, and helpful generative AI. With RAG, we're dealing with a system that understands context, digs out relevant, up-to-date information, and presents it in a cohesive manner. Being one of the most significant and promising techniques for making LLMs more efficient, the practical uses of RAG are just beginning to be tapped into, with future developments set to enhance its applications even further.

Recommended for you

Stay connected

Subscribe to receive new blog posts and latest discoveries in the industry from SuperAnnotate
Thank you for subscribing to our newsletter!
Oops! Something went wrong while submitting the form.