Skip to content

RAG

RAG Framework

Large Language Models (LLMs) are trained on vast volumes of data and use billions of parameters to generate original output for tasks like answering questions, translating languages, and completing sentences.

Problem Statement

However, LLMs have a knowledge constraint: their understanding and knowledge extend up to their last training cut-off; after that date, they do not have any new information. Consequently, LLMs cannot utilize the latest information. In addition, the training corpus of LLMs does not contain any private nonpublic knowledge. Therefore, LLMs cannot operate and answer specific and proprietary questions to enterprises.

RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model.

Why RAG was needed?

Lets say we have a goal to create bots that can answer user questions in various contexts by cross-referencing authoritative knowledge sources. Unfortunately, the nature of LLM technology introduces unpredictability in LLM responses. Additionally, LLM training data is static and introduces a cut-off date on the knowledge it has.

You can think of the Large Language Model as an over-enthusiastic new employee who refuses to stay informed with current events but will always answer every question with absolute confidence. Unfortunately, such an attitude can negatively impact user trust and is not something you want your chatbots to emulate!

RAG is one approach to solving some of these challenges. It redirects the LLM to retrieve relevant information from authoritative, pre-determined knowledge sources.

What is RAG

RAG is a method that combines additional data with a language model’s input to improve its output without altering the initial prompt.

This supplemental data can come from an organization’s database or an external, updated source.

The language model then processes the merged information to include factual data from the knowledge base in its response. This technique is particularly useful when the latest data and its integration into your information are required

Benefits of RAG

  • User Trust: RAG allows the LLM to present accurate information with source attribution. The output can include citations or references to sources. Users can also look up source documents themselves if they require further clarification or more detail. This can increase trust and confidence in your generative AI solution.

  • Latest information: RAG allows developers to provide the latest research, statistics, or news to the generative models. They can use RAG to connect the LLM directly to live social media feeds, news sites, or other frequently-updated information sources. The LLM can then provide the latest information to the users.

  • More control on output: With RAG, developers can test and improve their chat applications more efficiently. They can control and change the LLM's information sources to adapt to changing requirements or cross-functional usage. Developers can also restrict sensitive information retrieval to different authorization levels and ensure the LLM generates appropriate responses.

RAG Steps

  • User input is converted to embedding vectors using an embedding model
  • Embeddings are saved in Vector Database
  • Vector Database runs a similarity search to find the related content
  • Question + Context is our final prompt which is sent to LLM

Vector Databases ➾

A vector database is specifically designed to operate on embedding vectors. As the popularity of LLMs and generative AI has grown recently, so has the use of embeddings to encode unstructured data. Vector databases have emerged as an effective solution for enterprises to deliver and scale these use cases.

What is Vector DB?

Vector databases are specialized databases that store data as high-dimensional vectors and their original content. They offer the capabilities of both vector indexes and traditional databases, such as optimized storage, scalability, flexibility, and query language support. They allow users to find and retrieve similar or relevant data based on their semantic or contextual meaning.

Vector databases can help RAG models quickly find the most similar documents or passages to a given query and use them as additional context for the LLM.

Vector Index

A vector index is a data structure in a vector database designed to enhance the efficiency of processing, and it is particularly suited for the high-dimensional vector dataencountered with LLMs. Its function is to streamline the search and retrieval processes within the database.

By implementing a vector index, the system is capable ofconducting quick similarity searches, identifying vectors that closely match or aremost similar to a given input vector.

How to convert embedding to vector index?

To create vector indexes for your embeddings, there are many options, such as exact or approximate nearest neighbor algorithms (e.g., HNSW or IVF), different distance metrics (e.g., cosine or Euclidean), or various compression techniques (e.g.,quantization or pruning).

A vector search is used to find the most relevant documents or passages to the query based on the similarity between the query vector and the document vectors in the index.

Similarity measures are mathematical methods that compare two vectors and compute a distance value between them. This distance value indicates how dissimilar or similar the two vectors are in terms of their semantic meaning

-->