Topic 7: What is LongRAG framework?

we discuss the limitations of RAG with a long-context window and explore the intuition behind the LongRAG framework to address these limitations, along with a list of resources for further learning

Retrieval-Augmented Generation (RAG) has become increasingly popular in recent years as a way to enhance large language models (LLMs) with external knowledge.

We have covered RAG-related topics in the following articles:

These articles are, so far, the most-read, which demonstrates the ongoing interest in the topics. However, the traditional RAG models, developed around 2020, were designed when LLMs were severely limited in their ability to handle long contexts. This led to a design where retrievers worked with short text units, typically 100-word Wikipedia paragraphs, requiring them to search through massive corpora to find relevant information.

The landscape of language models has changed dramatically since then. In 2023 and 2024, we've seen the emergence of LLMs capable of handling much longer contexts, with some models able to process up to 128,000 tokens or even 1 million tokens (as with Google's Gemini 1.5 Pro). This significant increase in context length capabilities has opened up new possibilities for RAG systems.

Enter the LongRAG framework. By revisiting the fundamental design choices of RAG systems in light of recent advancements in LLMs, LongRAG offers a promising direction for improving the performance and boosting RAG with long-context LLMs. Let’s dive in!

In today’s episode, we will cover:

  • Original RAG and it’s working process

  • Intuition behind LongRAG

  • How LongRAG works: the architecture

  • LongRAG Advantages

  • Bonus: Resources

Original RAG and it’s working process

RAG enables the use of LLMs on previously unseen data without requiring fine-tuning. Additionally, knowledge in natural language form can be completely offloaded from the parametric memory of LLMs by leveraging a separate retrieval component from an external corpus.

RAG working process:

  • Query encoder: It encodes a user query into a numerical representation suitable for searching through a database of text passages or documents.

  • Retriever: It searches an external database of indexed documents using the vector produced by the query encoder. The retriever identifies the top-K most relevant documents based on the selected search algorithm.

  • Generator: The large language model conditions on the documents selected by the retriever and the input query to generate the output.

Intuition behind LongRAG

The rest of this article, with detailed explanations and best library of relevant resources, is available to our Premium users only –>

 Thank you for reading! Share this article with three friends and get a 1-month subscription free! 🤍

Reply

or to participate.