Turing Post
Posts
Topic 22: What is HtmlRAG, Multimodal RAG and Agentic RAG?

Topic 22: What is HtmlRAG, Multimodal RAG and Agentic RAG?

We explore in details three RAG methods that address limitations of original RAG and meet the upcoming trends of the new year

Alyona Vert.
January 08, 2025

Retrieval-Augmented Generation (RAG) is a topic that never gets old and continues expanding more to enhance LLMs functionality. For those, who are not so familiar with RAG: this method empowers models with external knowledge, retrieving the information you actually need from external resources. Today, we’ll dive into three approaches that go further than traditional RAG, overcoming its issues such as retrieved data quality, accuracy of the answers and low performance in specific domains. As multimodality and agentic systems are among the main focuses of 2025 in AI, we’ll explore the following types of RAG: 1) HtmlRAG which works directly with HTML version of text; 2) Multimodal RAG that can retrieve image information; and 3) Agentic RAG which incorporates agentic capabilities in RAG technique. So let’s explore!

In today’s episode, we will cover:

Traditional RAG limitations
What is HtmlRAG?
- The core idea behind HtmlRAG
- How does HtmlRAG work?
- Limitations
What is Multimodal RAG?
- The main idea of Multimodal RAG
- How does Multimodal RAG work?
- What about Multimodal RAG performance?
What is Agentic RAG?
- What does Agentic RAG address?
- How does Agentic RAG work?
- What is good about Agentic RAG?
- Limitations
Conclusion
Bonus: Resources to dive deeper

Traditional RAG limitations

RAG systems combine a retrieval mechanism with a generative AI model to provide more accurate or contextually relevant responses. However, as any technique, it has several limitations, such as:

Dependency on the quality of retrieved information: The effectiveness of the response relies heavily on the quality, relevance and bias of the documents retrieved. If the retrieval step fails, the generated output could be incorrect.
Standard RAG can’t retrieve various types of information, such as HTML texts, images and videos.
Mismatch between retrieval and query: The system may fail to align the user's query with the right context in the retrieved documents.
Standard RAG have difficulties with searching across and retrieving multiple sources, or working with complex structures in documents.
Scalability latency issues: Searching through a large knowledge base can introduce latency, especially if the retrieval system isn't optimized.
RAG systems might underperform in highly specialized domains where context and nuance are critical.
Computational resources: Handling large datasets for retrieval can be computationally expensive, requiring significant storage and processing power.

Researchers create different upgraded RAG systems and methods to overcome these issues. Types of RAG we are going to talk about mostly address the limitation of quality and diversity of retrieval information and mismatch between the query and the retrieval. Ladies and Gentlemen, meet HtmlRAG, Multimodal RAG and Agentic RAG.

What is HtmlRAG?

You can read this article for free on our page on Hugging Face. Follow us there ;)

Reply

or to participate.