• Turing Post
  • Posts
  • Topic 14: What are DoRA, QLoRA and QDoRA?

Topic 14: What are DoRA, QLoRA and QDoRA?

we compare three fine-tuning methods – DoRA, QLoRA, and QDoRA – each designed to improve model performance and memory efficiency in different ways.

When specialized AI models, such as coders or those designed for specific domains, emerge, fine-tuning is crucial for boosting their performance and tailoring them to specific tasks. Low-Rank Adaptation (LoRA), which we previously discussed, is one such method, but it doesn't always deliver the best results in certain cases. To address these limitations, advanced methods like Weight-Decomposed Low-Rank Adaptation (DoRA), QLoRA, and QDoRA have been developed, offering enhanced performance. What are these methods, and how do they improve upon LoRA? Let’s explore!

In today’s episode, we will cover:

  • What’s wrong with LoRA?

  • Here comes Weight-Decomposed Low-Rank Adaptation (DoRA)

  • How does DoRA work?

  • The benefits of DoRA

  • How good are DoRA’s results?

  • What is QLoRA, and why should you use it?

  • How does QLoRA work?

  • Results of QLoRA

  • How does QDoRA enhance QLoRA?

  • Conclusion

  • Bonus: Resources

What’s wrong with LoRA?

To tailor general models for specific tasks, we need to fine-tune them. This process usually involves retraining all the model’s parameters, but as models get bigger, this becomes more expensive and resource-intensive.

To solve this, Parameter-Efficient Fine-Tuning (PEFT) methods were developed, and one of them is LoRA (Low-Rank Adaptation). LoRA modifies fewer parameters for efficient fine-tuning while keeping the model architecture the same. However, it often doesn’t perform as well as full fine-tuning (FT) where all parameters are retrained. But why? 

NVIDIA and HKUST researchers, inspired by the concept of Weight Normalization, compared LoRA and FT. Weight Normalization separates the weight matrix into two components – magnitude (how large the change is) and direction (where the change is happening in the parameter space). This separation helps better understand how each method updates the model’s weights, allowing a more detailed comparison of the flexibility and precision in adjustments made by LoRA and FT.

They discovered that these two methods update the model in different ways": 

  • LoRA updates the model proportionally, changing both magnitude and direction consistently. 

  • Full fine-tuning, on the other hand, shows more complex behavior. It can make subtle changes in direction while making larger changes in magnitude or vice versa. This flexibility allows FT to adapt more precisely to tasks. 

As LoRA lacks this flexibility, it may not always be able to make the same precise adjustments as FT. But full fine-tuning still requires retraining all parameters, which is computationally intensive.

Here comes Weight-Decomposed Low-Rank Adaptation (DoRA)

The rest of this article, with detailed explanations and library of relevant resources, is available to our Premium users only –>

Share this article with three friends and get a 1-month subscription for free! 🤍

Reply

or to participate.