Can Less Be More? Exploring PEFT for LLMs

#10 Advanced Generative AI: Introduction to Parameter Efficient Fine tuning

Mar 27, 2024

Introduction

The recent advent of open-source large language models like LLama2, Gemma Bloom, etc, has democratized access to a wealth of knowledge, unlocking the immense potential to create AI-powered applications. This development has piqued the interest of companies and individuals keen to train or fine-tune these large language models (LLMs) for their specific use cases. However, this endeavour comes with its own set of challenges. The training process for these models typically demands substantial computational resources and a vast amount of data to achieve high-performance levels, which can be a significant obstacle for many.

In response to this challenge, specialized training techniques have been devised to empower individuals and organizations to train and infer LLMs on local machines or utilize them with minimal or no cost. These innovative techniques are collectively known as Parameter-Efficient Fine-Tuning (PEFT) techniques. In this blog, we will delve deeper into the intricacies of PEFT and explore how they are revolutionizing the way we interact with large language models, making them more accessible and feasible for a wide range of users.

MacBook beside typewriter machine — Photo by Glenn Carstens-Peters on Unsplash

Motivation behind PEFT

Large language models, which consist of billions of parameters, are typically trained on vast datasets to detect and learn complex patterns. This process, known as fine-tuning, involves retraining the entire neural network model with a new dataset. However, this approach presents several challenges, including the substantial computational resources required for training and the time it takes to complete the process. Additionally, there is a risk of catastrophic forgetting, where the neural network may lose the patterns it learned from previous training.

Catastrophic forgetting occurs when a model loses its ability to perform previously learned tasks as it adapts to new ones. Specifically, it happens when a model's weights, optimized for earlier tasks, are substantially overwritten during the training process for new tasks, resulting in a decline in the model's performance on the old tasks.

The motivation behind PEFT is to mitigate the challenges associated with traditional fine-tuning by focusing on adjusting only a small subset of the pre-trained model's parameters.

What is PEFT?

Parameter-efficient fine-tuning (PEFT) techniques have been developed to address these issues. PEFT is a set of specialized techniques designed to perform training and inference on large language models while consuming significantly fewer resources. The rationale is that most of the pre-trained LLM's knowledge about language and the real world is already captured in the pre-trained parameters. Therefore, PEFT works on modifying a small subset of parameters that are specific to the new task and dataset, making the fine-tuning process more efficient and less prone to catastrophic forgetting.

PEFT offers several advantages over traditional fine-tuning, making it a more efficient and versatile approach:

Reduced Training Burden: PEFT significantly lowers the computational cost of training. Compared to traditional methods, it requires fewer data and resources, making it more accessible for various applications.
Preserving Knowledge: PEFT tackles the challenge of catastrophic forgetting. By freezing most of the pre-trained LLM parameters, it ensures the model retains its general knowledge base while learning a new specific task.
Adaptability with Limited Data: Even with small datasets for a specific task, PEFT can be surprisingly effective. This is because the model leverages the strong foundation of knowledge already learned during pre-training.
Fast Switching Between Tasks: Unlike traditional fine-tuning, PEFT only fine-tunes a small set of parameters for each task. These "PEFT weights" are separate and easily swapped. This allows you to train PEFT models for different tasks independently. Then, you can switch functionalities on the same pre-trained LLM by swapping the relevant PEFT weights – a much faster and more efficient process.

Overall, PEFT makes LLMs more accessible and efficient by reducing training costs, overcoming data limitations, and enabling smooth switching between tasks.

Types of PEFT Techniques

There's no one-size-fits-all approach to PEFT. Different techniques are suited for different tasks and data situations. Here's a breakdown of popular techniques available:

Adapter Modules: These are small neural network layers inserted between the layers of a pre-trained model. During fine-tuning, only the adapter parameters are updated, while the original model parameters remain unchanged. Adapters reduce the number of parameters that need to be trained, making the process more efficient.
Prompt Tuning: This technique involves adding a small set of trainable parameters called prompts to the model's input. The prompts are optimized during fine-tuning, guiding the model to generate desired outputs without altering the pre-trained parameters.
Low-Rank Adaptation: In this approach, the weight matrices of the pre-trained model are approximated using low-rank factorization. Only the low-rank factors are updated during fine-tuning, reducing the number of trainable parameters.
BitFit: This technique involves fine-tuning only the bias parameters of the pre-trained model while keeping the rest of the parameters frozen. BitFit is based on the observation that bias parameters can have a significant impact on model performance with minimal computational overhead.
Sparse Fine-Tuning: Sparse fine-tuning selectively updates only a small fraction of the model's parameters, chosen based on certain criteria such as their importance or sensitivity to the task at hand. This sparsity reduces the computational burden of fine-tuning.
Layer Freezing: In this technique, only a subset of the layers in the pre-trained model is fine-tuned, while the rest are kept frozen. This reduces the number of parameters that need to be updated and can also prevent overfitting.

These techniques represent a powerful toolbox for practitioners to leverage when fine-tuning LLMs in a parameter-efficient manner. Future blog posts will explore these techniques in greater detail, providing code-based implementations and in-depth technical discussions.

Conclusion

We've only scratched the surface of the immense power and capabilities of modern large language models (LLMs) and how we can efficiently harness this wealth of information and knowledge through Parameter-Efficient Fine-Tuning (PEFT) techniques. Stay tuned for our upcoming release, where we will delve deeper into a more comprehensive discussion, analysis, and code-based exploration of the key PEFT techniques mentioned above. Get ready to unravel the exciting world of generative AI and unlock the power of LLMs for your projects.

NeuraForge: AI Unleashed

Discussion about this post