GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs
Abstract
Parameter Efficient Fine-Tuning (PEFT) methods have gained popularity and democratized the usage of Large Language Models (LLMs). Recent studies have shown that a small subset of weights significantly impacts performance. Based on this observation, we introduce a novel PEFT method, called Gaussian noise Injected Fine Tuning of Salient Weights (GIFT-SW). Our method updates only salient columns, while injecting Gaussian noise into non-salient ones. To identify these columns, we developeda generalized sensitivity metric that extends and unifies metrics from previous studies. Experiments with LLaMA models demonstrate that GIFT-SW outperforms full fine-tuning and modern PEFT methods under the same computational budget. Moreover, GIFT-SW offers practical advantages to recover performance of models subjected to mixed-precision quantization with keeping salient weights in full precision.
Community
A co-author here. Recently, we've developed another Parameter-Efficient Fine-Tuning (PEFT) method.
We're still working on the project and would appreciate any feedback. What do you think about the idea, and how can we improve it?
Short Summary of Our Work:
Subsample of Columns: We fine-tune not the low-rank matrix, but a subsample of columns, since this is more practical.
Sensitive Columns Metric: We propose a metric for selecting sensitive columns.
Clever Noise: We add clever noise at the fine-tuning stage and show that the quality improves.
Results:
Performance: Our method outperforms LoRA, DoRA, and full training with one token budget.
Quantization Compatibility: We tested our method with quantization, and it works well. All non-sensitive weights can be safely quantized, and then the model can be fine-tuned by updating only the sensitive weights.
Interesting Findings:
After fine-tuning, LLaMA2 (7B and 13B) achieved the same quality as the TÜLU-V2 model (link to paper), which is essentially the same LLaMA2 but trained more extensively. (Table in the comments.)
Potential Continuations:
Compatibility with Other Compression Methods: Our method can be used with other compression techniques like pruning and decomposition. You just need to slightly adjust the metric for choosing sensitive weights.
Noise Procedure Improvement: It seems that the procedure with adding noise can be further improved and theoretically justified.
We'd love to hear your thoughts and suggestions!
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs
Recently, we've developed another Parameter-Efficient Fine-Tuning (PEFT) method.
Short Summary of Our Work:
We fine-tune not the low-rank matrix, but a subsample of columns, since this is more practical. We show that it yields comparable results.
Sensitivity metric - We propose a metric for selecting sensitive columns.
We add noise at the fine-tuning stage and show that the quality improves.
Results:
Our method outperforms LoRA, DoRA, and full training with one token budget.
Quantization Compatibility. We tested our method with quantization, and it works well. All non-sensitive weights can be safely quantized, and then the model can be fine-tuned by updating only the sensitive weights.
Interesting Findings:
After fine-tuning, LLaMA2 (7B and 13B) achieved the same quality as the TÜLU-V2 model, which is essentially the same LLaMA2 but trained more extensively.
Potential continuations:
Compatibility with Other Compression Methods: Our method can be used with other compression techniques like pruning and decomposition. You just need to slightly adjust the metric for choosing sensitive weights.
It seems that the procedure with adding noise can be further improved and theoretically justified.
We'd love to hear your thoughts and suggestions!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization (2024)
- Accurate and Efficient Fine-Tuning of Quantized Large Language Models Through Optimal Balance (2024)
- LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices (2024)
- LoRA-GA: Low-Rank Adaptation with Gradient Approximation (2024)
- LeanQuant: Accurate Large Language Model Quantization with Loss-Error-Aware Grid (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper