Post
Fine-tune 7B models on free-tier Colab hardware using Unsloth 🦥
Unsloth is a framework for fine tuning language models boasting a 0% loss in accuracy while using no approximation methods. They offer a trainer for both supervised fine-tuning (SFT) and direct preference optimization (DPO) that can increase speed of fine-tuning by up to 5x.
This is achieved by adding LoRa adapters. This way they only need to train 1 to 10% of the total parameters. You can export a LoRa adapter or merge to 16-bit for a full finetune. The resulting model is prepared for use in vLLM for faster inference.
Additionally, Huggingface has integrated Unsloth into the documentation for DPO training and reported 18.6% performance gains on T4.
This sets a new standard for fine-tuning large language models. If you would like to explore this methodology for yourself I have provided a notebook "AutoSloth," where you can fine tune using either SFT or DPO and it will upload to HF with a prefilled Unsloth README 🦥 and a Q8_0 quantization.
The SFT example is set up for free tier usage, but the DPO example is set up for an A100. The DPO example can be altered to work on T4 but I wanted to include more than one example.
Colab Stats during training:
+ Model: unsloth/mistral-7b-bnb-4bit
+ Dataset: yahma/alpaca-cleaned
+ Batch size: 2
+ Gradient steps: 4
+ System RAM: 8.5 / 51.0 GB
+ VRAM (T4): 13.6 / 15.0 GB
Resources:
🦥Unsloth: https://github.com/unslothai/unsloth
🦥AutoSloth: https://colab.research.google.com/drive/1Zo0sVEb2lqdsUm9dy2PTzGySxdF9CNkc?usp=sharing
🤗HF-Unsloth-docs: https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth
🤗HF-Unsloth Blog Post: https://huggingface.co/blog/unsloth-trl
Unsloth is a framework for fine tuning language models boasting a 0% loss in accuracy while using no approximation methods. They offer a trainer for both supervised fine-tuning (SFT) and direct preference optimization (DPO) that can increase speed of fine-tuning by up to 5x.
This is achieved by adding LoRa adapters. This way they only need to train 1 to 10% of the total parameters. You can export a LoRa adapter or merge to 16-bit for a full finetune. The resulting model is prepared for use in vLLM for faster inference.
Additionally, Huggingface has integrated Unsloth into the documentation for DPO training and reported 18.6% performance gains on T4.
This sets a new standard for fine-tuning large language models. If you would like to explore this methodology for yourself I have provided a notebook "AutoSloth," where you can fine tune using either SFT or DPO and it will upload to HF with a prefilled Unsloth README 🦥 and a Q8_0 quantization.
The SFT example is set up for free tier usage, but the DPO example is set up for an A100. The DPO example can be altered to work on T4 but I wanted to include more than one example.
Colab Stats during training:
+ Model: unsloth/mistral-7b-bnb-4bit
+ Dataset: yahma/alpaca-cleaned
+ Batch size: 2
+ Gradient steps: 4
+ System RAM: 8.5 / 51.0 GB
+ VRAM (T4): 13.6 / 15.0 GB
Resources:
🦥Unsloth: https://github.com/unslothai/unsloth
🦥AutoSloth: https://colab.research.google.com/drive/1Zo0sVEb2lqdsUm9dy2PTzGySxdF9CNkc?usp=sharing
🤗HF-Unsloth-docs: https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth
🤗HF-Unsloth Blog Post: https://huggingface.co/blog/unsloth-trl