|
--- |
|
license: other |
|
--- |
|
# Model Card for llama-7b-hf-28q_4bit-128g_WVU |
|
|
|
## Model Description |
|
|
|
`llama-7b-hf-28q_4bit-128g_WVU` is a model based on the |
|
Llama architecture with 7 billion parameters. |
|
This model adopts a quantization in which the first 28 layers |
|
of the decoder have been quantized with the [`gptq`](https://github.com/qwopqwop200/GPTQ-for-LLaMa) method, |
|
which uses 4-bit precision and 128 groups. |
|
Then, the last 4 decoder layers (1/8 of decoding layers), and lm_head have been fine-tuned using the [wizard_vicuna_70k_unfiltered dataset](https://huggingface.co/datasets/ehartford/wizard_vicuna_70k_unfiltered), 1 epoch. |
|
|
|
## Note |
|
|
|
Quantization effectively reduces memory usage, however, it may result in differences in the parameters. |
|
Additionally, fine-tuning only the last few layers lowers memory requirements for training but could lead to minor performance degradation. |
|
|
|
Several alternatives exist for fine-tuning and quantizing the Llama models. The specific method utilized here—quantizing several layers, |
|
followed by fine-tuning the last few layers—is designed to account for errors introduced during quantization (which sometimes can result in unexpected answers), |
|
and enables the last few layers to be fine-tuned considering both the quantization error and the dataset. |
|
|
|
It is worth mentioning that other methods may yield superior performance. For instance: |
|
1. Fine-tuning the entire model for `X` epochs |
|
2. Quantizing the first `K` layers |
|
3. Fine-tuning the remaining layers for `Y` epochs |
|
|
|
Nonetheless, as fine-tuning the entire model requires considerable resources (for example, 4 GPUs with 80GB VRAM is required for 7B LLaMa), |
|
this model omit the first step from the method described above, and it works. |
|
|
|
## Using the Model |
|
|
|
To load the model, a custom `LlamaForCausalLM` is required. |
|
You can find quantized llama [here](https://github.com/LearnItAnyway/quantized_llama). |
|
|
|
## References |
|
|
|
1. Meta - LLaMA |
|
2. [WizardLM](https://github.com/nlpxucan/WizardLM) |
|
3. [GPTQ for LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa) |
|
4. [Wizard Vicuna Unfiltered Dataset](https://huggingface.co/datasets/ehartford/wizard_vicuna_70k_unfiltered) |
|
5. Various unlisted but great works, researches, and projects. |
|
|