spedrox-sac commited on
Commit
1da0c45
1 Parent(s): a08825c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -1
README.md CHANGED
@@ -14,4 +14,40 @@ tags:
14
 
15
  ## This model is quantized from [Meta-Llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B).
16
 
17
- ## Model : 4bit model of fp16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
 
15
  ## This model is quantized from [Meta-Llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B).
16
 
17
+ # Quantized Llama 3.2-1B
18
+
19
+ This repository contains a quantized version of the Llama 3.2-1B model, optimized for reduced memory footprint and faster inference.
20
+
21
+ ## Quantization Details
22
+
23
+ The model has been quantized using GPTQ (Generative Pretrained Transformer Quantization) with the following parameters:
24
+
25
+ - **Quantization method:** GPTQ
26
+ - **Number of bits:** 4
27
+ - **Dataset used for calibration:** c4
28
+
29
+ ## Usage
30
+
31
+ To use the quantized model, you can load it using the `load_quantized_model` function from the `optimum.gptq` library:
32
+ Make sure to replace `save_folder` with the path to the directory where the quantized model is saved.
33
+
34
+ ## Requirements
35
+
36
+ - Python 3.8 or higher
37
+ - PyTorch 2.0 or higher
38
+ - Transformers
39
+ - Optimum
40
+ - Accelerate
41
+ - Bitsandbytes
42
+ - Auto-GPTQ
43
+
44
+ You can install these dependencies using pip:
45
+ ## Disclaimer
46
+
47
+ This quantized model is provided for research and experimentation purposes. While quantization can significantly reduce model size and improve inference speed, it may also result in a slight decrease in accuracy compared to the original model.
48
+
49
+ ## Acknowledgements
50
+
51
+ - Meta AI for releasing the Llama 3.2-1B model.
52
+ - The authors of the GPTQ quantization method.
53
+ - The Hugging Face team for providing the tools and resources for model sharing and deployment.