spedrox-sac
/

Llama-3.2-1B_quantized

Text Generation

quantized_model

Model card Files Files and versions Community

spedrox-sac commited on 4 days ago

Commit

1da0c45

•

1 Parent(s): a08825c

Update README.md

Files changed (1) hide show

README.md +37 -1

README.md CHANGED Viewed

@@ -14,4 +14,40 @@ tags:
 ## This model is quantized from [Meta-Llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B).
-## Model : 4bit model of fp16

 ## This model is quantized from [Meta-Llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B).
+# Quantized Llama 3.2-1B
+This repository contains a quantized version of the Llama 3.2-1B model, optimized for reduced memory footprint and faster inference.
+## Quantization Details
+The model has been quantized using GPTQ (Generative Pretrained Transformer Quantization) with the following parameters:
+- **Quantization method:** GPTQ
+- **Number of bits:** 4
+- **Dataset used for calibration:** c4
+## Usage
+To use the quantized model, you can load it using the `load_quantized_model` function from the `optimum.gptq` library:
+Make sure to replace `save_folder` with the path to the directory where the quantized model is saved.
+## Requirements
+- Python 3.8 or higher
+- PyTorch 2.0 or higher
+- Transformers
+- Optimum
+- Accelerate
+- Bitsandbytes
+- Auto-GPTQ
+You can install these dependencies using pip:
+## Disclaimer
+This quantized model is provided for research and experimentation purposes. While quantization can significantly reduce model size and improve inference speed, it may also result in a slight decrease in accuracy compared to the original model.
+## Acknowledgements
+- Meta AI for releasing the Llama 3.2-1B model.
+- The authors of the GPTQ quantization method.
+- The Hugging Face team for providing the tools and resources for model sharing and deployment.