This model is quantized from Meta-Llama/Llama-3.2-1B.
Quantized Llama 3.2-1B
This repository contains a quantized version of the Llama 3.2-1B model, optimized for reduced memory footprint and faster inference.
Quantization Details
The model has been quantized using GPTQ (Generative Pretrained Transformer Quantization) with the following parameters:
- Quantization method: GPTQ
- Number of bits: 4
- Dataset used for calibration: c4
Usage
To use the quantized model, you can load it using the load_quantized_model
function from the optimum.gptq
library:
Make sure to replace save_folder
with the path to the directory where the quantized model is saved.
Requirements
- Python 3.8 or higher
- PyTorch 2.0 or higher
- Transformers
- Optimum
- Accelerate
- Bitsandbytes
- Auto-GPTQ
You can install these dependencies using pip.
Disclaimer
This quantized model is provided for research and experimentation purposes. While quantization can significantly reduce model size and improve inference speed, it may also result in a slight decrease in accuracy compared to the original model.
Acknowledgements
- Meta AI for releasing the Llama 3.2-1B model.
- The authors of the GPTQ quantization method.
- The Hugging Face team for providing the tools and resources for model sharing and deployment.
Model tree for spedrox-sac/Llama-3.2-1B_quantized
Base model
meta-llama/Llama-3.2-1B