This model is quantized from Meta-Llama/Llama-3.2-1B.

Quantized Llama 3.2-1B

This repository contains a quantized version of the Llama 3.2-1B model, optimized for reduced memory footprint and faster inference.

Quantization Details

The model has been quantized using GPTQ (Generative Pretrained Transformer Quantization) with the following parameters:

  • Quantization method: GPTQ
  • Number of bits: 4
  • Dataset used for calibration: c4

Usage

To use the quantized model, you can load it using the load_quantized_model function from the optimum.gptq library: Make sure to replace save_folder with the path to the directory where the quantized model is saved.

Requirements

  • Python 3.8 or higher
  • PyTorch 2.0 or higher
  • Transformers
  • Optimum
  • Accelerate
  • Bitsandbytes
  • Auto-GPTQ

You can install these dependencies using pip.

Disclaimer

This quantized model is provided for research and experimentation purposes. While quantization can significantly reduce model size and improve inference speed, it may also result in a slight decrease in accuracy compared to the original model.

Acknowledgements

  • Meta AI for releasing the Llama 3.2-1B model.
  • The authors of the GPTQ quantization method.
  • The Hugging Face team for providing the tools and resources for model sharing and deployment.
Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
393M params
Tensor type
I32
·
FP16
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for spedrox-sac/Llama-3.2-1B_quantized

Finetuned
(218)
this model