metadata

quantized_by: spedrox-sac
license: mit
pipeline_tag: text-generation
base_model:
  - meta-llama/Llama-3.2-1B
language:
  - en
tags:
  - text-generation
  - text-model
  - quantized_model

This model is quantized from Meta-Llama/Llama-3.2-1B.

Quantized Llama 3.2-1B

This repository contains a quantized version of the Llama 3.2-1B model, optimized for reduced memory footprint and faster inference.

Quantization Details

The model has been quantized using GPTQ (Generative Pretrained Transformer Quantization) with the following parameters:

Quantization method: GPTQ
Number of bits: 4
Dataset used for calibration: c4

Usage

To use the quantized model, you can load it using the load_quantized_model function from the optimum.gptq library: Make sure to replace save_folder with the path to the directory where the quantized model is saved.

Requirements

Python 3.8 or higher
PyTorch 2.0 or higher
Transformers
Optimum
Accelerate
Bitsandbytes
Auto-GPTQ

You can install these dependencies using pip:

Disclaimer

This quantized model is provided for research and experimentation purposes. While quantization can significantly reduce model size and improve inference speed, it may also result in a slight decrease in accuracy compared to the original model.

Acknowledgements

Meta AI for releasing the Llama 3.2-1B model.
The authors of the GPTQ quantization method.
The Hugging Face team for providing the tools and resources for model sharing and deployment.