File size: 1,576 Bytes
32950f4
8ecc2d2
32950f4
d47f3b2
30dbdb6
 
abef2c2
 
 
 
 
 
0eae213
a08825c
 
 
1da0c45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01b7ce8
1da0c45
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
quantized_by: spedrox-sac
license: mit
pipeline_tag: text-generation
base_model:
- meta-llama/Llama-3.2-1B
language:
- en
tags:
- text-generation
- text-model
- quantized_model
---

## This model is quantized from [Meta-Llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B).

# Quantized Llama 3.2-1B

This repository contains a quantized version of the Llama 3.2-1B model, optimized for reduced memory footprint and faster inference.

## Quantization Details

The model has been quantized using GPTQ (Generative Pretrained Transformer Quantization) with the following parameters:

- **Quantization method:** GPTQ
- **Number of bits:** 4
- **Dataset used for calibration:** c4

## Usage

To use the quantized model, you can load it using the `load_quantized_model` function from the `optimum.gptq` library:
Make sure to replace `save_folder` with the path to the directory where the quantized model is saved.

## Requirements

- Python 3.8 or higher
- PyTorch 2.0 or higher
- Transformers
- Optimum
- Accelerate
- Bitsandbytes
- Auto-GPTQ

You can install these dependencies using pip.
## Disclaimer

This quantized model is provided for research and experimentation purposes. While quantization can significantly reduce model size and improve inference speed, it may also result in a slight decrease in accuracy compared to the original model.

## Acknowledgements

- Meta AI for releasing the Llama 3.2-1B model.
- The authors of the GPTQ quantization method.
- The Hugging Face team for providing the tools and resources for model sharing and deployment.