spedrox-sac
/

Llama-3.2-1B_quantized

Text Generation

quantized_model

Model card Files Files and versions Community

Llama-3.2-1B_quantized / README.md

spedrox-sac's picture

Update README.md

01b7ce8 verified 15 days ago

|

history blame contribute delete

1.58 kB

	---
	quantized_by: spedrox-sac
	license: mit
	pipeline_tag: text-generation
	base_model:
	- meta-llama/Llama-3.2-1B
	language:
	- en
	tags:
	- text-generation
	- text-model
	- quantized_model
	---

	## This model is quantized from [Meta-Llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B).

	# Quantized Llama 3.2-1B

	This repository contains a quantized version of the Llama 3.2-1B model, optimized for reduced memory footprint and faster inference.

	## Quantization Details

	The model has been quantized using GPTQ (Generative Pretrained Transformer Quantization) with the following parameters:

	- Quantization method: GPTQ
	- Number of bits: 4
	- Dataset used for calibration: c4

	## Usage

	To use the quantized model, you can load it using the `load_quantized_model` function from the `optimum.gptq` library:
	Make sure to replace `save_folder` with the path to the directory where the quantized model is saved.

	## Requirements

	- Python 3.8 or higher
	- PyTorch 2.0 or higher
	- Transformers
	- Optimum
	- Accelerate
	- Bitsandbytes
	- Auto-GPTQ

	You can install these dependencies using pip.
	## Disclaimer

	This quantized model is provided for research and experimentation purposes. While quantization can significantly reduce model size and improve inference speed, it may also result in a slight decrease in accuracy compared to the original model.

	## Acknowledgements

	- Meta AI for releasing the Llama 3.2-1B model.
	- The authors of the GPTQ quantization method.
	- The Hugging Face team for providing the tools and resources for model sharing and deployment.