spedrox-sac's picture
Update README.md
01b7ce8 verified
metadata
quantized_by: spedrox-sac
license: mit
pipeline_tag: text-generation
base_model:
  - meta-llama/Llama-3.2-1B
language:
  - en
tags:
  - text-generation
  - text-model
  - quantized_model

This model is quantized from Meta-Llama/Llama-3.2-1B.

Quantized Llama 3.2-1B

This repository contains a quantized version of the Llama 3.2-1B model, optimized for reduced memory footprint and faster inference.

Quantization Details

The model has been quantized using GPTQ (Generative Pretrained Transformer Quantization) with the following parameters:

  • Quantization method: GPTQ
  • Number of bits: 4
  • Dataset used for calibration: c4

Usage

To use the quantized model, you can load it using the load_quantized_model function from the optimum.gptq library: Make sure to replace save_folder with the path to the directory where the quantized model is saved.

Requirements

  • Python 3.8 or higher
  • PyTorch 2.0 or higher
  • Transformers
  • Optimum
  • Accelerate
  • Bitsandbytes
  • Auto-GPTQ

You can install these dependencies using pip.

Disclaimer

This quantized model is provided for research and experimentation purposes. While quantization can significantly reduce model size and improve inference speed, it may also result in a slight decrease in accuracy compared to the original model.

Acknowledgements

  • Meta AI for releasing the Llama 3.2-1B model.
  • The authors of the GPTQ quantization method.
  • The Hugging Face team for providing the tools and resources for model sharing and deployment.