spedrox-sac's picture
Update README.md
1da0c45 verified
|
raw
history blame
1.58 kB
metadata
quantized_by: spedrox-sac
license: mit
pipeline_tag: text-generation
base_model:
  - meta-llama/Llama-3.2-1B
language:
  - en
tags:
  - text-generation
  - text-model
  - quantized_model

This model is quantized from Meta-Llama/Llama-3.2-1B.

Quantized Llama 3.2-1B

This repository contains a quantized version of the Llama 3.2-1B model, optimized for reduced memory footprint and faster inference.

Quantization Details

The model has been quantized using GPTQ (Generative Pretrained Transformer Quantization) with the following parameters:

  • Quantization method: GPTQ
  • Number of bits: 4
  • Dataset used for calibration: c4

Usage

To use the quantized model, you can load it using the load_quantized_model function from the optimum.gptq library: Make sure to replace save_folder with the path to the directory where the quantized model is saved.

Requirements

  • Python 3.8 or higher
  • PyTorch 2.0 or higher
  • Transformers
  • Optimum
  • Accelerate
  • Bitsandbytes
  • Auto-GPTQ

You can install these dependencies using pip:

Disclaimer

This quantized model is provided for research and experimentation purposes. While quantization can significantly reduce model size and improve inference speed, it may also result in a slight decrease in accuracy compared to the original model.

Acknowledgements

  • Meta AI for releasing the Llama 3.2-1B model.
  • The authors of the GPTQ quantization method.
  • The Hugging Face team for providing the tools and resources for model sharing and deployment.