Model Information

Quantized version of EleutherAI/pythia-1b-deduped using torch.float32 for quantization tuning.

  • 4 bits (INT4)
  • group size = 128
  • Symmetrical Quantization
  • Method AutoRound (WOQ)

Fast and low memory, 2-3X speedup (slight accuracy drop at W4G128)

Quantization framework: Intel AutoRound

Note: this INT4 version of pythia-1b-deduped has been quantized to run inference through CPU.

Replication Recipe

Step 1 Install Requirements

I suggest to install requirements into a dedicated python-virtualenv or a conda enviroment.

python -m pip install <package> --upgrade
  • accelerate==1.0.1
  • auto_gptq==0.7.1
  • neural_compressor==3.1
  • torch==2.3.0+cpu
  • torchaudio==2.5.0+cpu
  • torchvision==0.18.0+cpu
  • transformers==4.45.2

Step 2 Build Intel AutoRound wheel from sources

python -m pip install git+https://github.com/intel/auto-round.git

Step 3 Script for Quantization

  from transformers import AutoModelForCausalLM, AutoTokenizer
  model_name = "EleutherAI/pythia-1b-deduped"
  model = AutoModelForCausalLM.from_pretrained(model_name)
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  from auto_round import AutoRound
  bits, group_size, sym = 4, 128, True
  autoround = AutoRound(model, tokenizer, nsamples=128, iters=200, seqlen=512, batch_size=4, bits=bits, group_size=group_size, sym=sym)
  autoround.quantize()
  output_dir = "./AutoRound/EleutherAI_pythia-1b-deduped-autoround-int4-gs128-sym"
  autoround.save_quantized(output_dir, format='auto_round', inplace=True)

License

Apache 2.0 License

Disclaimer

This quantized model comes with no warranty. It has been developed only for research purposes.

Downloads last month
12
Safetensors
Model size
314M params
Tensor type
F32
·
I32
·
FP16
·
Inference Examples
Inference API (serverless) has been turned off for this model.

Model tree for fbaldassarri/EleutherAI_pythia-1b-deduped-autoround-int4-gs128-sym

Quantized
(9)
this model

Dataset used to train fbaldassarri/EleutherAI_pythia-1b-deduped-autoround-int4-gs128-sym