smpanaro's picture
Create README.md
856ca1f verified
metadata
license: mit
datasets:
  - wikitext

pythia-6.9b quantized to 4-bit using AutoGPTQ.

To use, first install AutoGPTQ:

pip install auto-gptq

Then load the model from the hub:

from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
Model 4-Bit Perplexity 16-Bit Perplexity Delta
smpanaro/pythia-70m-AutoGPTQ-4bit-128g 49.125 - -
smpanaro/pythia-160m-AutoGPTQ-4bit-128g 33.4375 23.3024 10.1351
smpanaro/pythia-410m-AutoGPTQ-4bit-128g 21.4688 13.9838 7.485
smpanaro/pythia-1b-AutoGPTQ-4bit-128g 12.0391 11.6178 0.4213
smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g 10.9609 10.4391 0.5218
smpanaro/pythia-2.8b-AutoGPTQ-4bit-128g 9.8281 9.0028 0.8253
smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g 8.5078 8.2257 0.2821

Wikitext perplexity measured as in the huggingface docs, lower is better