metadata
license: mit
datasets:
- wikitext
pythia-6.9b quantized to 4-bit using AutoGPTQ.
To use, first install AutoGPTQ:
pip install auto-gptq
Then load the model from the hub:
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name = "smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
Model | 4-Bit Perplexity | 16-Bit Perplexity | Delta |
---|---|---|---|
smpanaro/pythia-70m-AutoGPTQ-4bit-128g | 49.125 | - | - |
smpanaro/pythia-160m-AutoGPTQ-4bit-128g | 33.4375 | 23.3024 | 10.1351 |
smpanaro/pythia-410m-AutoGPTQ-4bit-128g | 21.4688 | 13.9838 | 7.485 |
smpanaro/pythia-1b-AutoGPTQ-4bit-128g | 12.0391 | 11.6178 | 0.4213 |
smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g | 10.9609 | 10.4391 | 0.5218 |
smpanaro/pythia-2.8b-AutoGPTQ-4bit-128g | 9.8281 | 9.0028 | 0.8253 |
smpanaro/pythia-6.9b-AutoGPTQ-4bit-128g | 8.5078 | 8.2257 | 0.2821 |
Wikitext perplexity measured as in the huggingface docs, lower is better