Performance Drop due to quantization?
#34
by
Teja-Gollapudi
- opened
Hi,
Are there any benchmark comparisions for the Quantized model vs the full model?
I want to gauge the performance drop introduced by quantization.
Thank you!
Teja-Gollapudi
changed discussion title from
Benchmark comparison
to Performance Drop due to quantization?
Did you manage to find any comparison?
Never got around to doing it π.
4 bits are roughly 95 percent as accurate as full precision model
I found a couple of subreddits discussing that topic: