This is a 4bit quant of https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b
My secret sauce:
- Using comit 3c16fd9 of 0cc4m's GPTQ fork
- Using PTB as the calibration dataset
- Act-order, True-sequential, perfdamp 0.1 (the default perfdamp is 0.01)
- No groupsize
- Will run with CUDA, does not need triton.
- Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.
Benchmark results
Model | C4 | WikiText2 | PTB |
---|---|---|---|
This Quant | 7.326207160949707 | 4.957101345062256 | 24.941526412963867 |
Aela's Quant here | x.xxxxxx | x.xxxxxx | x.xxxxxx |