askmyteapot
/

GPT4-x-AlpacaDente2-30b-4bit

Text Generation

Inference Endpoints

Model card Files Files and versions Community

GPT4-x-AlpacaDente2-30b-4bit / README.md

askmyteapot's picture

Update README.md

9914282 over 1 year ago

|

918 Bytes

	## This is a 4bit quant of https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b



	# My secret sauce:
	* Using comit <a href="https://github.com/0cc4m/GPTQ-for-LLaMa/tree/3c16fd9c7946ebe85df8d951cb742adbc1966ec7">3c16fd9</a> of 0cc4m's GPTQ fork
	* Using PTB as the calibration dataset
	* Act-order, True-sequential, percdamp 0.1
	(<i>the default percdamp is 0.01</i>)
	* No groupsize
	* Will run with CUDA, does not need triton.
	* Quant completed on a 'Premium GPU' and 'High Memory' Google Colab.

	## Benchmark results

	\|<b>Model<b>\|<b>C4<b>\|<b>WikiText2<b>\|<b>PTB<b>\|
	\|:---:\|---\|---\|---\|
	\|Aeala's FP16\|7.05504846572876\|4.662261962890625\|24.547462463378906\|
	\|This Quant\|7.326207160949707\|4.957101345062256\|24.941526412963867\|
	\|Aeala's Quant <a href="https://huggingface.co/Aeala/GPT4-x-AlpacaDente2-30b/resolve/main/4bit.safetensors">here</a>\|7.332120418548584\|5.016242980957031\|25.576189041137695\|