LavaPlanet
/

Goliath120B-exl2_2-2.64bpw

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Goliath120B-exl2_2-2.64bpw / README.md

LavaPlanet's picture

Update README.md

1b7774b 11 months ago

|

history blame contribute delete

525 Bytes

	---
	license: llama2
	---
	Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.64BPW and using the new experimental quant method of exllamav2.


	Pippa llama2 Chat was used as the calibration dataset.

	Can be run on two RTX 3090s w/ 24GB vram each.

	Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use.
	```yaml
	2.64BPW @ 4096 ctx
	Empty Ctx
	GPU Split:18/24
	GPU1: 19.8/24
	GPU2: 21.9/24
	10~ tk/s
	```