--- license: llama2 language: - en pipeline_tag: conversational --- Another EXL2 version of AlpinDale's https://huggingface.co/alpindale/goliath-120b this one being at 2.37BPW. [2.64BPW](https://huggingface.co/LavaPlanet/Goliath120B-exl2-2.64bpw) Pippa llama2 Chat was used as the calibration dataset. Can be run on two RTX 3090s w/ 24GB vram each. Assuming Windows overhead, the following figures should be more or less close enough for estimation of your own use. ```yaml 2.37BPW @ 4096 ctx empty ctx GPU split: 16/24 GPU1: 17.4/24GB GPU2: 19.5/24GB 11~ tk/s 3000+ ctx 8~-12 tk/s ```