vllm

Add EXL2, INT8, and/or INT4 version of the model, PLEASE!

#21
by Abdelhak - opened

The model is too big to run for people with less than 24GB. Please, make a quantized version of it.

Abdelhak changed discussion title from Add am EXL2, INT8, and/or INT4 of the model, PLEASE! to Add EXL2, INT8, and/or INT4 version of the model, PLEASE!

It is taking 60GB of ram for me, and taking around 15 minutes to process each prompt, running on CPU. We really need a Quantized version

exllamav2 doesn't support vision fwiw

Sign up or log in to comment