Add EXL2, INT8, and/or INT4 version of the model, PLEASE!
#21
by
Abdelhak
- opened
The model is too big to run for people with less than 24GB. Please, make a quantized version of it.
Abdelhak
changed discussion title from
Add am EXL2, INT8, and/or INT4 of the model, PLEASE!
to Add EXL2, INT8, and/or INT4 version of the model, PLEASE!
It is taking 60GB of ram for me, and taking around 15 minutes to process each prompt, running on CPU. We really need a Quantized version
There is an nf4 version here:
https://huggingface.co/mistralai/Pixtral-12B-2409/discussions/21#66f347780dc1833d4e484073
exllamav2 doesn't support vision fwiw