Quant method
#1
by
gcapnias
- opened
Just a question,
Why just 3bit and 5bit quantization models? Usually, all the models start with 4bit quantization.
I was looking in order to create run the model under Ollama. Usually, the 4bit models are used in order to be light.
George J.
We selected to share the Q5_K_M model because it provides better performance with a "small" difference in memory requirements, as well as the Q3 version, which is of lower quality but can run in lower-end GPUs.
If you are interested in a 4-bit version of the model you can find an AWQ one here: https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1-AWQ.
For ollama, we have uploaded a 4bit version here: https://ollama.com/ilsp/meltemi-instruct:q4.1
Great,
Thanks a lot!
soksof
changed discussion status to
closed