Quant method

by gcapnias - opened Apr 11

Discussion

gcapnias

Apr 11

Just a question,

Why just 3bit and 5bit quantization models? Usually, all the models start with 4bit quantization.

I was looking in order to create run the model under Ollama. Usually, the 4bit models are used in order to be light.

George J.

soksof

Institute for Language and Speech Processing org Apr 11

We selected to share the Q5_K_M model because it provides better performance with a "small" difference in memory requirements, as well as the Q3 version, which is of lower quality but can run in lower-end GPUs.
If you are interested in a 4-bit version of the model you can find an AWQ one here: https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1-AWQ.

For ollama, we have uploaded a 4bit version here: https://ollama.com/ilsp/meltemi-instruct:q4.1

gcapnias

Apr 11

Great,

Thanks a lot!

soksof changed discussion status to closed Apr 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment