GGML Quantize

#1
by thewh1teagle - opened

It would be great if you could offer quantized versions for the model.
Many users doesn't have enough GPU vram and the model is too heavy and also run slow for them.
You can quantize the model just like the author of whisper.cpp does every new release.
For instance see this page and notice the models with q suffix.
https://huggingface.co/ggerganov/whisper.cpp/tree/main

See https://github.com/thewh1teagle/vibe/blob/main/docs/MODELS.md#prepare-your-own-models
And https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#quantization
about how to quantize

Sign up or log in to comment