Token Speeds for Q5_K_M?
#3
by
Dreifort
- opened
I am trying out the ~22GB .Q5_K_M LLM model on system that uses a RTX 3060 with 12GB VRAM. What sort of speeds should I get from the Q5 model that is 2x my VRAM size? I currently get 0.8 t/s.
And any suggestions in improving the speed (without getting a better GPU)?
Thanks!