Token Speeds for Q5_K_M?

by Dreifort - opened May 3

May 3

I am trying out the ~22GB .Q5_K_M LLM model on system that uses a RTX 3060 with 12GB VRAM. What sort of speeds should I get from the Q5 model that is 2x my VRAM size? I currently get 0.8 t/s.

And any suggestions in improving the speed (without getting a better GPU)?

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment