what is better? IQ2_M or Q2_K
#1
by
Autumnlight
- opened
I can only run those two, which one should I use?
that's a good chart to reference ^
if you can run both and all other things are equal, use Q2_K
Out of curiousity, is inferrence also slower if I had, lets say 4 3090 vs 2 3090? are higher quants slower?
Higher quants will be slower because they just require more data to be moved through memory
Llamacpp doesn't do amazing splitting across cards, but 4090s will be a good bit faster than 3090s
I see, thank you for the info. I think I'll stick with 3090 tho since price difference is like 650 vs 1500