what is better? IQ2_M or Q2_K

by Autumnlight - opened 5 days ago

5 days ago

I can only run those two, which one should I use?

luzamu

5 days ago

Owner 4 days ago

that's a good chart to reference ^

if you can run both and all other things are equal, use Q2_K

2 days ago

Out of curiousity, is inferrence also slower if I had, lets say 4 3090 vs 2 3090? are higher quants slower?

Owner 1 day ago

Higher quants will be slower because they just require more data to be moved through memory

Llamacpp doesn't do amazing splitting across cards, but 4090s will be a good bit faster than 3090s

1 day ago

I see, thank you for the info. I think I'll stick with 3090 tho since price difference is like 650 vs 1500

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment