Thanks!

by SekkSea - opened Sep 19, 2023

Sep 19, 2023

This is the first time I have been able to load a 34b model on my budget 3060! With 12gb of vram, the 2.55 bit variation mostly loads on my GPU, with a little spilling over into the CPU at 2048 context.

Ixel1

Sep 20, 2023

Indeed, this is the best AI model I've used so far which also fits on a single 3090. I'm using the 5_0-bpw-h8-evol-ins variant. Thanks from me too.

latimar

Owner Sep 21, 2023

wow, glad it's useful.
@SekkSea what's your experience with 2.55 variant? Is it actually usable and helpful?
It's a shame Phind did not make a 13B variant of the model, I'd love to compare compare 2.55 quant of 34B model with different quants of 13B...

SekkSea

Sep 24, 2023

•

edited Sep 24, 2023

From what I've seen, I think the quality of 2.55-bit 34b exceeds comparable 6-bit or 8-bit 13b models, but that's just my own subjective opinion. 34b models like this one are usable at 2 bpw, but the replies take a while, so it's probably not the sweet spot for 12 VRAM. It's fun to use on occasion, though, because of the higher quality responses.

For the most part, I'm using 4 bpw 13b models for 4k context, 4.65 bpw 13b models for 3k context, and 3 bpw 20b models for ~2k context.

Hisma

Oct 25, 2023

@latimar , can you or someone else explain why the perplexity scores are worse on the "5_0-bpw-h8-evol-ins" model versus the "5_0-bpw-h8" model?
I would assume fine-tuning the model would improve the scores?
Also, in my personal non scientific test, I give both LLMs a coding challenge, and the "5_0-bpw-h8-evol-ins" model gave a better response than the "5_0-bpw-h8" model. So anecdotally, "5_0-bpw-h8-evol-ins" is a better performing model for me, despite the worse PPL score.

latimar

Owner Oct 25, 2023

@Hisma 5_0-bpw-h8-evol-ins was converted using different calibration dataset, not wikitext, but evol-instruct. It has worse ppl score on wikitext, yes, but its coding abilities are actually better that 5_0-bpw-h8. The better metric to compare different quants would be HumanEval score, or at least ppl score on evol-instruct dataset.

Hisma

Oct 25, 2023

Got it, thank you. Would have been useful to include the humaneval scores with these models too like you did in your supercoder models. But regardless, I can definitely confirm there is noticeablely better coding performance on 5_0-bpw-h8-evol-ins, so based on what you're saying this all makes sense. Thank you for explaining!

latimar

Owner Oct 26, 2023

@Hisma well, it was a first naive attempt to quantize phind, I only measured PPL at the moment and thought it was enough. I was young and stupid back then =) I'll probably update the README in the repo to include HumanEval scores, but I want to finish making new phind quants first.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment