EXL2 quants

#3
by Cytho - opened

2.5bpw to fit in 48gb of vram would be great

2.5bpw to fit in 48gb of vram would be great

Someone might beat me to it but I'll see what I can do

NeverSleep org

2.5bpw to fit in 48gb of vram would be great

Someone might beat me to it but I'll see what I can do

48GB VRAM 2.75BPW exl2, can apparently fit 12k context
I'm interested too I don't have the compute to do it right now hah

Okay, I put the batter in the oven. It's doing measurements now. I usually only make high bpw quants for my own use that I share, and I am too lazy to edit my script, so the lm_head is gonna be big. Hopefully, it won't make it too chonky. If it does I can make another once the measurements file is done. I'm guessing about 5 hours till the first one shows up on my profile. Gonna spit out a 2.5, 2.75, 3.0, 3.25, 3.5, 3.75, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5 and 8. I'll prune later.

NeverSleep org

Okay, I put the batter in the oven. It's doing measurements now. I usually only make high bpw quants for my own use that I share, and I am too lazy to edit my script, so the lm_head is gonna be big. Hopefully, it won't make it too chonky. If it does I can make another once the measurements file is done. I'm guessing about 5 hours till the first one shows up on my profile. Gonna spit out a 2.5, 2.75, 3.0, 3.25, 3.5, 3.75, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5 and 8. I'll prune later.

Fucking hero

2.5bpw to fit in 48gb of vram would be great

Ok took waaay longer than I thought but 2.5bpw is up on my profile.

I tested it on a 48gb GPU and it was spittin out lyrics with 16k context and no cache to RAM.

NeverSleep org

Thank you!

Thank you!

You bet! You keep doing the big stuff, and I'm glad to help where I can.

@BigHuggyD Any eta on the 2.75? I run 2.75 of the Mistral Large with 32k context so curious if I could run this as well. Hope it's not bigger than the original model

@Adzeiros it's already up? Done cooking 2.5, 2.75, 3.0 and 3.25

https://huggingface.co/BigHuggyD/NeverSleep_Lumimaid-v0.2-123B_exl2_2.75bpw_h8

Lumimaid has some extra girth compared to Largestral so it might be bigger.

4.25 fits 32k+ at 72g. 4.5 fits only 10k.

Nobody has made a 4.25 yet for this model.

NeverSleep org

@Adzeiros it's already up? Done cooking 2.5, 2.75, 3.0 and 3.25

https://huggingface.co/BigHuggyD/NeverSleep_Lumimaid-v0.2-123B_exl2_2.75bpw_h8

Grabbing 2.75 asap

4.25 fits 32k+ at 72g. 4.5 fits only 10k.

Nobody has made a 4.25 yet for this model.

Is that using cache or uncached? I could add a 4.25 to the todo list but I have to wait for this batch to finish.

Its using Q4 cache, no CFG.

No rush, just letting people know if they don't have 4x24 what will fit.

Its using Q4 cache, no CFG.

No rush, just letting people know if they don't have 4x24 what will fit.

The first batch is finally done. I have added 4.25 to the queue. Based on how long the other quants have taken it should be completed and uploaded in about 5 hours.

Screenshot_20240730_210042_Chrome.jpg

NeverSleep org
edited Jul 31

picture

Thanks a lot for this!

Sign up or log in to comment