Not-For-All-Audiences

nsfw

Inference Endpoints

Model card Files Files and versions Community

EXL2 quants

by Cytho - opened Jul 27

Discussion

Cytho

Jul 27

2.5bpw to fit in 48gb of vram would be great

BigHuggyD

Jul 27

2.5bpw to fit in 48gb of vram would be great

Someone might beat me to it but I'll see what I can do

Undi95

NeverSleep org Jul 27

2.5bpw to fit in 48gb of vram would be great

Someone might beat me to it but I'll see what I can do

48GB VRAM 2.75BPW exl2, can apparently fit 12k context
I'm interested too I don't have the compute to do it right now hah

BigHuggyD

Jul 27

Okay, I put the batter in the oven. It's doing measurements now. I usually only make high bpw quants for my own use that I share, and I am too lazy to edit my script, so the lm_head is gonna be big. Hopefully, it won't make it too chonky. If it does I can make another once the measurements file is done. I'm guessing about 5 hours till the first one shows up on my profile. Gonna spit out a 2.5, 2.75, 3.0, 3.25, 3.5, 3.75, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5 and 8. I'll prune later.

Undi95

NeverSleep org Jul 27

Okay, I put the batter in the oven. It's doing measurements now. I usually only make high bpw quants for my own use that I share, and I am too lazy to edit my script, so the lm_head is gonna be big. Hopefully, it won't make it too chonky. If it does I can make another once the measurements file is done. I'm guessing about 5 hours till the first one shows up on my profile. Gonna spit out a 2.5, 2.75, 3.0, 3.25, 3.5, 3.75, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5 and 8. I'll prune later.

Fucking hero

BigHuggyD

Jul 28

2.5bpw to fit in 48gb of vram would be great

Ok took waaay longer than I thought but 2.5bpw is up on my profile.

I tested it on a 48gb GPU and it was spittin out lyrics with 16k context and no cache to RAM.

Undi95

NeverSleep org Jul 28

Thank you!

BigHuggyD

Jul 28

Thank you!

You bet! You keep doing the big stuff, and I'm glad to help where I can.

Adzeiros

Jul 28

@BigHuggyD Any eta on the 2.75? I run 2.75 of the Mistral Large with 32k context so curious if I could run this as well. Hope it's not bigger than the original model

BigHuggyD

Jul 28

•

edited Jul 28

@Adzeiros it's already up? Done cooking 2.5, 2.75, 3.0 and 3.25

https://huggingface.co/BigHuggyD/NeverSleep_Lumimaid-v0.2-123B_exl2_2.75bpw_h8

Lumimaid has some extra girth compared to Largestral so it might be bigger.

jackboot

Jul 28

4.25 fits 32k+ at 72g. 4.5 fits only 10k.

Nobody has made a 4.25 yet for this model.

Undi95

NeverSleep org Jul 28

@Adzeiros it's already up? Done cooking 2.5, 2.75, 3.0 and 3.25

https://huggingface.co/BigHuggyD/NeverSleep_Lumimaid-v0.2-123B_exl2_2.75bpw_h8

Grabbing 2.75 asap

BigHuggyD

Jul 28

4.25 fits 32k+ at 72g. 4.5 fits only 10k.

Nobody has made a 4.25 yet for this model.

Is that using cache or uncached? I could add a 4.25 to the todo list but I have to wait for this batch to finish.

jackboot

Jul 28

Its using Q4 cache, no CFG.

No rush, just letting people know if they don't have 4x24 what will fit.

BigHuggyD

Jul 30

Its using Q4 cache, no CFG.

No rush, just letting people know if they don't have 4x24 what will fit.

The first batch is finally done. I have added 4.25 to the queue. Based on how long the other quants have taken it should be completed and uploaded in about 5 hours.

BigHuggyD

Jul 31

Undi95

NeverSleep org Jul 31

•

edited Jul 31

picture

Thanks a lot for this!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment