My alternative quantizations.

#16

by ZeroWw - opened Jul 23

Discussion

ZeroWw

Jul 23

•

edited Jul 23

These are my own quantizations (updated almost daily).

The difference with normal quantizations is that I quantize the output and embed tensors to f16.
and the other tensors to 15_k,q6_k or q8_0.
This creates models that are little or not degraded at all and have a smaller size.
They run at about 3-6 t/sec on CPU only using llama.cpp
And obviously faster on computers with potent GPUs

ZeroWw/Meta-Llama-3.1-8B-Instruct-GGUF

danielus

Jul 23

Bro your copy paste message not work, the first link goes to a 404 page, please show some commitment by keeping your copied and pasted message up to date to prove that you are not just a spam. That said, thanks anyway for your quantized models <3

ZeroWw

Jul 23

Bro your copy paste message not work, the first link goes to a 404 page, please show some commitment by keeping your copied and pasted message up to date to prove that you are not just a spam. That said, thanks anyway for your quantized models <3

"Bro", I don't know why you say that. The link works. I just tested it.

danielus

Jul 24

Bro your copy paste message not work, the first link goes to a 404 page, please show some commitment by keeping your copied and pasted message up to date to prove that you are not just a spam. That said, thanks anyway for your quantized models <3

"Bro", I don't know why you say that. The link works. I just tested it.

As if I didn't see the * edited * label above the message. 😂😂

ZeroWw

Jul 24

•

edited Jul 24

My answer would have been "thank you" if you didn't mention spam.
And anyway it's the last link the important one.

ZeroWw changed discussion status to closed Jul 24

ZeroWw changed discussion status to open Jul 24

danielus

Jul 24

My answer would have been "thank you" if you didn't mention spam.
And anyway it's the last link the important one.

I said spam not to denigrate your work, on the contrary, I thank you for what you do, I simply wanted to point out to you since I see you in every model and you always use the same message with broken links, it looks like typical suspicious spam behaviour, especially that even the model card of the models you quantize are always copied and pasted. Mine was meant as a note to possibly improve your work. See ya :)

ZeroWw

Jul 24

My answer would have been "thank you" if you didn't mention spam.
And anyway it's the last link the important one.

I said spam not to denigrate your work, on the contrary, I thank you for what you do, I simply wanted to point out to you since I see you in every model and you always use the same message with broken links, it looks like typical suspicious spam behaviour, especially that even the model card of the models you quantize are always copied and pasted. Mine was meant as a note to possibly improve your work. See ya :)

Sincerely I copied and pasted the links by hand, and I do my quantizations on all models I find interesting for a reason or another.
Thank you for pointing out the mistake (it was in the original readme which is the heading I copy and paste).
It was just the "spam" word that triggered me.

Thanks anyways.

danielus

Jul 24

It was just the "spam" word that triggered me.

Yeah yeah, sorry for that, my bad, i use that word without considering that I might offend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment