Trained on ternary bits?

by LLMToaster - opened 4 days ago

4 days ago

Was it trained with 3 integer values only (-1,0+1) from start? Or is it a quantized model from full model i.e. full model was compressed into this? If it's compressed from full model i.e not trained from start (with ternary bits) then does not it affect quality of responses ? 😕

Why is it 16 bit on hugging face when downloaded? Does not it inversely affect quality of generation and speed?

supercharge19

4 days ago

I believe what you are looking for is this: tiiuae/Falcon3-10B-Instruct-1.58bit-GGUF
Though I've not tested either of these quantized models but I think this one was trained and is supposed to be better than its guff counterpart. @ybelkada right?

The full model is, at least for text to text tasks, as good as gpt-4o-mini. Try full if you can.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment