Trained on ternary bits?

#3
by LLMToaster - opened

Was it trained with 3 integer values only (-1,0+1) from start? Or is it a quantized model from full model i.e. full model was compressed into this? If it's compressed from full model i.e not trained from start (with ternary bits) then does not it affect quality of responses ? πŸ˜•

Why is it 16 bit on hugging face when downloaded? Does not it inversely affect quality of generation and speed?

I believe what you are looking for is this: tiiuae/Falcon3-10B-Instruct-1.58bit-GGUF
Though I've not tested either of these quantized models but I think this one was trained and is supposed to be better than its guff counterpart. @ybelkada right?

The full model is, at least for text to text tasks, as good as gpt-4o-mini. Try full if you can.

Sign up or log in to comment