IQ3_M quant busted, IQ3_XXS okay

#1
by anxcat - opened

Hey, I grabbed the IQ3_M and inference with it (in Ooba) just produces an endless string of zeroes, no coherent output. I then downloaded IQ3_XXS and tested it in the same environment and it's working perfectly, so it may be just that quant.

Edit: Oof, disregard this, sorry. Turns out this model needs the BoS token disabled when used in the Notebook for some reason. That's why it appeared to work with one quant but not the other. Both quants actually work fine if I disable the checkbox to prepend the BoS string. Sorry for the timewasting, I'd delete this but HF doesn't allow it for some reason.

anxcat changed discussion status to closed
anxcat changed discussion status to open
anxcat changed discussion status to closed

No worries! good to leave it up in case someone finds similar issues anyways ;D

anxcat changed discussion status to open

I'm getting garbage output on latest koboldcpp, IQ4_XS

I'm getting garbage output on latest koboldcpp, IQ4_XS

Does koboldcpp have an equivalent of this setting from Oobabooga?

image.png

In Ooba, unchecking this fixed it for me. I was getting random alphanumeric garbage output initially, but after unchecking this it works fine. So see if koboldcpp has an option to disable the BoS token, it appears to be the presence of that token that breaks it.

Regular Mistral Large does not need this btw, it seems to be a quirk of Tess.

I'm having the same issue with the Q5_K_M quant in LM Studio 0.2.31, can't find BoS string disable equivalent in LM Studio.

In ooba the "Add the bos_token to the beginning" checkbox is only available under parameters if I switch the loader to "llamacpp_HF", but this model cannot load using that loader... it only loads with the llama.cpp loader. The llama.cpp parameter settings does not have this checkbox. Am I missing something?

In ooba the "Add the bos_token to the beginning" checkbox is only available under parameters if I switch the loader to "llamacpp_HF", but this model cannot load using that loader... it only loads with the llama.cpp loader. The llama.cpp parameter settings does not have this checkbox. Am I missing something?

Yeah you have to convert it to HF format first, but that can be done quickly and easily from within ooba itself on the model page, using the "llamacpp_HF_creator" tab on the right hand side:

image.png

As in my screenshot, select your GGUF and then paste in the URL of the original unquantized model. Which in in this case is: https://huggingface.co/migtissera/Tess-3-Mistral-Large-2-123B

Then click Submit and that's it, it will auto-convert it to HF format by downloading the required config files to a new subfolder and moving the GGUF into it. Should only take a few seconds. Then refresh your model list or restart ooba to see the converted model.

Thanks for that guidance it was very easy and got things working in notebook mode. Appreciate it.

I think the issue arrises from the fact that migtissera set the BOS token to <|im_start|> which then confuses the hell out of llamacpp

I've already talked to him, he'll use the proper <s> in the future :D

Thanks @bartowski is there any indication when a fixed Tess model will be released? I would love to use it on Kobold but it doesn't support any BOS token fixes

I don't know if he has any intention to re-train it with the proper, but maybe I can see if just changing it in the JSON will work.. it may degrade output is my only concern

Sign up or log in to comment