IQ3_M quant busted, IQ3_XXS okay

by anxcat - opened Aug 9

Aug 9

•

Hey, I grabbed the IQ3_M and inference with it (in Ooba) just produces an endless string of zeroes, no coherent output. I then downloaded IQ3_XXS and tested it in the same environment and it's working perfectly, so it may be just that quant.

Edit: Oof, disregard this, sorry. Turns out this model needs the BoS token disabled when used in the Notebook for some reason. That's why it appeared to work with one quant but not the other. Both quants actually work fine if I disable the checkbox to prepend the BoS string. Sorry for the timewasting, I'd delete this but HF doesn't allow it for some reason.

anxcat changed discussion status to closed Aug 9

anxcat changed discussion status to open Aug 9

anxcat changed discussion status to closed Aug 9

bartowski

Owner Aug 9

No worries! good to leave it up in case someone finds similar issues anyways ;D

anxcat changed discussion status to open Aug 9

m33393

Aug 9

I'm getting garbage output on latest koboldcpp, IQ4_XS

anxcat

Aug 9

•

edited Aug 9

I'm getting garbage output on latest koboldcpp, IQ4_XS

Does koboldcpp have an equivalent of this setting from Oobabooga?

In Ooba, unchecking this fixed it for me. I was getting random alphanumeric garbage output initially, but after unchecking this it works fine. So see if koboldcpp has an option to disable the BoS token, it appears to be the presence of that token that breaks it.

Regular Mistral Large does not need this btw, it seems to be a quirk of Tess.

L285348120816

Aug 13

I'm having the same issue with the Q5_K_M quant in LM Studio 0.2.31, can't find BoS string disable equivalent in LM Studio.

BaronRabban

Aug 14

In ooba the "Add the bos_token to the beginning" checkbox is only available under parameters if I switch the loader to "llamacpp_HF", but this model cannot load using that loader... it only loads with the llama.cpp loader. The llama.cpp parameter settings does not have this checkbox. Am I missing something?

anxcat

Aug 14

•

edited Aug 14

In ooba the "Add the bos_token to the beginning" checkbox is only available under parameters if I switch the loader to "llamacpp_HF", but this model cannot load using that loader... it only loads with the llama.cpp loader. The llama.cpp parameter settings does not have this checkbox. Am I missing something?

Yeah you have to convert it to HF format first, but that can be done quickly and easily from within ooba itself on the model page, using the "llamacpp_HF_creator" tab on the right hand side:

As in my screenshot, select your GGUF and then paste in the URL of the original unquantized model. Which in in this case is: https://huggingface.co/migtissera/Tess-3-Mistral-Large-2-123B

Then click Submit and that's it, it will auto-convert it to HF format by downloading the required config files to a new subfolder and moving the GGUF into it. Should only take a few seconds. Then refresh your model list or restart ooba to see the converted model.

BaronRabban

Aug 14

Thanks for that guidance it was very easy and got things working in notebook mode. Appreciate it.

bartowski

Owner Aug 15

•

edited Aug 16

I think the issue arrises from the fact that migtissera set the BOS token to <|im_start|> which then confuses the hell out of llamacpp

I've already talked to him, he'll use the proper <s> in the future :D

BaronRabban

Aug 16

Thanks @bartowski is there any indication when a fixed Tess model will be released? I would love to use it on Kobold but it doesn't support any BOS token fixes

bartowski

Owner Aug 16

I don't know if he has any intention to re-train it with the proper, but maybe I can see if just changing it in the JSON will work.. it may degrade output is my only concern

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment