General discussion.
This is considered a highly experimental model.
Feedback for authors is welcome if you can test.
@Lewdiculous seeing an error with mmproj. Pretty much every model I try from you, including Puppy, will output the full 500 tokens of which it turns to junk after 150-200 tokens. The only model I'm not seeing this issue with is Poppy Porpoise 0.72. I'm wondering if when you do the quants now, are you still changing the config files? I suspect there is some error manifesting in multimodal due to a bad stopping string/EOT token.
I do change the configs with the llama-bpe configs, I'll try to use your original configs instead.
I do change the configs with the llama-bpe configs, I'll try to use your original configs instead.
Oh man don't worry about it I was just checking to see if there was any difference between poppy and puppy quants. I already tried a straight quant last night and it was borked, but I waited to see yours before I drew the conclusion. I'll talk to Nitral when he gets off work to see if there's anything I've missed.
That image.....now I want to try this one too.....
@Nitral-AI All good, just for reference, in Poppy Porpoise 0.72 I also used the llama-bpe configs.
The model name and the image are so good though. I hope Puppy-chan makes a comeback.
@Lewdiculous found the issue but no word on the cause:
Somehow my EOS token is wrong, I'll take a look at the files to see if I can fix easily.
I've checked the configs and they are identical. Downloading repo now to see if manually changing tokenizer_config.json will fix it.
Alright. If that does it you can catbox me the configs for the next run to be proper.
Alright. If that does it you can catbox me the configs for the next run to be proper.
@Lewdiculous Do you remember how we had to delete tokenizer.model? Did you also have to add the new Llama 3 config files to get conversion working? I did, and I've pretty much narrowed it down to that being the point of issue. Still downloading last file because I missed it in the pull, but will know in a few minutes.
I resolved the mismatch of the EOS token, but it did not solve the issue.
Did you also have to add the new Llama 3 config files to get conversion working?
Not necessarily, I think it was possible without replacing them, basically using your files as they are. The issue was that leftover file.
But this makes sense to me:
I ended up getting a new 4K_M with the corrected EOS but it didn't seem to fix the issue. I would ask you to gen one (because I feel like I'm doing it wrong,) but I'm not sure it would fix anything. Nitral is testing with some other mmproj now and it's having the same issue. I don't really understand why this is happening as I made no significant changes to anything.
@Nitral-AI All good, just for reference, in Poppy Porpoise 0.72 I also used the llama-bpe configs.
Hopefully the model is stopping for gguf user's then :kek:
@Lewdiculous maybe you could link me to a doc or something that explains the proper way to quant. I followed a llamacpp issue but it was just convert and quant and as I understand it there should be another step now?
To be fair all they gave us was the PR, but things are explained at least.
There's no need to set vacab type manually anymore.
I added some guidance to the script page, see the Llama-3 warning.
You should be good just following this.
During the model download portion, if you want to replace configs, you can do so that at time in the models/{model} folder and the rest of the process should continue. Use the lossless version.
https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script
Are you performing all steps manually?
Honestly if the configs are already good just Download the model, convert it to BF16-GGUF using the hf-gguf script, then convert that to the quants you want.
There's the hf-gguf update script to download the configs but you're saying these are wrong?
@Nitral-AI - The thing is, they changed the config in the Instruct model but the llama "documentation" - PR - says to use the llama-bpe configs fetched by the ...update.py script.
I'd figure if things changed the script would reflect the new configs if they are necessary for proper functionality.
I asked about that but it's fine if y'all prefer the included/upstream configs, doesn't matter for the process, just something I need to be aware of and I'll do it accordingly. I don't expect either way to cause issues.
But I also don't want to run broken quants anyway so I'm in to get things right. β Tho so far there haven't been tokenizer/context formatting issues reported.
I asked about that but it's fine if y'all prefer the included/upstream configs, doesn't matter for the process, just something I need to be aware of and I'll do it accordingly. I don't expect either way to cause issues.
But I also don't want to run broken quants anyway so I'm in to get things right. β Tho so far there haven't been tokenizer/context formatting issues reported.
I found that it doesn't make a difference whether I run the quant you did or the quant I fixed EOS on Puppy, it still handles text fine either way. The only issue is when running it with the mmproj. So go ahead and leave it up, I'm running your imat quant for two days now no issue with the text output.
I asked about that but it's fine if y'all prefer the included/upstream configs, doesn't matter for the process, just something I need to be aware of and I'll do it accordingly. I don't expect either way to cause issues.
But I also don't want to run broken quants anyway so I'm in to get things right. β Tho so far there haven't been tokenizer/context formatting issues reported.
Hid my previous comment, seemed a bit snappy which was not my intention. Tested both config setups with exl2 and seeing very equalized performance stop wise, given the tokens act in a similar manner. Maybe it does not matter, but i have not gone into any kind of extensive long context beyond just checking for stopping behavior.