Wrong prompt format in tokenizer_config.json?
The chat_template specified in tokenizer_config.json is ChatML, but apparently this model uses the (weird) GPT4 Correct prompt format. Please clarify which is the correct prompt format/chat template and kindly state it on the model card, and make sure tokenizer_config.json also has the proper template. Thank you!
@mlabonne What's the actual chat template? In your tokenizer_config.json, the chat_template is set to ChatML, but the models your mix is made of are using a GPT4 Correct prompt format. How do you prompt it properly?
I used TheBloke's GGUF because the HF version crashed with the error message "RuntimeError: CUDA error: device-side assert triggered". Is that a known issue or just a problem on my end?
Yeah, I managed to make it work with ChatML without any issues but it looks like this depends on your config. There's no pre-defined chat template. As you said, this is a merge of several models that use the GPT4 Correct prompt format, but these tokens are not implemented. I tried a few configs and I'm opting for a modified GPT4 Correct prompt format with a different eos token. I believe it's the best solution but I haven't tested it thoroughly. The CUDA error is also fixed.