General discussion.
WARNING: GGUF versions might be broken.
Welp, are they really? 👀
I am not sure. The original versions were badly broken in many tools (I tried in llama-cpp-python
and ooba) where even the <|im_start|>
and <|im_end|>
were not tokenized correctly.
I thought it might have been because tokenizer.json
was missing from the original, so I added it and re-ran the process again. I still have to do more testing to be sure.
Would love to hear from someone who tried both the fp16 and gguf if it seems broken :D
I can't really run the full F16 but I'll try with Imatrix GGUF quants and see if anything seems badly broken.
I think it's still broken.
The output of the full model looks okay, but I am getting this type of gibberish with Q8_0 GGUF:as she looked down at her with her arms still behind her head as she spoke again with her smile still present as she spoke again with her eyes still closed as she spoke again with her arms still behind her head as she spoke again with her eyes still closed as she spoke again with her arms still behind her head as she spoke again with her eyes still closed as she spoke again with her arms still behind her head as she spoke again with her eyes still closed as she spoke again with her arms still behind her head
@jirka642 Can you confirm this is also the case with my Quants?
I didn't seem to experience that somehow.
https://huggingface.co/Lewdiculous/opus-v1.2-7b-GGUF-IQ-Imatrix
Sorry, ignore that. After testing your quants, the quants from this repo suddenly started working too...
I didn't change any parameters or the prompt, so I don't know what was the issue.
I think for this model it's very important that you're using the correct prompt format preset, it seems very sensitive.
Hi @jirka642 -- that might be also down to sampling params, I use the following:
temperature: 0.8 (or less)
min_p: 0.05 (or a bit more)
frequency_pentaly, presence_penalty: 0.1
repetition_pentaly: 1.1
And as @Lewdiculous said, the model might be sensitive to the prompt template -- at least the correct ChatML+text (where assistant role is replace with text role -- more in the docs) and the first lines of the system message.
Some software might tokenize wrong, but this is more the case for the Yi base 34B model (since Yi has nonstandard tokenizer settings).
For the 34B, the correct tokenization for "<|im_start|>system\nHello!" would be:
# Common software bugs here are that there's BOS at the start (token in 1) and that the "system" gets tokenized as `▁system` with token id 1328.
# Yi models should not have BOS and not not have the `▁` in this case.
['<|im_start|>', 'system', '\n', 'Hello', '!']
[6, 10707, 144, 25102, 99]
For the 7B, the correct tokenization for "<|im_start|>system\nHello!" would be:
['<|im_start|>', '▁system', '<0x0A>', 'Hello', '!']
[32000, 1587, 13, 16230, 28808]