Tokenizer of Yi Dare v5
Required once again, as it was in the v7.
And it works.
Thanks.
Note that the tokenizer from the v5 merge is just a copy of Yi's tokenizer, IIRC. This was before I was even aware of mergekit's union tokenizer merge.
It might not work quite right, as some tokens (like the ChatML tokens) are missing.
Well, you integrated Bagel, and it works on the same tokenizer despite all the prompt formats it offers.
Here's Jon's views on ChatML :
"ChatML (sort of)
I don't really understand the point of having special tokens for <|im_start|> and <|im_end|>, because in practice they just act as BOS and EOS tokens (but, please correct me if I'm wrong).
So, instead of:
{bos}<|im_start|>{role}
{text}
<|im_end|>{eos}
I just changed it to:
{bos}{role}
{text}
{eos}
If you really want to use <|im_start|> and <|im_end|>, just update your tokenizer_config.json to use <|im_start|> instead of and <|im_end|> instead of and when tokenizing. And if you still don't like what I've done to this chat-ml-ish format, feel free to cry into your pillow or fork the code and do a new fine-tune."
https://huggingface.co/jondurbin/bagel-dpo-34b-v0.2#chatml-sort-of
Also, a few tests I made. Is the calibration that you make in your quant possibly affecting so much the wikitext and ptb perplexity?
Hmmm, you are not the first person to report something off with the quantization. I test all the merges myself in ooba with 4-bit bitsandbytes, and the perplexity of the raw weights is good.
I ran exllamav2's own perplexity test in wikitext just to rule out ooba:
v8-exl2-4bpw-fiction: 6.2723
v8-exl2-31pw-fiction: 8203.8706
v8-exl2-26pw-fiction: 77592.0066
v7-exl2-31bpw-fiction: 9097.3480
Oof. Yeah something is wrong with the lower quants, possibly all of them.
Here's perplexity on the actual .parquet file I quantized with:
v8-exl2-31pw-fiction: 14.1167
v8-exl2-26pw-fiction: 21.5868
Still catastrophic, albeit not hilariously catastrophic like wikitext.
I guess I will take the 3.1bpw and lower quants down? TBH I didn't really notice they were broken because I have only been testing 4bpw locally.
@turboderp Do you have any idea what's going on here? The quantization commands I used are:
python /home/alpha/AI/exllamav2/convert.py --in_dir /home/alpha/FastModels/v8/v8 -o /home/alpha/FastModels/scratch -om /home/alpha/FastModels/v8meas.json --cal_dataset /home/alpha/Documents/stories.parquet -ml 32768 -mr 8 -ss 4096 -b 4.0 -hb 6 -nr
python /home/alpha/AI/exllamav2/convert.py --in_dir /home/alpha/FastModels/v8/v8 -o /home/alpha/FastModels/scratch -m /home/alpha/FastModels/v8meas.json --cal_dataset /home/alpha/Documents/stories.parquet -l 12288 -r 26 -ml 32768 -mr 8 -ss 4096 -b 4.0 -hb 6 -cf /home/alpha/FastModels/v8-exl2-4bpw-fiction -nr
The measurements file is here: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-megamerge-v8/blob/main/v8meas.json
Anecdotally I noticed exllama seemed to be allocating 2.2bpw to most everything except a few layers in the middle.
Glad my little tests could help. I wish Ooba included Hellaswag (I guess it's not just about including the text file lol), as LlamaCPP does.
For your defective quants, check LoneStriker's, his 3bpw Exl2 quant works as intended.
And otherwise, great jobs on your merges!
Oof. Yeah something is wrong with the lower quants, possibly all of them.
You should try to use a calibration dataset composed of random tokens, I'm not joking it's probably the best solution to this
https://github.com/ggerganov/llama.cpp/discussions/5006
Yeah I am in that thread already, the one who tested with exl2.
The jury is still out, but its very interesting. I will do more testing later.