Tokenizer of Yi Dare v5

by Nexesenex - opened Jan 16

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

-0

Nexesenex

Jan 16

Required once again, as it was in the v7.
And it works.

Tokenizer of Yi Dare v5f69d60db

brucethemoose changed pull request status to merged Jan 16

brucethemoose

Owner Jan 16

•

edited Jan 16

Thanks.

Note that the tokenizer from the v5 merge is just a copy of Yi's tokenizer, IIRC. This was before I was even aware of mergekit's union tokenizer merge.

It might not work quite right, as some tokens (like the ChatML tokens) are missing.

Nexesenex

Jan 16

•

edited Jan 16

Well, you integrated Bagel, and it works on the same tokenizer despite all the prompt formats it offers.

Here's Jon's views on ChatML :

"ChatML (sort of)

I don't really understand the point of having special tokens for <|im_start|> and <|im_end|>, because in practice they just act as BOS and EOS tokens (but, please correct me if I'm wrong).

So, instead of:

{bos}<|im_start|>{role}
{text}
<|im_end|>{eos}

I just changed it to:

{bos}{role}
{text}
{eos}

https://huggingface.co/jondurbin/bagel-dpo-34b-v0.2#chatml-sort-of

Also, a few tests I made. Is the calibration that you make in your quant possibly affecting so much the wikitext and ptb perplexity?

brucethemoose

Owner Jan 16

•

edited Jan 16

Hmmm, you are not the first person to report something off with the quantization. I test all the merges myself in ooba with 4-bit bitsandbytes, and the perplexity of the raw weights is good.

I ran exllamav2's own perplexity test in wikitext just to rule out ooba:

v8-exl2-4bpw-fiction: 6.2723

v8-exl2-31pw-fiction: 8203.8706

v8-exl2-26pw-fiction: 77592.0066

v7-exl2-31bpw-fiction: 9097.3480

Oof. Yeah something is wrong with the lower quants, possibly all of them.

brucethemoose

Owner Jan 16

•

edited Jan 16

Here's perplexity on the actual .parquet file I quantized with:

v8-exl2-31pw-fiction: 14.1167

v8-exl2-26pw-fiction: 21.5868

Still catastrophic, albeit not hilariously catastrophic like wikitext.

I guess I will take the 3.1bpw and lower quants down? TBH I didn't really notice they were broken because I have only been testing 4bpw locally.

brucethemoose

Owner Jan 16

@turboderp Do you have any idea what's going on here? The quantization commands I used are:

python /home/alpha/AI/exllamav2/convert.py --in_dir /home/alpha/FastModels/v8/v8 -o /home/alpha/FastModels/scratch -om /home/alpha/FastModels/v8meas.json --cal_dataset /home/alpha/Documents/stories.parquet -ml 32768 -mr 8 -ss 4096 -b 4.0 -hb 6 -nr

python /home/alpha/AI/exllamav2/convert.py --in_dir /home/alpha/FastModels/v8/v8 -o /home/alpha/FastModels/scratch -m /home/alpha/FastModels/v8meas.json --cal_dataset /home/alpha/Documents/stories.parquet -l 12288 -r 26 -ml 32768 -mr 8 -ss 4096 -b 4.0 -hb 6 -cf /home/alpha/FastModels/v8-exl2-4bpw-fiction -nr

The measurements file is here: https://huggingface.co/brucethemoose/Yi-34B-200K-DARE-megamerge-v8/blob/main/v8meas.json

Anecdotally I noticed exllama seemed to be allocating 2.2bpw to most everything except a few layers in the middle.

Nexesenex

Jan 16

Glad my little tests could help. I wish Ooba included Hellaswag (I guess it's not just about including the text file lol), as LlamaCPP does.

For your defective quants, check LoneStriker's, his 3bpw Exl2 quant works as intended.

And otherwise, great jobs on your merges!

TheYuriLover

Jan 18

•

edited Jan 18

@brucethemoose

Oof. Yeah something is wrong with the lower quants, possibly all of them.

You should try to use a calibration dataset composed of random tokens, I'm not joking it's probably the best solution to this
https://github.com/ggerganov/llama.cpp/discussions/5006

brucethemoose

Owner Jan 18

Yeah I am in that thread already, the one who tested with exl2.

The jury is still out, but its very interesting. I will do more testing later.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment