Undi95
/

Llama-3-Unholy-8B-GGUF

It is weird. I see your GGUF has the right the tokenizer.ggml.pre llama-bpe flag yet when I look at it in the debugger the tokenization is broken in your version of GGUF.
I don't have enough space to try the non-GGUF version with pytorch to see if same issue is with the original unquantized version.

Undi95

Owner May 4

It is weird. I see your GGUF has the right the tokenizer.ggml.pre llama-bpe flag yet when I look at it in the debugger the tokenization is broken in your version of GGUF.
I don't have enough space to try the non-GGUF version with pytorch to see if an issue with the original unquantized version.

No worries, I will check later

Noseu

May 4

Thanks!
Specifically, right now its
string: '3333+777?'
input tokens: [ '33':1644, '33':1644, '+':10, '777':15831, '?':30 ]

It should be
['333':8765, '3':18, '+':10, '777':15831, '?':30]

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment