Something isn't working for me.
Not sure if the same issues as with Aurora, but I'm getting this when converting, both to FP16 and BF16:
INFO:hf-to-gguf:Loading model: Puppy_Purpose_0.69
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 32
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 2562, in <module>
main()
File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 2547, in main
model_instance.set_vocab()
File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 1288, in set_vocab
self. _set_vocab_sentencepiece()
File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 583, in _set_vocab_sentencepiece
tokenizer.LoadFromFile(str(tokenizer_path))
File "C:\Users\User\scoop\apps\miniconda3\current\envs\conver\Lib\site-packages\sentencepiece\__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: C:\b\abs_f7cttiucvr\croot\sentencepiece_1684525347071\work\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized. Size())]
---
INFO:hf-to-gguf:Loading model: Puppy_Purpose_0.69
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model tokenizer
Traceback (most recent call last):
File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 2562, in <module>
main()
File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 2547, in main
model_instance.set_vocab()
File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 1288, in set_vocab
self. _set_vocab_sentencepiece()
File "D:\conda\llama.cpp\convert-hf-to-gguf.py", line 583, in _set_vocab_sentencepiece
tokenizer.LoadFromFile(str(tokenizer_path))
File "C:\Users\User\scoop\apps\miniconda3\current\envs\conver\Lib\site-packages\sentencepiece\__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Internal: C:\b\abs_f7cttiucvr\croot\sentencepiece_1684525347071\work\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized. Size())]
This is the exact error i get on EVERY model, I have no clue what causes it. @Lewdiculous
This model never saw local hardware, it was completely created in mergekit gui on HF.
I just did cgato/L3-TheSpice-8b-v0.8.3 and that one doesn't have this issue, ugh.
Are some of the configs modified compared to what its expecting, looks like it's not sure which tokenizer to select.
I am using the llama-bpe files fetched from convert-hf-to-gguf-update.py.
https://files.catbox.moe/45uddo.zip
I remember also having issues with ResplendentAI/Aurora_l3_8B, not sure if exactly these but...
@SolidSnacke have you run into this with some models?
Surprised that fetching the files now still gave the result in the left. Might be from base model. Didn't have to check so far tbh.
Honestly, I tried with both the llama-bpe config files and the original repo files and the result was the same.
Yea I've had that issue before as well when doing merges.
Wait so what did I do wrong? All of the constituent models had been quanted before...
Ill have to tamper around with this tomorrow when i have time after work.
Wait so what did I do wrong? All of the constituent models had been quanted before...
nothing that i can tell.
Working so far for me:
cgato/L3-TheSpice-8b-v0.8.3
NeverSleep/Llama-3-Lumimaid-8B-v0.1-OAS
As examples, not sure if it applies at this stage.
Working so far for me:
cgato/L3-TheSpice-8b-v0.8.3
NeverSleep/Llama-3-Lumimaid-8B-v0.1-OASAs examples, not sure if it applies at this stage.
Both appear to be using the outdated version like on the left example you sent above.
Well then. We did try with the new and previous version.
Maybe we wait for clarifications on this then:
https://github.com/ggerganov/llama.cpp/issues/7129
Converting to GGUF worked for me after deleting tokenizer.model
. I can upload some quants if you want.
Eeh. IT ACTUALLY WORKS! Huge, nbeer! Do you want to upload Puppy_Purpose_0.69 then? :D - But I don't mind doing it too.
No go for it! Glad I could help :)
Turns out we're good to go. Hurray!
llm_load_print_meta: model size = 14.96 GiB (16.00 BPW)
llm_load_print_meta: general.name = Puppy_Purpose_0.69
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128009 '<|eot_id|>'
llm_load_print_meta: LF token = 128 'βΓ€'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
I see you've been cooking many experiments, anything hot so far in your opinion?
Update:
I might fall asleep before uploading, if that happens it will come in the morning.
Update:
It's all uploaded.
Thank you so much @nbeerbower I've been trying to keep all model files together since sometimes NOT having them causes errors, never thought including an official Meta Llama 3 file would break anything.
@Lewdiculous Not really, I just shit out a bunch of models on intuition and let others test them π
tbf, RP isn't my main goal and I don't want to use llama in production for licensing reasons... but I've been focused on improving chatML support and leaderboard performance (llama3 seems to really like learning languages)
@jeiku no prob m8, I spent hours this morning trying to get llama3 quants working and the tokenizer is definitely a huge pain in the ass lol