Rhea-72b-v0.5-Q6_K.gguf broken?

by charltonh - opened Mar 27

Mar 27

•

Cannot run Rhea-72b-v0.5-Q6_K.gguf on latest oobabooga/text-generation-webui (git pull as of 3/26/2024).
Get the following:

Model metadata: {'tokenizer.ggml.padding_token_id': '151643', 'tokenizer.ggml.unknown_token_id': '151643', 'tokenizer.ggml.eos_token_id': '151643', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.architecture': 'llama', 'llama.rope.freq_base': '1000000.000000', 'llama.context_length': '32768', 'general.name': 'models', 'llama.vocab_size': '152064', 'general.file_type': '18', 'llama.embedding_length': '8192', 'llama.feed_forward_length': '24576', 'llama.attention.layer_norm_rms_epsilon': '0.000001', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '151643', 'llama.attention.head_count': '64', 'llama.block_count': '80', 'llama.attention.head_count_kv': '64'}
Using fallback chat format: None
21:35:16-400166 INFO     LOADER: "llama.cpp"
21:35:16-426248 INFO     TRUNCATION LENGTH: 32768
21:35:16-426879 INFO     INSTRUCTION TEMPLATE: "Alpaca"
21:35:16-427465 INFO     Loaded the model in 52.12 seconds.
terminate called after throwing an instance of 'std::out_of_range'
  what():  _Map_base::at
Aborted

bartowski

Owner Mar 27

Downloading to test

bartowski

Owner Mar 27

Do you fail on loading or on generation? I was able to load it (OOMed when trying to generate lol, trying again now)

bartowski

Owner Mar 27

seems they're all core dumping as soon as i try to generate, hmm.. that's super odd, i wonder what went wrong, I'll have to pull this and investigate, sorry about all your bandwidth you wasted :')

bartowski

Owner Mar 27

@charltonh okay good news, figured it out, remaking them all now!

charltonh

Mar 28

@charltonh okay good news, figured it out, remaking them all now!

Cool deal! Can't wait to try it.
Please do a quick test first before putting them out there, but thank you in advance!

bartowski

Owner Mar 28

https://huggingface.co/bartowski/Rhea-72b-v0.5-GGUF

I tested Q2 and it worked, in hindsight I should have kept a bigger one to test as well so if you don't want to try I'll update in the morning if that worked too