InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF · Q8 quant files are incorrect. They are same files as the Q6 ones.

May 8

The hashes and file sizes are identical here on HuggingFace.
Also downloaded the Q8 files and confirmed that they are Q6 in llama.cpp.

InferenceIllusionist

Owner May 8

Hey there. thanks for reaching out - can confirm the Q8_0 was erroneously listed as the Q6_K. My apologies, on closer inspection I made a mistake running the gguf -split script that I didn't catch until your comment (but truthfully the file sizes being the same should have clued me in before uploading).

Just quantized the Q8_0 again with the importance matrix and it's loading correctly now:

Removing the incorrect quants and uploading the proper Q8_0 to the repo right now in case you want to give it another chance - apologies for any inconvenience and appreciate you letting me know.

wecadev

May 8

No apologies needed. Your work is much appreciated.
I noticed that the model card references using this imatrix: https://huggingface.co/jukofyork/WizardLM-2-8x22B-imatrix
and that imatrix was recently updated (2 days ago) (https://github.com/ggerganov/llama.cpp/pull/7099) which was after your initial upload of these files (6-7 days ago).
Is the fresh Q8_0 quant using the older imatrix or the more recent one?

InferenceIllusionist

Owner May 9

Great question - I just realized I hadn't uploaded the .imatrix file I created for these yet. Thank you for keeping me honest as I definitely need to be better about updating my repos with these. The weighted quants here were created with an .imatrix file that was calculated using Llama-3-70B-Instruct-Storywriter Q8_0 and groups_merged.txt over 88 chunks. If you want to take a look here's the link to the file.
For reference they were created with llama.cpp build b2774 from about a week ago, including the newly re-uploaded Q8_0's for consistency's sake. The model card's intention was merely to give credit to where I found the process but I can see how not having the file in the repo would make things confusing.
Hopefully this clears things up, let me know if there's anything else I can answer, I appreciate your support and feedback.