Q8 quant files are incorrect. They are same files as the Q6 ones.
The hashes and file sizes are identical here on HuggingFace.
Also downloaded the Q8 files and confirmed that they are Q6 in llama.cpp.
Hey there. thanks for reaching out - can confirm the Q8_0 was erroneously listed as the Q6_K. My apologies, on closer inspection I made a mistake running the gguf -split script that I didn't catch until your comment (but truthfully the file sizes being the same should have clued me in before uploading).
Just quantized the Q8_0 again with the importance matrix and it's loading correctly now:
Removing the incorrect quants and uploading the proper Q8_0 to the repo right now in case you want to give it another chance - apologies for any inconvenience and appreciate you letting me know.
No apologies needed. Your work is much appreciated.
I noticed that the model card references using this imatrix: https://huggingface.co/jukofyork/WizardLM-2-8x22B-imatrix
and that imatrix was recently updated (2 days ago) (https://github.com/ggerganov/llama.cpp/pull/7099) which was after your initial upload of these files (6-7 days ago).
Is the fresh Q8_0 quant using the older imatrix or the more recent one?
Great question - I just realized I hadn't uploaded the .imatrix file I created for these yet. Thank you for keeping me honest as I definitely need to be better about updating my repos with these. The weighted quants here were created with an .imatrix file that was calculated using Llama-3-70B-Instruct-Storywriter Q8_0 and groups_merged.txt over 88 chunks. If you want to take a look here's the link to the file.
For reference they were created with llama.cpp build b2774 from about a week ago, including the newly re-uploaded Q8_0's for consistency's sake. The model card's intention was merely to give credit to where I found the process but I can see how not having the file in the repo would make things confusing.
Hopefully this clears things up, let me know if there's anything else I can answer, I appreciate your support and feedback.