Nexesenex/Mistral-Large-Instruct-2407-iMat-CQ-GGUF

Custom Quants for MistralAI Mistral Large v2 123b

IQ4_XXSR, basically IQ4_XS with attn_q in IQ3_S, and attn_v in Q6_K, and token_embed in Q6_0. Yes, you did read correctly, the last traditional quant of Ikawrakow, not available on Llama.cpp mainline.

WARNING : Compatible with IK_Llama.cpp and Croco.cpp (my fork of the great KoboldCpp) only. I'll release .exe soon, but it works already (at least on Windows) for those who can compile. https://github.com/Nexesenex/croco.cpp

Overall, maybe it's time for the Llama.cpp team to have a look at Ikawrakow's last work and offer terms of cooperation with him, so we can enjoy once again SOTA quants in Llama.cpp. https://github.com/ikawrakow/ik_llama.cpp

Because the situation is becoming grotesque : we are quantizing massively models with non-SOTA quants while there is better in reach. Thousands of terabytes of storage space, our compute and our time is wasted because of this situation.