--- license: mit language: - en --- For the bandwidth limited ones <3 # GGUFs for [HanNayeoniee/LHK_DPO_v1](https://huggingface.co/HanNayeoniee/LHK_DPO_v1) For a general representation of how quantization level influences output quality, check any model card from TheBloke, or [see this table](https://docs.faraday.dev/models/choose-model#size-vs-perplexity-tradeoff). Note those benchmarks were done on Llama models, and are probably not recent. Also I don't know how the MOE architecture influences those results but you got the idea! So about the model, I just played with it 40min so far (Q5_K_M, ChatML template, [TGWUI](https://github.com/oobabooga/text-generation-webui), ratherly short context size) but from what I saw, this model was really impressive 👏 I should rather say quite astonishing! [Edit: every quants are now tested and validated] The coherence seems remarkably well maintained. To illustrate, [see this sequence of interactions](https://bin.0xfc.de/?3110c74187a4b1f6#9qZMtmnmqeTrVrsoUsf37a7H39uXJvizRcpFdCf2yokS) with the model. [HanNayeoniee/LHK_DPO_v1](https://huggingface.co/HanNayeoniee/LHK_DPO_v1) was trained via Direct Preference Optimization(DPO) from [TomGrc/FusionNet_7Bx2_MoE_14B](https://huggingface.co/TomGrc/FusionNet_7Bx2_MoE_14B). Thanks for the community and sincere congrats to HanNayeoniee and TomGrc!