R136a1
/

InfinityKumon-2x7B-GGUF

nsfw

Not-For-All-Audiences

Inference Endpoints

Model card Files Files and versions Community

InfinityKumon-2x7B-GGUF / README.md

R136a1's picture

Create README.md

0a3a492 verified 8 months ago

|

history blame contribute delete

1.3 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- nsfw
	- not-for-all-audiences
	- roleplay
	---

	## InfinityKumon-2x7B

	![InfinityKumon-2x7B](https://cdn.discordapp.com/attachments/843160171676565508/1222560876578476103/00000-3033963009.png?ex=6616a98b&is=6604348b&hm=6434f8a16f22a3515728ab38bf7230a01448b00e6136729d42d75ae0374e5802&)

	GGUF - Imatrix quant of [InfinityKumon-2x7B](https://huggingface.co/R136a1/InfinityKumon-2x7B)

	Another MoE merge from [Endevor/InfinityRP-v1-7B](https://huggingface.co/Endevor/InfinityRP-v1-7B) and [grimjim/kukulemon-7B](https://huggingface.co/grimjim/kukulemon-7B).

	The reason? Because I like InfinityRP-v1-7B so much and wondering if I can improve it even more by merging 2 great models into MoE.


	## Perplexity

	Using llama.cpp/perplexity with private roleplay dataset.

	\| Format \| PPL \|
	\| --- \| --- \|
	\| FP16 \| 3.1748 +/- 0.11928 \|
	\| Q8_0 \| 3.1734 +/- 0.11935 \|
	\| Q6_K \| 3.1752 +/- 0.11899 \|
	\| Q5_K_M \| 3.1731 +/- 0.11892 \|
	\| IQ4_NL \| 3.1752 +/- 0.11943 \|
	\| IQ3_M \| 3.1773 +/- 0.11528 \|
	\| Q2_K \| 3.2309 +/- 0.11996 \|

	I don't really recomend using Q2_K based on the ppl, the other quants are fine.

	### Prompt format:
	Alpaca or ChatML

	Switch: [FP16](https://huggingface.co/R136a1/InfinityKumon-2x7B) - [GGUF](https://huggingface.co/R136a1/InfinityKumon-2x7B-GGUF)