Feedback
If my reddit account wasn't suspended for some reason - I'd recommend everyone this model
Right now im using IQ2_M quant with 24k 4bit context and this is the best model i have tried for roleplay(Can't run Behemoth due to its size)
Initially in L2 times i used 7B, 9B and 11B models via exllama. Then tried Mixtral and was fashionated by MoE capabilities and speed so much, that made a lot of frankenmoe merges in L3 times. Then for a month or two stuck to "Free API" CommandR+ and sometimes use it even now
Now bought myself a P40 and with my weird 36GB VRAM RTX3060+P40 setup decided to try big models, initially some from Sao but his models were too focused on eRP(or that's just my settings were shitty), then tried Magnum 72B V2 and was amazed for it's quality, when Magnum 72B V4 came out i sticked to it for some time before trying Nemotron 70B and it's finetunes like Nautilus and Sunfall. Honestly, wasn't impressed by Nautilus but Sunfall was great and amazed me almost every day, so it became my daily driver
However even IQ2_M quant of Endurance 100B amazed me by it's quality, first, it has much higher emotional intelligence, i haven't seen a model better for a good drama. Second, it's capability at remember things, i often want to see how LLM understands what's happening in the roleplay and write something like "Stop the roleplay, analysis it, write an essay, split it at subtopics" or "Stop the roleplay, analysis {{user}}, write an essay, split it at subtopics". CommandR+ was the only one who could break out of character and write analysis without any crutches on first try, Nemotron and it's finetunes were also great at this, however Endurance's analysis was something else, not only first try breaking out of character but also it's analytical capabilities were the best among every model i tried for RP purposes... and im not even sure im using fitting settings.
The only "bad" thing i can say is that my speed dropped by ~1t/s compared to ~70B models, but that was expected, if using row split wouldn't turn my output into garbage maybe that would be better, anyway I'll try row split later, koboldcpp updated many times so i have hope
Well, what can i say... Bravo, just bravo
Hm, maybe I should try this too?
I have similar configuration - P40 + 1080 ti 11gb.
I though it will not fit, but since you're fitting it in 36 gb and it works good, them I will try now too.
Thanks for the recommendation. And also, if this model really is so good, than it's amazing that we have something for that VRAM range.
Probably you wouldn't be able to run 24k context with 1GB of VRAM less, try 16k with koboldcpp or run 18~20k context by using llamacpp
However it seems like model has some problems with monster girls, for example it describe naga to have "human body above her neck", not just upper body, however more detailed description in card/persona solved the problem. Maybe model needs additional healing on larger dataset? Or general fine-tuning before RP tune? What are fitting settings for Endurance?