Relative success?
I think this model has achieved pretty well what you were aiming for. Conversations are in fact more fluid, less sloppy, characters are more emotional and give you some great reactions\responses to work with, it has some nice depth\nuance, it understands undertones \ reads the room and has some general understanding of what's contextually appropriate and possible. Can do and understand irony\sarcasm, can be evasive, can detect insinuation and answer to that instead of what was literally spoken. Doesn't shy away from being rude in some cases, characters don't behave one-dimensionally. It can handle some esoteric and intense characters well too; it tends to play them more grounded compared to many other models that instead go over the top with flamboyancy\ostentatiousness. Can reference relevant details from earlier messages and reach conclusions over between-the-lines stuff that forms in linkage between two different messages even with quite a few others in-between (that's wild).
A couple of issues. Narration capabilities are limited, so it's not a good fit for adventures. Sometimes struggles staying coherent, especially when there are a lot of moving elements or independent agents. Can also randomly become incoherent in just about any circumstance but in those dire cases regenerating the response mostly helps. And like all Nemo models it can be a bit too agreeable to jump into erotic stuff. One other thing is, sometimes the responses are too short. Which is better than too long, certainly, but it gets dry fast when you let it fall into this pattern. That's even with rep pen settings disabled! I'm talking like 30-token responses. It gave the best stuff when it wrote around 100t~300t, i wish it would steer itself to stay in that region by default. "Continue the last message" in SillyTavern helps, it's just not ideal compared to if it wasn't so often needed.
Overall, despite the flaws, i think there's something special in here. It's already nice enough to enjoy conversations and basic RP with some cards as is, just need to get patient with it with rerolls.
If you keep developing it to be even more human-like, i think it could benefit a lot from some in-character refusal examples; it already knows how to get mad at user and be rude\passive aggressive, disapproving, now it just needs to know how to be disagreeable and shut the user down when appropriate. Also, what do you think of this - (jondurbin/truthy-dpo-v0.1)?
Used Q8 quant. ChatML with disabled system prompt. Samplers: temp 0.7, min_p 0.05, top_A 0.2, smoothing factor 0.2.
Hi there, thank you for the feedback. In general I agree with everything you've mentioned. Currently responses trend short and I'm trying to fix that. Its not an exact science. As for adventure RP, can you provide some info on your system prompts or character cards used? I can work with them and include it in the RL dataset.
I'm not the best person to ask this. I don't use system prompts, so i can't be of help here. I also don't use cards that have any adventure specific rules embedded, it's just that sometimes the cards are written in a way that warrants taking the chat in a more complex direction which i'm used to calling adventure, but the only difference from regular rp for me is frequent location\circumstance changes, time skips, and inclusion of numerous fleeting side-characters. On Chub the cards under Dark Fantasy tag are usually written in a way that warrants doing it this way with them. Normal Fantasy too, but it's more diluted. I'll only link a few examples.
https://venus.chub.ai/characters/yoiiru/princess-darlene-dfc8c39e3984
https://venus.chub.ai/characters/yoiiru/colette-485968ab
https://venus.chub.ai/characters/NobodAlreadyExists/feuer-the-pyromancer-9982f85e
https://venus.chub.ai/characters/Anonymous/amara-fae7b1d3
https://venus.chub.ai/characters/NobodAlreadyExists/cleddyf-llym-sh-r-l-n-n-h-37c6e7331d09
Honestly 18 days later i've been using more and more of this model and my initial subdued feedback doesn't do this model justice. I just can't go back to my previous daily drivers anymore (violet twilight, base mistral small) because this model just tops it when it comes to realistic behavior and conversations. The slop is nonexistent, and it gives very different results between swipes and all are equally good (aside from those very-short occasional ones).
i even stood corrected on the "it's meh for adventures". My dumb ass came to the wrong conclusion initially. It might struggle like any other 12B model when there's way too much stuff to keep track of, but as far as advancing the plot goes, it does it well. It also can reach far back into the depths of the context when it might be relevant and not just ignore it, so high context chats (24-32k) are actually worth it.
The versatility of this model knows no bounds, it does literally whatever you throw at it. Plot-wise from casual vanilla stuff to heavy darkness to goofy nonsense to intellectual mahjong riichi duel with custom ruleset and plot twists (.......), it always stays afloat and gives you something interesting. This model is the only one in this range that not only doesn't bitch about card formatting, but can actually properly chew very heavy cards that are disgustingly formatted, those interview-style cards and story-of-my-life ones, which even gpt 4 may struggle with, this one can make sense of them and make them work. At the very worst, it just detects the archetype, even a complex one, and just plays into that, avoiding pitfalls by keeping it simple but convincing. Same goes for cards that lack detail, once again, because it's got such deep understanding of archetypes, it can breathe life into an otherwise boring card.
I noticed you've shadow-updated this repo, but i've still been sitting on that version from before, so even that one holds up.
But i've downloaded the alternative B version to compare a few days ago. And at first glance i couldn't tell the difference. But after several hours on different cards it started hitting me. Something didn't feel right after all. The happy-go-lucky bias was noticeable compared to the main branch. Maybe it's not exactly that, it's more like it's easier on the uptake, with enthusiasm... So that didn't sit right with me, thus i went back to this one after all.
Waiting on the gguf for the update to this branch. (Oh wait, someone just dropped Q8!) But have you considered pushing updated versions to separate repos? What if an update ends up a step backwards, you never know. Plus, it will be easier for quantizers to re-quantize your updates and not miss them.
It's a damn shame this model is flying under the radar. For an experiment, it's already toppling the whole 12B stage for me. Even 22B mistral small. That's no small feat.
Yes, I'm constantly updating this repo because its an experimental repo where I dump my results. I'm currently testing one to toss into a v0.1 as a sort of stable release.
The model is great, against other NEMOs it's a very quality breakthrough, but for me I found one point that disappointed. All the male characters on it turn into beggars when it comes to ERP, they don't act themselves and shift control of the scene to the user while using rather feminine phrases to motivate the user. This isn't exactly acceptable to me as a girl, but the fact that the model isn't throwing herself at me and directing all the action towards sex like a cat in spring makes me insanely happy.
The model is great, against other NEMOs it's a very quality breakthrough, but for me I found one point that disappointed. All the male characters on it turn into beggars when it comes to ERP, they don't act themselves and shift control of the scene to the user while using rather feminine phrases to motivate the user. This isn't exactly acceptable to me as a girl, but the fact that the model isn't throwing herself at me and directing all the action towards sex like a cat in spring makes me insanely happy.
This is definitely a blind spot for my RL dataset. I'll have to go out of my way to include some samples of this to ensure its balanced. Thank you for your feedback.
The model is great, against other NEMOs it's a very quality breakthrough, but for me I found one point that disappointed. All the male characters on it turn into beggars when it comes to ERP, they don't act themselves and shift control of the scene to the user while using rather feminine phrases to motivate the user. This isn't exactly acceptable to me as a girl, but the fact that the model isn't throwing herself at me and directing all the action towards sex like a cat in spring makes me insanely happy.
This is definitely a blind spot for my RL dataset. I'll have to go out of my way to include some samples of this to ensure its balanced. Thank you for your feedback.
This is a common problem with all NEMO models: at best they only have 3 male archetypes, and more than 7 female archetypes. I realize the reason is that the models are trained mostly by guys, so they don't care about data to create a male character, their training is geared towards creating female characters. And girls in RP sit on either GPT or Claude, there are few girls like me running local models.