Works pretty well. Fairly engaging. Seems to get stuck on a generalization sometimes.
I've actually found this to be a surprisingly good general model for a 13B in completely SFW things and it seems to stay SFW in anything that starts off SFW. I've struggled to find something as good as the 20B I used to use since I only have 16GB VRAM. This one can be fairly engaging and decent quality at writing a number of different styles (though it seems to be hard to get it to understand a medieval fantasy setting shouldn't have cellphones!) but it likes to keep getting stuck on repeating some generalization. Like "person feels uplifted by the recent successes and looks forward to future adventures" sort of statements will start to appear over and over. Even if I edit them out they keep coming back sometimes. I think they're a filler because they mostly show up on shorter responses as if it's trying specifically to keep them from being too small. I tried tweaking settings, but it doesn't really go away. All I can do is edit them out when they show up and then they disappear for a bit usually.
EDIT: Increasing repetition penalty range in particular seems to have helped with some of those things.
Still, I haven't found any other 13B that felt as engaging so far. I'd really like to see more done with this merge formula because I think it shows good promise for engaging writing.
I wish we could see something like this with GQA though. Man that KV buffer is murder. This 13B actually seems to handle 16K context really well (not as great as really big models I'm sure, but it does great for a small model) but at that size the KV buffer is actually larger than the model itself! (At least at reasonable quants.)
Thanks for the full feedback and edit, appreciate it!
I should probably add that I'm using TheBloke's GGUF quants. I think maybe there might have been some change since those were made because I tried quantifying it myself and I swear it acts very differently.