Good model
Merge you did works spectacularly, good work! Could you also apply the same method to Sao10K/L3-8B-Niitama-v1 ?
I believe that the dataset Sao10K uses lacks stop tokens, model REALLY wants to constantly extend it's output and spit out 1k tokens of inconsequential and unnecessary text.
model REALLY wants to constantly extend it's output and spit out 1k tokens of inconsequential and unnecessary text.
I haven't encountered this a single time when testing Niitama. If it's happening every single time, then there's probably something wrong with your setup.
I still have to give Niitama a proper evaluation to see what I do and don't like about it, but it seems pretty good so far, so it will likely be part of a future merge. In the meantime, you can check out a merge I just uploaded recently: https://huggingface.co/HiroseKoichi/Llama-3-8B-Stroganoff; GGUFs are linked in the model page.
I tested Niitama a bit more, and the responses frequently get more lengthy than I would like, but I've only had two instances where the token count was pushing 1K in a single response.
I did try out Stroganoff merge you did, it stops just right in comparison to Niitama or Lunaris. It's pretty good. I also completely support your method of putting coherency first. Afterall, it's meaningless to do inference with an braindead erotic word printer.
My further testing on Niitama suggest that, increasingly lengthy responses mostly occur during RP / using the action speech format, response length pretty much stays stable in other tasks, suggesting that there is a problem in the dataset/s related to those.
But a merge like yours seems to reteach proper stop token usage.