Only with Tess-v1.3
I would try one with just Tess-M-v1.3.
Tess-M-v1.2 is very good, but have some repetition issues at very long context.
I sorted it out with Tess-M-v1.3: https://migel.substack.com/p/testing-tess-m-v13
Yeah I read the post! I actually did a pure 1.2 merge before 1.3 came out, and it didn't seem to repeat or output json umprompted in my quick tests (out to about 60K).
https://huggingface.co/brucethemoose/Capybara-Tess12-Yi-34B-200K-DARE
The merge is not straightforward, some of the model weights get trimmed... Hence I suspect this won't repeat since 1.2 was only merged in at ~1/5 weight with less density. I was already thinking I would remerge with just 1.3 if it repeats like Tess 1.2, or maybe turn the density way down.
Also, @migtissera while you are here, have you considered training Tess 1.3 on top of nous capybara?
Maybe even applying the existing Qlora would work well? Before any 200ks came out, regular 32K Yi loras worked well enough on the 200K base model.
I think you were right, I swapped Tess out for an airoboros 200K finetune.