Post
1380
I use mergekit regularly, and often enough get acceptable results without performing fine-tuning afterward. My current thinking is that DARE-TIES should be avoided when merging dense models, as the process of thinning inherently punches holes in models.
I've had success using SLERP merges to graft Mistral v0.1 models with Mistral v0.2 models to obtain the context length benefits of the latter, and am looking forward to experimenting with Mistral v0.3, which recently dropped.
I've had success using SLERP merges to graft Mistral v0.1 models with Mistral v0.2 models to obtain the context length benefits of the latter, and am looking forward to experimenting with Mistral v0.3, which recently dropped.