Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
grimjim 
posted an update May 23
Post
1380
I use mergekit regularly, and often enough get acceptable results without performing fine-tuning afterward. My current thinking is that DARE-TIES should be avoided when merging dense models, as the process of thinning inherently punches holes in models.

I've had success using SLERP merges to graft Mistral v0.1 models with Mistral v0.2 models to obtain the context length benefits of the latter, and am looking forward to experimenting with Mistral v0.3, which recently dropped.

The Great Mistral Resistance!
Best wishes in your new rock smashing round.