@grimjim on Hugging Face: "I use mergekit regularly, and often enough get acceptable results without…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

grimjim

posted an update May 23

Post

1380

I use mergekit regularly, and often enough get acceptable results without performing fine-tuning afterward. My current thinking is that DARE-TIES should be avoided when merging dense models, as the process of thinning inherently punches holes in models.

I've had success using SLERP merges to graft Mistral v0.1 models with Mistral v0.2 models to obtain the context length benefits of the latter, and am looking forward to experimenting with Mistral v0.3, which recently dropped.

Lewdiculous

May 25

The Great Mistral Resistance!
Best wishes in your new rock smashing round.

In this post

grimjim Jim Lai
Lewdiculous Lewdiculous (AetherArchitectural)