[Model Discussion]
#1
by
Darkknight535
- opened
Hey, thanks for the 8B model.
The issue with the 8B model was that it was losing coherency because it was an 8B model. That's why I first converted it into 12B variants and then merged them. This approach allows it to utilize the full layers of a specific model (full 8B) while adding the remaining layers from another model on top of it. I tried using the 8B model myself, but ultimately decided to remove it.
Cheers!