Baubo-11b-DARE
It's worse than pre-merge by A LOT
Found the problem, details below.
I mean it, this model is insane.
The other 11b models are useful, but worse than a good 7B - and thus worthless.
This model... it forms coherent sentences but NOT coherent thought.
Named after a pseudonym for Iambe, Baubo is a similar mix, but out of these odd 11b models. Let's see if dare_ties hammers out the (platonic) kinks, eh?
EDIT: It did not. Upon actually reading the whole paper rather than skimming it, (shocking idea, I know) I can tell you what happened here. As the authors say in the dang abstract, "we have also tried to remove fine-tuned instead of delta [changed by LoRA] parameters and find that a 10% reduction can lead to drastically decreased performance (even to 0.0)" But, my base model here is not actually the base model. Besides its bizzare size, it contains aspects of Alpaca... but not on every layer! Thus, confusion reigned as dares_ties saw some Alpaca formatting as being delta weights and others as being part of the base model. It fragged the poor thing's little network something fierce.
This also explains why the version of Imabe-20b-DARE that came out as the release was the iteration it was. The release Iambe-DARE has my BigLlama as the base model, and the other two had roleplaying SFTed models as the base. Looks like I have another version of Iambe to produce, this time with a proper base.
Recipe
merge_method: dare_ties
base_model: athirdpath/LilLlama-11b
model: athirdpath/U-Amethyst-11B
weight: 0.39 / density: 0.55
model: athirdpath/PsyMedRP-v1-11B
weight: 0.20 / density: 0.45
model: thirdpath/CleverGirl-11b
weight: 0.41 / density: 0.55
int8_mask: true
dtype: bfloat16
- Downloads last month
- 14