The model card is wrong base_model
Code will be released on Saturday widely and available on a HF space... for everyone to have fun with it..
But so far.. this model was based on 34Beagles and not bagel-34.
Model Plagiarism Detector marks:
- bagel-34: 38.44%
- 34Beagles: 44.63%
It uses UNA technology. It doesnt follow ref/citation neither licensing hierarchy. And it is a merge, maybe not a mergekit one.. but definitively is a merge of 34Beagles and at least 2 more models, one of them also bagel-34.
Model Predominance marks:
- bagel-34: 28.17%
- 34beagles: 47.85%
Hey fblgit, thanks for your comment.
We see that you have a method that is called UNA but as far as we are aware, you have not released any information about it. As such, it is not possible for us to have plagiarised your method.
We welcome UNA as an alternative to our approach! Indeed, I believe you have already applied UNA for one of our previous models: https://huggingface.co/abacusai/UNA-MM-Orc-Vic-bagel-34b-c1000
However, Smaug is not built on that UNA-d model at all.
We can also certainly state that this model is NOT a merge in any way, and has been achieved using only SFT and DPO, and our training tricks. And we can emphatically restate that we started from bagel-34 and have not directly utilised 34Beagles.
As we have already stated, we will be releasing the full datasets and methodology involved shortly so that the community can replicate our approach fully.
Look, I won't go into debate. I do tell you how the neural network of this model is inside.
Numerically your statement is not possible.. you cant get bagel-34 with a 38.44% similarity over raw parameters vectors values..and do SFT & DPO with your own data and then increase 10% (3 BILLIONS) parameters long-float layer-index-tensor-indices similarities.. that is .. not possible XD
Both models claim to be derived from the exact same base model. Identical training will yield fined-tuned models that are very similar. Similar training will yield fine-tuned models that are more similar to each other relative to the base model.
How numerically implausible is it to push 10% of the parameters toward a sibling model while the other 90% of the parameters remain unchanged or move away?
Regardless, more details are in order to give credit where credit is due.
Smaug-34B-v0.1
states that used jondurbin/bagel-34b-v0.2
On that combination, the NeuralSimilarity is 38.44%
NeuralSimilarity vs jondurbin/nontoxic-bagel-34b-v0.2
is 45.39%
NeuralSimilarity vs jondurbin/bagel-dpo-34b-v0.2
is 45.37%
NeuralSimilarity vs 34Beagles
is 44.63%
NeuralSimilarity vs jondurbin/bagel-34b-v0.2
is 38.44%
Per se, this model was trained with jondurbin/nontoxic-bagel-34b-v0.2 and not jondurbin/bagel-34b-v0.2
Mistery solved, no missing citation.. just a wrong link on the readme :)
Hey - thanks for your analysis @fblgit . But again we can confirm that we did indeed train with jondurbin/bagel-34b-v0.2. Perhaps your method is not quite detecting the right thing? Would you be able to publish/describe your method?
We can certainly look into that if there is strong community interest, though we have other versions of Smaug in mind and under training atm :). In any case, we plan to release all methods and dataset in the course of the next few weeks so the community is welcome to replicate on alternate base models.