Pala Tej Deep
AI & ML interests
Recent Activity
Organizations
Tej3's activity
The backbone refers to the pretrained model used as the base model for fine-tuning the expert model.
For example, in the case of Wizard Models:
WizardLM-13B and WizardMath-13B are both fine-tuned from the llama2-13B model. Therefore, they can be effectively merged using Della, Dare, or TIES because they share the same backbone.
On the other hand, WizardCoder-13B is fine-tuned from the CodeLlama-13B-Python model. Since WizardCoder uses a different base model (backbone) compared to WizardLM-13B and WizardMath-13B, merging these three models effectively using Della, Dare, or TIES is not feasible.
To summarize, the backbone is the underlying pretrained model that serves as the starting point for fine-tuning. It is crucial in the merging process because models fine-tuned from different backbones may not merge effectively due to the differences in their initial pretrained weights and configurations.