Pala Tej Deep's picture
6 1

Pala Tej Deep

Tej3

AI & ML interests

None yet

Recent Activity

Organizations

Walled AI's profile picture

Tej3's activity

replied to RishabhBhardwaj's post 6 months ago
view reply

The backbone refers to the pretrained model used as the base model for fine-tuning the expert model.

For example, in the case of Wizard Models:

  • WizardLM-13B and WizardMath-13B are both fine-tuned from the llama2-13B model. Therefore, they can be effectively merged using Della, Dare, or TIES because they share the same backbone.

  • On the other hand, WizardCoder-13B is fine-tuned from the CodeLlama-13B-Python model. Since WizardCoder uses a different base model (backbone) compared to WizardLM-13B and WizardMath-13B, merging these three models effectively using Della, Dare, or TIES is not feasible.

To summarize, the backbone is the underlying pretrained model that serves as the starting point for fine-tuning. It is crucial in the merging process because models fine-tuned from different backbones may not merge effectively due to the differences in their initial pretrained weights and configurations.

updated a Space about 1 year ago
updated a Space over 1 year ago