Pala Tej Deep

Tej3

Tej-Deep

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

liked a model 3 days ago

declare-lab/Emma-X

upvoted a paper about 1 month ago

M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

View all activity

Organizations

Tej3's activity

upvoted a paper 3 days ago

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

Paper • 2412.11974 • Published 4 days ago • 6

liked a model 3 days ago

declare-lab/Emma-X

Updated 3 days ago • 6 • 5

upvoted a paper about 1 month ago

M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding And A Retrieval-Aware Tuning Framework

Paper • 2411.06176 • Published Nov 9 • 44

upvoted a paper 4 months ago

WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models

Paper • 2408.03837 • Published Aug 7 • 17

replied to RishabhBhardwaj's post 6 months ago

The backbone refers to the pretrained model used as the base model for fine-tuning the expert model.

For example, in the case of Wizard Models:

WizardLM-13B and WizardMath-13B are both fine-tuned from the llama2-13B model. Therefore, they can be effectively merged using Della, Dare, or TIES because they share the same backbone.
On the other hand, WizardCoder-13B is fine-tuned from the CodeLlama-13B-Python model. Since WizardCoder uses a different base model (backbone) compared to WizardLM-13B and WizardMath-13B, merging these three models effectively using Della, Dare, or TIES is not feasible.

To summarize, the backbone is the underlying pretrained model that serves as the starting point for fine-tuning. It is crucial in the merging process because models fine-tuned from different backbones may not merge effectively due to the differences in their initial pretrained weights and configurations.

upvoted a paper 6 months ago