Training data and code

by kardosdrur - opened 8 days ago

8 days ago

Hi, I'm currently annotating training data and metadata for models in MTEB for the new leaderboard.
We would love to know a) what you trained your model on, b) how your training procedure is different from others (possibly including code).
Is there by any chance a technical report/paper on its way? Can your users get some of this information in your readme in the meanwhile?

yibinlei

Owner 8 days ago

Hey @kardosdrur ,

Thank you for your great efforts in maintaining MTEB! Our model was trained on the cfli/bge-full-data data. Our report is available at Enhancing Lexicon-Based Text Embeddings with Large Language Models. I apologize for missing this in the README and will update it now. We also plan to release the training code in the future. Thank you again for all your efforts on the great MTEB!

kardosdrur

8 days ago

Thanks for getting back to me about this so quickly! I will include this information in our model metadata then

yibinlei

Owner 8 days ago

Thank you Márton!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment