Training data and code

#1
by kardosdrur - opened

Hi, I'm currently annotating training data and metadata for models in MTEB for the new leaderboard.
We would love to know a) what you trained your model on, b) how your training procedure is different from others (possibly including code).
Is there by any chance a technical report/paper on its way? Can your users get some of this information in your readme in the meanwhile?

Hey @kardosdrur ,

Thank you for your great efforts in maintaining MTEB! Our model was trained on the cfli/bge-full-data data. Our report is available at Enhancing Lexicon-Based Text Embeddings with Large Language Models. I apologize for missing this in the README and will update it now. We also plan to release the training code in the future. Thank you again for all your efforts on the great MTEB!

Thanks for getting back to me about this so quickly! I will include this information in our model metadata then

Thank you Márton!

Sign up or log in to comment