Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Language distribution in MTEB
#83
by
maiia-bocharova
- opened
Hello, I am writing a paper for my PhD (on text embeddings for Ukrainian language) and I want to include information about language distribution in MTEB (maybe token per language) into my paper. How can I get such statistics? I did not find anything apart from number of languages in the official MTEB paper.
Can you please help?
Hi @Maiia, I believe Ukrainian is only included in the bitext mining tasks. You can easily search the GitHub repo and see it here:
https://github.com/search?q=repo%3Aembeddings-benchmark%2Fmteb+%22uk%22&type=code