Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Cannot reproduce exact numbers (MTEB French retrieval) from the leaderboard
#142
by
pxyu
- opened
Hi there,
I am trying to verify if the numbers on the MTEB French retrieval leaderboard can be reproduced, using mGTE as an example. I am getting lower numbers on 2/5 datasets. Here is my code:
import mteb
from sentence_transformers import SentenceTransformer
model_name_or_path="Alibaba-NLP/gte-multilingual-base"
model = SentenceTransformer(model_name_or_path, trust_remote_code=True)
tasks = mteb.get_tasks(
tasks=["AlloprofRetrieval", "BSARDRetrieval", "MintakaRetrieval", "SyntecRetrieval", "XPQARetrieval"],
languages=["fra"],
eval_splits=['test'],
task_types=["retrieval"]
)
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(
model,
output_folder=f"results/{model_name_or_path}",
encode_kwargs = {'batch_size': 8}
)
for r in results:
print(r.task_name, r.scores['test'][0]['ndcg_at_10'], "\n")
Results I get:
AlloprofRetrieval 0.49213
BSARDRetrieval 0.19224
MintakaRetrieval 0.34703
SyntecRetrieval 0.83043
XPQARetrieval 0.67386
However, the nDCG@10 reported on AlloprofRetrieval and BSARDRetrieval are 53.64 and 26.11.
Can someone help clarify the issue here? Thank you!
@pxyu we do not take issues on this repository (see pinned issue)
However, please repost this on the mteb GitHub and I will make sure to tag the people responsible for the French leaderboard.
Oh sorry I didn't notice that.
pxyu
changed discussion status to
closed
No worries :)