Spaces:
Running
on
CPU Upgrade
Old Evaluation Results Being Displayed
Hello,
I wanted to ask if we should expect to see the results from each evaluation run on the leaderboard, and if so, whether they will be highlighted as 'most recent run' or something similar. A couple of the models appear twice with different scores like upstage/llama-30b-instruct
and lilloukas/Platypus-30B
, while arielnlee/SuperPlatty-30B
and lilloukas/GPlatty-30B
only show the old evaluation results.
Hi! They should definitely not appear twice, this is a bug, thank you for reporting. @SaylorTwift has been working on cleaning the results dataset format (we have many files for several models, since the output of the leaderboard backend changed several times) - it should be fixed by the end of the week
Yes, yesterday we were in the process of rerunning all evals so some results appeared twice, this is being fixed, only the newest results will be shown in the leaderboard. Thanks for your feedback !
Thanks for the update!