Old Evaluation Results Being Displayed

#117
by lilloukas - opened

Hello,

I wanted to ask if we should expect to see the results from each evaluation run on the leaderboard, and if so, whether they will be highlighted as 'most recent run' or something similar. A couple of the models appear twice with different scores like upstage/llama-30b-instruct and lilloukas/Platypus-30B, while arielnlee/SuperPlatty-30B and lilloukas/GPlatty-30B only show the old evaluation results.

deleted

Same for other llama-based models. I think it's better if the old scores are not kept in the leaderboard UI.

Open LLM Leaderboard org

Hi! They should definitely not appear twice, this is a bug, thank you for reporting. @SaylorTwift has been working on cleaning the results dataset format (we have many files for several models, since the output of the leaderboard backend changed several times) - it should be fixed by the end of the week

Open LLM Leaderboard org

Yes, yesterday we were in the process of rerunning all evals so some results appeared twice, this is being fixed, only the newest results will be shown in the leaderboard. Thanks for your feedback !

Thanks for the update!

clefourrier changed discussion status to closed

Sign up or log in to comment