Why does this model perform so poorly on DROP compared to OpenHermes?
#29
by
yahma
- opened
In the Huggingface Open LLM Leaderboard OpenChat performs really well on all the benchmarks except for DROP, where is scores 7.22 vs the 35.79 that OpenHermes-2.5-mistral scores.
Why such poor performance on DROP?
imone
changed discussion status to
closed