Why does this model perform so poorly on DROP compared to OpenHermes?

#29

by yahma - opened Nov 23, 2023

Nov 23, 2023

In the Huggingface Open LLM Leaderboard OpenChat performs really well on all the benchmarks except for DROP, where is scores 7.22 vs the 35.79 that OpenHermes-2.5-mistral scores.
Why such poor performance on DROP?

imone

OpenChat org Nov 24, 2023

Probably because open llm leaderboard doesn't use conversation templates and CoT, see discussion of gsm8k here. We manually tested the DROP examples and it worked really well.

imone changed discussion status to closed Nov 25, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment