MT-Bench Results
#8
by
0-hero
- opened
MT-Bench
Model | MT-Bench |
---|---|
Claude 3 Opus | 9.43 |
GPT-4-1106-Preview | 9.32 |
Claude 3 Sonnet | 9.18 |
WizardLM-2 8x22B | 9.12 |
GPT-4-0314 | 8.96 |
Mixtral-8x22B-Instruct-v0.1 | 8.66 |
zephyr-orpo-141b-A35b-v0.1 | 8.17 |
Matter-0.2-8x22B | 8.00 |
Nice!
It will be interesting to see more benchmark results here.
I guess Mixtral-8x22B-Instruct-v0.1 is better in multilingualty than WizardLM-2 8x22B.
Maybe merging them can work even better :)