MMLU - 77
#3
by
orendar
- opened
New open-source SOTA!
Just ran 5-shot MMLU with lm-evaluation-harness, see results:
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
mmlu | N/A | none | 0 | acc | 0.7735 | ± | 0.0034 |
- humanities | N/A | none | 5 | acc | 0.7337 | ± | 0.0062 |
- other | N/A | none | 5 | acc | 0.8182 | ± | 0.0067 |
- social_sciences | N/A | none | 5 | acc | 0.8687 | ± | 0.0060 |
- stem | N/A | none | 5 | acc | 0.6958 | ± | 0.0078 |
It's even better than mistral-medium. Complete set of benchmarks - https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1/discussions/4