Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
lilaczheng
commited on
Commit
•
84387ca
1
Parent(s):
572c5cf
Update src/display/about.py
Browse filesIntroduction of the new clcc dataset
- src/display/about.py +1 -1
src/display/about.py
CHANGED
@@ -40,7 +40,7 @@ We evaluate models on 7 key benchmarks using the <a href="https://github.com/Ele
|
|
40 |
- <a href="https://arxiv.org/abs/2110.14168" target="_blank"> GSM8k </a> (5-shot) - diverse grade school math word problems to measure a model's ability to solve multi-step mathematical reasoning problems.
|
41 |
- <a href="https://flageval.baai.ac.cn/#/taskIntro?t=zh_qa" target="_blank"> C-SEM </a> (5-shot) - Semantic understanding is seen as a key cornerstone in the research and application of natural language processing. However, there is still a lack of publicly available benchmarks that approach from a linguistic perspective in the field of evaluating large Chinese language models.
|
42 |
- <a href="https://arxiv.org/abs/2306.09212" target="_blank"> CMMLU </a> (5-shot) - CMMLU is a comprehensive evaluation benchmark specifically designed to evaluate the knowledge and reasoning abilities of LLMs within the context of Chinese language and culture. CMMLU covers a wide range of subjects, comprising 67 topics that span from elementary to advanced professional levels.
|
43 |
-
|
44 |
For all these evaluations, a higher score is a better score.
|
45 |
We chose these benchmarks as they test a variety of reasoning and general knowledge across a wide variety of fields in 0-shot and few-shot settings.
|
46 |
|
|
|
40 |
- <a href="https://arxiv.org/abs/2110.14168" target="_blank"> GSM8k </a> (5-shot) - diverse grade school math word problems to measure a model's ability to solve multi-step mathematical reasoning problems.
|
41 |
- <a href="https://flageval.baai.ac.cn/#/taskIntro?t=zh_qa" target="_blank"> C-SEM </a> (5-shot) - Semantic understanding is seen as a key cornerstone in the research and application of natural language processing. However, there is still a lack of publicly available benchmarks that approach from a linguistic perspective in the field of evaluating large Chinese language models.
|
42 |
- <a href="https://arxiv.org/abs/2306.09212" target="_blank"> CMMLU </a> (5-shot) - CMMLU is a comprehensive evaluation benchmark specifically designed to evaluate the knowledge and reasoning abilities of LLMs within the context of Chinese language and culture. CMMLU covers a wide range of subjects, comprising 67 topics that span from elementary to advanced professional levels.
|
43 |
+
- <a href="https://flageval.baai.ac.cn/#/taskIntro?t=zh_oqa"> CLCC </a> - CLCC is prepared by trained undergraduate or graduate students in different disciplines based on the FlagEval competency dimensions.
|
44 |
For all these evaluations, a higher score is a better score.
|
45 |
We chose these benchmarks as they test a variety of reasoning and general knowledge across a wide variety of fields in 0-shot and few-shot settings.
|
46 |
|