Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Sean Cho
commited on
Commit
β’
94a1689
1
Parent(s):
a507ee8
update about
Browse files
src/assets/text_content.py
CHANGED
@@ -31,12 +31,13 @@ Please provide information about the model through an issue! π€©
|
|
31 |
|
32 |
## How it works
|
33 |
|
34 |
-
π We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM.
|
35 |
-
- Ko-HellaSwag (provided by __Upstage__)
|
36 |
-
- Ko-MMLU (provided by __Upstage__)
|
37 |
-
- Ko-Arc (provided by __Upstage__)
|
38 |
-
- Ko-Truthful QA (provided by __Upstage__)
|
39 |
-
|
|
|
40 |
|
41 |
GPUs are provided by __KT__ for the evaluations.
|
42 |
|
|
|
31 |
|
32 |
## How it works
|
33 |
|
34 |
+
π We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM. We have also added a new dataset prepared from scratch.
|
35 |
+
- Ko-HellaSwag (provided by __Upstage__, machine translation)
|
36 |
+
- Ko-MMLU (provided by __Upstage__, human translation and variation)
|
37 |
+
- Ko-Arc (provided by __Upstage__, human translation and variation)
|
38 |
+
- Ko-Truthful QA (provided by __Upstage__, human translation and variation)
|
39 |
+
- Ko-CommonGen V2 (provided by __Korea University NLP&AI Lab__, created from scratch)
|
40 |
+
To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing these elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from each evaluation datasets.
|
41 |
|
42 |
GPUs are provided by __KT__ for the evaluations.
|
43 |
|