Spaces:

upstage
/

open-ko-llm-leaderboard

Running on CPU Upgrade

Sean Cho commited on Sep 21, 2023

Commit

94a1689

•

1 Parent(s): a507ee8

update about

Files changed (1) hide show

src/assets/text_content.py CHANGED Viewed

@@ -31,12 +31,13 @@ Please provide information about the model through an issue! 🤩
 ## How it works
-📈 We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM.
-- Ko-HellaSwag (provided by __Upstage__)
-- Ko-MMLU (provided by __Upstage__)
-- Ko-Arc (provided by __Upstage__)
-- Ko-Truthful QA (provided by __Upstage__)
-To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing four elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from the four evaluation datasets.
 GPUs are provided by __KT__ for the evaluations.

 ## How it works
+📈 We have set up a benchmark using datasets translated into Korean from the four tasks (HellaSwag, MMLU, Arc, Truthful QA) operated by HuggingFace OpenLLM. We have also added a new dataset prepared from scratch.
+- Ko-HellaSwag (provided by __Upstage__, machine translation)
+- Ko-MMLU (provided by __Upstage__, human translation and variation)
+- Ko-Arc (provided by __Upstage__, human translation and variation)
+- Ko-Truthful QA (provided by __Upstage__, human translation and variation)
+- Ko-CommonGen V2 (provided by __Korea University NLP&AI Lab__, created from scratch)
+To provide an evaluation befitting the LLM era, we've selected benchmark datasets suitable for assessing these elements: expertise, inference, hallucination, and common sense. The final score is converted to the average score from each evaluation datasets.
 GPUs are provided by __KT__ for the evaluations.