Spaces:
Running
Running
Update src/about.py
Browse files- src/about.py +27 -30
src/about.py
CHANGED
@@ -18,45 +18,42 @@ This leaderboard is developed in collaboration with <a href="https://www.scb10x.
|
|
18 |
"""
|
19 |
|
20 |
LLM_BENCHMARKS_TEXT = f"""
|
21 |
-
Evaluations
|
22 |
The leaderboard currently consists of the following benchmarks:
|
23 |
-
- Exam
|
24 |
- <a href="https://huggingface.co/datasets/scb10x/thai_exam">ThaiExam</a>: ThaiExam is a Thai language benchmark based on examinations for high-school students and investment professionals in Thailand.
|
25 |
-
- <a href="https://arxiv.org/abs/2306.05179">M3Exam</a>: M3Exam is a novel benchmark sourced from
|
26 |
-
- LLM
|
27 |
-
- Thai MT-Bench
|
28 |
-
- NLU
|
29 |
-
- <a href="https://huggingface.co/datasets/facebook/belebele">Belebele</a>: Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants
|
30 |
-
- <a href="https://huggingface.co/datasets/facebook/xnli">XNLI</a>: XNLI is an evaluation corpus for language transfer and cross-lingual sentence classification in 15 languages.
|
31 |
-
- <a href="https://huggingface.co/datasets/cambridgeltl/xcopa">XCOPA</a>: XCOPA is a
|
32 |
-
- <a href="https://huggingface.co/datasets/pythainlp/wisesight_sentiment">Wisesight</a>: Wisesight sentiment analysis corpus
|
33 |
-
- NLG
|
34 |
-
- <a href="https://huggingface.co/datasets/csebuetnlp/xlsum">XLSum</a>: XLSum is a comprehensive and diverse dataset comprising 1.35 million professionally annotated article-summary pairs from BBC.
|
35 |
-
- <a href="https://huggingface.co/datasets/SEACrowd/flores200">Flores200</a>: FLORES is a benchmark dataset
|
36 |
-
- <a href="https://huggingface.co/datasets/iapp/iapp_wiki_qa_squad">iapp Wiki QA Squad</a>: iapp Wiki QA Squad is an extractive question
|
37 |
|
38 |
|
39 |
-
|
40 |
-
- BLEU is calculated using flores200 tokenizer using
|
41 |
-
- ROUGEL is calculated using
|
42 |
-
- LLM
|
43 |
|
44 |
-
Reproducibility
|
45 |
|
46 |
-
|
47 |
|
48 |
-
Acknowledgements
|
49 |
|
50 |
-
We
|
51 |
"""
|
52 |
|
53 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
54 |
-
CITATION_BUTTON_TEXT = r"""@misc{
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
|
59 |
-
|
60 |
-
primaryClass={cs.CL},
|
61 |
-
url={https://arxiv.org/abs/2406.10118},
|
62 |
}"""
|
|
|
18 |
"""
|
19 |
|
20 |
LLM_BENCHMARKS_TEXT = f"""
|
|
|
21 |
The leaderboard currently consists of the following benchmarks:
|
22 |
+
- <b>Exam</b>
|
23 |
- <a href="https://huggingface.co/datasets/scb10x/thai_exam">ThaiExam</a>: ThaiExam is a Thai language benchmark based on examinations for high-school students and investment professionals in Thailand.
|
24 |
+
- <a href="https://arxiv.org/abs/2306.05179">M3Exam</a>: M3Exam is a novel benchmark sourced from authentic and official human exam questions for evaluating LLMs in a multilingual, multimodal, and multilevel context. This leaderboard uses the Thai subset of M3Exam.
|
25 |
+
- <b>LLM-as-a-Judge</b>
|
26 |
+
- <a href="https://huggingface.co/datasets/ThaiLLM-Leaderboard/mt-bench-thai">Thai MT-Bench</a>: A Thai version of <a href="https://arxiv.org/abs/2306.05685">MT-Bench</a> developed specially by VISTEC for probing Thai generative skills using the LLM-as-a-judge method.
|
27 |
+
- <b>NLU</b>
|
28 |
+
- <a href="https://huggingface.co/datasets/facebook/belebele">Belebele</a>: Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants, where the Thai subset is used in this leaderboard.
|
29 |
+
- <a href="https://huggingface.co/datasets/facebook/xnli">XNLI</a>: XNLI is an evaluation corpus for language transfer and cross-lingual sentence classification in 15 languages. This leaderboard uses the Thai subset of this corpus.
|
30 |
+
- <a href="https://huggingface.co/datasets/cambridgeltl/xcopa">XCOPA</a>: XCOPA is a corpus of translated and re-annotated English COPA, covers 11 languages. This is designed to measure the commonsense reasoning ability in non-English languages. This leaderboard uses the Thai subset of this corpus.
|
31 |
+
- <a href="https://huggingface.co/datasets/pythainlp/wisesight_sentiment">Wisesight</a>: Wisesight sentiment analysis corpus contains social media messages in the Thai language with sentiment labels.
|
32 |
+
- <b>NLG</b>
|
33 |
+
- <a href="https://huggingface.co/datasets/csebuetnlp/xlsum">XLSum</a>: XLSum is a comprehensive and diverse dataset comprising 1.35 million professionally annotated article-summary pairs from the BBC. This corpus evaluates the summarization performance in non-English languages, and this leaderboard uses the Thai subset.
|
34 |
+
- <a href="https://huggingface.co/datasets/SEACrowd/flores200">Flores200</a>: FLORES is a machine translation benchmark dataset used to evaluate translation quality between English and low-resource languages. This leaderboard uses the Thai subset of Flores200.
|
35 |
+
- <a href="https://huggingface.co/datasets/iapp/iapp_wiki_qa_squad">iapp Wiki QA Squad</a>: iapp Wiki QA Squad is an extractive question-answering dataset derived from Thai Wikipedia articles.
|
36 |
|
37 |
|
38 |
+
Metric Implementation Details:
|
39 |
+
- BLEU is calculated using flores200's tokenizer using HuggingFace `evaluate` <a href="https://huggingface.co/spaces/evaluate-metric/sacrebleu">implementation</a>.
|
40 |
+
- ROUGEL is calculated using PyThaiNLP newmm tokenizer and HuggingFace `evaluate` <a href="https://huggingface.co/spaces/evaluate-metric/rouge">implementation</a>.
|
41 |
+
- LLM-as-a-judge rating is based on OpenAI's gpt-4o-2024-05-13 using the prompt defined in <a href="https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/judge_prompts.jsonl">lmsys MT-Bench</a>.
|
42 |
|
43 |
+
Reproducibility:
|
44 |
|
45 |
+
- For reproducibility of results, we have open-sourced the evaluation pipeline. Please check out the repository <a href="https://github.com/scb-10x/seacrowd-eval">seacrowd-experiments</a>.
|
46 |
|
47 |
+
Acknowledgements:
|
48 |
|
49 |
+
- We are grateful to previous open-source projects that released datasets, tools, and knowledge. We thank community members for tasks and model submissions. To contribute, please see the submit tab.
|
50 |
"""
|
51 |
|
52 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
53 |
+
CITATION_BUTTON_TEXT = r"""@misc{thaillm-leaderboard,
|
54 |
+
author = {SCB 10X, VISTEC, SEACrowd},
|
55 |
+
title = {Thai LLM Leaderboard},
|
56 |
+
year = {2024},
|
57 |
+
publisher = {Hugging Face},
|
58 |
+
url={https://huggingface.co/spaces/ThaiLLM-Leaderboard/leaderboard}
|
|
|
|
|
59 |
}"""
|