yentinglin
commited on
Commit
•
a92da22
1
Parent(s):
6edca0c
Update src/about.py
Browse files- src/about.py +6 -6
src/about.py
CHANGED
@@ -23,7 +23,7 @@ NUM_FEWSHOT = 0 # Change with your few shot
|
|
23 |
|
24 |
|
25 |
# Your leaderboard name
|
26 |
-
TITLE = """<h1 align="center" id="space-title">
|
27 |
|
28 |
# What does your leaderboard evaluate?
|
29 |
INTRODUCTION_TEXT = """
|
@@ -36,7 +36,7 @@ LLM_BENCHMARKS_TEXT = f"""
|
|
36 |
The leaderboard evaluates LLMs on the following benchmarks:
|
37 |
|
38 |
1. TMLU (Taiwanese Mandarin Language Understanding): Measures the model's ability to understand Taiwanese Mandarin text across various domains.
|
39 |
-
2. TW Truthful QA: Assesses the model's capability to provide truthful answers to questions in Taiwanese Mandarin, with a focus on Taiwan-specific context.
|
40 |
3. TW Legal Eval: Evaluates the model's understanding of legal terminology and concepts in Taiwanese Mandarin, using questions from the Taiwanese bar exam for lawyers.
|
41 |
4. MMLU (Massive Multitask Language Understanding): Tests the model's performance on a wide range of tasks in English.
|
42 |
|
@@ -44,10 +44,10 @@ To reproduce our results, please follow the instructions in the provided GitHub
|
|
44 |
|
45 |
該排行榜在以下考題上評估 LLMs:
|
46 |
|
47 |
-
1. TMLU(
|
48 |
-
2. TW Truthful QA
|
49 |
-
3. TW Legal Eval
|
50 |
-
4. MMLU(
|
51 |
|
52 |
要重現我們的結果,請按照:https://github.com/adamlin120/lm-evaluation-harness/blob/main/run_all.sh
|
53 |
"""
|
|
|
23 |
|
24 |
|
25 |
# Your leaderboard name
|
26 |
+
TITLE = """<h1 align="center" id="space-title">Open Taiwan LLM leaderboard</h1>"""
|
27 |
|
28 |
# What does your leaderboard evaluate?
|
29 |
INTRODUCTION_TEXT = """
|
|
|
36 |
The leaderboard evaluates LLMs on the following benchmarks:
|
37 |
|
38 |
1. TMLU (Taiwanese Mandarin Language Understanding): Measures the model's ability to understand Taiwanese Mandarin text across various domains.
|
39 |
+
2. TW Truthful QA: Assesses the model's capability to provide truthful and localized answers to questions in Taiwanese Mandarin, with a focus on Taiwan-specific context.
|
40 |
3. TW Legal Eval: Evaluates the model's understanding of legal terminology and concepts in Taiwanese Mandarin, using questions from the Taiwanese bar exam for lawyers.
|
41 |
4. MMLU (Massive Multitask Language Understanding): Tests the model's performance on a wide range of tasks in English.
|
42 |
|
|
|
44 |
|
45 |
該排行榜在以下考題上評估 LLMs:
|
46 |
|
47 |
+
1. [TMLU(臺灣中文大規模多任務語言理解)](https://huggingface.co/datasets/miulab/tmlu):衡量模型理解各個領域(國中、高中、大學、國考)的能力。
|
48 |
+
2. TW Truthful QA:評估模型以臺灣特定的背景來回答問題,測試模型的在地化能力。
|
49 |
+
3. [TW Legal Eval](https://huggingface.co/datasets/lianghsun/tw-legal-benchmark-v1):使用臺灣律師資格考試的問題,評估模型對臺灣法律術語和概念的理解。
|
50 |
+
4. [MMLU(英文大規模多任務語言理解)](https://huggingface.co/datasets/cais/mmlu):測試模型在英語中各種任務上的表現。
|
51 |
|
52 |
要重現我們的結果,請按照:https://github.com/adamlin120/lm-evaluation-harness/blob/main/run_all.sh
|
53 |
"""
|