yentinglin commited on
Commit
a92da22
1 Parent(s): 6edca0c

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +6 -6
src/about.py CHANGED
@@ -23,7 +23,7 @@ NUM_FEWSHOT = 0 # Change with your few shot
23
 
24
 
25
  # Your leaderboard name
26
- TITLE = """<h1 align="center" id="space-title">Demo leaderboard</h1>"""
27
 
28
  # What does your leaderboard evaluate?
29
  INTRODUCTION_TEXT = """
@@ -36,7 +36,7 @@ LLM_BENCHMARKS_TEXT = f"""
36
  The leaderboard evaluates LLMs on the following benchmarks:
37
 
38
  1. TMLU (Taiwanese Mandarin Language Understanding): Measures the model's ability to understand Taiwanese Mandarin text across various domains.
39
- 2. TW Truthful QA: Assesses the model's capability to provide truthful answers to questions in Taiwanese Mandarin, with a focus on Taiwan-specific context.
40
  3. TW Legal Eval: Evaluates the model's understanding of legal terminology and concepts in Taiwanese Mandarin, using questions from the Taiwanese bar exam for lawyers.
41
  4. MMLU (Massive Multitask Language Understanding): Tests the model's performance on a wide range of tasks in English.
42
 
@@ -44,10 +44,10 @@ To reproduce our results, please follow the instructions in the provided GitHub
44
 
45
  該排行榜在以下考題上評估 LLMs:
46
 
47
- 1. TMLU(臺灣國語語言理解):衡量模型理解各個領域臺灣國語文本的能力。
48
- 2. TW Truthful QA:評估模型以臺灣國語提供真實答案的能力,重點關注臺灣特定的背景。
49
- 3. TW Legal Eval:使用臺灣律師資格考試的問題,評估模型對臺灣國語法律術語和概念的理解。
50
- 4. MMLU(大規模多任務語言理解):測試模型在英語中各種任務上的表現。
51
 
52
  要重現我們的結果,請按照:https://github.com/adamlin120/lm-evaluation-harness/blob/main/run_all.sh
53
  """
 
23
 
24
 
25
  # Your leaderboard name
26
+ TITLE = """<h1 align="center" id="space-title">Open Taiwan LLM leaderboard</h1>"""
27
 
28
  # What does your leaderboard evaluate?
29
  INTRODUCTION_TEXT = """
 
36
  The leaderboard evaluates LLMs on the following benchmarks:
37
 
38
  1. TMLU (Taiwanese Mandarin Language Understanding): Measures the model's ability to understand Taiwanese Mandarin text across various domains.
39
+ 2. TW Truthful QA: Assesses the model's capability to provide truthful and localized answers to questions in Taiwanese Mandarin, with a focus on Taiwan-specific context.
40
  3. TW Legal Eval: Evaluates the model's understanding of legal terminology and concepts in Taiwanese Mandarin, using questions from the Taiwanese bar exam for lawyers.
41
  4. MMLU (Massive Multitask Language Understanding): Tests the model's performance on a wide range of tasks in English.
42
 
 
44
 
45
  該排行榜在以下考題上評估 LLMs:
46
 
47
+ 1. [TMLU(臺灣中文大規模多任務語言理解)](https://huggingface.co/datasets/miulab/tmlu):衡量模型理解各個領域(國中、高中、大學、國考)的能力。
48
+ 2. TW Truthful QA:評估模型以臺灣特定的背景來回答問題,測試模型的在地化能力。
49
+ 3. [TW Legal Eval](https://huggingface.co/datasets/lianghsun/tw-legal-benchmark-v1):使用臺灣律師資格考試的問題,評估模型對臺灣法律術語和概念的理解。
50
+ 4. [MMLU(英文大規模多任務語言理解)](https://huggingface.co/datasets/cais/mmlu):測試模型在英語中各種任務上的表現。
51
 
52
  要重現我們的結果,請按照:https://github.com/adamlin120/lm-evaluation-harness/blob/main/run_all.sh
53
  """