|
TITLE = '<h1 align="center" id="space-title">Open Multilingual LLM Evaluation Leaderboard</h1>' |
|
|
|
INTRO_TEXT = f""" |
|
## About |
|
|
|
This leaderboard tracks progress and ranks performance of large language models (LLMs) developed for different languages, |
|
emphasizing on non-English languages to democratize benefits of LLMs to broader society. |
|
Our current leaderboard provides evaluation data for 29 languages, i.e., |
|
Arabic, Armenian, Basque, Bengali, Catalan, Chinese, Croatian, Danish, Dutch, |
|
French, German, Gujarati, Hindi, Hungarian, Indonesian, Italian, Kannada, Malayalam, |
|
Marathi, Nepali, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish, |
|
Tamil, Telugu, Ukrainian, and Vietnamese, that will be expanded along the way. |
|
Both multilingual and language-specific LLMs are welcome in this leaderboard. |
|
We currently evaluate models over four benchmarks: |
|
|
|
- <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot) |
|
- <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (10-shot) |
|
- <a href="https://arxiv.org/abs/2009.03300" target="_blank"> MMLU </a> (5-shot) |
|
- <a href="https://arxiv.org/abs/2109.07958" target="_blank"> TruthfulQA </a> (0-shot) |
|
|
|
The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo). |
|
|
|
""" |
|
|
|
HOW_TO = f""" |
|
## How to list your model performance on this leaderboard: |
|
|
|
Run the evaluation of your model using this repo: <a href="https://github.com/laiviet/lm-evaluation-harness" target="_blank">https://github.com/laiviet/lm-evaluation-harness</a>. |
|
|
|
And then, push the evaluation log and make a pull request. |
|
""" |
|
|
|
CREDIT = f""" |
|
## Credit |
|
|
|
To make this website, we use the following resources: |
|
|
|
- Datasets (AI2_ARC, HellaSwag, MMLU, TruthfulQA) |
|
- Funding and GPU access (Adobe Research) |
|
- Evaluation code (EleutherAI's lm_evaluation_harness repo) |
|
- Leaderboard code (Huggingface4's open_llm_leaderboard repo) |
|
|
|
""" |
|
|
|
|
|
CITATION = f""" |
|
## Citation |
|
|
|
``` |
|
|
|
@misc{{lai2023openllmbenchmark, |
|
author = {{Viet Lai and Nghia Trung Ngo and Amir Pouran Ben Veyseh and Franck Dernoncourt and Thien Huu Nguyen}}, |
|
title={{Open Multilingual LLM Evaluation Leaderboard}}, |
|
year={{2023}} |
|
}} |
|
``` |
|
""" |
|
|