One of the oldest leaderboards on the hub, it has already evaluated more than 1000 models! It uses Korean translations of MMLU, ARC, HellaSwag, TruthfulQA, and a new dataset, Korean CommonGen, about specific common sense alignement.
What's interesting about this leaderboard is how it drove LLM development in Korea, with on average about 4 submissions/models per day since it started! Really looking forward to seeing similar initiatives in other languages, to help qualitative models emerge outside of "just English" (for the other 2/3rds of the world).