README.md · opencompass/judgerbench

metadata

title: JudgerBench Leaderboard
emoji: 🌎
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.1.0
app_file: app.py
pinned: true
license: apache-2.0
tags:
  - leaderboard
short_description: JudgerBench Leaderboard

In this leaderboard, we display all evaluation results obtained with VLMEvalKit. The space provides an overall leaderboard, consisting of a curated selection of benchmarks and the overall score; as well as the benchmark-level leaderboards that provides the overall and fine-grained scores for each single benchmark.

Github: https://github.com/open-compass/VLMEvalKit Report: https://arxiv.org/abs/2407.11691

Please consider to cite the report if the resource is useful to your research:

@misc{duan2024vlmevalkitopensourcetoolkitevaluating,
      title={VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models}, 
      author={Haodong Duan and Junming Yang and Yuxuan Qiao and Xinyu Fang and Lin Chen and Yuan Liu and Amit Agarwal and Zhe Chen and Mo Li and Yubo Ma and Hailong Sun and Xiangyu Zhao and Junbo Cui and Xiaoyi Dong and Yuhang Zang and Pan Zhang and Jiaqi Wang and Dahua Lin and Kai Chen},
      year={2024},
      eprint={2407.11691},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.11691}, 
}