MetaMetrics-RM-v1.0

RewardBench LeaderBoard

Model Score Chat Chat Hard Safety Reasoning
nvidia/Llama-3.1-Nemotron-70B-Reward 94.1 97.5 85.7 95.1 98.1
meta-metrics/MetaMetrics-RM-v1.0 93.5 98.9 86.2 90.7 98.2
SF-Foundation/TextEval-Llama3.1-70B 93.5 94.1 90.1 93.2 96.4
RLHFlow/ArmoRM-Llama3-8B-v0.1 90.4 96.9 76.8 90.5 97.3

Citation

If you find this work useful for your research, please consider citing:

@article{winata2024metametrics,
  title={MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences},
  author={Winata, Genta Indra and Anugraha, David and Susanto, Lucky and Kuwanto, Garry and Wijaya, Derry Tanti},
  journal={arXiv preprint arXiv:2410.02381},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .

Datasets used to train meta-metrics/MetaMetrics-RM-v1.0