|
--- |
|
datasets: |
|
- natolambert/skywork-preferences-80k-v0.1-cleaned |
|
- allenai/preference-test-sets |
|
--- |
|
|
|
# MetaMetrics-RM-v1.0 |
|
|
|
+ **Authors** [Genta Indra Winata](https://gentawinata.com/), [David Anugraha](https://weixiongust.github.io/WeiXiongUST/index.html), [Lucky Susanto](https://tengyangxie.github.io/), [Garry Kuwanto](https://hanzhaoml.github.io/), [Derry Tanti Wijaya](https://tongzhang-ml.org/) |
|
+ **Paper**: https://arxiv.org/abs/2406.12845 |
|
+ **Model**: [meta-metrics/MetaMetrics-RM-v1.0](https://huggingface.co/meta-metrics/MetaMetrics-RM-v1.0) |
|
+ **Dataset**: |
|
- [natolambert/skywork-preferences-80k-v0.1-cleaned](https://huggingface.co/datasets/natolambert/skywork-preferences-80k-v0.1-cleaned) |
|
- [allenai/preference-test-sets](https://huggingface.co/datasets/allenai/preference-test-sets) |
|
+ **Code Repository:** https://github.com/meta-metrics/metametrics |
|
|
|
## RewardBench LeaderBoard |
|
|
|
| Model | Score | Chat | Chat Hard | Safety | Reasoning | |
|
|:--------------------------------------------------------------------------------|:-----------------------------------------------------------------------|:-----:|:-----|:----------|:-------|:----------|:-----------------------|:------------------------| |
|
| nvidia/Llama-3.1-Nemotron-70B-Reward | **94.1** | 97.5 | 85.7 | **95.1** | 98.1 | |
|
| meta-metrics/MetaMetrics-RM-v1.0 | 93.5 | **98.9** | 86.2 | 90.7 | **98.2** | |
|
| SF-Foundation/TextEval-Llama3.1-70B | 93.5 | 94.1 | **90.1** | 93.2 | 96.4 | |
|
| RLHFlow/ArmoRM-Llama3-8B-v0.1 | 90.4 | 96.9 | 76.8 | 90.5 | 97.3 | |
|
|
|
## Citation |
|
|
|
If you find this work useful for your research, please consider citing: |
|
``` |
|
@article{winata2024metametrics, |
|
title={MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences}, |
|
author={Winata, Genta Indra and Anugraha, David and Susanto, Lucky and Kuwanto, Garry and Wijaya, Derry Tanti}, |
|
journal={arXiv preprint arXiv:2410.02381}, |
|
year={2024} |
|
} |
|
``` |