MetaMetrics-RM-v1.0 / README.md
gentaiscool's picture
Create README.md
75a689a verified
|
raw
history blame
1.95 kB
metadata
datasets:
  - natolambert/skywork-preferences-80k-v0.1-cleaned
  - allenai/preference-test-sets

MetaMetrics-RM-v1.0

RewardBench LeaderBoard

| Model | Score | Chat | Chat Hard | Safety | Reasoning | |:--------------------------------------------------------------------------------|:-----------------------------------------------------------------------|:-----:|:-----|:----------|:-------|:----------|:-----------------------|:------------------------| | nvidia/Llama-3.1-Nemotron-70B-Reward | 94.1 | 97.5 | 85.7 | 95.1 | 98.1 | | meta-metrics/MetaMetrics-RM-v1.0 | 93.5 | 98.9 | 86.2 | 90.7 | 98.2 | | SF-Foundation/TextEval-Llama3.1-70B | 93.5 | 94.1 | 90.1 | 93.2 | 96.4 | | RLHFlow/ArmoRM-Llama3-8B-v0.1 | 90.4 | 96.9 | 76.8 | 90.5 | 97.3 |

Citation

If you find this work useful for your research, please consider citing:

@article{winata2024metametrics,
  title={MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences},
  author={Winata, Genta Indra and Anugraha, David and Susanto, Lucky and Kuwanto, Garry and Wijaya, Derry Tanti},
  journal={arXiv preprint arXiv:2410.02381},
  year={2024}
}