File size: 1,080 Bytes
621bb87
c2b704b
e153c41
 
621bb87
e153c41
 
 
a68acfa
e153c41
 
 
 
 
 
 
 
 
 
 
 
580bb86
 
e153c41
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
license: mit
language:
- en
---

# LM Loss OPT RM

This is a fine tuned OPT 1.3b model for reward modelling. The finetuning has been done on top of the full [SLF5K](https://huggingface.co/datasets/JeremyAlain/SLF5K) dataset following the method presented in the paper [Training Language Models with Language Feedback at Scale](https://arxiv.org/abs/2303.16755). The main results can be seen in the following table:

| Model              | # Params  | Validation Accuracy (in %) |
|--------------------|-----------|-------------------|
| OPT LM Loss        |      13B  |  **73.4 +/- 1.9** |
| OPT LM Loss        |      1.3B |      69.6 +/- 2.0 |
| OPT RM Loss        |      13B  |      71.8 +/- 2.0 |

If using this model, please cite the following paper:

```
@article{scheurer2023training,
  title={Training Language Models with Language Feedback at Scale},
  author={Scheurer, J{\'e}r{\'e}my and Campos, Jon Ander and Korbak, Tomasz and Chan, Jun Shern and Chen, Angelica and Cho, Kyunghyun and Perez, Ethan},
  journal={arXiv preprint arXiv:2303.16755},
  year={2023}
}
```