ixa-ehu commited on
Commit
76e8190
1 Parent(s): f238f0d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -1,3 +1,26 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
  ---
6
+
7
+ # LM Loss OPT RM
8
+
9
+ This is a fine tuned OPT 13b model for reward modelling. The finetuning has been done on top of the full [SLF5K](https://huggingface.co/datasets/JeremyAlain/SLF5K) dataset following the method presented in the paper [Training Language Models with Language Feedback at Scale](https://arxiv.org/abs/2303.16755). The main results can be seen in the following table:
10
+
11
+ | Model | # Params | Validation Accuracy (in %) |
12
+ |--------------------|-----------|-------------------|
13
+ | OPT LM Loss | 13B | **73.4 +/- 1.9** |
14
+ | OPT LM Loss | 1.3B | 69.6 +/- 2.0 |
15
+ | OPT RM Loss | 13B | 71.8 +/- 2.0 |
16
+
17
+ If using this model, please cite the following paper:
18
+
19
+ ```
20
+ @article{scheurer2023training,
21
+ title={Training Language Models with Language Feedback at Scale},
22
+ author={Scheurer, J{\'e}r{\'e}my and Campos, Jon Ander and Korbak, Tomasz and Chan, Jun Shern and Chen, Angelica and Cho, Kyunghyun and Perez, Ethan},
23
+ journal={arXiv preprint arXiv:2303.16755},
24
+ year={2023}
25
+ }
26
+ ```