HiTZ
/

rmloss-opt-rm-13b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ixa-ehu commited on Apr 7, 2023

Commit

76e8190

•

1 Parent(s): f238f0d

Update README.md

Files changed (1) hide show

README.md +23 -0

README.md CHANGED Viewed

@@ -1,3 +1,26 @@
 ---
 license: mit
 ---

 ---
 license: mit
+language:
+- en
 ---
+# LM Loss OPT RM
+This is a fine tuned OPT 13b model for reward modelling. The finetuning has been done on top of the full [SLF5K](https://huggingface.co/datasets/JeremyAlain/SLF5K) dataset following the method presented in the paper [Training Language Models with Language Feedback at Scale](https://arxiv.org/abs/2303.16755). The main results can be seen in the following table:
+| Model              | # Params  | Validation Accuracy (in %) |
+|--------------------|-----------|-------------------|
+| OPT LM Loss        |      13B  |  **73.4 +/- 1.9** |
+| OPT LM Loss        |      1.3B |      69.6 +/- 2.0 |
+| OPT RM Loss        |      13B  |      71.8 +/- 2.0 |
+If using this model, please cite the following paper:
+```
+@article{scheurer2023training,
+  title={Training Language Models with Language Feedback at Scale},
+  author={Scheurer, J{\'e}r{\'e}my and Campos, Jon Ander and Korbak, Tomasz and Chan, Jun Shern and Chen, Angelica and Cho, Kyunghyun and Perez, Ethan},
+  journal={arXiv preprint arXiv:2303.16755},
+  year={2023}
+}
+```