Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ language:
|
|
6 |
|
7 |
# LM Loss OPT RM
|
8 |
|
9 |
-
This is a fine tuned OPT 1.3b model for reward modelling. The finetuning has been done on top of the full [
|
10 |
|
11 |
| Model | # Params | Validation Accuracy (in %) |
|
12 |
|--------------------|-----------|-------------------|
|
|
|
6 |
|
7 |
# LM Loss OPT RM
|
8 |
|
9 |
+
This is a fine tuned OPT 1.3b model for reward modelling. The finetuning has been done on top of the full [SLF5K](https://huggingface.co/datasets/JeremyAlain/SLF5K) dataset following the method presented in the paper [Training Language Models with Language Feedback at Scale](https://arxiv.org/abs/2303.16755). The main results can be seen in the following table:
|
10 |
|
11 |
| Model | # Params | Validation Accuracy (in %) |
|
12 |
|--------------------|-----------|-------------------|
|