Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ pipeline_tag: text-generation
|
|
26 |
|
27 |
![SmolTulu Banner](smoltulubanner.png)
|
28 |
|
29 |
-
SmolTulu-1.7b-Reinforced is the reinforcement learning with verifiable rewards (RLVR) version of [SmolTulu-1.7b-Instruct](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct), which leverages [AllenAI's Tulu 3 post-training pipeline](https://
|
30 |
|
31 |
This model scores the highest current score in both IFEval and GSM8k while maintaining the extremely low contamination levels in Tulu 3 and SmolLM2! I've listed the datasets used to do both the RLVR stage, which is the same one mentioned used in the Tulu 3 paper.
|
32 |
## Evaluation
|
|
|
26 |
|
27 |
![SmolTulu Banner](smoltulubanner.png)
|
28 |
|
29 |
+
SmolTulu-1.7b-Reinforced is the reinforcement learning with verifiable rewards (RLVR) version of [SmolTulu-1.7b-Instruct](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct), which leverages [AllenAI's Tulu 3 post-training pipeline](https://arxiv.org/abs/2411.15124)
|
30 |
|
31 |
This model scores the highest current score in both IFEval and GSM8k while maintaining the extremely low contamination levels in Tulu 3 and SmolLM2! I've listed the datasets used to do both the RLVR stage, which is the same one mentioned used in the Tulu 3 paper.
|
32 |
## Evaluation
|