SultanR commited on
Commit
530b6c0
1 Parent(s): 7247435

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -26,7 +26,7 @@ pipeline_tag: text-generation
26
 
27
  ![SmolTulu Banner](smoltulubanner.png)
28
 
29
- SmolTulu-1.7b-Reinforced is the reinforcement learning with verifiable rewards (RLVR) version of [SmolTulu-1.7b-Instruct](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct), which leverages [AllenAI's Tulu 3 post-training pipeline](https://allenai.org/blog/tulu-3-technical)
30
 
31
  This model scores the highest current score in both IFEval and GSM8k while maintaining the extremely low contamination levels in Tulu 3 and SmolLM2! I've listed the datasets used to do both the RLVR stage, which is the same one mentioned used in the Tulu 3 paper.
32
  ## Evaluation
 
26
 
27
  ![SmolTulu Banner](smoltulubanner.png)
28
 
29
+ SmolTulu-1.7b-Reinforced is the reinforcement learning with verifiable rewards (RLVR) version of [SmolTulu-1.7b-Instruct](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct), which leverages [AllenAI's Tulu 3 post-training pipeline](https://arxiv.org/abs/2411.15124)
30
 
31
  This model scores the highest current score in both IFEval and GSM8k while maintaining the extremely low contamination levels in Tulu 3 and SmolLM2! I've listed the datasets used to do both the RLVR stage, which is the same one mentioned used in the Tulu 3 paper.
32
  ## Evaluation