SultanR
/

SmolTulu-1.7b-Reinforced

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

SultanR commited on 4 days ago

Commit

530b6c0

•

1 Parent(s): 7247435

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ pipeline_tag: text-generation
 ![SmolTulu Banner](smoltulubanner.png)
-SmolTulu-1.7b-Reinforced is the reinforcement learning with verifiable rewards (RLVR) version of [SmolTulu-1.7b-Instruct](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct), which leverages [AllenAI's Tulu 3 post-training pipeline](https://allenai.org/blog/tulu-3-technical)
 This model scores the highest current score in both IFEval and GSM8k while maintaining the extremely low contamination levels in Tulu 3 and SmolLM2! I've listed the datasets used to do both the RLVR stage, which is the same one mentioned used in the Tulu 3 paper.
 ## Evaluation

 ![SmolTulu Banner](smoltulubanner.png)
+SmolTulu-1.7b-Reinforced is the reinforcement learning with verifiable rewards (RLVR) version of [SmolTulu-1.7b-Instruct](https://huggingface.co/SultanR/SmolTulu-1.7b-Instruct), which leverages [AllenAI's Tulu 3 post-training pipeline](https://arxiv.org/abs/2411.15124)
 This model scores the highest current score in both IFEval and GSM8k while maintaining the extremely low contamination levels in Tulu 3 and SmolLM2! I've listed the datasets used to do both the RLVR stage, which is the same one mentioned used in the Tulu 3 paper.
 ## Evaluation