SmolTulu
Collection
A collection of models that use SmolLM2 as the pretrained base in conjunction with AllenAI's Tulu 3 post training pipeline.
•
6 items
•
Updated
•
1
SmolTulu-1.7b-Reinforced is the reinforcement learning with verifiable rewards (RLVR) version of SmolTulu-1.7b-Instruct, which leverages AllenAI's Tulu 3 post-training pipeline
This model scores the highest current score in both IFEval and GSM8k while maintaining the extremely low contamination levels in Tulu 3 and SmolLM2! I've listed the datasets used to do both the RLVR stage, which is the same one mentioned used in the Tulu 3 paper.
I ran these evaluations using SmolLM2's evaluation code for a more fair comparison.
Metric | SmolTulu-1.7b-Instruct | SmolTulu-1.7b-Reinforced | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct |
---|---|---|---|---|---|---|
ARC (Average) | 51.5 | 51.1 | 51.7 | 41.6 | 46.2 | 43.7 |
BBH (3-shot) | 33.8 | 33.4 | 32.2 | 27.6 | 35.3 | 25.7 |
GSM8K (5-shot) | 51.6 | 61.0 | 48.2 | 26.8 | 42.8 | 4.6 |
HellaSwag | 61.1 | 60.4 | 66.1 | 56.1 | 60.9 | 55.5 |
IFEval (Average prompt/inst) | 67.7 | 69.3 | 56.7 | 53.5 | 47.4 | 23.1 |
MMLU-Pro (MCF) | 17.4 | 17.3 | 19.3 | 12.7 | 24.2 | 11.7 |
PIQA | 72.2 | 72.1 | 74.4 | 72.3 | 73.2 | 71.6 |
The reinforced model used PPO with verifiable rewards:
Just like any Huggingface model, just run it using the transformers library:
# pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "SultanR/SmolTulu-1.7b-Reinforced"
device = "cuda" # for GPU usage or "cpu" for CPU usage
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")`
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device)
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
@misc{alrashed2024smoltuluhigherlearningrate,
title={SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs},
author={Sultan Alrashed},
year={2024},
eprint={2412.08347},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.08347},
}
The training methodology follows the Tulu 3 paper:
@article{lambert2024tulu3,
title={TÜLU 3: Pushing Frontiers in Open Language Model Post-Training},
author={Lambert, Nathan and Morrison, Jacob and Pyatkin, Valentina and others},
year={2024},
journal={arXiv preprint arXiv:2411.15124}
}