|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- allenai/tulu-3-sft-mixture |
|
- allenai/llama-3.1-tulu-3-8b-preference-mixture |
|
language: |
|
- en |
|
base_model: |
|
- HuggingFaceTB/SmolLM2-1.7B |
|
library_name: transformers |
|
tags: |
|
- Tulu3 |
|
- Smollm |
|
- SLMs |
|
- Small |
|
- Huggingface |
|
- Allenai |
|
- SFT |
|
- DPO |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# SmolLM2 1.7b Instruction Tuned & DPO Aligned through Tulu 3! |
|
|
|
![SmolTulu Banner](smoltulubannerv0.png) |
|
|
|
SmolTulu-v0.1 is the first model in a series of models meant to leverage [AllenAI's Tulu 3 post-training pipeline](https://allenai.org/blog/tulu-3-technical) to tune the [base version of Huggingface's SmolLM2-1.7b](https://huggingface.co/HuggingFaceTB/SmolLM2-1.7B)! The post training pipeline AllenAI came up with seemed like something perfect to apply here. |
|
|
|
This model scores the highest current score in IFEval while maintaining the extremely low contamination levels in Tulu 3 and SmolLM2! I've listed the datasets used to do both the SFT (supervised finetuning) and DPO (direct preference optimization) stages. |
|
|
|
## Why v0.1? |
|
|
|
There's a few reasons on why I like calling this model v0.1: |
|
|
|
1. The model still lags behind the instruction tuned version of SmolLM2 in some other metrics. |
|
2. This model has only undergone SFT and DPO, the RLVR (reinforcement learning with verifiable rewards) stage was too computationally expensive to run on a model that could be better. |
|
3. Initial hyperparameter choice during training was naive, through some napkin math I've been able to find a much better learning rate that scales the one found in the Tulu 3 paper according to my computational resources better. |
|
|
|
# Evaluation |
|
|
|
I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison. |
|
|
|
| Metric | SmolTulu-1.7b-it-v0 | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct | |
|
|:----------------------------|:---------------------:|:---------------------:|:---------------------:|:---------------------:|:---------------------:| |
|
| IFEval (Average prompt/inst) | **67.7** | 56.7 | 53.5 | 47.4 | 23.1 | |
|
| GSM8K (5-shot) | **51.6** | 48.2 | 26.8 | 42.8 | 4.6 | |
|
| ARC (Average) | 51.5 | **51.7** | 41.6 | 46.2 | 43.7 | |
|
| HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 | |
|
| MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 | |
|
|
|
# Usage |
|
|
|
Just like any Huggingface model, just run it using the transformers library: |
|
|
|
```python |
|
# pip install transformers |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
checkpoint = "SultanR/SmolTulu-v0" |
|
device = "cuda" # for GPU usage or "cpu" for CPU usage |
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
|
# for multiple GPUs install accelerate and do `model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto")` |
|
model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device) |
|
inputs = tokenizer.encode("Gravity is", return_tensors="pt").to(device) |
|
outputs = model.generate(inputs) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
I will be uploading the model to Ollama and providing GGUF versions very soon. |