SultanR commited on
Commit
8d9d648
1 Parent(s): 11ed78c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -40,10 +40,12 @@ There's a few reasons on why I like calling this model v0.1:
40
 
41
  I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison.
42
 
43
- | Metric | SmolTulu-1.7b-it-v0 | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct |
44
  |:----------------------------|:---------------------:|:---------------------:|:---------------------:|:---------------------:|:---------------------:|
45
  | IFEval (Average prompt/inst) | **67.7** | 56.7 | 53.5 | 47.4 | 23.1 |
46
  | GSM8K (5-shot) | **51.6** | 48.2 | 26.8 | 42.8 | 4.6 |
 
 
47
  | ARC (Average) | 51.5 | **51.7** | 41.6 | 46.2 | 43.7 |
48
  | HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 |
49
  | MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 |
 
40
 
41
  I ran these evaluations using [SmolLM2's evaluation code](https://github.com/huggingface/smollm/tree/main/evaluation) for a more fair comparison.
42
 
43
+ | Metric | SultanR/SmolTulu-1.7b-it-v0 | SmolLM2-1.7B-Instruct | Llama-1B-Instruct | Qwen2.5-1.5B-Instruct | SmolLM1-1.7B-Instruct |
44
  |:----------------------------|:---------------------:|:---------------------:|:---------------------:|:---------------------:|:---------------------:|
45
  | IFEval (Average prompt/inst) | **67.7** | 56.7 | 53.5 | 47.4 | 23.1 |
46
  | GSM8K (5-shot) | **51.6** | 48.2 | 26.8 | 42.8 | 4.6 |
47
+ | PIQA | 72.2 | **74.4** | 72.3 | 73.2 | 71.6 |
48
+ | BBH (3-shot) | 33.8 | 32.2 | 27.6 | **35.3** | 25.7 |
49
  | ARC (Average) | 51.5 | **51.7** | 41.6 | 46.2 | 43.7 |
50
  | HellaSwag | 61.1 | **66.1** | 56.1 | 60.9 | 55.5 |
51
  | MMLU-Pro (MCF) | 17.4 | 19.3 | 12.7 | **24.2** | 11.7 |