iansotnek commited on
Commit
f6fd5f3
1 Parent(s): cb51982

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -90,17 +90,17 @@ We present the results from various model benchmarks on the EleutherAI LLM Evalu
90
  Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are
91
  state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.
92
 
93
- | model | openbookqa | arc_easy | winogrande | hellaswag | arc_challenge | piqa | boolq |
94
- |:--------------|-------------:|-----------:|-------------:|------------:|----------------:|---------:|---------:|
95
- | gpt2 | 0.164 | 0.438131 | 0.51618 | 0.289185 | 0.190273 | 0.628945 | 0.487156 |
96
- | dlite-v2-124m | 0.174 | 0.44697 | 0.502762 | 0.291974 | 0.192833 | 0.631665 | 0.520183 |
97
- | dlite-v1-124m | 0.17 | 0.462542 | 0.494081 | 0.293268 | 0.223549 | 0.622416 | 0.502446 |
98
- | gpt2-medium | 0.186 | 0.490741 | 0.531176 | 0.333101 | 0.215017 | 0.676279 | 0.585933 |
99
- | dlite-v2-355m | 0.206 | 0.493687 | 0.524073 | 0.334993 | 0.226109 | 0.670838 | 0.582263 |
100
- | dlite-v1-355m | 0.216 | 0.507576 | 0.496448 | 0.338478 | 0.234642 | 0.664309 | 0.600306 |
101
- | gpt2-large | 0.194 | 0.531566 | 0.553275 | 0.363971 | 0.216724 | 0.703482 | 0.604893 |
102
- | dlite-774m-v2 | 0.212 | 0.539562 | 0.5588 | 0.365565 | 0.234642 | 0.700218 | 0.60367 |
103
- | dlite-774m-v1 | 0.218 | 0.545875 | 0.562747 | 0.375124 | 0.250853 | 0.698041 | 0.614985 |
104
- | gpt2-xl | 0.224 | 0.582912 | 0.583268 | 0.400418 | 0.25 | 0.708379 | 0.617737 |
105
- | dlite-v1-1.5b | 0.226 | 0.588384 | 0.584846 | 0.401414 | 0.268771 | 0.708379 | 0.624159 |
106
- | dlite-v2-1.5b | 0.226 | 0.59596 | 0.581689 | 0.40719 | 0.273891 | 0.705114 | 0.630887 |
 
90
  Model results are sorted by mean score, ascending, to provide an ordering. These metrics serve to further show that none of the DLite models are
91
  state of the art, but rather further show that chat-like behaviors in LLMs can be trained almost independent of model size.
92
 
93
+ | Model | arc_challenge | arc_easy | boolq | hellaswag | openbookqa | piqa | winogrande |
94
+ |:--------------|----------------:|-----------:|---------:|------------:|-------------:|---------:|-------------:|
95
+ | dlite-v2-124m | 0.199659 | 0.447811 | 0.494801 | 0.291675 | 0.156 | 0.620239 | 0.487766 |
96
+ | gpt2 | 0.190273 | 0.438131 | 0.487156 | 0.289185 | 0.164 | 0.628945 | 0.51618 |
97
+ | dlite-v1-124m | 0.223549 | 0.462542 | 0.502446 | 0.293268 | 0.17 | 0.622416 | 0.494081 |
98
+ | gpt2-medium | 0.215017 | 0.490741 | 0.585933 | 0.333101 | 0.186 | 0.676279 | 0.531176 |
99
+ | dlite-v2-355m | 0.251706 | 0.486111 | 0.547401 | 0.344354 | 0.216 | 0.671926 | 0.52723 |
100
+ | dlite-v1-355m | 0.234642 | 0.507576 | 0.600306 | 0.338478 | 0.216 | 0.664309 | 0.496448 |
101
+ | gpt2-large | 0.216724 | 0.531566 | 0.604893 | 0.363971 | 0.194 | 0.703482 | 0.553275 |
102
+ | dlite-v1-774m | 0.250853 | 0.545875 | 0.614985 | 0.375124 | 0.218 | 0.698041 | 0.562747 |
103
+ | dlite-v2-774m | 0.269625 | 0.52904 | 0.613761 | 0.395937 | 0.256 | 0.691513 | 0.566693 |
104
+ | gpt2-xl | 0.25 | 0.582912 | 0.617737 | 0.400418 | 0.224 | 0.708379 | 0.583268 |
105
+ | dlite-v1-1_5b | 0.268771 | 0.588384 | 0.624159 | 0.401414 | 0.226 | 0.708379 | 0.584846 |
106
+ | dlite-v2-1_5b | 0.289249 | 0.565657 | 0.601223 | 0.434077 | 0.272 | 0.703482 | 0.588003 |