Commit
•
7179ae7
1
Parent(s):
5f20240
Adding Evaluation Results (#3)
Browse files- Adding Evaluation Results (f4c0ed9c9b1810a470e146d8c930e65f40cad164)
Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>
README.md
CHANGED
@@ -1,23 +1,37 @@
|
|
1 |
---
|
|
|
|
|
|
|
2 |
tags:
|
3 |
- generated_from_trainer
|
4 |
-
license: mit
|
5 |
datasets:
|
6 |
- HuggingFaceH4/ultrachat_200k
|
7 |
- HuggingFaceH4/ultrafeedback_binarized
|
8 |
-
language:
|
9 |
-
- en
|
10 |
base_model: mistralai/Mistral-7B-v0.1
|
11 |
widget:
|
12 |
-
|
13 |
-
|
14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
pipeline_tag: text-generation
|
16 |
model-index:
|
17 |
- name: zephyr-7b-beta
|
18 |
results:
|
19 |
-
|
20 |
-
- task:
|
21 |
type: text-generation
|
22 |
name: Text Generation
|
23 |
dataset:
|
@@ -28,15 +42,16 @@ model-index:
|
|
28 |
args:
|
29 |
num_few_shot: 25
|
30 |
metrics:
|
31 |
-
|
32 |
-
|
33 |
-
|
|
|
|
|
|
|
34 |
source:
|
35 |
-
name: Open LLM Leaderboard
|
36 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
37 |
-
|
38 |
-
|
39 |
-
- task:
|
40 |
type: text-generation
|
41 |
name: Text Generation
|
42 |
dataset:
|
@@ -46,15 +61,16 @@ model-index:
|
|
46 |
args:
|
47 |
num_few_shot: 10
|
48 |
metrics:
|
49 |
-
|
50 |
-
|
51 |
-
|
|
|
|
|
|
|
52 |
source:
|
53 |
-
name: Open LLM Leaderboard
|
54 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
55 |
-
|
56 |
-
|
57 |
-
- task:
|
58 |
type: text-generation
|
59 |
name: Text Generation
|
60 |
dataset:
|
@@ -64,15 +80,13 @@ model-index:
|
|
64 |
args:
|
65 |
num_few_shot: 3
|
66 |
metrics:
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
source:
|
71 |
-
name: Open LLM Leaderboard
|
72 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
73 |
-
|
74 |
-
|
75 |
-
- task:
|
76 |
type: text-generation
|
77 |
name: Text Generation
|
78 |
dataset:
|
@@ -83,14 +97,14 @@ model-index:
|
|
83 |
args:
|
84 |
num_few_shot: 0
|
85 |
metrics:
|
86 |
-
|
87 |
-
|
|
|
|
|
88 |
source:
|
89 |
-
name: Open LLM Leaderboard
|
90 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
91 |
-
|
92 |
-
|
93 |
-
- task:
|
94 |
type: text-generation
|
95 |
name: Text Generation
|
96 |
dataset:
|
@@ -101,15 +115,16 @@ model-index:
|
|
101 |
args:
|
102 |
num_few_shot: 5
|
103 |
metrics:
|
104 |
-
|
105 |
-
|
106 |
-
|
|
|
|
|
|
|
107 |
source:
|
108 |
-
name: Open LLM Leaderboard
|
109 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
110 |
-
|
111 |
-
|
112 |
-
- task:
|
113 |
type: text-generation
|
114 |
name: Text Generation
|
115 |
dataset:
|
@@ -120,15 +135,16 @@ model-index:
|
|
120 |
args:
|
121 |
num_few_shot: 5
|
122 |
metrics:
|
123 |
-
|
124 |
-
|
125 |
-
|
|
|
|
|
|
|
126 |
source:
|
127 |
-
name: Open LLM Leaderboard
|
128 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
129 |
-
|
130 |
-
|
131 |
-
- task:
|
132 |
type: text-generation
|
133 |
name: Text Generation
|
134 |
dataset:
|
@@ -139,38 +155,37 @@ model-index:
|
|
139 |
args:
|
140 |
num_few_shot: 5
|
141 |
metrics:
|
142 |
-
|
143 |
-
|
144 |
-
|
|
|
|
|
|
|
145 |
source:
|
146 |
-
name: Open LLM Leaderboard
|
147 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
148 |
-
|
149 |
-
|
150 |
-
- task:
|
151 |
type: text-generation
|
152 |
name: Text Generation
|
153 |
dataset:
|
154 |
name: AlpacaEval
|
155 |
type: tatsu-lab/alpaca_eval
|
156 |
metrics:
|
157 |
-
|
158 |
-
|
159 |
-
|
160 |
source:
|
161 |
url: https://tatsu-lab.github.io/alpaca_eval/
|
162 |
-
|
163 |
-
# MT-Bench (taken from model card)
|
164 |
-
- task:
|
165 |
type: text-generation
|
166 |
name: Text Generation
|
167 |
dataset:
|
168 |
name: MT-Bench
|
169 |
type: unknown
|
170 |
metrics:
|
171 |
-
|
172 |
-
|
173 |
-
|
174 |
source:
|
175 |
url: https://huggingface.co/spaces/lmsys/mt-bench
|
176 |
---
|
@@ -407,4 +422,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
407 |
| TruthfulQA (0-shot) | 57.45 |
|
408 |
| Winogrande (5-shot) | 77.74 |
|
409 |
| GSM8K (5-shot) | 12.74 |
|
410 |
-
| DROP (3-shot) | 9.66 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
license: mit
|
5 |
tags:
|
6 |
- generated_from_trainer
|
|
|
7 |
datasets:
|
8 |
- HuggingFaceH4/ultrachat_200k
|
9 |
- HuggingFaceH4/ultrafeedback_binarized
|
|
|
|
|
10 |
base_model: mistralai/Mistral-7B-v0.1
|
11 |
widget:
|
12 |
+
- text: '<|system|>
|
13 |
+
|
14 |
+
You are a pirate chatbot who always responds with Arr!</s>
|
15 |
+
|
16 |
+
<|user|>
|
17 |
+
|
18 |
+
There''s a llama on my lawn, how can I get rid of him?</s>
|
19 |
+
|
20 |
+
<|assistant|>
|
21 |
+
|
22 |
+
'
|
23 |
+
output:
|
24 |
+
text: Arr! 'Tis a puzzlin' matter, me hearty! A llama on yer lawn be a rare sight,
|
25 |
+
but I've got a plan that might help ye get rid of 'im. Ye'll need to gather
|
26 |
+
some carrots and hay, and then lure the llama away with the promise of a tasty
|
27 |
+
treat. Once he's gone, ye can clean up yer lawn and enjoy the peace and quiet
|
28 |
+
once again. But beware, me hearty, for there may be more llamas where that one
|
29 |
+
came from! Arr!
|
30 |
pipeline_tag: text-generation
|
31 |
model-index:
|
32 |
- name: zephyr-7b-beta
|
33 |
results:
|
34 |
+
- task:
|
|
|
35 |
type: text-generation
|
36 |
name: Text Generation
|
37 |
dataset:
|
|
|
42 |
args:
|
43 |
num_few_shot: 25
|
44 |
metrics:
|
45 |
+
- type: acc_norm
|
46 |
+
value: 62.03071672354948
|
47 |
+
name: normalized accuracy
|
48 |
+
- type: acc_norm
|
49 |
+
value: 58.28
|
50 |
+
name: normalized accuracy
|
51 |
source:
|
|
|
52 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
53 |
+
name: Open LLM Leaderboard
|
54 |
+
- task:
|
|
|
55 |
type: text-generation
|
56 |
name: Text Generation
|
57 |
dataset:
|
|
|
61 |
args:
|
62 |
num_few_shot: 10
|
63 |
metrics:
|
64 |
+
- type: acc_norm
|
65 |
+
value: 84.35570603465445
|
66 |
+
name: normalized accuracy
|
67 |
+
- type: acc_norm
|
68 |
+
value: 81.0
|
69 |
+
name: normalized accuracy
|
70 |
source:
|
|
|
71 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
72 |
+
name: Open LLM Leaderboard
|
73 |
+
- task:
|
|
|
74 |
type: text-generation
|
75 |
name: Text Generation
|
76 |
dataset:
|
|
|
80 |
args:
|
81 |
num_few_shot: 3
|
82 |
metrics:
|
83 |
+
- type: f1
|
84 |
+
value: 9.66243708053691
|
85 |
+
name: f1 score
|
86 |
source:
|
|
|
87 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
88 |
+
name: Open LLM Leaderboard
|
89 |
+
- task:
|
|
|
90 |
type: text-generation
|
91 |
name: Text Generation
|
92 |
dataset:
|
|
|
97 |
args:
|
98 |
num_few_shot: 0
|
99 |
metrics:
|
100 |
+
- type: mc2
|
101 |
+
value: 57.44916942762855
|
102 |
+
- type: mc2
|
103 |
+
value: 46.1
|
104 |
source:
|
|
|
105 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
106 |
+
name: Open LLM Leaderboard
|
107 |
+
- task:
|
|
|
108 |
type: text-generation
|
109 |
name: Text Generation
|
110 |
dataset:
|
|
|
115 |
args:
|
116 |
num_few_shot: 5
|
117 |
metrics:
|
118 |
+
- type: acc
|
119 |
+
value: 12.736921910538287
|
120 |
+
name: accuracy
|
121 |
+
- type: acc
|
122 |
+
value: 13.04
|
123 |
+
name: accuracy
|
124 |
source:
|
|
|
125 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
126 |
+
name: Open LLM Leaderboard
|
127 |
+
- task:
|
|
|
128 |
type: text-generation
|
129 |
name: Text Generation
|
130 |
dataset:
|
|
|
135 |
args:
|
136 |
num_few_shot: 5
|
137 |
metrics:
|
138 |
+
- type: acc
|
139 |
+
value: 61.07
|
140 |
+
name: accuracy
|
141 |
+
- type: acc
|
142 |
+
value: 53.57
|
143 |
+
name: accuracy
|
144 |
source:
|
|
|
145 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
146 |
+
name: Open LLM Leaderboard
|
147 |
+
- task:
|
|
|
148 |
type: text-generation
|
149 |
name: Text Generation
|
150 |
dataset:
|
|
|
155 |
args:
|
156 |
num_few_shot: 5
|
157 |
metrics:
|
158 |
+
- type: acc
|
159 |
+
value: 77.7426992896606
|
160 |
+
name: accuracy
|
161 |
+
- type: acc
|
162 |
+
value: 74.74
|
163 |
+
name: accuracy
|
164 |
source:
|
|
|
165 |
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=HuggingFaceH4/zephyr-7b-beta
|
166 |
+
name: Open LLM Leaderboard
|
167 |
+
- task:
|
|
|
168 |
type: text-generation
|
169 |
name: Text Generation
|
170 |
dataset:
|
171 |
name: AlpacaEval
|
172 |
type: tatsu-lab/alpaca_eval
|
173 |
metrics:
|
174 |
+
- type: unknown
|
175 |
+
value: 0.906
|
176 |
+
name: win rate
|
177 |
source:
|
178 |
url: https://tatsu-lab.github.io/alpaca_eval/
|
179 |
+
- task:
|
|
|
|
|
180 |
type: text-generation
|
181 |
name: Text Generation
|
182 |
dataset:
|
183 |
name: MT-Bench
|
184 |
type: unknown
|
185 |
metrics:
|
186 |
+
- type: unknown
|
187 |
+
value: 7.34
|
188 |
+
name: score
|
189 |
source:
|
190 |
url: https://huggingface.co/spaces/lmsys/mt-bench
|
191 |
---
|
|
|
422 |
| TruthfulQA (0-shot) | 57.45 |
|
423 |
| Winogrande (5-shot) | 77.74 |
|
424 |
| GSM8K (5-shot) | 12.74 |
|
425 |
+
| DROP (3-shot) | 9.66 |
|
426 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
427 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_CallComply__zephyr-7b-beta-128k)
|
428 |
+
|
429 |
+
| Metric |Value|
|
430 |
+
|---------------------------------|----:|
|
431 |
+
|Avg. |54.45|
|
432 |
+
|AI2 Reasoning Challenge (25-Shot)|58.28|
|
433 |
+
|HellaSwag (10-Shot) |81.00|
|
434 |
+
|MMLU (5-Shot) |53.57|
|
435 |
+
|TruthfulQA (0-shot) |46.10|
|
436 |
+
|Winogrande (5-shot) |74.74|
|
437 |
+
|GSM8k (5-shot) |13.04|
|
438 |
+
|