leaderboard-pr-bot commited on
Commit
bbfabe1
1 Parent(s): f1b293c

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +190 -45
README.md CHANGED
@@ -1,58 +1,189 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
- - en
 
5
  tags:
6
- - text-generation
7
- base_model: JackFram/llama-68m
8
  datasets:
9
- - THUDM/webglm-qa
10
- - databricks/databricks-dolly-15k
11
- - cognitivecomputations/wizard_vicuna_70k_unfiltered
12
- - totally-not-an-llm/EverythingLM-data-V3
13
- - Amod/mental_health_counseling_conversations
14
- - sablo/oasst2_curated
15
- - starfishmedical/webGPT_x_dolly
16
- - Open-Orca/OpenOrca
17
- - mlabonne/chatml_dpo_pairs
 
18
  widget:
19
- - text: |-
20
- <|im_start|>system
21
- You are a knowledgeable assistant. Help the user as much as you can.<|im_end|>
22
- <|im_start|>user
23
- How to become healthier?<|im_end|>
24
- <|im_start|>assistant
25
- - text: |-
26
- <|im_start|>system
27
- You are a career counselor. The user will provide you with an individual looking for guidance in their professional life, and your task is to assist them in determining what careers they are most suited for based on their skills, interests, and experience. You should also conduct research into the various options available, explain the job market trends in different industries, and advice on which qualifications would be beneficial for pursuing particular fields.<|im_end|>
28
- <|im_start|>user
29
- Heya!<|im_end|>
30
- <|im_start|>assistant
31
- Hi! How may I help you?<|im_end|>
32
- <|im_start|>user
33
- I am interested in developing a career in software engineering. What would you recommend me to do?<|im_end|>
34
- <|im_start|>assistant
35
- - text: |-
36
- <|im_start|>system
37
- You are a helpful assistant who provides concise responses.<|im_end|>
38
- <|im_start|>user
39
- Hi!<|im_end|>
40
- <|im_start|>assistant
41
- Hello there! How may I help you?<|im_end|>
42
- <|im_start|>user
43
- I need to build a simple website. Where should I start learning about web development?<|im_end|>
44
- <|im_start|>assistant
45
- - text: |-
46
- <|im_start|>system
47
- You are a very creative assistant. User will give you a task, which you should complete with all your knowledge.<|im_end|>
48
- <|im_start|>user
49
- Write the background story of an RPG game about wizards and dragons in a sci-fi world.<|im_end|>
50
- <|im_start|>assistant
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51
  inference:
52
  parameters:
53
  max_new_tokens: 64
54
  penalty_alpha: 0.5
55
  top_k: 4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ---
57
 
58
  # A Llama Chat Model of 68M Parameters
@@ -88,3 +219,17 @@ inference:
88
  penalty_alpha: 0.5
89
  top_k: 4
90
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
+ - en
4
+ license: apache-2.0
5
  tags:
6
+ - text-generation
 
7
  datasets:
8
+ - THUDM/webglm-qa
9
+ - databricks/databricks-dolly-15k
10
+ - cognitivecomputations/wizard_vicuna_70k_unfiltered
11
+ - totally-not-an-llm/EverythingLM-data-V3
12
+ - Amod/mental_health_counseling_conversations
13
+ - sablo/oasst2_curated
14
+ - starfishmedical/webGPT_x_dolly
15
+ - Open-Orca/OpenOrca
16
+ - mlabonne/chatml_dpo_pairs
17
+ base_model: JackFram/llama-68m
18
  widget:
19
+ - text: '<|im_start|>system
20
+
21
+ You are a knowledgeable assistant. Help the user as much as you can.<|im_end|>
22
+
23
+ <|im_start|>user
24
+
25
+ How to become healthier?<|im_end|>
26
+
27
+ <|im_start|>assistant'
28
+ - text: '<|im_start|>system
29
+
30
+ You are a career counselor. The user will provide you with an individual looking
31
+ for guidance in their professional life, and your task is to assist them in determining
32
+ what careers they are most suited for based on their skills, interests, and experience.
33
+ You should also conduct research into the various options available, explain the
34
+ job market trends in different industries, and advice on which qualifications
35
+ would be beneficial for pursuing particular fields.<|im_end|>
36
+
37
+ <|im_start|>user
38
+
39
+ Heya!<|im_end|>
40
+
41
+ <|im_start|>assistant
42
+
43
+ Hi! How may I help you?<|im_end|>
44
+
45
+ <|im_start|>user
46
+
47
+ I am interested in developing a career in software engineering. What would you
48
+ recommend me to do?<|im_end|>
49
+
50
+ <|im_start|>assistant'
51
+ - text: '<|im_start|>system
52
+
53
+ You are a helpful assistant who provides concise responses.<|im_end|>
54
+
55
+ <|im_start|>user
56
+
57
+ Hi!<|im_end|>
58
+
59
+ <|im_start|>assistant
60
+
61
+ Hello there! How may I help you?<|im_end|>
62
+
63
+ <|im_start|>user
64
+
65
+ I need to build a simple website. Where should I start learning about web development?<|im_end|>
66
+
67
+ <|im_start|>assistant'
68
+ - text: '<|im_start|>system
69
+
70
+ You are a very creative assistant. User will give you a task, which you should
71
+ complete with all your knowledge.<|im_end|>
72
+
73
+ <|im_start|>user
74
+
75
+ Write the background story of an RPG game about wizards and dragons in a sci-fi
76
+ world.<|im_end|>
77
+
78
+ <|im_start|>assistant'
79
  inference:
80
  parameters:
81
  max_new_tokens: 64
82
  penalty_alpha: 0.5
83
  top_k: 4
84
+ model-index:
85
+ - name: Llama-68M-Chat-v1
86
+ results:
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: AI2 Reasoning Challenge (25-Shot)
92
+ type: ai2_arc
93
+ config: ARC-Challenge
94
+ split: test
95
+ args:
96
+ num_few_shot: 25
97
+ metrics:
98
+ - type: acc_norm
99
+ value: 23.29
100
+ name: normalized accuracy
101
+ source:
102
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
103
+ name: Open LLM Leaderboard
104
+ - task:
105
+ type: text-generation
106
+ name: Text Generation
107
+ dataset:
108
+ name: HellaSwag (10-Shot)
109
+ type: hellaswag
110
+ split: validation
111
+ args:
112
+ num_few_shot: 10
113
+ metrics:
114
+ - type: acc_norm
115
+ value: 28.27
116
+ name: normalized accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
119
+ name: Open LLM Leaderboard
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: MMLU (5-Shot)
125
+ type: cais/mmlu
126
+ config: all
127
+ split: test
128
+ args:
129
+ num_few_shot: 5
130
+ metrics:
131
+ - type: acc
132
+ value: 25.18
133
+ name: accuracy
134
+ source:
135
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
136
+ name: Open LLM Leaderboard
137
+ - task:
138
+ type: text-generation
139
+ name: Text Generation
140
+ dataset:
141
+ name: TruthfulQA (0-shot)
142
+ type: truthful_qa
143
+ config: multiple_choice
144
+ split: validation
145
+ args:
146
+ num_few_shot: 0
147
+ metrics:
148
+ - type: mc2
149
+ value: 47.27
150
+ source:
151
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
152
+ name: Open LLM Leaderboard
153
+ - task:
154
+ type: text-generation
155
+ name: Text Generation
156
+ dataset:
157
+ name: Winogrande (5-shot)
158
+ type: winogrande
159
+ config: winogrande_xl
160
+ split: validation
161
+ args:
162
+ num_few_shot: 5
163
+ metrics:
164
+ - type: acc
165
+ value: 54.3
166
+ name: accuracy
167
+ source:
168
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
169
+ name: Open LLM Leaderboard
170
+ - task:
171
+ type: text-generation
172
+ name: Text Generation
173
+ dataset:
174
+ name: GSM8k (5-shot)
175
+ type: gsm8k
176
+ config: main
177
+ split: test
178
+ args:
179
+ num_few_shot: 5
180
+ metrics:
181
+ - type: acc
182
+ value: 0.0
183
+ name: accuracy
184
+ source:
185
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
186
+ name: Open LLM Leaderboard
187
  ---
188
 
189
  # A Llama Chat Model of 68M Parameters
 
219
  penalty_alpha: 0.5
220
  top_k: 4
221
  ```
222
+
223
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
224
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Llama-68M-Chat-v1)
225
+
226
+ | Metric |Value|
227
+ |---------------------------------|----:|
228
+ |Avg. |29.72|
229
+ |AI2 Reasoning Challenge (25-Shot)|23.29|
230
+ |HellaSwag (10-Shot) |28.27|
231
+ |MMLU (5-Shot) |25.18|
232
+ |TruthfulQA (0-shot) |47.27|
233
+ |Winogrande (5-shot) |54.30|
234
+ |GSM8k (5-shot) | 0.00|
235
+