DreadPoor commited on
Commit
5908d45
·
verified ·
1 Parent(s): 26050fd

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (593fc555723c9e578331cabd74cc42bf199cdfec)

Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -17,6 +17,105 @@ tags:
17
  - mergekit
18
  - merge
19
  license: llama3.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  ---
21
  # merge
22
 
@@ -53,4 +152,18 @@ normalize: false
53
  int8_mask: true
54
  dtype: bfloat16
55
 
56
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  - mergekit
18
  - merge
19
  license: llama3.1
20
+ model-index:
21
+ - name: Wannabe-8B-Model_Stock
22
+ results:
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: IFEval (0-Shot)
28
+ type: wis-k/instruction-following-eval
29
+ split: train
30
+ args:
31
+ num_few_shot: 0
32
+ metrics:
33
+ - type: inst_level_strict_acc and prompt_level_strict_acc
34
+ value: 72.05
35
+ name: averaged accuracy
36
+ source:
37
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FWannabe-8B-Model_Stock
38
+ name: Open LLM Leaderboard
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: BBH (3-Shot)
44
+ type: SaylorTwift/bbh
45
+ split: test
46
+ args:
47
+ num_few_shot: 3
48
+ metrics:
49
+ - type: acc_norm
50
+ value: 34.28
51
+ name: normalized accuracy
52
+ source:
53
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FWannabe-8B-Model_Stock
54
+ name: Open LLM Leaderboard
55
+ - task:
56
+ type: text-generation
57
+ name: Text Generation
58
+ dataset:
59
+ name: MATH Lvl 5 (4-Shot)
60
+ type: lighteval/MATH-Hard
61
+ split: test
62
+ args:
63
+ num_few_shot: 4
64
+ metrics:
65
+ - type: exact_match
66
+ value: 17.6
67
+ name: exact match
68
+ source:
69
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FWannabe-8B-Model_Stock
70
+ name: Open LLM Leaderboard
71
+ - task:
72
+ type: text-generation
73
+ name: Text Generation
74
+ dataset:
75
+ name: GPQA (0-shot)
76
+ type: Idavidrein/gpqa
77
+ split: train
78
+ args:
79
+ num_few_shot: 0
80
+ metrics:
81
+ - type: acc_norm
82
+ value: 6.82
83
+ name: acc_norm
84
+ source:
85
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FWannabe-8B-Model_Stock
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MuSR (0-shot)
92
+ type: TAUR-Lab/MuSR
93
+ args:
94
+ num_few_shot: 0
95
+ metrics:
96
+ - type: acc_norm
97
+ value: 12.32
98
+ name: acc_norm
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FWannabe-8B-Model_Stock
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: MMLU-PRO (5-shot)
107
+ type: TIGER-Lab/MMLU-Pro
108
+ config: main
109
+ split: test
110
+ args:
111
+ num_few_shot: 5
112
+ metrics:
113
+ - type: acc
114
+ value: 31.45
115
+ name: accuracy
116
+ source:
117
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/?search=DreadPoor%2FWannabe-8B-Model_Stock
118
+ name: Open LLM Leaderboard
119
  ---
120
  # merge
121
 
 
152
  int8_mask: true
153
  dtype: bfloat16
154
 
155
+ ```
156
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
157
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/DreadPoor__Wannabe-8B-Model_Stock-details)!
158
+ Summarized results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/contents/viewer/default/train?q=DreadPoor%2FWannabe-8B-Model_Stock&sort[column]=Average%20%E2%AC%86%EF%B8%8F&sort[direction]=desc)!
159
+
160
+ | Metric |Value (%)|
161
+ |-------------------|--------:|
162
+ |**Average** | 29.09|
163
+ |IFEval (0-Shot) | 72.05|
164
+ |BBH (3-Shot) | 34.28|
165
+ |MATH Lvl 5 (4-Shot)| 17.60|
166
+ |GPQA (0-shot) | 6.82|
167
+ |MuSR (0-shot) | 12.32|
168
+ |MMLU-PRO (5-shot) | 31.45|
169
+