leaderboard-pr-bot commited on
Commit
b2740e1
1 Parent(s): 1336e20

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  language:
3
  - en
4
- pipeline_tag: text-generation
5
  tags:
6
  - shining-valiant
7
  - shining-valiant-2
@@ -30,8 +30,103 @@ base_model: meta-llama/Llama-3.2-3B-Instruct
30
  datasets:
31
  - sequelbox/Celestia
32
  - sequelbox/Supernova
 
33
  model_type: llama
34
- license: llama3.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ---
36
 
37
 
@@ -106,4 +201,17 @@ Shining Valiant 2 is created by [Valiant Labs.](http://valiantlabs.ca/)
106
  We care about open source.
107
  For everyone to use.
108
 
109
- We encourage others to finetune further from our models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
4
+ license: llama3.2
5
  tags:
6
  - shining-valiant
7
  - shining-valiant-2
 
30
  datasets:
31
  - sequelbox/Celestia
32
  - sequelbox/Supernova
33
+ pipeline_tag: text-generation
34
  model_type: llama
35
+ model-index:
36
+ - name: Llama3.2-3B-ShiningValiant2
37
+ results:
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: IFEval (0-Shot)
43
+ type: HuggingFaceH4/ifeval
44
+ args:
45
+ num_few_shot: 0
46
+ metrics:
47
+ - type: inst_level_strict_acc and prompt_level_strict_acc
48
+ value: 49.12
49
+ name: strict accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ValiantLabs/Llama3.2-3B-ShiningValiant2
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: BBH (3-Shot)
58
+ type: BBH
59
+ args:
60
+ num_few_shot: 3
61
+ metrics:
62
+ - type: acc_norm
63
+ value: 19.03
64
+ name: normalized accuracy
65
+ source:
66
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ValiantLabs/Llama3.2-3B-ShiningValiant2
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: MATH Lvl 5 (4-Shot)
73
+ type: hendrycks/competition_math
74
+ args:
75
+ num_few_shot: 4
76
+ metrics:
77
+ - type: exact_match
78
+ value: 9.52
79
+ name: exact match
80
+ source:
81
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ValiantLabs/Llama3.2-3B-ShiningValiant2
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: GPQA (0-shot)
88
+ type: Idavidrein/gpqa
89
+ args:
90
+ num_few_shot: 0
91
+ metrics:
92
+ - type: acc_norm
93
+ value: 3.02
94
+ name: acc_norm
95
+ source:
96
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ValiantLabs/Llama3.2-3B-ShiningValiant2
97
+ name: Open LLM Leaderboard
98
+ - task:
99
+ type: text-generation
100
+ name: Text Generation
101
+ dataset:
102
+ name: MuSR (0-shot)
103
+ type: TAUR-Lab/MuSR
104
+ args:
105
+ num_few_shot: 0
106
+ metrics:
107
+ - type: acc_norm
108
+ value: 4.72
109
+ name: acc_norm
110
+ source:
111
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ValiantLabs/Llama3.2-3B-ShiningValiant2
112
+ name: Open LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: MMLU-PRO (5-shot)
118
+ type: TIGER-Lab/MMLU-Pro
119
+ config: main
120
+ split: test
121
+ args:
122
+ num_few_shot: 5
123
+ metrics:
124
+ - type: acc
125
+ value: 19.09
126
+ name: accuracy
127
+ source:
128
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=ValiantLabs/Llama3.2-3B-ShiningValiant2
129
+ name: Open LLM Leaderboard
130
  ---
131
 
132
 
 
201
  We care about open source.
202
  For everyone to use.
203
 
204
+ We encourage others to finetune further from our models.
205
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
206
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ValiantLabs__Llama3.2-3B-ShiningValiant2)
207
+
208
+ | Metric |Value|
209
+ |-------------------|----:|
210
+ |Avg. |17.42|
211
+ |IFEval (0-Shot) |49.12|
212
+ |BBH (3-Shot) |19.03|
213
+ |MATH Lvl 5 (4-Shot)| 9.52|
214
+ |GPQA (0-shot) | 3.02|
215
+ |MuSR (0-shot) | 4.72|
216
+ |MMLU-PRO (5-shot) |19.09|
217
+