Commit
•
3dfe677
1
Parent(s):
798b7fe
Adding Evaluation Results
Browse filesThis is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr
The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.
If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions
README.md
CHANGED
@@ -1,13 +1,116 @@
|
|
1 |
---
|
|
|
|
|
|
|
2 |
license: llama2
|
3 |
tags:
|
4 |
- merge
|
5 |
- mergekit
|
6 |
- nsfw
|
7 |
- not-for-all-audiences
|
8 |
-
|
9 |
-
-
|
10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
---
|
12 |

|
13 |
This is my first "serious"(with practical use cases) experimental merge. Judge harshly. Mainly made for RP, but should be okay as an assistant. Turned out quite good, considering the amount of LORAs I merged into it.
|
@@ -94,4 +197,17 @@ Artefact2/Gembo-v1-70b-GGUF GGUF Q5_K_M, 4K context, Alpaca format:
|
|
94 |
- ✅ Consistently acknowledged all data input with "OK".
|
95 |
- âž– Did NOT follow instructions to answer with just a single letter or more than just a single letter.
|
96 |
|
97 |
-
This shows that this model can be used for real world use cases as an assistant.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- ru
|
5 |
license: llama2
|
6 |
tags:
|
7 |
- merge
|
8 |
- mergekit
|
9 |
- nsfw
|
10 |
- not-for-all-audiences
|
11 |
+
model-index:
|
12 |
+
- name: Gembo-v1-70b
|
13 |
+
results:
|
14 |
+
- task:
|
15 |
+
type: text-generation
|
16 |
+
name: Text Generation
|
17 |
+
dataset:
|
18 |
+
name: AI2 Reasoning Challenge (25-Shot)
|
19 |
+
type: ai2_arc
|
20 |
+
config: ARC-Challenge
|
21 |
+
split: test
|
22 |
+
args:
|
23 |
+
num_few_shot: 25
|
24 |
+
metrics:
|
25 |
+
- type: acc_norm
|
26 |
+
value: 71.25
|
27 |
+
name: normalized accuracy
|
28 |
+
source:
|
29 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
|
30 |
+
name: Open LLM Leaderboard
|
31 |
+
- task:
|
32 |
+
type: text-generation
|
33 |
+
name: Text Generation
|
34 |
+
dataset:
|
35 |
+
name: HellaSwag (10-Shot)
|
36 |
+
type: hellaswag
|
37 |
+
split: validation
|
38 |
+
args:
|
39 |
+
num_few_shot: 10
|
40 |
+
metrics:
|
41 |
+
- type: acc_norm
|
42 |
+
value: 86.98
|
43 |
+
name: normalized accuracy
|
44 |
+
source:
|
45 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
|
46 |
+
name: Open LLM Leaderboard
|
47 |
+
- task:
|
48 |
+
type: text-generation
|
49 |
+
name: Text Generation
|
50 |
+
dataset:
|
51 |
+
name: MMLU (5-Shot)
|
52 |
+
type: cais/mmlu
|
53 |
+
config: all
|
54 |
+
split: test
|
55 |
+
args:
|
56 |
+
num_few_shot: 5
|
57 |
+
metrics:
|
58 |
+
- type: acc
|
59 |
+
value: 70.85
|
60 |
+
name: accuracy
|
61 |
+
source:
|
62 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
|
63 |
+
name: Open LLM Leaderboard
|
64 |
+
- task:
|
65 |
+
type: text-generation
|
66 |
+
name: Text Generation
|
67 |
+
dataset:
|
68 |
+
name: TruthfulQA (0-shot)
|
69 |
+
type: truthful_qa
|
70 |
+
config: multiple_choice
|
71 |
+
split: validation
|
72 |
+
args:
|
73 |
+
num_few_shot: 0
|
74 |
+
metrics:
|
75 |
+
- type: mc2
|
76 |
+
value: 63.25
|
77 |
+
source:
|
78 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
|
79 |
+
name: Open LLM Leaderboard
|
80 |
+
- task:
|
81 |
+
type: text-generation
|
82 |
+
name: Text Generation
|
83 |
+
dataset:
|
84 |
+
name: Winogrande (5-shot)
|
85 |
+
type: winogrande
|
86 |
+
config: winogrande_xl
|
87 |
+
split: validation
|
88 |
+
args:
|
89 |
+
num_few_shot: 5
|
90 |
+
metrics:
|
91 |
+
- type: acc
|
92 |
+
value: 80.51
|
93 |
+
name: accuracy
|
94 |
+
source:
|
95 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
|
96 |
+
name: Open LLM Leaderboard
|
97 |
+
- task:
|
98 |
+
type: text-generation
|
99 |
+
name: Text Generation
|
100 |
+
dataset:
|
101 |
+
name: GSM8k (5-shot)
|
102 |
+
type: gsm8k
|
103 |
+
config: main
|
104 |
+
split: test
|
105 |
+
args:
|
106 |
+
num_few_shot: 5
|
107 |
+
metrics:
|
108 |
+
- type: acc
|
109 |
+
value: 50.19
|
110 |
+
name: accuracy
|
111 |
+
source:
|
112 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
|
113 |
+
name: Open LLM Leaderboard
|
114 |
---
|
115 |

|
116 |
This is my first "serious"(with practical use cases) experimental merge. Judge harshly. Mainly made for RP, but should be okay as an assistant. Turned out quite good, considering the amount of LORAs I merged into it.
|
|
|
197 |
- ✅ Consistently acknowledged all data input with "OK".
|
198 |
- âž– Did NOT follow instructions to answer with just a single letter or more than just a single letter.
|
199 |
|
200 |
+
This shows that this model can be used for real world use cases as an assistant.
|
201 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
202 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ChuckMcSneed__Gembo-v1-70b)
|
203 |
+
|
204 |
+
| Metric |Value|
|
205 |
+
|---------------------------------|----:|
|
206 |
+
|Avg. |70.51|
|
207 |
+
|AI2 Reasoning Challenge (25-Shot)|71.25|
|
208 |
+
|HellaSwag (10-Shot) |86.98|
|
209 |
+
|MMLU (5-Shot) |70.85|
|
210 |
+
|TruthfulQA (0-shot) |63.25|
|
211 |
+
|Winogrande (5-shot) |80.51|
|
212 |
+
|GSM8k (5-shot) |50.19|
|
213 |
+
|