ChuckMcSneed
commited on
Commit
•
798b7fe
1
Parent(s):
eef189e
Update README.md
Browse files
README.md
CHANGED
@@ -75,6 +75,8 @@ Then I SLERP-merged it with cognitivecomputations/dolphin-2.2-70b (Needed to bri
|
|
75 |
| P | 5.25 |
|
76 |
| Total | 19.75 |
|
77 |
|
|
|
|
|
78 |
### Open LLM leaderboard
|
79 |
[Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
80 |
|Model |Average|ARC |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|
|
@@ -82,4 +84,14 @@ Then I SLERP-merged it with cognitivecomputations/dolphin-2.2-70b (Needed to bri
|
|
82 |
|ChuckMcSneed/Gembo-v1-70b |70.51 |71.25|86.98 |70.85|63.25 |80.51 |50.19|
|
83 |
|ChuckMcSneed/SMaxxxer-v1-70b |72.23 |70.65|88.02 |70.55|60.7 |82.87 |60.58|
|
84 |
|
85 |
-
Looks like adding a shitton of RP stuff decreased HellaSwag, WinoGrande and GSM8K, but increased TruthfulQA, MMLU and ARC. Interesting. To be hosnest, I'm a bit surprised that it didn't do that much worse.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
| P | 5.25 |
|
76 |
| Total | 19.75 |
|
77 |
|
78 |
+
Absurdly high. That's what happens when you optimize the merges for a benchmark.
|
79 |
+
|
80 |
### Open LLM leaderboard
|
81 |
[Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
82 |
|Model |Average|ARC |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|
|
|
|
84 |
|ChuckMcSneed/Gembo-v1-70b |70.51 |71.25|86.98 |70.85|63.25 |80.51 |50.19|
|
85 |
|ChuckMcSneed/SMaxxxer-v1-70b |72.23 |70.65|88.02 |70.55|60.7 |82.87 |60.58|
|
86 |
|
87 |
+
Looks like adding a shitton of RP stuff decreased HellaSwag, WinoGrande and GSM8K, but increased TruthfulQA, MMLU and ARC. Interesting. To be hosnest, I'm a bit surprised that it didn't do that much worse.
|
88 |
+
|
89 |
+
### WolframRavenwolf
|
90 |
+
Benchmark by [@wolfram](https://huggingface.co/wolfram)
|
91 |
+
|
92 |
+
Artefact2/Gembo-v1-70b-GGUF GGUF Q5_K_M, 4K context, Alpaca format:
|
93 |
+
- ✅ Gave correct answers to all 18/18 multiple choice questions! Just the questions, no previous information, gave correct answers: 16/18
|
94 |
+
- ✅ Consistently acknowledged all data input with "OK".
|
95 |
+
- ➖ Did NOT follow instructions to answer with just a single letter or more than just a single letter.
|
96 |
+
|
97 |
+
This shows that this model can be used for real world use cases as an assistant.
|