ChuckMcSneed
/

Gembo-v1-70b

Text Generation

nsfw

Not-For-All-Audiences

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ChuckMcSneed commited on Mar 4

Commit

195a19c

•

1 Parent(s): cb2b48a

Update README.md

Files changed (1) hide show

README.md +10 -9

README.md CHANGED Viewed

@@ -180,14 +180,6 @@ Then I SLERP-merged it with cognitivecomputations/dolphin-2.2-70b (Needed to bri
 Absurdly high. That's what happens when you optimize the merges for a benchmark.
-### Open LLM leaderboard
-[Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
-|Model                           |Average|ARC  |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|
-|--------------------------------|-------|-----|---------|-----|----------|----------|-----|
-|ChuckMcSneed/Gembo-v1-70b       |70.51  |71.25|86.98    |70.85|63.25     |80.51     |50.19|
-|ChuckMcSneed/SMaxxxer-v1-70b    |72.23  |70.65|88.02    |70.55|60.7      |82.87     |60.58|
-Looks like adding a shitton of RP stuff decreased HellaSwag, WinoGrande and GSM8K, but increased TruthfulQA, MMLU and ARC. Interesting. To be hosnest, I'm a bit surprised that it didn't do that much worse.
 ### WolframRavenwolf
 Benchmark by [@wolfram](https://huggingface.co/wolfram)
@@ -198,7 +190,16 @@ Artefact2/Gembo-v1-70b-GGUF GGUF Q5_K_M, 4K context, Alpaca format:
 - ➖ Did NOT follow instructions to answer with just a single letter or more than just a single letter.
 This shows that this model can be used for real world use cases as an assistant.
-# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ChuckMcSneed__Gembo-v1-70b)
 |             Metric              |Value|

 Absurdly high. That's what happens when you optimize the merges for a benchmark.
 ### WolframRavenwolf
 Benchmark by [@wolfram](https://huggingface.co/wolfram)
 - ➖ Did NOT follow instructions to answer with just a single letter or more than just a single letter.
 This shows that this model can be used for real world use cases as an assistant.
+### [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+[Leaderboard on Huggingface](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+|Model                           |Average|ARC  |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|
+|--------------------------------|-------|-----|---------|-----|----------|----------|-----|
+|ChuckMcSneed/Gembo-v1-70b       |70.51  |71.25|86.98    |70.85|63.25     |80.51     |50.19|
+|ChuckMcSneed/SMaxxxer-v1-70b    |72.23  |70.65|88.02    |70.55|60.7      |82.87     |60.58|
+Looks like adding a shitton of RP stuff decreased HellaSwag, WinoGrande and GSM8K, but increased TruthfulQA, MMLU and ARC. Interesting. To be hosnest, I'm a bit surprised that it didn't do that much worse.
 Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ChuckMcSneed__Gembo-v1-70b)
 |             Metric              |Value|