Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (3dfe677698724f1faa8a8989b50d6bd4f5efd3e9)

Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +120 -4

README.md CHANGED Viewed

@@ -1,13 +1,116 @@
 ---
 license: llama2
 tags:
 - merge
 - mergekit
 - nsfw
 - not-for-all-audiences
-language:
-- en
-- ru
 ---
 ![logo-gembo.png](logo-gembo.png)
 This is my first "serious"(with practical use cases) experimental merge. Judge harshly. Mainly made for RP, but should be okay as an assistant. Turned out quite good, considering the amount of LORAs I merged into it.
@@ -94,4 +197,17 @@ Artefact2/Gembo-v1-70b-GGUF GGUF Q5_K_M, 4K context, Alpaca format:
 - ✅ Consistently acknowledged all data input with "OK".
 - ➖ Did NOT follow instructions to answer with just a single letter or more than just a single letter.
-This shows that this model can be used for real world use cases as an assistant.

 ---
+language:
+- en
+- ru
 license: llama2
 tags:
 - merge
 - mergekit
 - nsfw
 - not-for-all-audiences
+model-index:
+- name: Gembo-v1-70b
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 71.25
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 86.98
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 70.85
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 63.25
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 80.51
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 50.19
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ChuckMcSneed/Gembo-v1-70b
+      name: Open LLM Leaderboard
 ---
 ![logo-gembo.png](logo-gembo.png)
 This is my first "serious"(with practical use cases) experimental merge. Judge harshly. Mainly made for RP, but should be okay as an assistant. Turned out quite good, considering the amount of LORAs I merged into it.
 - ✅ Consistently acknowledged all data input with "OK".
 - ➖ Did NOT follow instructions to answer with just a single letter or more than just a single letter.
+This shows that this model can be used for real world use cases as an assistant.
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ChuckMcSneed__Gembo-v1-70b)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |70.51|
+|AI2 Reasoning Challenge (25-Shot)|71.25|
+|HellaSwag (10-Shot)              |86.98|
+|MMLU (5-Shot)                    |70.85|
+|TruthfulQA (0-shot)              |63.25|
+|Winogrande (5-shot)              |80.51|
+|GSM8k (5-shot)                   |50.19|