Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +190 -45

README.md CHANGED Viewed

@@ -1,58 +1,189 @@
 ---
-license: apache-2.0
 language:
-  - en
 tags:
-  - text-generation
-base_model: JackFram/llama-68m
 datasets:
-  - THUDM/webglm-qa
-  - databricks/databricks-dolly-15k
-  - cognitivecomputations/wizard_vicuna_70k_unfiltered
-  - totally-not-an-llm/EverythingLM-data-V3
-  - Amod/mental_health_counseling_conversations
-  - sablo/oasst2_curated
-  - starfishmedical/webGPT_x_dolly
-  - Open-Orca/OpenOrca
-  - mlabonne/chatml_dpo_pairs
 widget:
-  - text: |-
-      <|im_start|>system
-      You are a knowledgeable assistant. Help the user as much as you can.<|im_end|>
-      <|im_start|>user
-      How to become healthier?<|im_end|>
-      <|im_start|>assistant
-  - text: |-
-      <|im_start|>system
-      You are a career counselor. The user will provide you with an individual looking for guidance in their professional life, and your task is to assist them in determining what careers they are most suited for based on their skills, interests, and experience. You should also conduct research into the various options available, explain the job market trends in different industries, and advice on which qualifications would be beneficial for pursuing particular fields.<|im_end|>
-      <|im_start|>user
-      Heya!<|im_end|>
-      <|im_start|>assistant
-      Hi! How may I help you?<|im_end|>
-      <|im_start|>user
-      I am interested in developing a career in software engineering. What would you recommend me to do?<|im_end|>
-      <|im_start|>assistant
-  - text: |-
-      <|im_start|>system
-      You are a helpful assistant who provides concise responses.<|im_end|>
-      <|im_start|>user
-      Hi!<|im_end|>
-      <|im_start|>assistant
-      Hello there! How may I help you?<|im_end|>
-      <|im_start|>user
-      I need to build a simple website. Where should I start learning about web development?<|im_end|>
-      <|im_start|>assistant
-  - text: |-
-      <|im_start|>system
-      You are a very creative assistant. User will give you a task, which you should complete with all your knowledge.<|im_end|>
-      <|im_start|>user
-      Write the background story of an RPG game about wizards and dragons in a sci-fi world.<|im_end|>
-      <|im_start|>assistant
 inference:
   parameters:
     max_new_tokens: 64
     penalty_alpha: 0.5
     top_k: 4
 ---
 # A Llama Chat Model of 68M Parameters
@@ -88,3 +219,17 @@ inference:
 penalty_alpha: 0.5
 top_k: 4
 ```

 ---
 language:
+- en
+license: apache-2.0
 tags:
+- text-generation
 datasets:
+- THUDM/webglm-qa
+- databricks/databricks-dolly-15k
+- cognitivecomputations/wizard_vicuna_70k_unfiltered
+- totally-not-an-llm/EverythingLM-data-V3
+- Amod/mental_health_counseling_conversations
+- sablo/oasst2_curated
+- starfishmedical/webGPT_x_dolly
+- Open-Orca/OpenOrca
+- mlabonne/chatml_dpo_pairs
+base_model: JackFram/llama-68m
 widget:
+- text: '<|im_start|>system
+    You are a knowledgeable assistant. Help the user as much as you can.<|im_end|>
+    <|im_start|>user
+    How to become healthier?<|im_end|>
+    <|im_start|>assistant'
+- text: '<|im_start|>system
+    You are a career counselor. The user will provide you with an individual looking
+    for guidance in their professional life, and your task is to assist them in determining
+    what careers they are most suited for based on their skills, interests, and experience.
+    You should also conduct research into the various options available, explain the
+    job market trends in different industries, and advice on which qualifications
+    would be beneficial for pursuing particular fields.<|im_end|>
+    <|im_start|>user
+    Heya!<|im_end|>
+    <|im_start|>assistant
+    Hi! How may I help you?<|im_end|>
+    <|im_start|>user
+    I am interested in developing a career in software engineering. What would you
+    recommend me to do?<|im_end|>
+    <|im_start|>assistant'
+- text: '<|im_start|>system
+    You are a helpful assistant who provides concise responses.<|im_end|>
+    <|im_start|>user
+    Hi!<|im_end|>
+    <|im_start|>assistant
+    Hello there! How may I help you?<|im_end|>
+    <|im_start|>user
+    I need to build a simple website. Where should I start learning about web development?<|im_end|>
+    <|im_start|>assistant'
+- text: '<|im_start|>system
+    You are a very creative assistant. User will give you a task, which you should
+    complete with all your knowledge.<|im_end|>
+    <|im_start|>user
+    Write the background story of an RPG game about wizards and dragons in a sci-fi
+    world.<|im_end|>
+    <|im_start|>assistant'
 inference:
   parameters:
     max_new_tokens: 64
     penalty_alpha: 0.5
     top_k: 4
+model-index:
+- name: Llama-68M-Chat-v1
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 23.29
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 28.27
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 25.18
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 47.27
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 54.3
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 0.0
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
+      name: Open LLM Leaderboard
 ---
 # A Llama Chat Model of 68M Parameters
 penalty_alpha: 0.5
 top_k: 4
 ```
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Llama-68M-Chat-v1)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |29.72|
+|AI2 Reasoning Challenge (25-Shot)|23.29|
+|HellaSwag (10-Shot)              |28.27|
+|MMLU (5-Shot)                    |25.18|
+|TruthfulQA (0-shot)              |47.27|
+|Winogrande (5-shot)              |54.30|
+|GSM8k (5-shot)                   | 0.00|