Text Generation
Transformers
PyTorch
Safetensors
gpt2
conversational
text-generation-inference
Inference Endpoints
leaderboard-pr-bot commited on
Commit
1d42a5c
1 Parent(s): 90a5293

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +122 -6
README.md CHANGED
@@ -1,16 +1,119 @@
1
  ---
2
- license: other
3
- datasets:
4
- - databricks/databricks-dolly-15k
5
- - laion/OIG
6
- - OpenAssistant/oasst1
7
  language:
8
  - da
9
  - sv
10
  - 'no'
11
  - en
12
  - is
 
 
 
 
 
13
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
  # Model description
16
  [AI Sweden](https://huggingface.co/AI-Sweden-Models/)
@@ -285,4 +388,17 @@ Following Mitchell et al. (2018), we provide a model card for GPT-SW3.
285
  - If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
286
  - Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
287
  - If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
288
- - Any other comments? No.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
2
  language:
3
  - da
4
  - sv
5
  - 'no'
6
  - en
7
  - is
8
+ license: other
9
+ datasets:
10
+ - databricks/databricks-dolly-15k
11
+ - laion/OIG
12
+ - OpenAssistant/oasst1
13
  pipeline_tag: text-generation
14
+ model-index:
15
+ - name: gpt-sw3-126m-instruct
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ name: Text Generation
20
+ dataset:
21
+ name: AI2 Reasoning Challenge (25-Shot)
22
+ type: ai2_arc
23
+ config: ARC-Challenge
24
+ split: test
25
+ args:
26
+ num_few_shot: 25
27
+ metrics:
28
+ - type: acc_norm
29
+ value: 23.38
30
+ name: normalized accuracy
31
+ source:
32
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-126m-instruct
33
+ name: Open LLM Leaderboard
34
+ - task:
35
+ type: text-generation
36
+ name: Text Generation
37
+ dataset:
38
+ name: HellaSwag (10-Shot)
39
+ type: hellaswag
40
+ split: validation
41
+ args:
42
+ num_few_shot: 10
43
+ metrics:
44
+ - type: acc_norm
45
+ value: 29.88
46
+ name: normalized accuracy
47
+ source:
48
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-126m-instruct
49
+ name: Open LLM Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: MMLU (5-Shot)
55
+ type: cais/mmlu
56
+ config: all
57
+ split: test
58
+ args:
59
+ num_few_shot: 5
60
+ metrics:
61
+ - type: acc
62
+ value: 23.78
63
+ name: accuracy
64
+ source:
65
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-126m-instruct
66
+ name: Open LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: TruthfulQA (0-shot)
72
+ type: truthful_qa
73
+ config: multiple_choice
74
+ split: validation
75
+ args:
76
+ num_few_shot: 0
77
+ metrics:
78
+ - type: mc2
79
+ value: 42.65
80
+ source:
81
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-126m-instruct
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: Winogrande (5-shot)
88
+ type: winogrande
89
+ config: winogrande_xl
90
+ split: validation
91
+ args:
92
+ num_few_shot: 5
93
+ metrics:
94
+ - type: acc
95
+ value: 48.54
96
+ name: accuracy
97
+ source:
98
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-126m-instruct
99
+ name: Open LLM Leaderboard
100
+ - task:
101
+ type: text-generation
102
+ name: Text Generation
103
+ dataset:
104
+ name: GSM8k (5-shot)
105
+ type: gsm8k
106
+ config: main
107
+ split: test
108
+ args:
109
+ num_few_shot: 5
110
+ metrics:
111
+ - type: acc
112
+ value: 0.99
113
+ name: accuracy
114
+ source:
115
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=AI-Sweden-Models/gpt-sw3-126m-instruct
116
+ name: Open LLM Leaderboard
117
  ---
118
  # Model description
119
  [AI Sweden](https://huggingface.co/AI-Sweden-Models/)
 
388
  - If the dataset relates to people, are there applicable limits on the retention of the data associated with the instances (e.g., were individuals in question told that their data would be retained for a fixed period of time and then deleted)? If so, please describe these limits and explain how they will be enforced. Read the privacy policy for the NLU initiative at AI Sweden [here](https://www.ai.se/en/privacy-policy-nlu).
389
  - Will older versions of the dataset continue to be supported/hosted/maintained? If so, please describe how. If not, please describe how its obsolescence will be communicated to users. N/A.
390
  - If others want to extend/augment/build on/contribute to the dataset, is there a mechanism for them to do so? If so, please provide a description. Will these contributions be validated/ verified? If so, please describe how. If not, why not? Is there a process for communicating/ distributing these contributions to other users? If so, please provide a description. Not at this time.
391
+ - Any other comments? No.
392
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
393
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_AI-Sweden-Models__gpt-sw3-126m-instruct)
394
+
395
+ | Metric |Value|
396
+ |---------------------------------|----:|
397
+ |Avg. |28.20|
398
+ |AI2 Reasoning Challenge (25-Shot)|23.38|
399
+ |HellaSwag (10-Shot) |29.88|
400
+ |MMLU (5-Shot) |23.78|
401
+ |TruthfulQA (0-shot) |42.65|
402
+ |Winogrande (5-shot) |48.54|
403
+ |GSM8k (5-shot) | 0.99|
404
+