Adding Evaluation Results

#1
Files changed (1) hide show
  1. README.md +155 -47
README.md CHANGED
@@ -1,33 +1,4 @@
1
  ---
2
- base_model:
3
- - rvv-karma/BASH-Coder-Mistral-7B
4
- - Locutusque/Hercules-3.1-Mistral-7B
5
- - KoboldAI/Mistral-7B-Erebus-v3
6
- - Locutusque/Hyperion-2.1-Mistral-7B
7
- - Severian/Nexus-IKM-Mistral-7B-Pytorch
8
- - NousResearch/Hermes-2-Pro-Mistral-7B
9
- - mistralai/Mistral-7B-Instruct-v0.2
10
- - Nitral-AI/ProdigyXBioMistral_7B
11
- - Nitral-AI/Infinite-Mika-7b
12
- - Nous-Yarn-Mistral-7b-128k
13
- - yanismiraoui/Yarn-Mistral-7b-128k-sharded
14
- - LeroyDyer/LCARS_TOP_SCORE
15
- - LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0
16
- - LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
17
- - LeroyDyer/LCARS_AI_StarTrek_Computer
18
- - LeroyDyer/_Spydaz_Web_AI_ActionQA_Project
19
- - LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project
20
- - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned
21
- - LeroyDyer/SpyazWeb_AI_DeepMind_Project
22
- - LeroyDyer/SpydazWeb_AI_Swahili_Project
23
- - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project
24
- - LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project
25
- - LeroyDyer/QuietStar_Project
26
- - LeroyDyer/Mixtral_BioMedical_7b
27
- - LeroyDyer/Mixtral_AI_CyberTron_Coder
28
- - LeroyDyer/_Spydaz_Web_AI_BIBLE_002
29
- - LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project
30
- - LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project
31
  language:
32
  - en
33
  - sw
@@ -44,21 +15,7 @@ language:
44
  - bm
45
  - su
46
  license: apache-2.0
47
- datasets:
48
- - neoneye/base64-decode-v2
49
- - neoneye/base64-encode-v1
50
- - VuongQuoc/Chemistry_text_to_image
51
- - Kamizuru00/diagram_image_to_text
52
- - LeroyDyer/Chemistry_text_to_image_BASE64
53
- - LeroyDyer/AudioCaps-Spectrograms_to_Base64
54
- - LeroyDyer/winogroud_text_to_imaget_BASE64
55
- - LeroyDyer/chart_text_to_Base64
56
- - LeroyDyer/diagram_image_to_text_BASE64
57
- - mekaneeky/salt_m2e_15_3_instruction
58
- - mekaneeky/SALT-languages-bible
59
- - xz56/react-llama
60
- - BeIR/hotpotqa
61
- - arcee-ai/agent-data
62
  tags:
63
  - RolePlay
64
  - Role-Play-Pro
@@ -120,8 +77,50 @@ tags:
120
  - Text-Spectrogram
121
  - Mel-Text
122
  - Text-Mel
123
- pipeline_tag: text-generation
124
- library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  metrics:
126
  - accuracy
127
  - bertscore
@@ -129,6 +128,102 @@ metrics:
129
  - bleurt
130
  - brier_score
131
  - cer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
132
  ---
133
 
134
  BASE MODEL :
@@ -407,4 +502,17 @@ It most definatly did create graphs ad recognize some images , but i will need t
407
  so I will find some other insteresting datasets based around the task of image detection and generation as well as image segmentation and mask etc :
408
  It only the conversion of the dataset to include these base64 representations :
409
  Image training is quite slow ! ( i was able to create the trainer settings to perform 5000 samples in a single step , but it was still very slow for the step. ( google colab ))
410
- So If i Tap into the A100 I will Do a few 1000( sample ) Steps : also using this prompt ! as well as task training we are also prompt tuning ! by installing many repetitions of the same prompt ... hopefully removing traces of ..."your a helpful AI"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  language:
3
  - en
4
  - sw
 
15
  - bm
16
  - su
17
  license: apache-2.0
18
+ library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  tags:
20
  - RolePlay
21
  - Role-Play-Pro
 
77
  - Text-Spectrogram
78
  - Mel-Text
79
  - Text-Mel
80
+ base_model:
81
+ - rvv-karma/BASH-Coder-Mistral-7B
82
+ - Locutusque/Hercules-3.1-Mistral-7B
83
+ - KoboldAI/Mistral-7B-Erebus-v3
84
+ - Locutusque/Hyperion-2.1-Mistral-7B
85
+ - Severian/Nexus-IKM-Mistral-7B-Pytorch
86
+ - NousResearch/Hermes-2-Pro-Mistral-7B
87
+ - mistralai/Mistral-7B-Instruct-v0.2
88
+ - Nitral-AI/ProdigyXBioMistral_7B
89
+ - Nitral-AI/Infinite-Mika-7b
90
+ - Nous-Yarn-Mistral-7b-128k
91
+ - yanismiraoui/Yarn-Mistral-7b-128k-sharded
92
+ - LeroyDyer/LCARS_TOP_SCORE
93
+ - LeroyDyer/Mixtral_AI_Cyber_Matrix_2_0
94
+ - LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
95
+ - LeroyDyer/LCARS_AI_StarTrek_Computer
96
+ - LeroyDyer/_Spydaz_Web_AI_ActionQA_Project
97
+ - LeroyDyer/_Spydaz_Web_AI_ChatML_512K_Project
98
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project_UltraFineTuned
99
+ - LeroyDyer/SpyazWeb_AI_DeepMind_Project
100
+ - LeroyDyer/SpydazWeb_AI_Swahili_Project
101
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_ReAct_Project
102
+ - LeroyDyer/_Spydaz_Web_AI_MistralStar_001_Project
103
+ - LeroyDyer/QuietStar_Project
104
+ - LeroyDyer/Mixtral_BioMedical_7b
105
+ - LeroyDyer/Mixtral_AI_CyberTron_Coder
106
+ - LeroyDyer/_Spydaz_Web_AI_BIBLE_002
107
+ - LeroyDyer/_Spydaz_Web_AI_ChatQA_Reasoning101_Project
108
+ - LeroyDyer/SpydazWeb_AI_Text_AudioVision_Project
109
+ datasets:
110
+ - neoneye/base64-decode-v2
111
+ - neoneye/base64-encode-v1
112
+ - VuongQuoc/Chemistry_text_to_image
113
+ - Kamizuru00/diagram_image_to_text
114
+ - LeroyDyer/Chemistry_text_to_image_BASE64
115
+ - LeroyDyer/AudioCaps-Spectrograms_to_Base64
116
+ - LeroyDyer/winogroud_text_to_imaget_BASE64
117
+ - LeroyDyer/chart_text_to_Base64
118
+ - LeroyDyer/diagram_image_to_text_BASE64
119
+ - mekaneeky/salt_m2e_15_3_instruction
120
+ - mekaneeky/SALT-languages-bible
121
+ - xz56/react-llama
122
+ - BeIR/hotpotqa
123
+ - arcee-ai/agent-data
124
  metrics:
125
  - accuracy
126
  - bertscore
 
128
  - bleurt
129
  - brier_score
130
  - cer
131
+ pipeline_tag: text-generation
132
+ model-index:
133
+ - name: SpydazWeb_AI_HumanAI_RP
134
+ results:
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: IFEval (0-Shot)
140
+ type: HuggingFaceH4/ifeval
141
+ args:
142
+ num_few_shot: 0
143
+ metrics:
144
+ - type: inst_level_strict_acc and prompt_level_strict_acc
145
+ value: 25.41
146
+ name: strict accuracy
147
+ source:
148
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_HumanAI_RP
149
+ name: Open LLM Leaderboard
150
+ - task:
151
+ type: text-generation
152
+ name: Text Generation
153
+ dataset:
154
+ name: BBH (3-Shot)
155
+ type: BBH
156
+ args:
157
+ num_few_shot: 3
158
+ metrics:
159
+ - type: acc_norm
160
+ value: 7.18
161
+ name: normalized accuracy
162
+ source:
163
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_HumanAI_RP
164
+ name: Open LLM Leaderboard
165
+ - task:
166
+ type: text-generation
167
+ name: Text Generation
168
+ dataset:
169
+ name: MATH Lvl 5 (4-Shot)
170
+ type: hendrycks/competition_math
171
+ args:
172
+ num_few_shot: 4
173
+ metrics:
174
+ - type: exact_match
175
+ value: 1.28
176
+ name: exact match
177
+ source:
178
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_HumanAI_RP
179
+ name: Open LLM Leaderboard
180
+ - task:
181
+ type: text-generation
182
+ name: Text Generation
183
+ dataset:
184
+ name: GPQA (0-shot)
185
+ type: Idavidrein/gpqa
186
+ args:
187
+ num_few_shot: 0
188
+ metrics:
189
+ - type: acc_norm
190
+ value: 3.36
191
+ name: acc_norm
192
+ source:
193
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_HumanAI_RP
194
+ name: Open LLM Leaderboard
195
+ - task:
196
+ type: text-generation
197
+ name: Text Generation
198
+ dataset:
199
+ name: MuSR (0-shot)
200
+ type: TAUR-Lab/MuSR
201
+ args:
202
+ num_few_shot: 0
203
+ metrics:
204
+ - type: acc_norm
205
+ value: 5.87
206
+ name: acc_norm
207
+ source:
208
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_HumanAI_RP
209
+ name: Open LLM Leaderboard
210
+ - task:
211
+ type: text-generation
212
+ name: Text Generation
213
+ dataset:
214
+ name: MMLU-PRO (5-shot)
215
+ type: TIGER-Lab/MMLU-Pro
216
+ config: main
217
+ split: test
218
+ args:
219
+ num_few_shot: 5
220
+ metrics:
221
+ - type: acc
222
+ value: 3.6
223
+ name: accuracy
224
+ source:
225
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/SpydazWeb_AI_HumanAI_RP
226
+ name: Open LLM Leaderboard
227
  ---
228
 
229
  BASE MODEL :
 
502
  so I will find some other insteresting datasets based around the task of image detection and generation as well as image segmentation and mask etc :
503
  It only the conversion of the dataset to include these base64 representations :
504
  Image training is quite slow ! ( i was able to create the trainer settings to perform 5000 samples in a single step , but it was still very slow for the step. ( google colab ))
505
+ So If i Tap into the A100 I will Do a few 1000( sample ) Steps : also using this prompt ! as well as task training we are also prompt tuning ! by installing many repetitions of the same prompt ... hopefully removing traces of ..."your a helpful AI"
506
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
507
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/LeroyDyer__SpydazWeb_AI_HumanAI_RP-details)
508
+
509
+ | Metric |Value|
510
+ |-------------------|----:|
511
+ |Avg. | 7.78|
512
+ |IFEval (0-Shot) |25.41|
513
+ |BBH (3-Shot) | 7.18|
514
+ |MATH Lvl 5 (4-Shot)| 1.28|
515
+ |GPQA (0-shot) | 3.36|
516
+ |MuSR (0-shot) | 5.87|
517
+ |MMLU-PRO (5-shot) | 3.60|
518
+