File size: 19,144 Bytes
616a2e7 e7be766 616a2e7 e7be766 616a2e7 e7be766 616a2e7 e7be766 616a2e7 e7be766 616a2e7 e7be766 616a2e7 e7be766 616a2e7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
INFO: 2024-11-26 20:17:25,790: llmtf.base.evaluator: Starting eval on ['darumeru/multiq', 'darumeru/parus', 'darumeru/rcb', 'darumeru/rwsd', 'darumeru/use'] INFO: 2024-11-26 20:17:25,791: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:17:25,791: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:17:27,525: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] INFO: 2024-11-26 20:17:27,525: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:17:27,525: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:17:29,517: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] INFO: 2024-11-26 20:17:29,517: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:17:29,517: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:17:30,060: llmtf.base.darumeru/MultiQ: Loading Dataset: 4.27s INFO: 2024-11-26 20:17:31,597: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] INFO: 2024-11-26 20:17:31,597: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:17:31,597: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:17:33,345: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] INFO: 2024-11-26 20:17:33,345: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:17:33,345: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:17:35,521: llmtf.base.evaluator: Starting eval on ['vikhrmodels/habr_qa_sbs', 'ruparam', 'shlepa/moviesmc', 'shlepa/musicmc', 'shlepa/lawmc', 'shlepa/booksmc'] INFO: 2024-11-26 20:17:35,521: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:17:35,521: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:17:36,308: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 2.96s INFO: 2024-11-26 20:17:36,742: llmtf.base.daru/treewayabstractive: Loading Dataset: 5.14s INFO: 2024-11-26 20:17:37,659: llmtf.base.evaluator: Starting eval on ['ruopinionne'] INFO: 2024-11-26 20:17:37,659: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:17:37,660: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:17:37,984: llmtf.base.ruopinionne: Loading Dataset: 0.32s INFO: 2024-11-26 20:17:39,551: llmtf.base.evaluator: Starting eval on ['nerel'] INFO: 2024-11-26 20:17:39,551: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:17:39,551: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:17:43,423: llmtf.base.NEREL: Loading Dataset: 3.87s INFO: 2024-11-26 20:17:48,557: llmtf.base.vikhrmodels/habr_qa_sbs: Loading Dataset: 13.04s INFO: 2024-11-26 20:18:44,135: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 67.83s INFO: 2024-11-26 20:18:44,145: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: INFO: 2024-11-26 20:18:44,149: llmtf.base.darumeru/cp_para_ru: {'tokens_per_word': 1.905314928817744, 'symbol_per_token': 3.913951487866651, 'len': 0.9904780330832407, 'lcs': 0.8} INFO: 2024-11-26 20:18:44,150: llmtf.base.evaluator: Ended eval INFO: 2024-11-26 20:18:44,152: llmtf.base.evaluator: mean darumeru/cp_para_ru 0.800 0.800 INFO: 2024-11-26 20:18:50,857: llmtf.base.NEREL: Processing Dataset: 67.43s INFO: 2024-11-26 20:18:50,860: llmtf.base.NEREL: Results for NEREL: INFO: 2024-11-26 20:18:50,864: llmtf.base.NEREL: {'tp': 2.0, 'fp': 27.0, 'fn': 519.0, 'micro-f1': 0.00727272727272595} INFO: 2024-11-26 20:18:50,865: llmtf.base.evaluator: Ended eval INFO: 2024-11-26 20:18:50,869: llmtf.base.evaluator: mean NEREL darumeru/cp_para_ru 0.404 0.007 0.800 INFO: 2024-11-26 20:18:57,627: llmtf.base.daru/treewayabstractive: Processing Dataset: 80.88s INFO: 2024-11-26 20:18:57,629: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: INFO: 2024-11-26 20:18:57,632: llmtf.base.daru/treewayabstractive: {'rouge1': 0.3138417117532064, 'rouge2': 0.10462617373556911} INFO: 2024-11-26 20:18:57,634: llmtf.base.evaluator: Ended eval INFO: 2024-11-26 20:18:57,637: llmtf.base.evaluator: mean NEREL daru/treewayabstractive darumeru/cp_para_ru 0.339 0.007 0.209 0.800 INFO: 2024-11-26 20:19:13,379: llmtf.base.darumeru/MultiQ: Processing Dataset: 103.32s INFO: 2024-11-26 20:19:13,384: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: INFO: 2024-11-26 20:19:13,404: llmtf.base.darumeru/MultiQ: {'f1': 0.3016876852043635, 'em': 0.21319311663479923} INFO: 2024-11-26 20:19:13,412: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:19:13,412: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:19:15,573: llmtf.base.darumeru/PARus: Loading Dataset: 2.16s INFO: 2024-11-26 20:19:19,861: llmtf.base.darumeru/PARus: Processing Dataset: 4.29s INFO: 2024-11-26 20:19:19,867: llmtf.base.darumeru/PARus: Results for darumeru/PARus: INFO: 2024-11-26 20:19:19,887: llmtf.base.darumeru/PARus: {'acc': 0.44} INFO: 2024-11-26 20:19:19,888: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:19:19,888: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:19:22,295: llmtf.base.darumeru/RCB: Loading Dataset: 2.40s INFO: 2024-11-26 20:19:23,604: llmtf.base.ruopinionne: Processing Dataset: 105.62s INFO: 2024-11-26 20:19:23,610: llmtf.base.ruopinionne: Results for ruopinionne: INFO: 2024-11-26 20:19:23,639: llmtf.base.ruopinionne: {'f1': 0.02701209922104298} INFO: 2024-11-26 20:19:23,640: llmtf.base.evaluator: Ended eval INFO: 2024-11-26 20:19:23,656: llmtf.base.evaluator: mean NEREL daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/cp_para_ru ruopinionne 0.290 0.007 0.209 0.257 0.440 0.800 0.027 INFO: 2024-11-26 20:19:27,714: llmtf.base.darumeru/RCB: Processing Dataset: 5.42s INFO: 2024-11-26 20:19:27,715: llmtf.base.darumeru/RCB: Results for darumeru/RCB: INFO: 2024-11-26 20:19:27,722: llmtf.base.darumeru/RCB: {'acc': 0.4590909090909091, 'f1_macro': 0.36910715356478985} INFO: 2024-11-26 20:19:27,724: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:19:27,724: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:19:29,690: llmtf.base.darumeru/RWSD: Loading Dataset: 1.96s INFO: 2024-11-26 20:19:34,850: llmtf.base.darumeru/RWSD: Processing Dataset: 5.16s INFO: 2024-11-26 20:19:34,855: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: INFO: 2024-11-26 20:19:34,858: llmtf.base.darumeru/RWSD: {'acc': 0.49019607843137253} INFO: 2024-11-26 20:19:34,859: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:19:34,859: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:19:38,002: llmtf.base.darumeru/USE: Loading Dataset: 3.14s INFO: 2024-11-26 20:19:52,797: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 143.28s INFO: 2024-11-26 20:19:55,208: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 147.68s INFO: 2024-11-26 20:20:33,495: llmtf.base.vikhrmodels/habr_qa_sbs: Processing Dataset: 164.94s INFO: 2024-11-26 20:20:33,496: llmtf.base.vikhrmodels/habr_qa_sbs: Results for vikhrmodels/habr_qa_sbs: INFO: 2024-11-26 20:20:33,533: llmtf.base.vikhrmodels/habr_qa_sbs: {'acc': 0.547, 'f1_macro': 0.5280856541414141} INFO: 2024-11-26 20:20:33,546: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:20:33,547: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:20:40,434: llmtf.base.ruparam: Loading Dataset: 6.89s INFO: 2024-11-26 20:21:07,899: llmtf.base.darumeru/USE: Processing Dataset: 89.90s INFO: 2024-11-26 20:21:07,900: llmtf.base.darumeru/USE: Results for darumeru/USE: INFO: 2024-11-26 20:21:07,942: llmtf.base.darumeru/USE: {'grade_norm': 0.06078431372549018} INFO: 2024-11-26 20:21:07,948: llmtf.base.evaluator: Ended eval INFO: 2024-11-26 20:21:08,016: llmtf.base.evaluator: mean NEREL daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru ruopinionne vikhrmodels/habr_qa_sbs 0.324 0.007 0.209 0.257 0.440 0.414 0.490 0.061 0.800 0.027 0.538 INFO: 2024-11-26 20:25:43,772: llmtf.base.ruparam: Processing Dataset: 303.34s INFO: 2024-11-26 20:25:43,788: llmtf.base.ruparam: Results for ruparam: INFO: 2024-11-26 20:25:44,038: llmtf.base.ruparam: {'acc': 0.21363220494053065} INFO: 2024-11-26 20:25:44,053: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:25:44,053: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:25:47,874: llmtf.base.shlepa/movie_mc: Loading Dataset: 3.82s INFO: 2024-11-26 20:26:05,103: llmtf.base.shlepa/movie_mc: Processing Dataset: 17.22s INFO: 2024-11-26 20:26:05,119: llmtf.base.shlepa/movie_mc: Results for shlepa/movie_mc: INFO: 2024-11-26 20:26:05,122: llmtf.base.shlepa/movie_mc: {'acc': 0.22453703703703703} INFO: 2024-11-26 20:26:05,130: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:26:05,130: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:26:08,736: llmtf.base.shlepa/music_mc: Loading Dataset: 3.60s INFO: 2024-11-26 20:26:26,413: llmtf.base.shlepa/music_mc: Processing Dataset: 17.68s INFO: 2024-11-26 20:26:26,416: llmtf.base.shlepa/music_mc: Results for shlepa/music_mc: INFO: 2024-11-26 20:26:26,435: llmtf.base.shlepa/music_mc: {'acc': 0.24468085106382978} INFO: 2024-11-26 20:26:26,438: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:26:26,439: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:26:31,058: llmtf.base.shlepa/law_mc: Loading Dataset: 4.62s INFO: 2024-11-26 20:27:15,973: llmtf.base.shlepa/law_mc: Processing Dataset: 44.91s INFO: 2024-11-26 20:27:15,980: llmtf.base.shlepa/law_mc: Results for shlepa/law_mc: INFO: 2024-11-26 20:27:16,000: llmtf.base.shlepa/law_mc: {'acc': 0.537590113285273} INFO: 2024-11-26 20:27:16,006: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] INFO: 2024-11-26 20:27:16,006: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] INFO: 2024-11-26 20:27:19,668: llmtf.base.shlepa/books_mc: Loading Dataset: 3.66s INFO: 2024-11-26 20:27:39,200: llmtf.base.shlepa/books_mc: Processing Dataset: 19.53s INFO: 2024-11-26 20:27:39,204: llmtf.base.shlepa/books_mc: Results for shlepa/books_mc: INFO: 2024-11-26 20:27:39,209: llmtf.base.shlepa/books_mc: {'acc': 0.3112033195020747} INFO: 2024-11-26 20:27:39,212: llmtf.base.evaluator: Ended eval INFO: 2024-11-26 20:27:39,231: llmtf.base.evaluator: mean NEREL daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru ruopinionne ruparam shlepa/books_mc shlepa/law_mc shlepa/movie_mc shlepa/music_mc vikhrmodels/habr_qa_sbs 0.318 0.007 0.209 0.257 0.440 0.414 0.490 0.061 0.800 0.027 0.214 0.311 0.538 0.225 0.245 0.538 INFO: 2024-11-26 20:28:00,389: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 487.59s INFO: 2024-11-26 20:28:00,396: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: INFO: 2024-11-26 20:28:00,445: llmtf.base.nlpcoreteam/enMMLU: metric subject abstract_algebra 0.310000 anatomy 0.518519 astronomy 0.717105 business_ethics 0.630000 clinical_knowledge 0.667925 college_biology 0.680556 college_chemistry 0.400000 college_computer_science 0.510000 college_mathematics 0.270000 college_medicine 0.618497 college_physics 0.558824 computer_security 0.750000 conceptual_physics 0.561702 econometrics 0.438596 electrical_engineering 0.579310 elementary_mathematics 0.455026 formal_logic 0.412698 global_facts 0.250000 high_school_biology 0.738710 high_school_chemistry 0.541872 high_school_computer_science 0.650000 high_school_european_history 0.751515 high_school_geography 0.767677 high_school_government_and_politics 0.808290 high_school_macroeconomics 0.638462 high_school_mathematics 0.433333 high_school_microeconomics 0.676471 high_school_physics 0.377483 high_school_psychology 0.814679 high_school_statistics 0.527778 high_school_us_history 0.715686 high_school_world_history 0.767932 human_aging 0.641256 human_sexuality 0.679389 international_law 0.735537 jurisprudence 0.777778 logical_fallacies 0.760736 machine_learning 0.455357 management 0.757282 marketing 0.837607 medical_genetics 0.690000 miscellaneous 0.713921 moral_disputes 0.650289 moral_scenarios 0.243575 nutrition 0.663399 philosophy 0.668810 prehistory 0.682099 professional_accounting 0.489362 professional_law 0.418514 professional_medicine 0.599265 professional_psychology 0.588235 public_relations 0.600000 security_studies 0.689796 sociology 0.786070 us_foreign_policy 0.780000 virology 0.463855 world_religions 0.789474 INFO: 2024-11-26 20:28:00,453: llmtf.base.nlpcoreteam/enMMLU: metric subject STEM 0.528725 humanities 0.644203 other (business, health, misc.) 0.610063 social sciences 0.688972 INFO: 2024-11-26 20:28:00,464: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6179910195383496} INFO: 2024-11-26 20:28:00,501: llmtf.base.evaluator: Ended eval INFO: 2024-11-26 20:28:00,524: llmtf.base.evaluator: mean NEREL daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru nlpcoreteam/enMMLU ruopinionne ruparam shlepa/books_mc shlepa/law_mc shlepa/movie_mc shlepa/music_mc vikhrmodels/habr_qa_sbs 0.337 0.007 0.209 0.257 0.440 0.414 0.490 0.061 0.800 0.618 0.027 0.214 0.311 0.538 0.225 0.245 0.538 INFO: 2024-11-26 20:28:16,900: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 501.69s INFO: 2024-11-26 20:28:16,902: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: INFO: 2024-11-26 20:28:16,950: llmtf.base.nlpcoreteam/ruMMLU: metric subject abstract_algebra 0.320000 anatomy 0.414815 astronomy 0.572368 business_ethics 0.450000 clinical_knowledge 0.505660 college_biology 0.375000 college_chemistry 0.310000 college_computer_science 0.380000 college_mathematics 0.350000 college_medicine 0.508671 college_physics 0.431373 computer_security 0.530000 conceptual_physics 0.429787 econometrics 0.298246 electrical_engineering 0.448276 elementary_mathematics 0.417989 formal_logic 0.365079 global_facts 0.240000 high_school_biology 0.487097 high_school_chemistry 0.443350 high_school_computer_science 0.530000 high_school_european_history 0.654545 high_school_geography 0.525253 high_school_government_and_politics 0.481865 high_school_macroeconomics 0.430769 high_school_mathematics 0.392593 high_school_microeconomics 0.441176 high_school_physics 0.291391 high_school_psychology 0.572477 high_school_statistics 0.416667 high_school_us_history 0.495098 high_school_world_history 0.632911 human_aging 0.488789 human_sexuality 0.496183 international_law 0.677686 jurisprudence 0.574074 logical_fallacies 0.441718 machine_learning 0.321429 management 0.563107 marketing 0.722222 medical_genetics 0.470000 miscellaneous 0.536398 moral_disputes 0.517341 moral_scenarios 0.237989 nutrition 0.526144 philosophy 0.546624 prehistory 0.478395 professional_accounting 0.365248 professional_law 0.331160 professional_medicine 0.367647 professional_psychology 0.415033 public_relations 0.454545 security_studies 0.595918 sociology 0.631841 us_foreign_policy 0.710000 virology 0.409639 world_religions 0.549708 INFO: 2024-11-26 20:28:16,959: llmtf.base.nlpcoreteam/ruMMLU: metric subject STEM 0.413740 humanities 0.500179 other (business, health, misc.) 0.469167 social sciences 0.504442 INFO: 2024-11-26 20:28:16,967: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.47188210698069283} INFO: 2024-11-26 20:28:17,010: llmtf.base.evaluator: Ended eval INFO: 2024-11-26 20:28:17,020: llmtf.base.evaluator: mean NEREL daru/treewayabstractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/USE darumeru/cp_para_ru nlpcoreteam/enMMLU nlpcoreteam/ruMMLU ruopinionne ruparam shlepa/books_mc shlepa/law_mc shlepa/movie_mc shlepa/music_mc vikhrmodels/habr_qa_sbs 0.345 0.007 0.209 0.257 0.440 0.414 0.490 0.061 0.800 0.618 0.472 0.027 0.214 0.311 0.538 0.225 0.245 0.538 |