|
INFO: 2024-11-18 14:18:48,851: llmtf.base.evaluator: Starting eval on ['darumeru/multiq'] |
|
INFO: 2024-11-18 14:18:48,852: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:18:48,852: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:18:50,653: llmtf.base.evaluator: Starting eval on ['darumeru/parus'] |
|
INFO: 2024-11-18 14:18:50,654: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:18:50,654: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:18:52,696: llmtf.base.darumeru/MultiQ: Loading Dataset: 3.84s |
|
INFO: 2024-11-18 14:18:52,936: llmtf.base.evaluator: Starting eval on ['darumeru/rcb'] |
|
INFO: 2024-11-18 14:18:52,936: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:18:52,936: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:18:53,102: llmtf.base.darumeru/PARus: Loading Dataset: 2.45s |
|
INFO: 2024-11-18 14:18:54,811: llmtf.base.evaluator: Starting eval on ['darumeru/ruopenbookqa'] |
|
INFO: 2024-11-18 14:18:54,811: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:18:54,811: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:18:55,708: llmtf.base.darumeru/RCB: Loading Dataset: 2.77s |
|
INFO: 2024-11-18 14:18:56,455: llmtf.base.darumeru/PARus: Processing Dataset: 3.35s |
|
INFO: 2024-11-18 14:18:56,457: llmtf.base.darumeru/PARus: Results for darumeru/PARus: |
|
INFO: 2024-11-18 14:18:56,470: llmtf.base.darumeru/PARus: {'acc': 0.24} |
|
INFO: 2024-11-18 14:18:56,471: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:18:56,474: llmtf.base.evaluator: |
|
mean darumeru/PARus |
|
0.240 0.240 |
|
INFO: 2024-11-18 14:18:56,487: llmtf.base.evaluator: Starting eval on ['darumeru/ruworldtree'] |
|
INFO: 2024-11-18 14:18:56,488: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:18:56,488: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:18:58,099: llmtf.base.darumeru/ruOpenBookQA: Loading Dataset: 3.29s |
|
INFO: 2024-11-18 14:18:58,743: llmtf.base.evaluator: Starting eval on ['darumeru/rwsd'] |
|
INFO: 2024-11-18 14:18:58,744: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:18:58,744: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:18:58,925: llmtf.base.darumeru/ruWorldTree: Loading Dataset: 2.44s |
|
INFO: 2024-11-18 14:19:00,968: llmtf.base.evaluator: Starting eval on ['daru/treewayextractive'] |
|
INFO: 2024-11-18 14:19:00,968: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:19:00,968: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:19:01,031: llmtf.base.darumeru/RCB: Processing Dataset: 5.32s |
|
INFO: 2024-11-18 14:19:01,033: llmtf.base.darumeru/RCB: Results for darumeru/RCB: |
|
INFO: 2024-11-18 14:19:01,040: llmtf.base.darumeru/RCB: {'acc': 0.4727272727272727, 'f1_macro': 0.39356669305497743} |
|
INFO: 2024-11-18 14:19:01,041: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:19:01,044: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB |
|
0.337 0.240 0.433 |
|
INFO: 2024-11-18 14:19:01,497: llmtf.base.darumeru/RWSD: Loading Dataset: 2.75s |
|
INFO: 2024-11-18 14:19:01,851: llmtf.base.darumeru/ruWorldTree: Processing Dataset: 2.92s |
|
INFO: 2024-11-18 14:19:01,852: llmtf.base.darumeru/ruWorldTree: Results for darumeru/ruWorldTree: |
|
INFO: 2024-11-18 14:19:01,859: llmtf.base.darumeru/ruWorldTree: {'acc': 0.7714285714285715, 'f1_macro': 0.7726851851851853} |
|
INFO: 2024-11-18 14:19:01,859: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:19:01,863: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/ruWorldTree |
|
0.482 0.240 0.433 0.772 |
|
INFO: 2024-11-18 14:19:02,889: llmtf.base.evaluator: Starting eval on ['daru/treewayabstractive'] |
|
INFO: 2024-11-18 14:19:02,890: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:19:02,890: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:19:03,199: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/rummlu'] |
|
INFO: 2024-11-18 14:19:03,199: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:19:03,199: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:19:06,629: llmtf.base.darumeru/RWSD: Processing Dataset: 5.13s |
|
INFO: 2024-11-18 14:19:06,631: llmtf.base.darumeru/RWSD: Results for darumeru/RWSD: |
|
INFO: 2024-11-18 14:19:06,635: llmtf.base.darumeru/RWSD: {'acc': 0.5098039215686274} |
|
INFO: 2024-11-18 14:19:06,636: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:19:06,641: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruWorldTree |
|
0.489 0.240 0.433 0.510 0.772 |
|
INFO: 2024-11-18 14:19:06,885: llmtf.base.daru/treewayabstractive: Loading Dataset: 4.00s |
|
INFO: 2024-11-18 14:19:07,496: llmtf.base.evaluator: Starting eval on ['darumeru/cp_para_ru'] |
|
INFO: 2024-11-18 14:19:07,497: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:19:07,497: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:19:10,509: llmtf.base.darumeru/cp_para_ru: Loading Dataset: 3.01s |
|
INFO: 2024-11-18 14:19:13,909: llmtf.base.daru/treewayextractive: Loading Dataset: 12.94s |
|
INFO: 2024-11-18 14:19:44,800: llmtf.base.darumeru/ruOpenBookQA: Processing Dataset: 46.70s |
|
INFO: 2024-11-18 14:19:44,801: llmtf.base.darumeru/ruOpenBookQA: Results for darumeru/ruOpenBookQA: |
|
INFO: 2024-11-18 14:19:44,814: llmtf.base.darumeru/ruOpenBookQA: {'acc': 0.615979381443299, 'f1_macro': 0.6154023944317246} |
|
INFO: 2024-11-18 14:19:44,821: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:19:44,826: llmtf.base.evaluator: |
|
mean darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.514 0.240 0.433 0.510 0.616 0.772 |
|
INFO: 2024-11-18 14:21:05,969: llmtf.base.nlpcoreteam/ruMMLU: Loading Dataset: 122.77s |
|
INFO: 2024-11-18 14:21:35,520: llmtf.base.daru/treewayextractive: Processing Dataset: 141.61s |
|
INFO: 2024-11-18 14:21:35,523: llmtf.base.daru/treewayextractive: Results for daru/treewayextractive: |
|
INFO: 2024-11-18 14:21:35,753: llmtf.base.daru/treewayextractive: {'r-prec': 0.3782488455988456} |
|
INFO: 2024-11-18 14:21:35,793: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:21:35,799: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.491 0.378 0.240 0.433 0.510 0.616 0.772 |
|
INFO: 2024-11-18 14:24:06,030: llmtf.base.darumeru/cp_para_ru: Processing Dataset: 295.52s |
|
INFO: 2024-11-18 14:24:06,032: llmtf.base.darumeru/cp_para_ru: Results for darumeru/cp_para_ru: |
|
INFO: 2024-11-18 14:24:06,036: llmtf.base.darumeru/cp_para_ru: {'symbol_per_token': 3.8978256135744003, 'len': 0.8764597602663033, 'lcs': 0.05} |
|
INFO: 2024-11-18 14:24:06,036: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:24:06,041: llmtf.base.evaluator: |
|
mean daru/treewayextractive darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.428 0.378 0.240 0.433 0.510 0.050 0.616 0.772 |
|
INFO: 2024-11-18 14:24:17,174: llmtf.base.daru/treewayabstractive: Processing Dataset: 310.29s |
|
INFO: 2024-11-18 14:24:17,190: llmtf.base.daru/treewayabstractive: Results for daru/treewayabstractive: |
|
INFO: 2024-11-18 14:24:17,208: llmtf.base.daru/treewayabstractive: {'rouge1': 0.31023763628891676, 'rouge2': 0.09443696323171702} |
|
INFO: 2024-11-18 14:24:17,210: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:24:17,215: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree |
|
0.400 0.202 0.378 0.240 0.433 0.510 0.050 0.616 0.772 |
|
INFO: 2024-11-18 14:26:06,991: llmtf.base.nlpcoreteam/ruMMLU: Processing Dataset: 301.02s |
|
INFO: 2024-11-18 14:26:06,993: llmtf.base.nlpcoreteam/ruMMLU: Results for nlpcoreteam/ruMMLU: |
|
INFO: 2024-11-18 14:26:07,037: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
abstract_algebra 0.280000 |
|
anatomy 0.400000 |
|
astronomy 0.572368 |
|
business_ethics 0.460000 |
|
clinical_knowledge 0.494340 |
|
college_biology 0.375000 |
|
college_chemistry 0.290000 |
|
college_computer_science 0.400000 |
|
college_mathematics 0.400000 |
|
college_medicine 0.491329 |
|
college_physics 0.362745 |
|
computer_security 0.500000 |
|
conceptual_physics 0.421277 |
|
econometrics 0.280702 |
|
electrical_engineering 0.427586 |
|
elementary_mathematics 0.391534 |
|
formal_logic 0.373016 |
|
global_facts 0.230000 |
|
high_school_biology 0.496774 |
|
high_school_chemistry 0.458128 |
|
high_school_computer_science 0.500000 |
|
high_school_european_history 0.600000 |
|
high_school_geography 0.535354 |
|
high_school_government_and_politics 0.518135 |
|
high_school_macroeconomics 0.471795 |
|
high_school_mathematics 0.400000 |
|
high_school_microeconomics 0.462185 |
|
high_school_physics 0.291391 |
|
high_school_psychology 0.614679 |
|
high_school_statistics 0.490741 |
|
high_school_us_history 0.534314 |
|
high_school_world_history 0.624473 |
|
human_aging 0.520179 |
|
human_sexuality 0.519084 |
|
international_law 0.694215 |
|
jurisprudence 0.537037 |
|
logical_fallacies 0.472393 |
|
machine_learning 0.258929 |
|
management 0.640777 |
|
marketing 0.700855 |
|
medical_genetics 0.480000 |
|
miscellaneous 0.533844 |
|
moral_disputes 0.488439 |
|
moral_scenarios 0.268156 |
|
nutrition 0.526144 |
|
philosophy 0.543408 |
|
prehistory 0.475309 |
|
professional_accounting 0.347518 |
|
professional_law 0.345502 |
|
professional_medicine 0.426471 |
|
professional_psychology 0.411765 |
|
public_relations 0.427273 |
|
security_studies 0.542857 |
|
sociology 0.686567 |
|
us_foreign_policy 0.700000 |
|
virology 0.379518 |
|
world_religions 0.538012 |
|
INFO: 2024-11-18 14:26:07,045: llmtf.base.nlpcoreteam/ruMMLU: metric |
|
subject |
|
STEM 0.406471 |
|
humanities 0.499559 |
|
other (business, health, misc.) 0.473641 |
|
social sciences 0.514200 |
|
INFO: 2024-11-18 14:26:07,053: llmtf.base.nlpcoreteam/ruMMLU: {'acc': 0.47346768524963356} |
|
INFO: 2024-11-18 14:26:07,087: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:26:07,096: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.408 0.202 0.378 0.240 0.433 0.510 0.050 0.616 0.772 0.473 |
|
INFO: 2024-11-18 14:29:49,600: llmtf.base.darumeru/MultiQ: Processing Dataset: 656.90s |
|
INFO: 2024-11-18 14:29:49,603: llmtf.base.darumeru/MultiQ: Results for darumeru/MultiQ: |
|
INFO: 2024-11-18 14:29:49,608: llmtf.base.darumeru/MultiQ: {'f1': 0.20613243758223346, 'em': 0.11281070745697896} |
|
INFO: 2024-11-18 14:29:49,612: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:29:49,634: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/ruMMLU |
|
0.383 0.202 0.378 0.159 0.240 0.433 0.510 0.050 0.616 0.772 0.473 |
|
INFO: 2024-11-18 14:29:55,578: llmtf.base.evaluator: Starting eval on ['nlpcoreteam/enmmlu'] |
|
INFO: 2024-11-18 14:29:55,579: llmtf.base.hfmodel: Updated generation_config.eos_token_id: [145111] |
|
INFO: 2024-11-18 14:29:55,579: llmtf.base.hfmodel: Updated generation_config.stop_strings: ['<|im_end|>'] |
|
INFO: 2024-11-18 14:31:56,928: llmtf.base.nlpcoreteam/enMMLU: Loading Dataset: 121.35s |
|
INFO: 2024-11-18 14:36:48,348: llmtf.base.nlpcoreteam/enMMLU: Processing Dataset: 291.42s |
|
INFO: 2024-11-18 14:36:48,352: llmtf.base.nlpcoreteam/enMMLU: Results for nlpcoreteam/enMMLU: |
|
INFO: 2024-11-18 14:36:48,396: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
abstract_algebra 0.350000 |
|
anatomy 0.562963 |
|
astronomy 0.664474 |
|
business_ethics 0.640000 |
|
clinical_knowledge 0.652830 |
|
college_biology 0.652778 |
|
college_chemistry 0.450000 |
|
college_computer_science 0.490000 |
|
college_mathematics 0.310000 |
|
college_medicine 0.618497 |
|
college_physics 0.500000 |
|
computer_security 0.710000 |
|
conceptual_physics 0.587234 |
|
econometrics 0.429825 |
|
electrical_engineering 0.531034 |
|
elementary_mathematics 0.460317 |
|
formal_logic 0.373016 |
|
global_facts 0.270000 |
|
high_school_biology 0.748387 |
|
high_school_chemistry 0.522167 |
|
high_school_computer_science 0.620000 |
|
high_school_european_history 0.733333 |
|
high_school_geography 0.747475 |
|
high_school_government_and_politics 0.808290 |
|
high_school_macroeconomics 0.658974 |
|
high_school_mathematics 0.403704 |
|
high_school_microeconomics 0.684874 |
|
high_school_physics 0.390728 |
|
high_school_psychology 0.822018 |
|
high_school_statistics 0.550926 |
|
high_school_us_history 0.720588 |
|
high_school_world_history 0.742616 |
|
human_aging 0.623318 |
|
human_sexuality 0.687023 |
|
international_law 0.710744 |
|
jurisprudence 0.759259 |
|
logical_fallacies 0.742331 |
|
machine_learning 0.419643 |
|
management 0.747573 |
|
marketing 0.824786 |
|
medical_genetics 0.690000 |
|
miscellaneous 0.708812 |
|
moral_disputes 0.641618 |
|
moral_scenarios 0.252514 |
|
nutrition 0.653595 |
|
philosophy 0.668810 |
|
prehistory 0.675926 |
|
professional_accounting 0.510638 |
|
professional_law 0.397653 |
|
professional_medicine 0.602941 |
|
professional_psychology 0.589869 |
|
public_relations 0.609091 |
|
security_studies 0.673469 |
|
sociology 0.776119 |
|
us_foreign_policy 0.740000 |
|
virology 0.463855 |
|
world_religions 0.783626 |
|
INFO: 2024-11-18 14:36:48,403: llmtf.base.nlpcoreteam/enMMLU: metric |
|
subject |
|
STEM 0.520077 |
|
humanities 0.630926 |
|
other (business, health, misc.) 0.612129 |
|
social sciences 0.685586 |
|
INFO: 2024-11-18 14:36:48,425: llmtf.base.nlpcoreteam/enMMLU: {'acc': 0.6121795307919802} |
|
INFO: 2024-11-18 14:36:48,459: llmtf.base.evaluator: Ended eval |
|
INFO: 2024-11-18 14:36:48,480: llmtf.base.evaluator: |
|
mean daru/treewayabstractive daru/treewayextractive darumeru/MultiQ darumeru/PARus darumeru/RCB darumeru/RWSD darumeru/cp_para_ru darumeru/ruOpenBookQA darumeru/ruWorldTree nlpcoreteam/enMMLU nlpcoreteam/ruMMLU |
|
0.404 0.202 0.378 0.159 0.240 0.433 0.510 0.050 0.616 0.772 0.612 0.473 |
|
|