File size: 5,758 Bytes
f43c1aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|                 Tasks                 |Version|Filter|n-shot|Metric|Value |   |Stderr|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu                                   |N/A    |none  |     0|acc   |0.4765|±  |0.0041|
| - humanities                          |N/A    |none  |     0|acc   |0.4499|±  |0.0069|
|  - formal_logic                       |      0|none  |     0|acc   |0.2857|±  |0.0404|
|  - high_school_european_history       |      0|none  |     0|acc   |0.6545|±  |0.0371|
|  - high_school_us_history             |      0|none  |     0|acc   |0.6520|±  |0.0334|
|  - high_school_world_history          |      0|none  |     0|acc   |0.6582|±  |0.0309|
|  - international_law                  |      0|none  |     0|acc   |0.6198|±  |0.0443|
|  - jurisprudence                      |      0|none  |     0|acc   |0.5278|±  |0.0483|
|  - logical_fallacies                  |      0|none  |     0|acc   |0.4847|±  |0.0393|
|  - moral_disputes                     |      0|none  |     0|acc   |0.5000|±  |0.0269|
|  - moral_scenarios                    |      0|none  |     0|acc   |0.2380|±  |0.0142|
|  - philosophy                         |      0|none  |     0|acc   |0.5916|±  |0.0279|
|  - prehistory                         |      0|none  |     0|acc   |0.5741|±  |0.0275|
|  - professional_law                   |      0|none  |     0|acc   |0.3931|±  |0.0125|
|  - world_religions                    |      0|none  |     0|acc   |0.6667|±  |0.0362|
| - other                               |N/A    |none  |     0|acc   |0.5291|±  |0.0087|
|  - business_ethics                    |      0|none  |     0|acc   |0.5400|±  |0.0501|
|  - clinical_knowledge                 |      0|none  |     0|acc   |0.4340|±  |0.0305|
|  - college_medicine                   |      0|none  |     0|acc   |0.4624|±  |0.0380|
|  - global_facts                       |      0|none  |     0|acc   |0.2800|±  |0.0451|
|  - human_aging                        |      0|none  |     0|acc   |0.5112|±  |0.0335|
|  - management                         |      0|none  |     0|acc   |0.6505|±  |0.0472|
|  - marketing                          |      0|none  |     0|acc   |0.6923|±  |0.0302|
|  - medical_genetics                   |      0|none  |     0|acc   |0.5100|±  |0.0502|
|  - miscellaneous                      |      0|none  |     0|acc   |0.6501|±  |0.0171|
|  - nutrition                          |      0|none  |     0|acc   |0.5000|±  |0.0286|
|  - professional_accounting            |      0|none  |     0|acc   |0.3546|±  |0.0285|
|  - professional_medicine              |      0|none  |     0|acc   |0.5037|±  |0.0304|
|  - virology                           |      0|none  |     0|acc   |0.4458|±  |0.0387|
| - social_sciences                     |N/A    |none  |     0|acc   |0.5518|±  |0.0088|
|  - econometrics                       |      0|none  |     0|acc   |0.2456|±  |0.0405|
|  - high_school_geography              |      0|none  |     0|acc   |0.5606|±  |0.0354|
|  - high_school_government_and_politics|      0|none  |     0|acc   |0.6839|±  |0.0336|
|  - high_school_macroeconomics         |      0|none  |     0|acc   |0.4692|±  |0.0253|
|  - high_school_microeconomics         |      0|none  |     0|acc   |0.5042|±  |0.0325|
|  - high_school_psychology             |      0|none  |     0|acc   |0.6073|±  |0.0209|
|  - human_sexuality                    |      0|none  |     0|acc   |0.6107|±  |0.0428|
|  - professional_psychology            |      0|none  |     0|acc   |0.4951|±  |0.0202|
|  - public_relations                   |      0|none  |     0|acc   |0.5364|±  |0.0478|
|  - security_studies                   |      0|none  |     0|acc   |0.5959|±  |0.0314|
|  - sociology                          |      0|none  |     0|acc   |0.6617|±  |0.0335|
|  - us_foreign_policy                  |      0|none  |     0|acc   |0.7200|±  |0.0451|
| - stem                                |N/A    |none  |     0|acc   |0.3907|±  |0.0085|
|  - abstract_algebra                   |      0|none  |     0|acc   |0.2900|±  |0.0456|
|  - anatomy                            |      0|none  |     0|acc   |0.4889|±  |0.0432|
|  - astronomy                          |      0|none  |     0|acc   |0.5132|±  |0.0407|
|  - college_biology                    |      0|none  |     0|acc   |0.4792|±  |0.0418|
|  - college_chemistry                  |      0|none  |     0|acc   |0.3700|±  |0.0485|
|  - college_computer_science           |      0|none  |     0|acc   |0.4200|±  |0.0496|
|  - college_mathematics                |      0|none  |     0|acc   |0.3500|±  |0.0479|
|  - college_physics                    |      0|none  |     0|acc   |0.2941|±  |0.0453|
|  - computer_security                  |      0|none  |     0|acc   |0.6200|±  |0.0488|
|  - conceptual_physics                 |      0|none  |     0|acc   |0.3106|±  |0.0303|
|  - electrical_engineering             |      0|none  |     0|acc   |0.4621|±  |0.0415|
|  - elementary_mathematics             |      0|none  |     0|acc   |0.3148|±  |0.0239|
|  - high_school_biology                |      0|none  |     0|acc   |0.5581|±  |0.0283|
|  - high_school_chemistry              |      0|none  |     0|acc   |0.3153|±  |0.0327|
|  - high_school_computer_science       |      0|none  |     0|acc   |0.4700|±  |0.0502|
|  - high_school_mathematics            |      0|none  |     0|acc   |0.2704|±  |0.0271|
|  - high_school_physics                |      0|none  |     0|acc   |0.2583|±  |0.0357|
|  - high_school_statistics             |      0|none  |     0|acc   |0.3981|±  |0.0334|
|  - machine_learning                   |      0|none  |     0|acc   |0.3839|±  |0.0462|