license: apache-2.0
tags:
- merge
- mergekit
MetaModel
This model is a merge of the following models made with mergekit:
🧩 Configuration
slices:
- sources:
- model: jeonsworld/CarbonVillain-en-10.7B-v4
layer_range: [0, 48]
- model: kekmodel/StopCarbon-10.7B-v5
layer_range: [0, 48]
merge_method: slerp
base_model: jeonsworld/CarbonVillain-en-10.7B-v4
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
Dataset Card for Evaluation run of gagan3012/MetaModel
Dataset automatically created during the evaluation run of model gagan3012/MetaModel on the Open LLM Leaderboard.
The dataset is composed of 63 configuration, each one coresponding to one of the evaluated task.
The dataset has been created from 1 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results.
An additional configuration "results" store all the aggregated results of the run (and is used to compute and display the aggregated metrics on the Open LLM Leaderboard).
To load the details from a run, you can for instance do the following:
from datasets import load_dataset
data = load_dataset("open-llm-leaderboard/details_gagan3012__MetaModel",
"harness_winogrande_5",
split="train")
Latest results
These are the latest results from run 2024-01-04T14:09:43.780941(note that their might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval):
{
"all": {
"acc": 0.6664380298886512,
"acc_stderr": 0.031642195230944255,
"acc_norm": 0.6671639222858992,
"acc_norm_stderr": 0.03228745343467652,
"mc1": 0.5691554467564259,
"mc1_stderr": 0.01733527247533237,
"mc2": 0.7184177934834866,
"mc2_stderr": 0.014995634120330182
},
"harness|arc:challenge|25": {
"acc": 0.6843003412969283,
"acc_stderr": 0.013582571095815291,
"acc_norm": 0.7107508532423208,
"acc_norm_stderr": 0.01325001257939344
},
"harness|hellaswag|10": {
"acc": 0.7132045409281019,
"acc_stderr": 0.004513409114983828,
"acc_norm": 0.8844851623182632,
"acc_norm_stderr": 0.0031898897894046684
},
"harness|hendrycksTest-abstract_algebra|5": {
"acc": 0.43,
"acc_stderr": 0.049756985195624284,
"acc_norm": 0.43,
"acc_norm_stderr": 0.049756985195624284
},
"harness|hendrycksTest-anatomy|5": {
"acc": 0.6148148148148148,
"acc_stderr": 0.04203921040156279,
"acc_norm": 0.6148148148148148,
"acc_norm_stderr": 0.04203921040156279
},
"harness|hendrycksTest-astronomy|5": {
"acc": 0.743421052631579,
"acc_stderr": 0.0355418036802569,
"acc_norm": 0.743421052631579,
"acc_norm_stderr": 0.0355418036802569
},
"harness|hendrycksTest-business_ethics|5": {
"acc": 0.75,
"acc_stderr": 0.04351941398892446,
"acc_norm": 0.75,
"acc_norm_stderr": 0.04351941398892446
},
"harness|hendrycksTest-clinical_knowledge|5": {
"acc": 0.6830188679245283,
"acc_stderr": 0.02863723563980089,
"acc_norm": 0.6830188679245283,
"acc_norm_stderr": 0.02863723563980089
},
"harness|hendrycksTest-college_biology|5": {
"acc": 0.7638888888888888,
"acc_stderr": 0.03551446610810826,
"acc_norm": 0.7638888888888888,
"acc_norm_stderr": 0.03551446610810826
},
"harness|hendrycksTest-college_chemistry|5": {
"acc": 0.47,
"acc_stderr": 0.050161355804659205,
"acc_norm": 0.47,
"acc_norm_stderr": 0.050161355804659205
},
"harness|hendrycksTest-college_computer_science|5": {
"acc": 0.48,
"acc_stderr": 0.05021167315686781,
"acc_norm": 0.48,
"acc_norm_stderr": 0.05021167315686781
},
"harness|hendrycksTest-college_mathematics|5": {
"acc": 0.32,
"acc_stderr": 0.046882617226215034,
"acc_norm": 0.32,
"acc_norm_stderr": 0.046882617226215034
},
"harness|hendrycksTest-college_medicine|5": {
"acc": 0.6647398843930635,
"acc_stderr": 0.03599586301247077,
"acc_norm": 0.6647398843930635,
"acc_norm_stderr": 0.03599586301247077
},
"harness|hendrycksTest-college_physics|5": {
"acc": 0.38235294117647056,
"acc_stderr": 0.04835503696107223,
"acc_norm": 0.38235294117647056,
"acc_norm_stderr": 0.04835503696107223
},
"harness|hendrycksTest-computer_security|5": {
"acc": 0.75,
"acc_stderr": 0.04351941398892446,
"acc_norm": 0.75,
"acc_norm_stderr": 0.04351941398892446
},
"harness|hendrycksTest-conceptual_physics|5": {
"acc": 0.625531914893617,
"acc_stderr": 0.03163910665367291,
"acc_norm": 0.625531914893617,
"acc_norm_stderr": 0.03163910665367291
},
"harness|hendrycksTest-econometrics|5": {
"acc": 0.4824561403508772,
"acc_stderr": 0.04700708033551038,
"acc_norm": 0.4824561403508772,
"acc_norm_stderr": 0.04700708033551038
},
"harness|hendrycksTest-electrical_engineering|5": {
"acc": 0.6413793103448275,
"acc_stderr": 0.039966295748767186,
"acc_norm": 0.6413793103448275,
"acc_norm_stderr": 0.039966295748767186
},
"harness|hendrycksTest-elementary_mathematics|5": {
"acc": 0.5,
"acc_stderr": 0.025751310131230234,
"acc_norm": 0.5,
"acc_norm_stderr": 0.025751310131230234
},
"harness|hendrycksTest-formal_logic|5": {
"acc": 0.42857142857142855,
"acc_stderr": 0.0442626668137991,
"acc_norm": 0.42857142857142855,
"acc_norm_stderr": 0.0442626668137991
},
"harness|hendrycksTest-global_facts|5": {
"acc": 0.35,
"acc_stderr": 0.047937248544110196,
"acc_norm": 0.35,
"acc_norm_stderr": 0.047937248544110196
},
"harness|hendrycksTest-high_school_biology|5": {
"acc": 0.8129032258064516,
"acc_stderr": 0.022185710092252252,
"acc_norm": 0.8129032258064516,
"acc_norm_stderr": 0.022185710092252252
},
"harness|hendrycksTest-high_school_chemistry|5": {
"acc": 0.5073891625615764,
"acc_stderr": 0.035176035403610105,
"acc_norm": 0.5073891625615764,
"acc_norm_stderr": 0.035176035403610105
},
"harness|hendrycksTest-high_school_computer_science|5": {
"acc": 0.72,
"acc_stderr": 0.04512608598542128,
"acc_norm": 0.72,
"acc_norm_stderr": 0.04512608598542128
},
"harness|hendrycksTest-high_school_european_history|5": {
"acc": 0.8121212121212121,
"acc_stderr": 0.03050193405942914,
"acc_norm": 0.8121212121212121,
"acc_norm_stderr": 0.03050193405942914
},
"harness|hendrycksTest-high_school_geography|5": {
"acc": 0.8636363636363636,
"acc_stderr": 0.024450155973189835,
"acc_norm": 0.8636363636363636,
"acc_norm_stderr": 0.024450155973189835
},
"harness|hendrycksTest-high_school_government_and_politics|5": {
"acc": 0.8963730569948186,
"acc_stderr": 0.021995311963644244,
"acc_norm": 0.8963730569948186,
"acc_norm_stderr": 0.021995311963644244
},
"harness|hendrycksTest-high_school_macroeconomics|5": {
"acc": 0.6692307692307692,
"acc_stderr": 0.02385479568097114,
"acc_norm": 0.6692307692307692,
"acc_norm_stderr": 0.02385479568097114
},
"harness|hendrycksTest-high_school_mathematics|5": {
"acc": 0.37037037037037035,
"acc_stderr": 0.02944316932303154,
"acc_norm": 0.37037037037037035,
"acc_norm_stderr": 0.02944316932303154
},
"harness|hendrycksTest-high_school_microeconomics|5": {
"acc": 0.7142857142857143,
"acc_stderr": 0.029344572500634332,
"acc_norm": 0.7142857142857143,
"acc_norm_stderr": 0.029344572500634332
},
"harness|hendrycksTest-high_school_physics|5": {
"acc": 0.3708609271523179,
"acc_stderr": 0.03943966699183629,
"acc_norm": 0.3708609271523179,
"acc_norm_stderr": 0.03943966699183629
},
"harness|hendrycksTest-high_school_psychology|5": {
"acc": 0.8422018348623853,
"acc_stderr": 0.01563002297009246,
"acc_norm": 0.8422018348623853,
"acc_norm_stderr": 0.01563002297009246
},
"harness|hendrycksTest-high_school_statistics|5": {
"acc": 0.5740740740740741,
"acc_stderr": 0.03372343271653062,
"acc_norm": 0.5740740740740741,
"acc_norm_stderr": 0.03372343271653062
},
"harness|hendrycksTest-high_school_us_history|5": {
"acc": 0.8578431372549019,
"acc_stderr": 0.02450980392156862,
"acc_norm": 0.8578431372549019,
"acc_norm_stderr": 0.02450980392156862
},
"harness|hendrycksTest-high_school_world_history|5": {
"acc": 0.8565400843881856,
"acc_stderr": 0.022818291821017012,
"acc_norm": 0.8565400843881856,
"acc_norm_stderr": 0.022818291821017012
},
"harness|hendrycksTest-human_aging|5": {
"acc": 0.672645739910314,
"acc_stderr": 0.03149384670994131,
"acc_norm": 0.672645739910314,
"acc_norm_stderr": 0.03149384670994131
},
"harness|hendrycksTest-human_sexuality|5": {
"acc": 0.7557251908396947,
"acc_stderr": 0.03768335959728743,
"acc_norm": 0.7557251908396947,
"acc_norm_stderr": 0.03768335959728743
},
"harness|hendrycksTest-international_law|5": {
"acc": 0.7851239669421488,
"acc_stderr": 0.037494924487096966,
"acc_norm": 0.7851239669421488,
"acc_norm_stderr": 0.037494924487096966
},
"harness|hendrycksTest-jurisprudence|5": {
"acc": 0.8055555555555556,
"acc_stderr": 0.038260763248848646,
"acc_norm": 0.8055555555555556,
"acc_norm_stderr": 0.038260763248848646
},
"harness|hendrycksTest-logical_fallacies|5": {
"acc": 0.754601226993865,
"acc_stderr": 0.03380939813943354,
"acc_norm": 0.754601226993865,
"acc_norm_stderr": 0.03380939813943354
},
"harness|hendrycksTest-machine_learning|5": {
"acc": 0.4732142857142857,
"acc_stderr": 0.047389751192741546,
"acc_norm": 0.4732142857142857,
"acc_norm_stderr": 0.047389751192741546
},
"harness|hendrycksTest-management|5": {
"acc": 0.8446601941747572,
"acc_stderr": 0.035865947385739734,
"acc_norm": 0.8446601941747572,
"acc_norm_stderr": 0.035865947385739734
},
"harness|hendrycksTest-marketing|5": {
"acc": 0.8589743589743589,
"acc_stderr": 0.02280138253459753,
"acc_norm": 0.8589743589743589,
"acc_norm_stderr": 0.02280138253459753
},
"harness|hendrycksTest-medical_genetics|5": {
"acc": 0.7,
"acc_stderr": 0.046056618647183814,
"acc_norm": 0.7,
"acc_norm_stderr": 0.046056618647183814
},
"harness|hendrycksTest-miscellaneous|5": {
"acc": 0.8084291187739464,
"acc_stderr": 0.014072859310451949,
"acc_norm": 0.8084291187739464,
"acc_norm_stderr": 0.014072859310451949
},
"harness|hendrycksTest-moral_disputes|5": {
"acc": 0.7572254335260116,
"acc_stderr": 0.023083658586984204,
"acc_norm": 0.7572254335260116,
"acc_norm_stderr": 0.023083658586984204
},
"harness|hendrycksTest-moral_scenarios|5": {
"acc": 0.39664804469273746,
"acc_stderr": 0.016361354769822468,
"acc_norm": 0.39664804469273746,
"acc_norm_stderr": 0.016361354769822468
},
"harness|hendrycksTest-nutrition|5": {
"acc": 0.7581699346405228,
"acc_stderr": 0.024518195641879334,
"acc_norm": 0.7581699346405228,
"acc_norm_stderr": 0.024518195641879334
},
"harness|hendrycksTest-philosophy|5": {
"acc": 0.7202572347266881,
"acc_stderr": 0.025494259350694905,
"acc_norm": 0.7202572347266881,
"acc_norm_stderr": 0.025494259350694905
},
"harness|hendrycksTest-prehistory|5": {
"acc": 0.7777777777777778,
"acc_stderr": 0.02313237623454333,
"acc_norm": 0.7777777777777778,
"acc_norm_stderr": 0.02313237623454333
},
"harness|hendrycksTest-professional_accounting|5": {
"acc": 0.5035460992907801,
"acc_stderr": 0.02982674915328092,
"acc_norm": 0.5035460992907801,
"acc_norm_stderr": 0.02982674915328092
},
"harness|hendrycksTest-professional_law|5": {
"acc": 0.49478487614080835,
"acc_stderr": 0.012769541449652547,
"acc_norm": 0.49478487614080835,
"acc_norm_stderr": 0.012769541449652547
},
"harness|hendrycksTest-professional_medicine|5": {
"acc": 0.75,
"acc_stderr": 0.026303648393696036,
"acc_norm": 0.75,
"acc_norm_stderr": 0.026303648393696036
},
"harness|hendrycksTest-professional_psychology|5": {
"acc": 0.6813725490196079,
"acc_stderr": 0.018850084696468712,
"acc_norm": 0.6813725490196079,
"acc_norm_stderr": 0.018850084696468712
},
"harness|hendrycksTest-public_relations|5": {
"acc": 0.6818181818181818,
"acc_stderr": 0.04461272175910509,
"acc_norm": 0.6818181818181818,
"acc_norm_stderr": 0.04461272175910509
},
"harness|hendrycksTest-security_studies|5": {
"acc": 0.746938775510204,
"acc_stderr": 0.027833023871399677,
"acc_norm": 0.746938775510204,
"acc_norm_stderr": 0.027833023871399677
},
"harness|hendrycksTest-sociology|5": {
"acc": 0.8258706467661692,
"acc_stderr": 0.026814951200421603,
"acc_norm": 0.8258706467661692,
"acc_norm_stderr": 0.026814951200421603
},
"harness|hendrycksTest-us_foreign_policy|5": {
"acc": 0.91,
"acc_stderr": 0.028762349126466125,
"acc_norm": 0.91,
"acc_norm_stderr": 0.028762349126466125
},
"harness|hendrycksTest-virology|5": {
"acc": 0.5783132530120482,
"acc_stderr": 0.038444531817709175,
"acc_norm": 0.5783132530120482,
"acc_norm_stderr": 0.038444531817709175
},
"harness|hendrycksTest-world_religions|5": {
"acc": 0.7777777777777778,
"acc_stderr": 0.03188578017686398,
"acc_norm": 0.7777777777777778,
"acc_norm_stderr": 0.03188578017686398
},
"harness|truthfulqa:mc|0": {
"mc1": 0.5691554467564259,
"mc1_stderr": 0.01733527247533237,
"mc2": 0.7184177934834866,
"mc2_stderr": 0.014995634120330182
},
"harness|winogrande|5": {
"acc": 0.8342541436464088,
"acc_stderr": 0.010450899545370632
},
"harness|gsm8k|5": {
"acc": 0.6535253980288097,
"acc_stderr": 0.013107179054313398
}
}
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 74.4 |
ARC (25-shot) | 71.08 |
HellaSwag (10-shot) | 88.45 |
MMLU (5-shot) | 66.26 |
TruthfulQA (0-shot) | 71.84 |
Winogrande (5-shot) | 83.43 |
GSM8K (5-shot) | 65.35 |