SOLAR-10.7B-Instruct-v1.0-laser

This version of Solar-10.7B was lasered and perplexity was calculated against gsm8k.

  • Initial Model Perplexity: 12.865185737609863
  • New baseline perplexity: 12.554274559020996

The laser process decreased perplexity by 2.41%

Model AGIEval GPT4All TruthfulQA Bigbench Average
SOLAR-10.7B-Instruct-v1.0-laser 46.9 74.99 70.64 43.74 59.07

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 29.53 ± 2.87
acc_norm 28.35 ± 2.83
agieval_logiqa_en 0 acc 39.78 ± 1.92
acc_norm 40.55 ± 1.93
agieval_lsat_ar 0 acc 23.04 ± 2.78
acc_norm 21.30 ± 2.71
agieval_lsat_lr 0 acc 51.18 ± 2.22
acc_norm 51.76 ± 2.21
agieval_lsat_rc 0 acc 66.54 ± 2.88
acc_norm 66.91 ± 2.87
agieval_sat_en 0 acc 78.16 ± 2.89
acc_norm 78.16 ± 2.89
agieval_sat_en_without_passage 0 acc 50.97 ± 3.49
acc_norm 50.00 ± 3.49
agieval_sat_math 0 acc 42.73 ± 3.34
acc_norm 38.18 ± 3.28

Average: 46.9%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 60.84 ± 1.43
acc_norm 63.99 ± 1.40
arc_easy 0 acc 83.59 ± 0.76
acc_norm 81.44 ± 0.80
boolq 1 acc 87.58 ± 0.58
hellaswag 0 acc 68.11 ± 0.47
acc_norm 85.77 ± 0.35
openbookqa 0 acc 35.40 ± 2.14
acc_norm 48.40 ± 2.24
piqa 0 acc 80.58 ± 0.92
acc_norm 80.74 ± 0.92
winogrande 0 acc 77.03 ± 1.18

Average: 74.99%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 55.45 ± 1.74
mc2 70.64 ± 1.49

Average: 70.64%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 57.37 ± 3.60
bigbench_date_understanding 0 multiple_choice_grade 62.87 ± 2.52
bigbench_disambiguation_qa 0 multiple_choice_grade 35.66 ± 2.99
bigbench_geometric_shapes 0 multiple_choice_grade 33.15 ± 2.49
exact_str_match 0.00 ± 0.00
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 26.20 ± 1.97
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 19.71 ± 1.50
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 45.00 ± 2.88
bigbench_movie_recommendation 0 multiple_choice_grade 39.00 ± 2.18
bigbench_navigate 0 multiple_choice_grade 51.20 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 53.90 ± 1.11
bigbench_ruin_names 0 multiple_choice_grade 40.18 ± 2.32
bigbench_salient_translation_error_detection 0 multiple_choice_grade 39.98 ± 1.55
bigbench_snarks 0 multiple_choice_grade 63.54 ± 3.59
bigbench_sports_understanding 0 multiple_choice_grade 68.36 ± 1.48
bigbench_temporal_sequences 0 multiple_choice_grade 65.20 ± 1.51
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 22.48 ± 1.18
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.46 ± 0.93
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 45.00 ± 2.88

Average: 43.74%

Average score: 59.07%

Elapsed time: 02:33:24

Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for macadeliccc/SOLAR-10.7B-Instruct-v1.0-laser

Quantizations
2 models