--- license: cc-by-nc-2.0 --- # SOLAR-10.7B-Instruct-v1.0-laser This version of Solar-10.7B was lasered and perplexity was calculated against gsm8k. + Initial Model Perplexity: 12.865185737609863 + New baseline perplexity: 12.554274559020996 The laser process decreased perplexity by 2.41% | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |-----------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |[SOLAR-10.7B-Instruct-v1.0-laser](https://huggingface.co/macadeliccc/SOLAR-10.7B-Instruct-v1.0-laser)| 46.9| 74.99| 70.64| 43.74| 59.07| ### AGIEval | Task |Version| Metric |Value| |Stderr| |------------------------------|------:|--------|----:|---|-----:| |agieval_aqua_rat | 0|acc |29.53|± | 2.87| | | |acc_norm|28.35|± | 2.83| |agieval_logiqa_en | 0|acc |39.78|± | 1.92| | | |acc_norm|40.55|± | 1.93| |agieval_lsat_ar | 0|acc |23.04|± | 2.78| | | |acc_norm|21.30|± | 2.71| |agieval_lsat_lr | 0|acc |51.18|± | 2.22| | | |acc_norm|51.76|± | 2.21| |agieval_lsat_rc | 0|acc |66.54|± | 2.88| | | |acc_norm|66.91|± | 2.87| |agieval_sat_en | 0|acc |78.16|± | 2.89| | | |acc_norm|78.16|± | 2.89| |agieval_sat_en_without_passage| 0|acc |50.97|± | 3.49| | | |acc_norm|50.00|± | 3.49| |agieval_sat_math | 0|acc |42.73|± | 3.34| | | |acc_norm|38.18|± | 3.28| Average: 46.9% ### GPT4All | Task |Version| Metric |Value| |Stderr| |-------------|------:|--------|----:|---|-----:| |arc_challenge| 0|acc |60.84|± | 1.43| | | |acc_norm|63.99|± | 1.40| |arc_easy | 0|acc |83.59|± | 0.76| | | |acc_norm|81.44|± | 0.80| |boolq | 1|acc |87.58|± | 0.58| |hellaswag | 0|acc |68.11|± | 0.47| | | |acc_norm|85.77|± | 0.35| |openbookqa | 0|acc |35.40|± | 2.14| | | |acc_norm|48.40|± | 2.24| |piqa | 0|acc |80.58|± | 0.92| | | |acc_norm|80.74|± | 0.92| |winogrande | 0|acc |77.03|± | 1.18| Average: 74.99% ### TruthfulQA | Task |Version|Metric|Value| |Stderr| |-------------|------:|------|----:|---|-----:| |truthfulqa_mc| 1|mc1 |55.45|± | 1.74| | | |mc2 |70.64|± | 1.49| Average: 70.64% ### Bigbench | Task |Version| Metric |Value| |Stderr| |------------------------------------------------|------:|---------------------|----:|---|-----:| |bigbench_causal_judgement | 0|multiple_choice_grade|57.37|± | 3.60| |bigbench_date_understanding | 0|multiple_choice_grade|62.87|± | 2.52| |bigbench_disambiguation_qa | 0|multiple_choice_grade|35.66|± | 2.99| |bigbench_geometric_shapes | 0|multiple_choice_grade|33.15|± | 2.49| | | |exact_str_match | 0.00|± | 0.00| |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|26.20|± | 1.97| |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|19.71|± | 1.50| |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|45.00|± | 2.88| |bigbench_movie_recommendation | 0|multiple_choice_grade|39.00|± | 2.18| |bigbench_navigate | 0|multiple_choice_grade|51.20|± | 1.58| |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|53.90|± | 1.11| |bigbench_ruin_names | 0|multiple_choice_grade|40.18|± | 2.32| |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|39.98|± | 1.55| |bigbench_snarks | 0|multiple_choice_grade|63.54|± | 3.59| |bigbench_sports_understanding | 0|multiple_choice_grade|68.36|± | 1.48| |bigbench_temporal_sequences | 0|multiple_choice_grade|65.20|± | 1.51| |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.48|± | 1.18| |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.46|± | 0.93| |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|45.00|± | 2.88| Average: 43.74% Average score: 59.07% Elapsed time: 02:33:24