|
--- |
|
license: cc-by-nc-2.0 |
|
--- |
|
# SOLAR-10.7B-Instruct-v1.0-laser |
|
|
|
This version of Solar-10.7B was lasered and perplexity was calculated against gsm8k. |
|
|
|
+ Initial Model Perplexity: 12.865185737609863 |
|
+ New baseline perplexity: 12.554274559020996 |
|
|
|
The laser process decreased perplexity by 2.41% |
|
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average| |
|
|-----------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:| |
|
|[SOLAR-10.7B-Instruct-v1.0-laser](https://huggingface.co/macadeliccc/SOLAR-10.7B-Instruct-v1.0-laser)| 46.9| 74.99| 70.64| 43.74| 59.07| |
|
|
|
### AGIEval |
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------|------:|--------|----:|---|-----:| |
|
|agieval_aqua_rat | 0|acc |29.53|± | 2.87| |
|
| | |acc_norm|28.35|± | 2.83| |
|
|agieval_logiqa_en | 0|acc |39.78|± | 1.92| |
|
| | |acc_norm|40.55|± | 1.93| |
|
|agieval_lsat_ar | 0|acc |23.04|± | 2.78| |
|
| | |acc_norm|21.30|± | 2.71| |
|
|agieval_lsat_lr | 0|acc |51.18|± | 2.22| |
|
| | |acc_norm|51.76|± | 2.21| |
|
|agieval_lsat_rc | 0|acc |66.54|± | 2.88| |
|
| | |acc_norm|66.91|± | 2.87| |
|
|agieval_sat_en | 0|acc |78.16|± | 2.89| |
|
| | |acc_norm|78.16|± | 2.89| |
|
|agieval_sat_en_without_passage| 0|acc |50.97|± | 3.49| |
|
| | |acc_norm|50.00|± | 3.49| |
|
|agieval_sat_math | 0|acc |42.73|± | 3.34| |
|
| | |acc_norm|38.18|± | 3.28| |
|
|
|
Average: 46.9% |
|
|
|
### GPT4All |
|
| Task |Version| Metric |Value| |Stderr| |
|
|-------------|------:|--------|----:|---|-----:| |
|
|arc_challenge| 0|acc |60.84|± | 1.43| |
|
| | |acc_norm|63.99|± | 1.40| |
|
|arc_easy | 0|acc |83.59|± | 0.76| |
|
| | |acc_norm|81.44|± | 0.80| |
|
|boolq | 1|acc |87.58|± | 0.58| |
|
|hellaswag | 0|acc |68.11|± | 0.47| |
|
| | |acc_norm|85.77|± | 0.35| |
|
|openbookqa | 0|acc |35.40|± | 2.14| |
|
| | |acc_norm|48.40|± | 2.24| |
|
|piqa | 0|acc |80.58|± | 0.92| |
|
| | |acc_norm|80.74|± | 0.92| |
|
|winogrande | 0|acc |77.03|± | 1.18| |
|
|
|
Average: 74.99% |
|
|
|
### TruthfulQA |
|
| Task |Version|Metric|Value| |Stderr| |
|
|-------------|------:|------|----:|---|-----:| |
|
|truthfulqa_mc| 1|mc1 |55.45|± | 1.74| |
|
| | |mc2 |70.64|± | 1.49| |
|
|
|
Average: 70.64% |
|
|
|
### Bigbench |
|
| Task |Version| Metric |Value| |Stderr| |
|
|------------------------------------------------|------:|---------------------|----:|---|-----:| |
|
|bigbench_causal_judgement | 0|multiple_choice_grade|57.37|± | 3.60| |
|
|bigbench_date_understanding | 0|multiple_choice_grade|62.87|± | 2.52| |
|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|35.66|± | 2.99| |
|
|bigbench_geometric_shapes | 0|multiple_choice_grade|33.15|± | 2.49| |
|
| | |exact_str_match | 0.00|± | 0.00| |
|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|26.20|± | 1.97| |
|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|19.71|± | 1.50| |
|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|45.00|± | 2.88| |
|
|bigbench_movie_recommendation | 0|multiple_choice_grade|39.00|± | 2.18| |
|
|bigbench_navigate | 0|multiple_choice_grade|51.20|± | 1.58| |
|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|53.90|± | 1.11| |
|
|bigbench_ruin_names | 0|multiple_choice_grade|40.18|± | 2.32| |
|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|39.98|± | 1.55| |
|
|bigbench_snarks | 0|multiple_choice_grade|63.54|± | 3.59| |
|
|bigbench_sports_understanding | 0|multiple_choice_grade|68.36|± | 1.48| |
|
|bigbench_temporal_sequences | 0|multiple_choice_grade|65.20|± | 1.51| |
|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.48|± | 1.18| |
|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.46|± | 0.93| |
|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|45.00|± | 2.88| |
|
|
|
Average: 43.74% |
|
|
|
Average score: 59.07% |
|
|
|
Elapsed time: 02:33:24 |