SOLAR-10.7B-Instruct-v1.0-laser / README.md

Update README.md

82700c8 verified about 1 year ago

5.11 kB

	---
	license: cc-by-nc-2.0
	---
	# SOLAR-10.7B-Instruct-v1.0-laser

	This version of Solar-10.7B was lasered and perplexity was calculated against gsm8k.

	+ Initial Model Perplexity: 12.865185737609863
	+ New baseline perplexity: 12.554274559020996

	The laser process decreased perplexity by 2.41%
	\| Model \|AGIEval\|GPT4All\|TruthfulQA\|Bigbench\|Average\|
	\|-----------------------------------------------------------------------------------------------------\|------:\|------:\|---------:\|-------:\|------:\|
	\|[SOLAR-10.7B-Instruct-v1.0-laser](https://huggingface.co/macadeliccc/SOLAR-10.7B-Instruct-v1.0-laser)\| 46.9\| 74.99\| 70.64\| 43.74\| 59.07\|

	### AGIEval
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------\|------:\|--------\|----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|29.53\|± \| 2.87\|
	\| \| \|acc_norm\|28.35\|± \| 2.83\|
	\|agieval_logiqa_en \| 0\|acc \|39.78\|± \| 1.92\|
	\| \| \|acc_norm\|40.55\|± \| 1.93\|
	\|agieval_lsat_ar \| 0\|acc \|23.04\|± \| 2.78\|
	\| \| \|acc_norm\|21.30\|± \| 2.71\|
	\|agieval_lsat_lr \| 0\|acc \|51.18\|± \| 2.22\|
	\| \| \|acc_norm\|51.76\|± \| 2.21\|
	\|agieval_lsat_rc \| 0\|acc \|66.54\|± \| 2.88\|
	\| \| \|acc_norm\|66.91\|± \| 2.87\|
	\|agieval_sat_en \| 0\|acc \|78.16\|± \| 2.89\|
	\| \| \|acc_norm\|78.16\|± \| 2.89\|
	\|agieval_sat_en_without_passage\| 0\|acc \|50.97\|± \| 3.49\|
	\| \| \|acc_norm\|50.00\|± \| 3.49\|
	\|agieval_sat_math \| 0\|acc \|42.73\|± \| 3.34\|
	\| \| \|acc_norm\|38.18\|± \| 3.28\|

	Average: 46.9%

	### GPT4All
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|-------------\|------:\|--------\|----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|60.84\|± \| 1.43\|
	\| \| \|acc_norm\|63.99\|± \| 1.40\|
	\|arc_easy \| 0\|acc \|83.59\|± \| 0.76\|
	\| \| \|acc_norm\|81.44\|± \| 0.80\|
	\|boolq \| 1\|acc \|87.58\|± \| 0.58\|
	\|hellaswag \| 0\|acc \|68.11\|± \| 0.47\|
	\| \| \|acc_norm\|85.77\|± \| 0.35\|
	\|openbookqa \| 0\|acc \|35.40\|± \| 2.14\|
	\| \| \|acc_norm\|48.40\|± \| 2.24\|
	\|piqa \| 0\|acc \|80.58\|± \| 0.92\|
	\| \| \|acc_norm\|80.74\|± \| 0.92\|
	\|winogrande \| 0\|acc \|77.03\|± \| 1.18\|

	Average: 74.99%

	### TruthfulQA
	\| Task \|Version\|Metric\|Value\| \|Stderr\|
	\|-------------\|------:\|------\|----:\|---\|-----:\|
	\|truthfulqa_mc\| 1\|mc1 \|55.45\|± \| 1.74\|
	\| \| \|mc2 \|70.64\|± \| 1.49\|

	Average: 70.64%

	### Bigbench
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------------------------\|------:\|---------------------\|----:\|---\|-----:\|
	\|bigbench_causal_judgement \| 0\|multiple_choice_grade\|57.37\|± \| 3.60\|
	\|bigbench_date_understanding \| 0\|multiple_choice_grade\|62.87\|± \| 2.52\|
	\|bigbench_disambiguation_qa \| 0\|multiple_choice_grade\|35.66\|± \| 2.99\|
	\|bigbench_geometric_shapes \| 0\|multiple_choice_grade\|33.15\|± \| 2.49\|
	\| \| \|exact_str_match \| 0.00\|± \| 0.00\|
	\|bigbench_logical_deduction_five_objects \| 0\|multiple_choice_grade\|26.20\|± \| 1.97\|
	\|bigbench_logical_deduction_seven_objects \| 0\|multiple_choice_grade\|19.71\|± \| 1.50\|
	\|bigbench_logical_deduction_three_objects \| 0\|multiple_choice_grade\|45.00\|± \| 2.88\|
	\|bigbench_movie_recommendation \| 0\|multiple_choice_grade\|39.00\|± \| 2.18\|
	\|bigbench_navigate \| 0\|multiple_choice_grade\|51.20\|± \| 1.58\|
	\|bigbench_reasoning_about_colored_objects \| 0\|multiple_choice_grade\|53.90\|± \| 1.11\|
	\|bigbench_ruin_names \| 0\|multiple_choice_grade\|40.18\|± \| 2.32\|
	\|bigbench_salient_translation_error_detection \| 0\|multiple_choice_grade\|39.98\|± \| 1.55\|
	\|bigbench_snarks \| 0\|multiple_choice_grade\|63.54\|± \| 3.59\|
	\|bigbench_sports_understanding \| 0\|multiple_choice_grade\|68.36\|± \| 1.48\|
	\|bigbench_temporal_sequences \| 0\|multiple_choice_grade\|65.20\|± \| 1.51\|
	\|bigbench_tracking_shuffled_objects_five_objects \| 0\|multiple_choice_grade\|22.48\|± \| 1.18\|
	\|bigbench_tracking_shuffled_objects_seven_objects\| 0\|multiple_choice_grade\|18.46\|± \| 0.93\|
	\|bigbench_tracking_shuffled_objects_three_objects\| 0\|multiple_choice_grade\|45.00\|± \| 2.88\|

	Average: 43.74%

	Average score: 59.07%

	Elapsed time: 02:33:24

	---
	license: cc-by-nc-2.0
	---
	# SOLAR-10.7B-Instruct-v1.0-laser

	This version of Solar-10.7B was lasered and perplexity was calculated against gsm8k.

	+ Initial Model Perplexity: 12.865185737609863
	+ New baseline perplexity: 12.554274559020996

	The laser process decreased perplexity by 2.41%
	\| Model \|AGIEval\|GPT4All\|TruthfulQA\|Bigbench\|Average\|
	\|-----------------------------------------------------------------------------------------------------\|------:\|------:\|---------:\|-------:\|------:\|
	\|[SOLAR-10.7B-Instruct-v1.0-laser](https://huggingface.co/macadeliccc/SOLAR-10.7B-Instruct-v1.0-laser)\| 46.9\| 74.99\| 70.64\| 43.74\| 59.07\|

	### AGIEval
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------\|------:\|--------\|----:\|---\|-----:\|
	\|agieval_aqua_rat \| 0\|acc \|29.53\|± \| 2.87\|
	\| \| \|acc_norm\|28.35\|± \| 2.83\|
	\|agieval_logiqa_en \| 0\|acc \|39.78\|± \| 1.92\|
	\| \| \|acc_norm\|40.55\|± \| 1.93\|
	\|agieval_lsat_ar \| 0\|acc \|23.04\|± \| 2.78\|
	\| \| \|acc_norm\|21.30\|± \| 2.71\|
	\|agieval_lsat_lr \| 0\|acc \|51.18\|± \| 2.22\|
	\| \| \|acc_norm\|51.76\|± \| 2.21\|
	\|agieval_lsat_rc \| 0\|acc \|66.54\|± \| 2.88\|
	\| \| \|acc_norm\|66.91\|± \| 2.87\|
	\|agieval_sat_en \| 0\|acc \|78.16\|± \| 2.89\|
	\| \| \|acc_norm\|78.16\|± \| 2.89\|
	\|agieval_sat_en_without_passage\| 0\|acc \|50.97\|± \| 3.49\|
	\| \| \|acc_norm\|50.00\|± \| 3.49\|
	\|agieval_sat_math \| 0\|acc \|42.73\|± \| 3.34\|
	\| \| \|acc_norm\|38.18\|± \| 3.28\|

	Average: 46.9%

	### GPT4All
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|-------------\|------:\|--------\|----:\|---\|-----:\|
	\|arc_challenge\| 0\|acc \|60.84\|± \| 1.43\|
	\| \| \|acc_norm\|63.99\|± \| 1.40\|
	\|arc_easy \| 0\|acc \|83.59\|± \| 0.76\|
	\| \| \|acc_norm\|81.44\|± \| 0.80\|
	\|boolq \| 1\|acc \|87.58\|± \| 0.58\|
	\|hellaswag \| 0\|acc \|68.11\|± \| 0.47\|
	\| \| \|acc_norm\|85.77\|± \| 0.35\|
	\|openbookqa \| 0\|acc \|35.40\|± \| 2.14\|
	\| \| \|acc_norm\|48.40\|± \| 2.24\|
	\|piqa \| 0\|acc \|80.58\|± \| 0.92\|
	\| \| \|acc_norm\|80.74\|± \| 0.92\|
	\|winogrande \| 0\|acc \|77.03\|± \| 1.18\|

	Average: 74.99%

	### TruthfulQA
	\| Task \|Version\|Metric\|Value\| \|Stderr\|
	\|-------------\|------:\|------\|----:\|---\|-----:\|
	\|truthfulqa_mc\| 1\|mc1 \|55.45\|± \| 1.74\|
	\| \| \|mc2 \|70.64\|± \| 1.49\|

	Average: 70.64%

	### Bigbench
	\| Task \|Version\| Metric \|Value\| \|Stderr\|
	\|------------------------------------------------\|------:\|---------------------\|----:\|---\|-----:\|
	\|bigbench_causal_judgement \| 0\|multiple_choice_grade\|57.37\|± \| 3.60\|
	\|bigbench_date_understanding \| 0\|multiple_choice_grade\|62.87\|± \| 2.52\|
	\|bigbench_disambiguation_qa \| 0\|multiple_choice_grade\|35.66\|± \| 2.99\|
	\|bigbench_geometric_shapes \| 0\|multiple_choice_grade\|33.15\|± \| 2.49\|
	\| \| \|exact_str_match \| 0.00\|± \| 0.00\|
	\|bigbench_logical_deduction_five_objects \| 0\|multiple_choice_grade\|26.20\|± \| 1.97\|
	\|bigbench_logical_deduction_seven_objects \| 0\|multiple_choice_grade\|19.71\|± \| 1.50\|
	\|bigbench_logical_deduction_three_objects \| 0\|multiple_choice_grade\|45.00\|± \| 2.88\|
	\|bigbench_movie_recommendation \| 0\|multiple_choice_grade\|39.00\|± \| 2.18\|
	\|bigbench_navigate \| 0\|multiple_choice_grade\|51.20\|± \| 1.58\|
	\|bigbench_reasoning_about_colored_objects \| 0\|multiple_choice_grade\|53.90\|± \| 1.11\|
	\|bigbench_ruin_names \| 0\|multiple_choice_grade\|40.18\|± \| 2.32\|
	\|bigbench_salient_translation_error_detection \| 0\|multiple_choice_grade\|39.98\|± \| 1.55\|
	\|bigbench_snarks \| 0\|multiple_choice_grade\|63.54\|± \| 3.59\|
	\|bigbench_sports_understanding \| 0\|multiple_choice_grade\|68.36\|± \| 1.48\|
	\|bigbench_temporal_sequences \| 0\|multiple_choice_grade\|65.20\|± \| 1.51\|
	\|bigbench_tracking_shuffled_objects_five_objects \| 0\|multiple_choice_grade\|22.48\|± \| 1.18\|
	\|bigbench_tracking_shuffled_objects_seven_objects\| 0\|multiple_choice_grade\|18.46\|± \| 0.93\|
	\|bigbench_tracking_shuffled_objects_three_objects\| 0\|multiple_choice_grade\|45.00\|± \| 2.88\|

	Average: 43.74%

	Average score: 59.07%

	Elapsed time: 02:33:24