UNA-SOLAR-10.7B-Instruct-v1.0 / README.md

Update README.md

34471a7 11 months ago

16.8 kB

	---
	base_model: upstage/SOLAR-10.7B-Instruct-v1.0
	tags:
	- alignment-handbook
	- generated_from_trainer
	- UNA
	- single-turn
	model-index:
	- name: UNA-SOLAR-10.7B-Instruct-v1.0
	results: []
	license: cc-by-nc-nd-4.0
	language:
	- en
	library_name: transformers
	---

	# UNA: Uniform Neural Alignment

	SFT Further:
	- Linear
	- 2e-5

	Merges:
	- Fan in: `0:2`
	- Fan out: `-4:`
	- Intermediary layers: `1/1/1/0/1/1/0/1/0/1/1/0/1/1/0` use the On/Off as a way of regularise.
	## Quants

	* [ggml-model-q5_k_m.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q5_k_m.gguf?download=true)
	* [ggml-model-q6_k.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q6_k.gguf?download=true)

	## Libraries:

	- Transformers 4.35.0-UNA
	- Pytorch 2.1.0
	- Datasets 2.14.6
	- Tokenizers 0.14.1

	## Evals LM-Evaluation Harness
	`mt-bench`:
	```
	Mode: single
	Input file: data/mt_bench/model_judgment/gpt-4_single.jsonl

	########## First turn ##########
	score
	model turn
	gpt-4 1 8.95625
	claude-v1 1 8.15000
	gpt-3.5-turbo 1 8.07500
	LUNA-SOLARkrautLM-Instruct 1 7.93750
	UNA-SOLAR-10.7B-Instruct-v1.0 1 7.80625
	vicuna-33b-v1.3 1 7.45625
	wizardlm-30b 1 7.13125
	tulu-30b 1 7.01875
	vicuna-13b-v1.3 1 6.81250
	guanaco-65b 1 6.78125
	nous-hermes-13b 1 6.43125
	alpaca-13b 1 4.97500
	rwkv-4-raven-14b 1 4.74375
	llama-13b 1 3.26250

	########## Second turn ##########
	score
	model turn
	gpt-4 2 9.025000
	gpt-3.5-turbo 2 7.812500
	claude-v1 2 7.650000
	UNA-SOLAR-10.7B-Instruct-v1.0 2 7.237500
	LUNA-SOLARkrautLM-Instruct 2 6.987500
	wizardlm-30b 2 6.887500
	vicuna-33b-v1.3 2 6.787500
	guanaco-65b 2 6.037500
	vicuna-13b-v1.3 2 5.962500
	tulu-30b 2 5.850000
	nous-hermes-13b 2 4.664557
	alpaca-13b 2 4.087500
	rwkv-4-raven-14b 2 3.225000
	llama-13b 2 1.950000

	########## Average ##########
	score
	model
	gpt-4 8.990625
	gpt-3.5-turbo 7.943750
	claude-instant-v1 7.905660
	claude-v1 7.900000
	UNA-SOLAR-10.7B-Instruct-v1.0 7.521875
	LUNA-SOLARkrautLM-Instruct 7.462500
	vicuna-33b-v1.3 7.121875
	wizardlm-30b 7.009375
	Llama-2-70b-chat 6.856250
	Llama-2-13b-chat 6.650000
	guanaco-33b 6.528125
	tulu-30b 6.434375
	guanaco-65b 6.409375
	oasst-sft-7-llama-30b 6.409375
	palm-2-chat-bison-001 6.400000
	mpt-30b-chat 6.393750
	vicuna-13b-v1.3 6.387500
	wizardlm-13b 6.353125
	Llama-2-7b-chat 6.268750
	vicuna-7b-v1.3 5.996875
	baize-v2-13b 5.750000
	nous-hermes-13b 5.553459
	mpt-7b-chat 5.459119
	gpt4all-13b-snoozy 5.452830
	koala-13b 5.350000
	mpt-30b-instruct 5.218750
	falcon-40b-instruct 5.168750
	h2ogpt-oasst-open-llama-13b 4.625000
	alpaca-13b 4.531250
	chatglm-6b 4.500000
	oasst-sft-4-pythia-12b 4.318750
	rwkv-4-raven-14b 3.984375
	dolly-v2-12b 3.275000
	fastchat-t5-3b 3.040625
	stablelm-tuned-alpha-7b 2.753125
	llama-13b 2.606250
	```

	`big-refactor` branch:

	```
	hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 25, batch_size: auto (32)
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|-------------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\|Yaml \|none \| 25\|acc \|0.6954\|± \|0.0134\|
	\| \| \|none \| 25\|acc_norm\|0.7167\|± \|0.0132\|

	hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
	\|Tasks\|Version\| Filter \|n-shot\| Metric \|Value\| \|Stderr\|
	\|-----\|-------\|----------\|-----:\|-----------\|----:\|---\|-----:\|
	\|gsm8k\|Yaml \|get-answer\| 5\|exact_match\|0.671\|± \|0.0129\|

	hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64)
	\| Tasks \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|--------------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|truthfulqa_mc2\|Yaml \|none \| 0\|acc \|0.7297\|_ \|0.0149\|

	hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 10, batch_size: auto (32)
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|---------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|hellaswag\|Yaml \|none \| 10\|acc \|0.7091\|± \|0.0045\|
	\| \| \|none \| 10\|acc_norm\|0.8821\|± \|0.0032\|

	hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (32)
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|--------------\|-------\|------\|-----:\|----------\|-----:\|---\|-----:\|
	\|boolq \|Yaml \|none \| 0\|acc \|0.8807\|_ \|0.0057\|
	\|lambada_openai\|Yaml \|none \| 0\|perplexity\|3.2452\|_ \|0.0778\|
	\| \| \|none \| 0\|acc \|0.7207\|_ \|0.0063\|
	\|piqa \|Yaml \|none \| 0\|acc \|0.8020\|_ \|0.0093\|
	\| \| \|none \| 0\|acc_norm \|0.8009\|_ \|0.0093\|
	\|sciq \|Yaml \|none \| 0\|acc \|0.9730\|_ \|0.0051\|
	\| \| \|none \| 0\|acc_norm \|0.9630\|_ \|0.0060\|
	\|winogrande \|Yaml \|none \| 0\|acc \|0.7577\|_ \|0.0120\|

	hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64)
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|--------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|mathqa \|Yaml \|none \| 0\|acc \|0.3474\|_ \|0.0087\|
	\| \| \|none \| 0\|acc_norm\|0.3568\|_ \|0.0088\|
	\|pubmedqa\|Yaml \|none \| 0\|acc \|0.5400\|_ \|0.0223\|

	hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|------------------------------------------------------\|-------\|------\|-----:\|-----------\|-----:\|---\|-----:\|
	\|bbh_fewshot \|N/A \|none \| 0\|exact_match\|0.4660\|_ \|0.1771\|
	\| - bbh_fewshot_boolean_expressions \|Yaml \|none \| 0\|exact_match\|0.8160\|_ \|0.0246\|
	\| - bbh_fewshot_causal_judgement \|Yaml \|none \| 0\|exact_match\|0.4973\|_ \|0.0367\|
	\| - bbh_fewshot_date_understanding \|Yaml \|none \| 0\|exact_match\|0.4840\|_ \|0.0317\|
	\| - bbh_fewshot_disambiguation_qa \|Yaml \|none \| 0\|exact_match\|0.6520\|_ \|0.0302\|
	\| - bbh_fewshot_dyck_languages \|Yaml \|none \| 0\|exact_match\|0.2040\|_ \|0.0255\|
	\| - bbh_fewshot_formal_fallacies \|Yaml \|none \| 0\|exact_match\|0.5280\|_ \|0.0316\|
	\| - bbh_fewshot_geometric_shapes \|Yaml \|none \| 0\|exact_match\|0.3360\|_ \|0.0299\|
	\| - bbh_fewshot_hyperbaton \|Yaml \|none \| 0\|exact_match\|0.5520\|_ \|0.0315\|
	\| - bbh_fewshot_logical_deduction_five_objects \|Yaml \|none \| 0\|exact_match\|0.4520\|_ \|0.0315\|
	\| - bbh_fewshot_logical_deduction_seven_objects \|Yaml \|none \| 0\|exact_match\|0.3920\|_ \|0.0309\|
	\| - bbh_fewshot_logical_deduction_three_objects \|Yaml \|none \| 0\|exact_match\|0.6200\|_ \|0.0308\|
	\| - bbh_fewshot_movie_recommendation \|Yaml \|none \| 0\|exact_match\|0.6640\|_ \|0.0299\|
	\| - bbh_fewshot_multistep_arithmetic_two \|Yaml \|none \| 0\|exact_match\|0.0080\|_ \|0.0056\|
	\| - bbh_fewshot_navigate \|Yaml \|none \| 0\|exact_match\|0.6280\|_ \|0.0306\|
	\| - bbh_fewshot_object_counting \|Yaml \|none \| 0\|exact_match\|0.3960\|_ \|0.0310\|
	\| - bbh_fewshot_penguins_in_a_table \|Yaml \|none \| 0\|exact_match\|0.4726\|_ \|0.0415\|
	\| - bbh_fewshot_reasoning_about_colored_objects \|Yaml \|none \| 0\|exact_match\|0.5320\|_ \|0.0316\|
	\| - bbh_fewshot_ruin_names \|Yaml \|none \| 0\|exact_match\|0.5680\|_ \|0.0314\|
	\| - bbh_fewshot_salient_translation_error_detection \|Yaml \|none \| 0\|exact_match\|0.5480\|_ \|0.0315\|
	\| - bbh_fewshot_snarks \|Yaml \|none \| 0\|exact_match\|0.5169\|_ \|0.0376\|
	\| - bbh_fewshot_sports_understanding \|Yaml \|none \| 0\|exact_match\|0.8320\|_ \|0.0237\|
	\| - bbh_fewshot_temporal_sequences \|Yaml \|none \| 0\|exact_match\|0.5520\|_ \|0.0315\|
	\| - bbh_fewshot_tracking_shuffled_objects_five_objects \|Yaml \|none \| 0\|exact_match\|0.1480\|_ \|0.0225\|
	\| - bbh_fewshot_tracking_shuffled_objects_seven_objects\|Yaml \|none \| 0\|exact_match\|0.1720\|_ \|0.0239\|
	\| - bbh_fewshot_tracking_shuffled_objects_three_objects\|Yaml \|none \| 0\|exact_match\|0.2760\|_ \|0.0283\|
	\| - bbh_fewshot_web_of_lies \|Yaml \|none \| 0\|exact_match\|0.4760\|_ \|0.0316\|
	\| - bbh_fewshot_word_sorting \|Yaml \|none \| 0\|exact_match\|0.2840\|_ \|0.0286\|

	\| Groups \|Version\|Filter\|n-shot\| Metric \|Value\| \|Stderr\|
	\|-----------\|-------\|------\|-----:\|-----------\|----:\|---\|-----:\|
	\|bbh_fewshot\|N/A \|none \| 0\|exact_match\|0.466\|_ \|0.1771\|

	hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto (16)
	\| Tasks \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|---------------------------------------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|mmlu \|N/A \|none \| 0\|acc \|0.6513\|± \|0.1221\|
	\| - humanities \|N/A \|none \| 5\|acc \|0.6077\|± \|0.1185\|
	\| - formal_logic \|Yaml \|none \| 5\|acc \|0.4444\|± \|0.0444\|
	\| - high_school_european_history \|Yaml \|none \| 5\|acc \|0.8121\|± \|0.0305\|
	\| - high_school_us_history \|Yaml \|none \| 5\|acc \|0.8431\|± \|0.0255\|
	\| - high_school_world_history \|Yaml \|none \| 5\|acc \|0.8523\|± \|0.0231\|
	\| - international_law \|Yaml \|none \| 5\|acc \|0.7851\|± \|0.0375\|
	\| - jurisprudence \|Yaml \|none \| 5\|acc \|0.7870\|± \|0.0396\|
	\| - logical_fallacies \|Yaml \|none \| 5\|acc \|0.7546\|± \|0.0338\|
	\| - moral_disputes \|Yaml \|none \| 5\|acc \|0.7370\|± \|0.0237\|
	\| - moral_scenarios \|Yaml \|none \| 5\|acc \|0.4101\|± \|0.0164\|
	\| - philosophy \|Yaml \|none \| 5\|acc \|0.7170\|± \|0.0256\|
	\| - prehistory \|Yaml \|none \| 5\|acc \|0.7840\|± \|0.0229\|
	\| - professional_law \|Yaml \|none \| 5\|acc \|0.4941\|± \|0.0128\|
	\| - world_religions \|Yaml \|none \| 5\|acc \|0.7895\|± \|0.0313\|
	\| - other \|N/A \|none \| 5\|acc \|0.7116\|± \|0.0939\|
	\| - business_ethics \|Yaml \|none \| 5\|acc \|0.7600\|± \|0.0429\|
	\| - clinical_knowledge \|Yaml \|none \| 5\|acc \|0.6792\|± \|0.0287\|
	\| - college_medicine \|Yaml \|none \| 5\|acc \|0.6590\|± \|0.0361\|
	\| - global_facts \|Yaml \|none \| 5\|acc \|0.3400\|± \|0.0476\|
	\| - human_aging \|Yaml \|none \| 5\|acc \|0.6816\|± \|0.0313\|
	\| - management \|Yaml \|none \| 5\|acc \|0.8350\|± \|0.0368\|
	\| - marketing \|Yaml \|none \| 5\|acc \|0.8547\|± \|0.0231\|
	\| - medical_genetics \|Yaml \|none \| 5\|acc \|0.7000\|± \|0.0461\|
	\| - miscellaneous \|Yaml \|none \| 5\|acc \|0.8020\|± \|0.0142\|
	\| - nutrition \|Yaml \|none \| 5\|acc \|0.7418\|± \|0.0251\|
	\| - professional_accounting \|Yaml \|none \| 5\|acc \|0.5071\|± \|0.0298\|
	\| - professional_medicine \|Yaml \|none \| 5\|acc \|0.7500\|± \|0.0263\|
	\| - virology \|Yaml \|none \| 5\|acc \|0.5843\|± \|0.0384\|
	\| - social_sciences \|N/A \|none \| 5\|acc \|0.7537\|± \|0.0681\|
	\| - econometrics \|Yaml \|none \| 5\|acc \|0.5000\|± \|0.0470\|
	\| - high_school_geography \|Yaml \|none \| 5\|acc \|0.8586\|± \|0.0248\|
	\| - high_school_government_and_politics\|Yaml \|none \| 5\|acc \|0.9016\|± \|0.0215\|
	\| - high_school_macroeconomics \|Yaml \|none \| 5\|acc \|0.6615\|± \|0.0240\|
	\| - high_school_microeconomics \|Yaml \|none \| 5\|acc \|0.7311\|± \|0.0288\|
	\| - high_school_psychology \|Yaml \|none \| 5\|acc \|0.8404\|± \|0.0157\|
	\| - human_sexuality \|Yaml \|none \| 5\|acc \|0.7328\|± \|0.0388\|
	\| - professional_psychology \|Yaml \|none \| 5\|acc \|0.6814\|± \|0.0189\|
	\| - public_relations \|Yaml \|none \| 5\|acc \|0.6909\|± \|0.0443\|
	\| - security_studies \|Yaml \|none \| 5\|acc \|0.7469\|± \|0.0278\|
	\| - sociology \|Yaml \|none \| 5\|acc \|0.8308\|± \|0.0265\|
	\| - us_foreign_policy \|Yaml \|none \| 5\|acc \|0.8900\|± \|0.0314\|
	\| - stem \|N/A \|none \| 5\|acc \|0.5569\|± \|0.1380\|
	\| - abstract_algebra \|Yaml \|none \| 5\|acc \|0.4100\|± \|0.0494\|
	\| - anatomy \|Yaml \|none \| 5\|acc \|0.6222\|± \|0.0419\|
	\| - astronomy \|Yaml \|none \| 5\|acc \|0.7368\|± \|0.0358\|
	\| - college_biology \|Yaml \|none \| 5\|acc \|0.8056\|± \|0.0331\|
	\| - college_chemistry \|Yaml \|none \| 5\|acc \|0.4700\|± \|0.0502\|
	\| - college_computer_science \|Yaml \|none \| 5\|acc \|0.5100\|± \|0.0502\|
	\| - college_mathematics \|Yaml \|none \| 5\|acc \|0.2800\|± \|0.0451\|
	\| - college_physics \|Yaml \|none \| 5\|acc \|0.3431\|± \|0.0472\|
	\| - computer_security \|Yaml \|none \| 5\|acc \|0.7400\|± \|0.0441\|
	\| - conceptual_physics \|Yaml \|none \| 5\|acc \|0.6340\|± \|0.0315\|
	\| - electrical_engineering \|Yaml \|none \| 5\|acc \|0.6000\|± \|0.0408\|
	\| - elementary_mathematics \|Yaml \|none \| 5\|acc \|0.4815\|± \|0.0257\|
	\| - high_school_biology \|Yaml \|none \| 5\|acc \|0.8032\|± \|0.0226\|
	\| - high_school_chemistry \|Yaml \|none \| 5\|acc \|0.4877\|± \|0.0352\|
	\| - high_school_computer_science \|Yaml \|none \| 5\|acc \|0.7200\|± \|0.0451\|
	\| - high_school_mathematics \|Yaml \|none \| 5\|acc \|0.3815\|± \|0.0296\|
	\| - high_school_physics \|Yaml \|none \| 5\|acc \|0.3576\|± \|0.0391\|
	\| - high_school_statistics \|Yaml \|none \| 5\|acc \|0.5602\|± \|0.0339\|
	\| - machine_learning \|Yaml \|none \| 5\|acc \|0.4643\|± \|0.0473\|

	\| Groups \|Version\|Filter\|n-shot\|Metric\|Value \| \|Stderr\|
	\|------------------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|mmlu \|N/A \|none \| 0\|acc \|0.6513\|± \|0.1221\|
	\| - humanities \|N/A \|none \| 5\|acc \|0.6077\|± \|0.1185\|
	\| - other \|N/A \|none \| 5\|acc \|0.7116\|± \|0.0939\|
	\| - social_sciences\|N/A \|none \| 5\|acc \|0.7537\|± \|0.0681\|
	\| - stem \|N/A \|none \| 5\|acc \|0.5569\|± \|0.1380\|
	```


	## Citations

	to [Upstage.AI](https://huggingface.co/upstage) for its awesome base model, this is merely a UNA of it. It can only refine what its already in there :)

	If you find UNA-SOLAR useful, cite and support the authors.