|
--- |
|
base_model: upstage/SOLAR-10.7B-Instruct-v1.0 |
|
tags: |
|
- alignment-handbook |
|
- generated_from_trainer |
|
- UNA |
|
- single-turn |
|
model-index: |
|
- name: UNA-SOLAR-10.7B-Instruct-v1.0 |
|
results: [] |
|
license: cc-by-nc-nd-4.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
--- |
|
|
|
# UNA: Uniform Neural Alignment |
|
|
|
SFT Further: |
|
- Linear |
|
- 2e-5 |
|
|
|
Merges: |
|
- Fan in: `0:2` |
|
- Fan out: `-4:` |
|
- Intermediary layers: `1/1/1/0/1/1/0/1/0/1/1/0/1/1/0` use the On/Off as a way of regularise. |
|
## Quants |
|
|
|
* [ggml-model-q5_k_m.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q5_k_m.gguf?download=true) |
|
* [ggml-model-q6_k.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q6_k.gguf?download=true) |
|
|
|
## Libraries: |
|
|
|
- Transformers 4.35.0-UNA |
|
- Pytorch 2.1.0 |
|
- Datasets 2.14.6 |
|
- Tokenizers 0.14.1 |
|
|
|
## Evals LM-Evaluation Harness |
|
`mt-bench`: |
|
``` |
|
Mode: single |
|
Input file: data/mt_bench/model_judgment/gpt-4_single.jsonl |
|
|
|
########## First turn ########## |
|
score |
|
model turn |
|
gpt-4 1 8.95625 |
|
claude-v1 1 8.15000 |
|
gpt-3.5-turbo 1 8.07500 |
|
LUNA-SOLARkrautLM-Instruct 1 7.93750 |
|
UNA-SOLAR-10.7B-Instruct-v1.0 1 7.80625 |
|
vicuna-33b-v1.3 1 7.45625 |
|
wizardlm-30b 1 7.13125 |
|
tulu-30b 1 7.01875 |
|
vicuna-13b-v1.3 1 6.81250 |
|
guanaco-65b 1 6.78125 |
|
nous-hermes-13b 1 6.43125 |
|
alpaca-13b 1 4.97500 |
|
rwkv-4-raven-14b 1 4.74375 |
|
llama-13b 1 3.26250 |
|
|
|
########## Second turn ########## |
|
score |
|
model turn |
|
gpt-4 2 9.025000 |
|
gpt-3.5-turbo 2 7.812500 |
|
claude-v1 2 7.650000 |
|
UNA-SOLAR-10.7B-Instruct-v1.0 2 7.237500 |
|
LUNA-SOLARkrautLM-Instruct 2 6.987500 |
|
wizardlm-30b 2 6.887500 |
|
vicuna-33b-v1.3 2 6.787500 |
|
guanaco-65b 2 6.037500 |
|
vicuna-13b-v1.3 2 5.962500 |
|
tulu-30b 2 5.850000 |
|
nous-hermes-13b 2 4.664557 |
|
alpaca-13b 2 4.087500 |
|
rwkv-4-raven-14b 2 3.225000 |
|
llama-13b 2 1.950000 |
|
|
|
########## Average ########## |
|
score |
|
model |
|
gpt-4 8.990625 |
|
gpt-3.5-turbo 7.943750 |
|
claude-instant-v1 7.905660 |
|
claude-v1 7.900000 |
|
UNA-SOLAR-10.7B-Instruct-v1.0 7.521875 |
|
LUNA-SOLARkrautLM-Instruct 7.462500 |
|
vicuna-33b-v1.3 7.121875 |
|
wizardlm-30b 7.009375 |
|
Llama-2-70b-chat 6.856250 |
|
Llama-2-13b-chat 6.650000 |
|
guanaco-33b 6.528125 |
|
tulu-30b 6.434375 |
|
guanaco-65b 6.409375 |
|
oasst-sft-7-llama-30b 6.409375 |
|
palm-2-chat-bison-001 6.400000 |
|
mpt-30b-chat 6.393750 |
|
vicuna-13b-v1.3 6.387500 |
|
wizardlm-13b 6.353125 |
|
Llama-2-7b-chat 6.268750 |
|
vicuna-7b-v1.3 5.996875 |
|
baize-v2-13b 5.750000 |
|
nous-hermes-13b 5.553459 |
|
mpt-7b-chat 5.459119 |
|
gpt4all-13b-snoozy 5.452830 |
|
koala-13b 5.350000 |
|
mpt-30b-instruct 5.218750 |
|
falcon-40b-instruct 5.168750 |
|
h2ogpt-oasst-open-llama-13b 4.625000 |
|
alpaca-13b 4.531250 |
|
chatglm-6b 4.500000 |
|
oasst-sft-4-pythia-12b 4.318750 |
|
rwkv-4-raven-14b 3.984375 |
|
dolly-v2-12b 3.275000 |
|
fastchat-t5-3b 3.040625 |
|
stablelm-tuned-alpha-7b 2.753125 |
|
llama-13b 2.606250 |
|
``` |
|
|
|
`big-refactor` branch: |
|
|
|
``` |
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 25, batch_size: auto (32) |
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|-------------|-------|------|-----:|--------|-----:|---|-----:| |
|
|arc_challenge|Yaml |none | 25|acc |0.6954|± |0.0134| |
|
| | |none | 25|acc_norm|0.7167|± |0.0132| |
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto |
|
|Tasks|Version| Filter |n-shot| Metric |Value| |Stderr| |
|
|-----|-------|----------|-----:|-----------|----:|---|-----:| |
|
|gsm8k|Yaml |get-answer| 5|exact_match|0.671|± |0.0129| |
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64) |
|
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|--------------|-------|------|-----:|------|-----:|---|-----:| |
|
|truthfulqa_mc2|Yaml |none | 0|acc |0.7297|_ |0.0149| |
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 10, batch_size: auto (32) |
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|---------|-------|------|-----:|--------|-----:|---|-----:| |
|
|hellaswag|Yaml |none | 10|acc |0.7091|± |0.0045| |
|
| | |none | 10|acc_norm|0.8821|± |0.0032| |
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (32) |
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|--------------|-------|------|-----:|----------|-----:|---|-----:| |
|
|boolq |Yaml |none | 0|acc |0.8807|_ |0.0057| |
|
|lambada_openai|Yaml |none | 0|perplexity|3.2452|_ |0.0778| |
|
| | |none | 0|acc |0.7207|_ |0.0063| |
|
|piqa |Yaml |none | 0|acc |0.8020|_ |0.0093| |
|
| | |none | 0|acc_norm |0.8009|_ |0.0093| |
|
|sciq |Yaml |none | 0|acc |0.9730|_ |0.0051| |
|
| | |none | 0|acc_norm |0.9630|_ |0.0060| |
|
|winogrande |Yaml |none | 0|acc |0.7577|_ |0.0120| |
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64) |
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|--------|-------|------|-----:|--------|-----:|---|-----:| |
|
|mathqa |Yaml |none | 0|acc |0.3474|_ |0.0087| |
|
| | |none | 0|acc_norm|0.3568|_ |0.0088| |
|
|pubmedqa|Yaml |none | 0|acc |0.5400|_ |0.0223| |
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto |
|
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|------------------------------------------------------|-------|------|-----:|-----------|-----:|---|-----:| |
|
|bbh_fewshot |N/A |none | 0|exact_match|0.4660|_ |0.1771| |
|
| - bbh_fewshot_boolean_expressions |Yaml |none | 0|exact_match|0.8160|_ |0.0246| |
|
| - bbh_fewshot_causal_judgement |Yaml |none | 0|exact_match|0.4973|_ |0.0367| |
|
| - bbh_fewshot_date_understanding |Yaml |none | 0|exact_match|0.4840|_ |0.0317| |
|
| - bbh_fewshot_disambiguation_qa |Yaml |none | 0|exact_match|0.6520|_ |0.0302| |
|
| - bbh_fewshot_dyck_languages |Yaml |none | 0|exact_match|0.2040|_ |0.0255| |
|
| - bbh_fewshot_formal_fallacies |Yaml |none | 0|exact_match|0.5280|_ |0.0316| |
|
| - bbh_fewshot_geometric_shapes |Yaml |none | 0|exact_match|0.3360|_ |0.0299| |
|
| - bbh_fewshot_hyperbaton |Yaml |none | 0|exact_match|0.5520|_ |0.0315| |
|
| - bbh_fewshot_logical_deduction_five_objects |Yaml |none | 0|exact_match|0.4520|_ |0.0315| |
|
| - bbh_fewshot_logical_deduction_seven_objects |Yaml |none | 0|exact_match|0.3920|_ |0.0309| |
|
| - bbh_fewshot_logical_deduction_three_objects |Yaml |none | 0|exact_match|0.6200|_ |0.0308| |
|
| - bbh_fewshot_movie_recommendation |Yaml |none | 0|exact_match|0.6640|_ |0.0299| |
|
| - bbh_fewshot_multistep_arithmetic_two |Yaml |none | 0|exact_match|0.0080|_ |0.0056| |
|
| - bbh_fewshot_navigate |Yaml |none | 0|exact_match|0.6280|_ |0.0306| |
|
| - bbh_fewshot_object_counting |Yaml |none | 0|exact_match|0.3960|_ |0.0310| |
|
| - bbh_fewshot_penguins_in_a_table |Yaml |none | 0|exact_match|0.4726|_ |0.0415| |
|
| - bbh_fewshot_reasoning_about_colored_objects |Yaml |none | 0|exact_match|0.5320|_ |0.0316| |
|
| - bbh_fewshot_ruin_names |Yaml |none | 0|exact_match|0.5680|_ |0.0314| |
|
| - bbh_fewshot_salient_translation_error_detection |Yaml |none | 0|exact_match|0.5480|_ |0.0315| |
|
| - bbh_fewshot_snarks |Yaml |none | 0|exact_match|0.5169|_ |0.0376| |
|
| - bbh_fewshot_sports_understanding |Yaml |none | 0|exact_match|0.8320|_ |0.0237| |
|
| - bbh_fewshot_temporal_sequences |Yaml |none | 0|exact_match|0.5520|_ |0.0315| |
|
| - bbh_fewshot_tracking_shuffled_objects_five_objects |Yaml |none | 0|exact_match|0.1480|_ |0.0225| |
|
| - bbh_fewshot_tracking_shuffled_objects_seven_objects|Yaml |none | 0|exact_match|0.1720|_ |0.0239| |
|
| - bbh_fewshot_tracking_shuffled_objects_three_objects|Yaml |none | 0|exact_match|0.2760|_ |0.0283| |
|
| - bbh_fewshot_web_of_lies |Yaml |none | 0|exact_match|0.4760|_ |0.0316| |
|
| - bbh_fewshot_word_sorting |Yaml |none | 0|exact_match|0.2840|_ |0.0286| |
|
|
|
| Groups |Version|Filter|n-shot| Metric |Value| |Stderr| |
|
|-----------|-------|------|-----:|-----------|----:|---|-----:| |
|
|bbh_fewshot|N/A |none | 0|exact_match|0.466|_ |0.1771| |
|
|
|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto (16) |
|
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:| |
|
|mmlu |N/A |none | 0|acc |0.6513|± |0.1221| |
|
| - humanities |N/A |none | 5|acc |0.6077|± |0.1185| |
|
| - formal_logic |Yaml |none | 5|acc |0.4444|± |0.0444| |
|
| - high_school_european_history |Yaml |none | 5|acc |0.8121|± |0.0305| |
|
| - high_school_us_history |Yaml |none | 5|acc |0.8431|± |0.0255| |
|
| - high_school_world_history |Yaml |none | 5|acc |0.8523|± |0.0231| |
|
| - international_law |Yaml |none | 5|acc |0.7851|± |0.0375| |
|
| - jurisprudence |Yaml |none | 5|acc |0.7870|± |0.0396| |
|
| - logical_fallacies |Yaml |none | 5|acc |0.7546|± |0.0338| |
|
| - moral_disputes |Yaml |none | 5|acc |0.7370|± |0.0237| |
|
| - moral_scenarios |Yaml |none | 5|acc |0.4101|± |0.0164| |
|
| - philosophy |Yaml |none | 5|acc |0.7170|± |0.0256| |
|
| - prehistory |Yaml |none | 5|acc |0.7840|± |0.0229| |
|
| - professional_law |Yaml |none | 5|acc |0.4941|± |0.0128| |
|
| - world_religions |Yaml |none | 5|acc |0.7895|± |0.0313| |
|
| - other |N/A |none | 5|acc |0.7116|± |0.0939| |
|
| - business_ethics |Yaml |none | 5|acc |0.7600|± |0.0429| |
|
| - clinical_knowledge |Yaml |none | 5|acc |0.6792|± |0.0287| |
|
| - college_medicine |Yaml |none | 5|acc |0.6590|± |0.0361| |
|
| - global_facts |Yaml |none | 5|acc |0.3400|± |0.0476| |
|
| - human_aging |Yaml |none | 5|acc |0.6816|± |0.0313| |
|
| - management |Yaml |none | 5|acc |0.8350|± |0.0368| |
|
| - marketing |Yaml |none | 5|acc |0.8547|± |0.0231| |
|
| - medical_genetics |Yaml |none | 5|acc |0.7000|± |0.0461| |
|
| - miscellaneous |Yaml |none | 5|acc |0.8020|± |0.0142| |
|
| - nutrition |Yaml |none | 5|acc |0.7418|± |0.0251| |
|
| - professional_accounting |Yaml |none | 5|acc |0.5071|± |0.0298| |
|
| - professional_medicine |Yaml |none | 5|acc |0.7500|± |0.0263| |
|
| - virology |Yaml |none | 5|acc |0.5843|± |0.0384| |
|
| - social_sciences |N/A |none | 5|acc |0.7537|± |0.0681| |
|
| - econometrics |Yaml |none | 5|acc |0.5000|± |0.0470| |
|
| - high_school_geography |Yaml |none | 5|acc |0.8586|± |0.0248| |
|
| - high_school_government_and_politics|Yaml |none | 5|acc |0.9016|± |0.0215| |
|
| - high_school_macroeconomics |Yaml |none | 5|acc |0.6615|± |0.0240| |
|
| - high_school_microeconomics |Yaml |none | 5|acc |0.7311|± |0.0288| |
|
| - high_school_psychology |Yaml |none | 5|acc |0.8404|± |0.0157| |
|
| - human_sexuality |Yaml |none | 5|acc |0.7328|± |0.0388| |
|
| - professional_psychology |Yaml |none | 5|acc |0.6814|± |0.0189| |
|
| - public_relations |Yaml |none | 5|acc |0.6909|± |0.0443| |
|
| - security_studies |Yaml |none | 5|acc |0.7469|± |0.0278| |
|
| - sociology |Yaml |none | 5|acc |0.8308|± |0.0265| |
|
| - us_foreign_policy |Yaml |none | 5|acc |0.8900|± |0.0314| |
|
| - stem |N/A |none | 5|acc |0.5569|± |0.1380| |
|
| - abstract_algebra |Yaml |none | 5|acc |0.4100|± |0.0494| |
|
| - anatomy |Yaml |none | 5|acc |0.6222|± |0.0419| |
|
| - astronomy |Yaml |none | 5|acc |0.7368|± |0.0358| |
|
| - college_biology |Yaml |none | 5|acc |0.8056|± |0.0331| |
|
| - college_chemistry |Yaml |none | 5|acc |0.4700|± |0.0502| |
|
| - college_computer_science |Yaml |none | 5|acc |0.5100|± |0.0502| |
|
| - college_mathematics |Yaml |none | 5|acc |0.2800|± |0.0451| |
|
| - college_physics |Yaml |none | 5|acc |0.3431|± |0.0472| |
|
| - computer_security |Yaml |none | 5|acc |0.7400|± |0.0441| |
|
| - conceptual_physics |Yaml |none | 5|acc |0.6340|± |0.0315| |
|
| - electrical_engineering |Yaml |none | 5|acc |0.6000|± |0.0408| |
|
| - elementary_mathematics |Yaml |none | 5|acc |0.4815|± |0.0257| |
|
| - high_school_biology |Yaml |none | 5|acc |0.8032|± |0.0226| |
|
| - high_school_chemistry |Yaml |none | 5|acc |0.4877|± |0.0352| |
|
| - high_school_computer_science |Yaml |none | 5|acc |0.7200|± |0.0451| |
|
| - high_school_mathematics |Yaml |none | 5|acc |0.3815|± |0.0296| |
|
| - high_school_physics |Yaml |none | 5|acc |0.3576|± |0.0391| |
|
| - high_school_statistics |Yaml |none | 5|acc |0.5602|± |0.0339| |
|
| - machine_learning |Yaml |none | 5|acc |0.4643|± |0.0473| |
|
|
|
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|------------------|-------|------|-----:|------|-----:|---|-----:| |
|
|mmlu |N/A |none | 0|acc |0.6513|± |0.1221| |
|
| - humanities |N/A |none | 5|acc |0.6077|± |0.1185| |
|
| - other |N/A |none | 5|acc |0.7116|± |0.0939| |
|
| - social_sciences|N/A |none | 5|acc |0.7537|± |0.0681| |
|
| - stem |N/A |none | 5|acc |0.5569|± |0.1380| |
|
``` |
|
|
|
|
|
## Citations |
|
|
|
to [Upstage.AI](https://huggingface.co/upstage) for its awesome base model, this is merely a UNA of it. It can only refine what its already in there :) |
|
|
|
If you find UNA-SOLAR useful, cite and support the authors. |