fblgit's picture
Update README.md
34471a7
|
raw
history blame
16.8 kB
---
base_model: upstage/SOLAR-10.7B-Instruct-v1.0
tags:
- alignment-handbook
- generated_from_trainer
- UNA
- single-turn
model-index:
- name: UNA-SOLAR-10.7B-Instruct-v1.0
results: []
license: cc-by-nc-nd-4.0
language:
- en
library_name: transformers
---
# UNA: Uniform Neural Alignment
SFT Further:
- Linear
- 2e-5
Merges:
- Fan in: `0:2`
- Fan out: `-4:`
- Intermediary layers: `1/1/1/0/1/1/0/1/0/1/1/0/1/1/0` use the On/Off as a way of regularise.
## Quants
* [ggml-model-q5_k_m.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q5_k_m.gguf?download=true)
* [ggml-model-q6_k.gguf](https://huggingface.co/fblgit/UNA-SOLAR-10.7B-Instruct-v1.0/resolve/main/ggml-model-q6_k.gguf?download=true)
## Libraries:
- Transformers 4.35.0-UNA
- Pytorch 2.1.0
- Datasets 2.14.6
- Tokenizers 0.14.1
## Evals LM-Evaluation Harness
`mt-bench`:
```
Mode: single
Input file: data/mt_bench/model_judgment/gpt-4_single.jsonl
########## First turn ##########
score
model turn
gpt-4 1 8.95625
claude-v1 1 8.15000
gpt-3.5-turbo 1 8.07500
LUNA-SOLARkrautLM-Instruct 1 7.93750
UNA-SOLAR-10.7B-Instruct-v1.0 1 7.80625
vicuna-33b-v1.3 1 7.45625
wizardlm-30b 1 7.13125
tulu-30b 1 7.01875
vicuna-13b-v1.3 1 6.81250
guanaco-65b 1 6.78125
nous-hermes-13b 1 6.43125
alpaca-13b 1 4.97500
rwkv-4-raven-14b 1 4.74375
llama-13b 1 3.26250
########## Second turn ##########
score
model turn
gpt-4 2 9.025000
gpt-3.5-turbo 2 7.812500
claude-v1 2 7.650000
UNA-SOLAR-10.7B-Instruct-v1.0 2 7.237500
LUNA-SOLARkrautLM-Instruct 2 6.987500
wizardlm-30b 2 6.887500
vicuna-33b-v1.3 2 6.787500
guanaco-65b 2 6.037500
vicuna-13b-v1.3 2 5.962500
tulu-30b 2 5.850000
nous-hermes-13b 2 4.664557
alpaca-13b 2 4.087500
rwkv-4-raven-14b 2 3.225000
llama-13b 2 1.950000
########## Average ##########
score
model
gpt-4 8.990625
gpt-3.5-turbo 7.943750
claude-instant-v1 7.905660
claude-v1 7.900000
UNA-SOLAR-10.7B-Instruct-v1.0 7.521875
LUNA-SOLARkrautLM-Instruct 7.462500
vicuna-33b-v1.3 7.121875
wizardlm-30b 7.009375
Llama-2-70b-chat 6.856250
Llama-2-13b-chat 6.650000
guanaco-33b 6.528125
tulu-30b 6.434375
guanaco-65b 6.409375
oasst-sft-7-llama-30b 6.409375
palm-2-chat-bison-001 6.400000
mpt-30b-chat 6.393750
vicuna-13b-v1.3 6.387500
wizardlm-13b 6.353125
Llama-2-7b-chat 6.268750
vicuna-7b-v1.3 5.996875
baize-v2-13b 5.750000
nous-hermes-13b 5.553459
mpt-7b-chat 5.459119
gpt4all-13b-snoozy 5.452830
koala-13b 5.350000
mpt-30b-instruct 5.218750
falcon-40b-instruct 5.168750
h2ogpt-oasst-open-llama-13b 4.625000
alpaca-13b 4.531250
chatglm-6b 4.500000
oasst-sft-4-pythia-12b 4.318750
rwkv-4-raven-14b 3.984375
dolly-v2-12b 3.275000
fastchat-t5-3b 3.040625
stablelm-tuned-alpha-7b 2.753125
llama-13b 2.606250
```
`big-refactor` branch:
```
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 25, batch_size: auto (32)
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|-------------|-------|------|-----:|--------|-----:|---|-----:|
|arc_challenge|Yaml |none | 25|acc |0.6954|± |0.0134|
| | |none | 25|acc_norm|0.7167|± |0.0132|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version| Filter |n-shot| Metric |Value| |Stderr|
|-----|-------|----------|-----:|-----------|----:|---|-----:|
|gsm8k|Yaml |get-answer| 5|exact_match|0.671|± |0.0129|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64)
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|--------------|-------|------|-----:|------|-----:|---|-----:|
|truthfulqa_mc2|Yaml |none | 0|acc |0.7297|_ |0.0149|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 10, batch_size: auto (32)
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|---------|-------|------|-----:|--------|-----:|---|-----:|
|hellaswag|Yaml |none | 10|acc |0.7091|± |0.0045|
| | |none | 10|acc_norm|0.8821|± |0.0032|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (32)
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------------|-------|------|-----:|----------|-----:|---|-----:|
|boolq |Yaml |none | 0|acc |0.8807|_ |0.0057|
|lambada_openai|Yaml |none | 0|perplexity|3.2452|_ |0.0778|
| | |none | 0|acc |0.7207|_ |0.0063|
|piqa |Yaml |none | 0|acc |0.8020|_ |0.0093|
| | |none | 0|acc_norm |0.8009|_ |0.0093|
|sciq |Yaml |none | 0|acc |0.9730|_ |0.0051|
| | |none | 0|acc_norm |0.9630|_ |0.0060|
|winogrande |Yaml |none | 0|acc |0.7577|_ |0.0120|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64)
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|--------|-------|------|-----:|--------|-----:|---|-----:|
|mathqa |Yaml |none | 0|acc |0.3474|_ |0.0087|
| | |none | 0|acc_norm|0.3568|_ |0.0088|
|pubmedqa|Yaml |none | 0|acc |0.5400|_ |0.0223|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto
| Tasks |Version|Filter|n-shot| Metric |Value | |Stderr|
|------------------------------------------------------|-------|------|-----:|-----------|-----:|---|-----:|
|bbh_fewshot |N/A |none | 0|exact_match|0.4660|_ |0.1771|
| - bbh_fewshot_boolean_expressions |Yaml |none | 0|exact_match|0.8160|_ |0.0246|
| - bbh_fewshot_causal_judgement |Yaml |none | 0|exact_match|0.4973|_ |0.0367|
| - bbh_fewshot_date_understanding |Yaml |none | 0|exact_match|0.4840|_ |0.0317|
| - bbh_fewshot_disambiguation_qa |Yaml |none | 0|exact_match|0.6520|_ |0.0302|
| - bbh_fewshot_dyck_languages |Yaml |none | 0|exact_match|0.2040|_ |0.0255|
| - bbh_fewshot_formal_fallacies |Yaml |none | 0|exact_match|0.5280|_ |0.0316|
| - bbh_fewshot_geometric_shapes |Yaml |none | 0|exact_match|0.3360|_ |0.0299|
| - bbh_fewshot_hyperbaton |Yaml |none | 0|exact_match|0.5520|_ |0.0315|
| - bbh_fewshot_logical_deduction_five_objects |Yaml |none | 0|exact_match|0.4520|_ |0.0315|
| - bbh_fewshot_logical_deduction_seven_objects |Yaml |none | 0|exact_match|0.3920|_ |0.0309|
| - bbh_fewshot_logical_deduction_three_objects |Yaml |none | 0|exact_match|0.6200|_ |0.0308|
| - bbh_fewshot_movie_recommendation |Yaml |none | 0|exact_match|0.6640|_ |0.0299|
| - bbh_fewshot_multistep_arithmetic_two |Yaml |none | 0|exact_match|0.0080|_ |0.0056|
| - bbh_fewshot_navigate |Yaml |none | 0|exact_match|0.6280|_ |0.0306|
| - bbh_fewshot_object_counting |Yaml |none | 0|exact_match|0.3960|_ |0.0310|
| - bbh_fewshot_penguins_in_a_table |Yaml |none | 0|exact_match|0.4726|_ |0.0415|
| - bbh_fewshot_reasoning_about_colored_objects |Yaml |none | 0|exact_match|0.5320|_ |0.0316|
| - bbh_fewshot_ruin_names |Yaml |none | 0|exact_match|0.5680|_ |0.0314|
| - bbh_fewshot_salient_translation_error_detection |Yaml |none | 0|exact_match|0.5480|_ |0.0315|
| - bbh_fewshot_snarks |Yaml |none | 0|exact_match|0.5169|_ |0.0376|
| - bbh_fewshot_sports_understanding |Yaml |none | 0|exact_match|0.8320|_ |0.0237|
| - bbh_fewshot_temporal_sequences |Yaml |none | 0|exact_match|0.5520|_ |0.0315|
| - bbh_fewshot_tracking_shuffled_objects_five_objects |Yaml |none | 0|exact_match|0.1480|_ |0.0225|
| - bbh_fewshot_tracking_shuffled_objects_seven_objects|Yaml |none | 0|exact_match|0.1720|_ |0.0239|
| - bbh_fewshot_tracking_shuffled_objects_three_objects|Yaml |none | 0|exact_match|0.2760|_ |0.0283|
| - bbh_fewshot_web_of_lies |Yaml |none | 0|exact_match|0.4760|_ |0.0316|
| - bbh_fewshot_word_sorting |Yaml |none | 0|exact_match|0.2840|_ |0.0286|
| Groups |Version|Filter|n-shot| Metric |Value| |Stderr|
|-----------|-------|------|-----:|-----------|----:|---|-----:|
|bbh_fewshot|N/A |none | 0|exact_match|0.466|_ |0.1771|
hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto (16)
| Tasks |Version|Filter|n-shot|Metric|Value | |Stderr|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu |N/A |none | 0|acc |0.6513|± |0.1221|
| - humanities |N/A |none | 5|acc |0.6077|± |0.1185|
| - formal_logic |Yaml |none | 5|acc |0.4444|± |0.0444|
| - high_school_european_history |Yaml |none | 5|acc |0.8121|± |0.0305|
| - high_school_us_history |Yaml |none | 5|acc |0.8431|± |0.0255|
| - high_school_world_history |Yaml |none | 5|acc |0.8523|± |0.0231|
| - international_law |Yaml |none | 5|acc |0.7851|± |0.0375|
| - jurisprudence |Yaml |none | 5|acc |0.7870|± |0.0396|
| - logical_fallacies |Yaml |none | 5|acc |0.7546|± |0.0338|
| - moral_disputes |Yaml |none | 5|acc |0.7370|± |0.0237|
| - moral_scenarios |Yaml |none | 5|acc |0.4101|± |0.0164|
| - philosophy |Yaml |none | 5|acc |0.7170|± |0.0256|
| - prehistory |Yaml |none | 5|acc |0.7840|± |0.0229|
| - professional_law |Yaml |none | 5|acc |0.4941|± |0.0128|
| - world_religions |Yaml |none | 5|acc |0.7895|± |0.0313|
| - other |N/A |none | 5|acc |0.7116|± |0.0939|
| - business_ethics |Yaml |none | 5|acc |0.7600|± |0.0429|
| - clinical_knowledge |Yaml |none | 5|acc |0.6792|± |0.0287|
| - college_medicine |Yaml |none | 5|acc |0.6590|± |0.0361|
| - global_facts |Yaml |none | 5|acc |0.3400|± |0.0476|
| - human_aging |Yaml |none | 5|acc |0.6816|± |0.0313|
| - management |Yaml |none | 5|acc |0.8350|± |0.0368|
| - marketing |Yaml |none | 5|acc |0.8547|± |0.0231|
| - medical_genetics |Yaml |none | 5|acc |0.7000|± |0.0461|
| - miscellaneous |Yaml |none | 5|acc |0.8020|± |0.0142|
| - nutrition |Yaml |none | 5|acc |0.7418|± |0.0251|
| - professional_accounting |Yaml |none | 5|acc |0.5071|± |0.0298|
| - professional_medicine |Yaml |none | 5|acc |0.7500|± |0.0263|
| - virology |Yaml |none | 5|acc |0.5843|± |0.0384|
| - social_sciences |N/A |none | 5|acc |0.7537|± |0.0681|
| - econometrics |Yaml |none | 5|acc |0.5000|± |0.0470|
| - high_school_geography |Yaml |none | 5|acc |0.8586|± |0.0248|
| - high_school_government_and_politics|Yaml |none | 5|acc |0.9016|± |0.0215|
| - high_school_macroeconomics |Yaml |none | 5|acc |0.6615|± |0.0240|
| - high_school_microeconomics |Yaml |none | 5|acc |0.7311|± |0.0288|
| - high_school_psychology |Yaml |none | 5|acc |0.8404|± |0.0157|
| - human_sexuality |Yaml |none | 5|acc |0.7328|± |0.0388|
| - professional_psychology |Yaml |none | 5|acc |0.6814|± |0.0189|
| - public_relations |Yaml |none | 5|acc |0.6909|± |0.0443|
| - security_studies |Yaml |none | 5|acc |0.7469|± |0.0278|
| - sociology |Yaml |none | 5|acc |0.8308|± |0.0265|
| - us_foreign_policy |Yaml |none | 5|acc |0.8900|± |0.0314|
| - stem |N/A |none | 5|acc |0.5569|± |0.1380|
| - abstract_algebra |Yaml |none | 5|acc |0.4100|± |0.0494|
| - anatomy |Yaml |none | 5|acc |0.6222|± |0.0419|
| - astronomy |Yaml |none | 5|acc |0.7368|± |0.0358|
| - college_biology |Yaml |none | 5|acc |0.8056|± |0.0331|
| - college_chemistry |Yaml |none | 5|acc |0.4700|± |0.0502|
| - college_computer_science |Yaml |none | 5|acc |0.5100|± |0.0502|
| - college_mathematics |Yaml |none | 5|acc |0.2800|± |0.0451|
| - college_physics |Yaml |none | 5|acc |0.3431|± |0.0472|
| - computer_security |Yaml |none | 5|acc |0.7400|± |0.0441|
| - conceptual_physics |Yaml |none | 5|acc |0.6340|± |0.0315|
| - electrical_engineering |Yaml |none | 5|acc |0.6000|± |0.0408|
| - elementary_mathematics |Yaml |none | 5|acc |0.4815|± |0.0257|
| - high_school_biology |Yaml |none | 5|acc |0.8032|± |0.0226|
| - high_school_chemistry |Yaml |none | 5|acc |0.4877|± |0.0352|
| - high_school_computer_science |Yaml |none | 5|acc |0.7200|± |0.0451|
| - high_school_mathematics |Yaml |none | 5|acc |0.3815|± |0.0296|
| - high_school_physics |Yaml |none | 5|acc |0.3576|± |0.0391|
| - high_school_statistics |Yaml |none | 5|acc |0.5602|± |0.0339|
| - machine_learning |Yaml |none | 5|acc |0.4643|± |0.0473|
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu |N/A |none | 0|acc |0.6513|± |0.1221|
| - humanities |N/A |none | 5|acc |0.6077|± |0.1185|
| - other |N/A |none | 5|acc |0.7116|± |0.0939|
| - social_sciences|N/A |none | 5|acc |0.7537|± |0.0681|
| - stem |N/A |none | 5|acc |0.5569|± |0.1380|
```
## Citations
to [Upstage.AI](https://huggingface.co/upstage) for its awesome base model, this is merely a UNA of it. It can only refine what its already in there :)
If you find UNA-SOLAR useful, cite and support the authors.