fblgit's picture
Update README.md
34471a7
|
raw
history blame
16.8 kB
metadata
base_model: upstage/SOLAR-10.7B-Instruct-v1.0
tags:
  - alignment-handbook
  - generated_from_trainer
  - UNA
  - single-turn
model-index:
  - name: UNA-SOLAR-10.7B-Instruct-v1.0
    results: []
license: cc-by-nc-nd-4.0
language:
  - en
library_name: transformers

UNA: Uniform Neural Alignment

SFT Further:

  • Linear
  • 2e-5

Merges:

  • Fan in: 0:2
  • Fan out: -4:
  • Intermediary layers: 1/1/1/0/1/1/0/1/0/1/1/0/1/1/0 use the On/Off as a way of regularise.

Quants

Libraries:

  • Transformers 4.35.0-UNA
  • Pytorch 2.1.0
  • Datasets 2.14.6
  • Tokenizers 0.14.1

Evals LM-Evaluation Harness

mt-bench:

Mode: single
Input file: data/mt_bench/model_judgment/gpt-4_single.jsonl

########## First turn ##########
                                      score
model                         turn
gpt-4                         1     8.95625
claude-v1                     1     8.15000
gpt-3.5-turbo                 1     8.07500
LUNA-SOLARkrautLM-Instruct    1     7.93750
UNA-SOLAR-10.7B-Instruct-v1.0 1     7.80625
vicuna-33b-v1.3               1     7.45625
wizardlm-30b                  1     7.13125
tulu-30b                      1     7.01875
vicuna-13b-v1.3               1     6.81250
guanaco-65b                   1     6.78125
nous-hermes-13b               1     6.43125
alpaca-13b                    1     4.97500
rwkv-4-raven-14b              1     4.74375
llama-13b                     1     3.26250

########## Second turn ##########
                                       score
model                         turn
gpt-4                         2     9.025000
gpt-3.5-turbo                 2     7.812500
claude-v1                     2     7.650000
UNA-SOLAR-10.7B-Instruct-v1.0 2     7.237500
LUNA-SOLARkrautLM-Instruct    2     6.987500
wizardlm-30b                  2     6.887500
vicuna-33b-v1.3               2     6.787500
guanaco-65b                   2     6.037500
vicuna-13b-v1.3               2     5.962500
tulu-30b                      2     5.850000
nous-hermes-13b               2     4.664557
alpaca-13b                    2     4.087500
rwkv-4-raven-14b              2     3.225000
llama-13b                     2     1.950000

########## Average ##########
                                  score
model
gpt-4                          8.990625
gpt-3.5-turbo                  7.943750
claude-instant-v1              7.905660
claude-v1                      7.900000
UNA-SOLAR-10.7B-Instruct-v1.0  7.521875
LUNA-SOLARkrautLM-Instruct     7.462500
vicuna-33b-v1.3                7.121875
wizardlm-30b                   7.009375
Llama-2-70b-chat               6.856250
Llama-2-13b-chat               6.650000
guanaco-33b                    6.528125
tulu-30b                       6.434375
guanaco-65b                    6.409375
oasst-sft-7-llama-30b          6.409375
palm-2-chat-bison-001          6.400000
mpt-30b-chat                   6.393750
vicuna-13b-v1.3                6.387500
wizardlm-13b                   6.353125
Llama-2-7b-chat                6.268750
vicuna-7b-v1.3                 5.996875
baize-v2-13b                   5.750000
nous-hermes-13b                5.553459
mpt-7b-chat                    5.459119
gpt4all-13b-snoozy             5.452830
koala-13b                      5.350000
mpt-30b-instruct               5.218750
falcon-40b-instruct            5.168750
h2ogpt-oasst-open-llama-13b    4.625000
alpaca-13b                     4.531250
chatglm-6b                     4.500000
oasst-sft-4-pythia-12b         4.318750
rwkv-4-raven-14b               3.984375
dolly-v2-12b                   3.275000
fastchat-t5-3b                 3.040625
stablelm-tuned-alpha-7b        2.753125
llama-13b                      2.606250

big-refactor branch:

hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 25, batch_size: auto (32)
|    Tasks    |Version|Filter|n-shot| Metric |Value |   |Stderr|
|-------------|-------|------|-----:|--------|-----:|---|-----:|
|arc_challenge|Yaml   |none  |    25|acc     |0.6954|±  |0.0134|
|             |       |none  |    25|acc_norm|0.7167|±  |0.0132|

hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|  Filter  |n-shot|  Metric   |Value|   |Stderr|
|-----|-------|----------|-----:|-----------|----:|---|-----:|
|gsm8k|Yaml   |get-answer|     5|exact_match|0.671|±  |0.0129|

hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64)
|    Tasks     |Version|Filter|n-shot|Metric|Value |   |Stderr|
|--------------|-------|------|-----:|------|-----:|---|-----:|
|truthfulqa_mc2|Yaml   |none  |     0|acc   |0.7297|_  |0.0149|

hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 10, batch_size: auto (32)
|  Tasks  |Version|Filter|n-shot| Metric |Value |   |Stderr|
|---------|-------|------|-----:|--------|-----:|---|-----:|
|hellaswag|Yaml   |none  |    10|acc     |0.7091|±  |0.0045|
|         |       |none  |    10|acc_norm|0.8821|±  |0.0032|

hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (32)
|    Tasks     |Version|Filter|n-shot|  Metric  |Value |   |Stderr|
|--------------|-------|------|-----:|----------|-----:|---|-----:|
|boolq         |Yaml   |none  |     0|acc       |0.8807|_  |0.0057|
|lambada_openai|Yaml   |none  |     0|perplexity|3.2452|_  |0.0778|
|              |       |none  |     0|acc       |0.7207|_  |0.0063|
|piqa          |Yaml   |none  |     0|acc       |0.8020|_  |0.0093|
|              |       |none  |     0|acc_norm  |0.8009|_  |0.0093|
|sciq          |Yaml   |none  |     0|acc       |0.9730|_  |0.0051|
|              |       |none  |     0|acc_norm  |0.9630|_  |0.0060|
|winogrande    |Yaml   |none  |     0|acc       |0.7577|_  |0.0120|

hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto (64)
| Tasks  |Version|Filter|n-shot| Metric |Value |   |Stderr|
|--------|-------|------|-----:|--------|-----:|---|-----:|
|mathqa  |Yaml   |none  |     0|acc     |0.3474|_  |0.0087|
|        |       |none  |     0|acc_norm|0.3568|_  |0.0088|
|pubmedqa|Yaml   |none  |     0|acc     |0.5400|_  |0.0223|

hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0,dtype=float16), gen_kwargs: (), limit: None, num_fewshot: 0, batch_size: auto
|                        Tasks                         |Version|Filter|n-shot|  Metric   |Value |   |Stderr|
|------------------------------------------------------|-------|------|-----:|-----------|-----:|---|-----:|
|bbh_fewshot                                           |N/A    |none  |     0|exact_match|0.4660|_  |0.1771|
| - bbh_fewshot_boolean_expressions                    |Yaml   |none  |     0|exact_match|0.8160|_  |0.0246|
| - bbh_fewshot_causal_judgement                       |Yaml   |none  |     0|exact_match|0.4973|_  |0.0367|
| - bbh_fewshot_date_understanding                     |Yaml   |none  |     0|exact_match|0.4840|_  |0.0317|
| - bbh_fewshot_disambiguation_qa                      |Yaml   |none  |     0|exact_match|0.6520|_  |0.0302|
| - bbh_fewshot_dyck_languages                         |Yaml   |none  |     0|exact_match|0.2040|_  |0.0255|
| - bbh_fewshot_formal_fallacies                       |Yaml   |none  |     0|exact_match|0.5280|_  |0.0316|
| - bbh_fewshot_geometric_shapes                       |Yaml   |none  |     0|exact_match|0.3360|_  |0.0299|
| - bbh_fewshot_hyperbaton                             |Yaml   |none  |     0|exact_match|0.5520|_  |0.0315|
| - bbh_fewshot_logical_deduction_five_objects         |Yaml   |none  |     0|exact_match|0.4520|_  |0.0315|
| - bbh_fewshot_logical_deduction_seven_objects        |Yaml   |none  |     0|exact_match|0.3920|_  |0.0309|
| - bbh_fewshot_logical_deduction_three_objects        |Yaml   |none  |     0|exact_match|0.6200|_  |0.0308|
| - bbh_fewshot_movie_recommendation                   |Yaml   |none  |     0|exact_match|0.6640|_  |0.0299|
| - bbh_fewshot_multistep_arithmetic_two               |Yaml   |none  |     0|exact_match|0.0080|_  |0.0056|
| - bbh_fewshot_navigate                               |Yaml   |none  |     0|exact_match|0.6280|_  |0.0306|
| - bbh_fewshot_object_counting                        |Yaml   |none  |     0|exact_match|0.3960|_  |0.0310|
| - bbh_fewshot_penguins_in_a_table                    |Yaml   |none  |     0|exact_match|0.4726|_  |0.0415|
| - bbh_fewshot_reasoning_about_colored_objects        |Yaml   |none  |     0|exact_match|0.5320|_  |0.0316|
| - bbh_fewshot_ruin_names                             |Yaml   |none  |     0|exact_match|0.5680|_  |0.0314|
| - bbh_fewshot_salient_translation_error_detection    |Yaml   |none  |     0|exact_match|0.5480|_  |0.0315|
| - bbh_fewshot_snarks                                 |Yaml   |none  |     0|exact_match|0.5169|_  |0.0376|
| - bbh_fewshot_sports_understanding                   |Yaml   |none  |     0|exact_match|0.8320|_  |0.0237|
| - bbh_fewshot_temporal_sequences                     |Yaml   |none  |     0|exact_match|0.5520|_  |0.0315|
| - bbh_fewshot_tracking_shuffled_objects_five_objects |Yaml   |none  |     0|exact_match|0.1480|_  |0.0225|
| - bbh_fewshot_tracking_shuffled_objects_seven_objects|Yaml   |none  |     0|exact_match|0.1720|_  |0.0239|
| - bbh_fewshot_tracking_shuffled_objects_three_objects|Yaml   |none  |     0|exact_match|0.2760|_  |0.0283|
| - bbh_fewshot_web_of_lies                            |Yaml   |none  |     0|exact_match|0.4760|_  |0.0316|
| - bbh_fewshot_word_sorting                           |Yaml   |none  |     0|exact_match|0.2840|_  |0.0286|

|  Groups   |Version|Filter|n-shot|  Metric   |Value|   |Stderr|
|-----------|-------|------|-----:|-----------|----:|---|-----:|
|bbh_fewshot|N/A    |none  |     0|exact_match|0.466|_  |0.1771|

hf (pretrained=fblgit/UNA-SOLAR-10.7B-Instruct-v1.0), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto (16)
|                 Tasks                 |Version|Filter|n-shot|Metric|Value |   |Stderr|
|---------------------------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu                                   |N/A    |none  |     0|acc   |0.6513|±  |0.1221|
| - humanities                          |N/A    |none  |     5|acc   |0.6077|±  |0.1185|
|  - formal_logic                       |Yaml   |none  |     5|acc   |0.4444|±  |0.0444|
|  - high_school_european_history       |Yaml   |none  |     5|acc   |0.8121|±  |0.0305|
|  - high_school_us_history             |Yaml   |none  |     5|acc   |0.8431|±  |0.0255|
|  - high_school_world_history          |Yaml   |none  |     5|acc   |0.8523|±  |0.0231|
|  - international_law                  |Yaml   |none  |     5|acc   |0.7851|±  |0.0375|
|  - jurisprudence                      |Yaml   |none  |     5|acc   |0.7870|±  |0.0396|
|  - logical_fallacies                  |Yaml   |none  |     5|acc   |0.7546|±  |0.0338|
|  - moral_disputes                     |Yaml   |none  |     5|acc   |0.7370|±  |0.0237|
|  - moral_scenarios                    |Yaml   |none  |     5|acc   |0.4101|±  |0.0164|
|  - philosophy                         |Yaml   |none  |     5|acc   |0.7170|±  |0.0256|
|  - prehistory                         |Yaml   |none  |     5|acc   |0.7840|±  |0.0229|
|  - professional_law                   |Yaml   |none  |     5|acc   |0.4941|±  |0.0128|
|  - world_religions                    |Yaml   |none  |     5|acc   |0.7895|±  |0.0313|
| - other                               |N/A    |none  |     5|acc   |0.7116|±  |0.0939|
|  - business_ethics                    |Yaml   |none  |     5|acc   |0.7600|±  |0.0429|
|  - clinical_knowledge                 |Yaml   |none  |     5|acc   |0.6792|±  |0.0287|
|  - college_medicine                   |Yaml   |none  |     5|acc   |0.6590|±  |0.0361|
|  - global_facts                       |Yaml   |none  |     5|acc   |0.3400|±  |0.0476|
|  - human_aging                        |Yaml   |none  |     5|acc   |0.6816|±  |0.0313|
|  - management                         |Yaml   |none  |     5|acc   |0.8350|±  |0.0368|
|  - marketing                          |Yaml   |none  |     5|acc   |0.8547|±  |0.0231|
|  - medical_genetics                   |Yaml   |none  |     5|acc   |0.7000|±  |0.0461|
|  - miscellaneous                      |Yaml   |none  |     5|acc   |0.8020|±  |0.0142|
|  - nutrition                          |Yaml   |none  |     5|acc   |0.7418|±  |0.0251|
|  - professional_accounting            |Yaml   |none  |     5|acc   |0.5071|±  |0.0298|
|  - professional_medicine              |Yaml   |none  |     5|acc   |0.7500|±  |0.0263|
|  - virology                           |Yaml   |none  |     5|acc   |0.5843|±  |0.0384|
| - social_sciences                     |N/A    |none  |     5|acc   |0.7537|±  |0.0681|
|  - econometrics                       |Yaml   |none  |     5|acc   |0.5000|±  |0.0470|
|  - high_school_geography              |Yaml   |none  |     5|acc   |0.8586|±  |0.0248|
|  - high_school_government_and_politics|Yaml   |none  |     5|acc   |0.9016|±  |0.0215|
|  - high_school_macroeconomics         |Yaml   |none  |     5|acc   |0.6615|±  |0.0240|
|  - high_school_microeconomics         |Yaml   |none  |     5|acc   |0.7311|±  |0.0288|
|  - high_school_psychology             |Yaml   |none  |     5|acc   |0.8404|±  |0.0157|
|  - human_sexuality                    |Yaml   |none  |     5|acc   |0.7328|±  |0.0388|
|  - professional_psychology            |Yaml   |none  |     5|acc   |0.6814|±  |0.0189|
|  - public_relations                   |Yaml   |none  |     5|acc   |0.6909|±  |0.0443|
|  - security_studies                   |Yaml   |none  |     5|acc   |0.7469|±  |0.0278|
|  - sociology                          |Yaml   |none  |     5|acc   |0.8308|±  |0.0265|
|  - us_foreign_policy                  |Yaml   |none  |     5|acc   |0.8900|±  |0.0314|
| - stem                                |N/A    |none  |     5|acc   |0.5569|±  |0.1380|
|  - abstract_algebra                   |Yaml   |none  |     5|acc   |0.4100|±  |0.0494|
|  - anatomy                            |Yaml   |none  |     5|acc   |0.6222|±  |0.0419|
|  - astronomy                          |Yaml   |none  |     5|acc   |0.7368|±  |0.0358|
|  - college_biology                    |Yaml   |none  |     5|acc   |0.8056|±  |0.0331|
|  - college_chemistry                  |Yaml   |none  |     5|acc   |0.4700|±  |0.0502|
|  - college_computer_science           |Yaml   |none  |     5|acc   |0.5100|±  |0.0502|
|  - college_mathematics                |Yaml   |none  |     5|acc   |0.2800|±  |0.0451|
|  - college_physics                    |Yaml   |none  |     5|acc   |0.3431|±  |0.0472|
|  - computer_security                  |Yaml   |none  |     5|acc   |0.7400|±  |0.0441|
|  - conceptual_physics                 |Yaml   |none  |     5|acc   |0.6340|±  |0.0315|
|  - electrical_engineering             |Yaml   |none  |     5|acc   |0.6000|±  |0.0408|
|  - elementary_mathematics             |Yaml   |none  |     5|acc   |0.4815|±  |0.0257|
|  - high_school_biology                |Yaml   |none  |     5|acc   |0.8032|±  |0.0226|
|  - high_school_chemistry              |Yaml   |none  |     5|acc   |0.4877|±  |0.0352|
|  - high_school_computer_science       |Yaml   |none  |     5|acc   |0.7200|±  |0.0451|
|  - high_school_mathematics            |Yaml   |none  |     5|acc   |0.3815|±  |0.0296|
|  - high_school_physics                |Yaml   |none  |     5|acc   |0.3576|±  |0.0391|
|  - high_school_statistics             |Yaml   |none  |     5|acc   |0.5602|±  |0.0339|
|  - machine_learning                   |Yaml   |none  |     5|acc   |0.4643|±  |0.0473|

|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
|mmlu              |N/A    |none  |     0|acc   |0.6513|±  |0.1221|
| - humanities     |N/A    |none  |     5|acc   |0.6077|±  |0.1185|
| - other          |N/A    |none  |     5|acc   |0.7116|±  |0.0939|
| - social_sciences|N/A    |none  |     5|acc   |0.7537|±  |0.0681|
| - stem           |N/A    |none  |     5|acc   |0.5569|±  |0.1380|

Citations

to Upstage.AI for its awesome base model, this is merely a UNA of it. It can only refine what its already in there :)

If you find UNA-SOLAR useful, cite and support the authors.