qnguyen3
/

Master-Yi-9B-GGUF

GGUF

Inference Endpoints

conversational

Model card Files Files and versions Community

qnguyen3 commited on May 18

Commit

4c96187

•

1 Parent(s): 67e119d

Create README.md

Browse files

Files changed (1) hide show

README.md +463 -0

README.md ADDED Viewed

	@@ -0,0 +1,463 @@

+---
+license: apache-2.0
+---
+## Model Description
+Master is a collection of LLMs trained using human-collected seed questions and regenerate the answers with a mixture of high performance Open-source LLMs.
+**Master-Yi-9B** is trained using the ORPO techniques. The model shows strong abilities in reasoning on coding and math questions.
+**Main Version**: [Here](https://huggingface.co/qnguyen3/Master-Yi-9B)
+![img](https://huggingface.co/qnguyen3/Master-Yi-9B/resolve/main/Master-Yi-9B.webp)
+## Prompt Template
+```
+<|im_start|>system
+You are a helpful AI assistant.<|im_end|>
+<|im_start|>user
+What is the meaning of life?<|im_end|>
+<|im_start|>assistant
+```
+## Examples
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/630430583926de1f7ec62c6b/E27JmdRAMrHQacM50-lBk.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/630430583926de1f7ec62c6b/z0HS4bxHFQzPe0gZlvCzZ.png)
+## Inference Code
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+device = "cuda" # the device to load the model onto
+model = AutoModelForCausalLM.from_pretrained(
+    "vilm/VinaLlama2-14B",
+    torch_dtype='auto',
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained("vilm/VinaLlama2-14B")
+prompt = "What is the mearning of life?"
+messages = [
+    {"role": "system", "content": "You are a helpful AI assistant."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(device)
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=1024,
+    eos_token_id=tokenizer.eos_token_id,
+    temperature=0.25,
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids)[0]
+print(response)
+```
+## Benchmarks
+Nous Benchmark:
+|                       Model                       |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
+|---------------------------------------------------|------:|------:|---------:|-------:|------:|
+|[Master-Yi-9B](https://huggingface.co/qnguyen3/Master-Yi-9B)|  43.55|  71.48|     48.54|   41.43|  51.25|
+```
+### AGIEval
+|             Task             |Version| Metric |Value|   |Stderr|
+|------------------------------|------:|--------|----:|---|-----:|
+|agieval_aqua_rat              |      0|acc     |35.83|±  |  3.01|
+|                              |       |acc_norm|31.89|±  |  2.93|
+|agieval_logiqa_en             |      0|acc     |38.25|±  |  1.91|
+|                              |       |acc_norm|37.79|±  |  1.90|
+|agieval_lsat_ar               |      0|acc     |23.04|±  |  2.78|
+|                              |       |acc_norm|20.43|±  |  2.66|
+|agieval_lsat_lr               |      0|acc     |48.04|±  |  2.21|
+|                              |       |acc_norm|42.75|±  |  2.19|
+|agieval_lsat_rc               |      0|acc     |61.34|±  |  2.97|
+|                              |       |acc_norm|52.79|±  |  3.05|
+|agieval_sat_en                |      0|acc     |79.13|±  |  2.84|
+|                              |       |acc_norm|72.33|±  |  3.12|
+|agieval_sat_en_without_passage|      0|acc     |44.17|±  |  3.47|
+|                              |       |acc_norm|42.72|±  |  3.45|
+|agieval_sat_math              |      0|acc     |52.27|±  |  3.38|
+|                              |       |acc_norm|47.73|±  |  3.38|
+Average: 43.55%
+### GPT4All
+|    Task     |Version| Metric |Value|   |Stderr|
+|-------------|------:|--------|----:|---|-----:|
+|arc_challenge|      0|acc     |54.95|±  |  1.45|
+|             |       |acc_norm|58.70|±  |  1.44|
+|arc_easy     |      0|acc     |82.28|±  |  0.78|
+|             |       |acc_norm|81.10|±  |  0.80|
+|boolq        |      1|acc     |86.15|±  |  0.60|
+|hellaswag    |      0|acc     |59.16|±  |  0.49|
+|             |       |acc_norm|77.53|±  |  0.42|
+|openbookqa   |      0|acc     |37.40|±  |  2.17|
+|             |       |acc_norm|44.00|±  |  2.22|
+|piqa         |      0|acc     |79.00|±  |  0.95|
+|             |       |acc_norm|80.25|±  |  0.93|
+|winogrande   |      0|acc     |72.61|±  |  1.25|
+Average: 71.48%
+### TruthfulQA
+|    Task     |Version|Metric|Value|   |Stderr|
+|-------------|------:|------|----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |33.05|±  |  1.65|
+|             |       |mc2   |48.54|±  |  1.54|
+Average: 48.54%
+### Bigbench
+|                      Task                      |Version|       Metric        |Value|   |Stderr|
+|------------------------------------------------|------:|---------------------|----:|---|-----:|
+|bigbench_causal_judgement                       |      0|multiple_choice_grade|54.74|±  |  3.62|
+|bigbench_date_understanding                     |      0|multiple_choice_grade|68.02|±  |  2.43|
+|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|40.31|±  |  3.06|
+|bigbench_geometric_shapes                       |      0|multiple_choice_grade|30.36|±  |  2.43|
+|                                                |       |exact_str_match      | 2.23|±  |  0.78|
+|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|26.00|±  |  1.96|
+|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|20.71|±  |  1.53|
+|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|44.00|±  |  2.87|
+|bigbench_movie_recommendation                   |      0|multiple_choice_grade|35.00|±  |  2.14|
+|bigbench_navigate                               |      0|multiple_choice_grade|58.40|±  |  1.56|
+|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|61.80|±  |  1.09|
+|bigbench_ruin_names                             |      0|multiple_choice_grade|42.41|±  |  2.34|
+|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|31.56|±  |  1.47|
+|bigbench_snarks                                 |      0|multiple_choice_grade|55.25|±  |  3.71|
+|bigbench_sports_understanding                   |      0|multiple_choice_grade|69.37|±  |  1.47|
+|bigbench_temporal_sequences                     |      0|multiple_choice_grade|27.70|±  |  1.42|
+|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|21.36|±  |  1.16|
+|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|14.69|±  |  0.85|
+|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|44.00|±  |  2.87|
+Average: 41.43%
+Average score: 51.25%
+```
+OpenLLM Benchmark:
+|                       Model                       |ARC |HellaSwag|MMLU |TruthfulQA|Winogrande|GSM8K|Average|
+|---------------------------------------------------|---:|--------:|----:|---------:|---------:|----:|------:|
+|[Master-Yi-9B](https://huggingface.co/qnguyen3/Master-Yi-9B)|61.6|    79.89|69.95|     48.59|     77.35|67.48|  67.48|
+```
+### ARC
+|    Task     |Version|       Metric       |    Value    |   |Stderr|
+|-------------|------:|--------------------|-------------|---|------|
+|arc_challenge|      1|acc,none            |         0.59|   |      |
+|             |       |acc_stderr,none     |         0.01|   |      |
+|             |       |acc_norm,none       |         0.62|   |      |
+|             |       |acc_norm_stderr,none|         0.01|   |      |
+|             |       |alias               |arc_challenge|   |      |
+Average: 61.6%
+### HellaSwag
+|  Task   |Version|       Metric       |  Value  |   |Stderr|
+|---------|------:|--------------------|---------|---|------|
+|hellaswag|      1|acc,none            |     0.61|   |      |
+|         |       |acc_stderr,none     |        0|   |      |
+|         |       |acc_norm,none       |     0.80|   |      |
+|         |       |acc_norm_stderr,none|        0|   |      |
+|         |       |alias               |hellaswag|   |      |
+Average: 79.89%
+### MMLU
+|                  Task                  |Version|    Metric     |                 Value                 |   |Stderr|
+|----------------------------------------|-------|---------------|---------------------------------------|---|------|
+|mmlu                                    |N/A    |acc,none       |                                    0.7|   |      |
+|                                        |       |acc_stderr,none|                                      0|   |      |
+|                                        |       |alias          |mmlu                                   |   |      |
+|mmlu_abstract_algebra                   |      0|alias          |  - abstract_algebra                   |   |      |
+|                                        |       |acc,none       |0.46                                   |   |      |
+|                                        |       |acc_stderr,none|0.05                                   |   |      |
+|mmlu_anatomy                            |      0|alias          |  - anatomy                            |   |      |
+|                                        |       |acc,none       |0.64                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_astronomy                          |      0|alias          |  - astronomy                          |   |      |
+|                                        |       |acc,none       |0.77                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_business_ethics                    |      0|alias          |  - business_ethics                    |   |      |
+|                                        |       |acc,none       |0.76                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_clinical_knowledge                 |      0|alias          |  - clinical_knowledge                 |   |      |
+|                                        |       |acc,none       |0.71                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_college_biology                    |      0|alias          |  - college_biology                    |   |      |
+|                                        |       |acc,none       |0.82                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_college_chemistry                  |      0|alias          |  - college_chemistry                  |   |      |
+|                                        |       |acc,none       |0.52                                   |   |      |
+|                                        |       |acc_stderr,none|0.05                                   |   |      |
+|mmlu_college_computer_science           |      0|alias          |  - college_computer_science           |   |      |
+|                                        |       |acc,none       |0.56                                   |   |      |
+|                                        |       |acc_stderr,none|0.05                                   |   |      |
+|mmlu_college_mathematics                |      0|alias          |  - college_mathematics                |   |      |
+|                                        |       |acc,none       |0.44                                   |   |      |
+|                                        |       |acc_stderr,none|0.05                                   |   |      |
+|mmlu_college_medicine                   |      0|alias          |  - college_medicine                   |   |      |
+|                                        |       |acc,none       |0.72                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_college_physics                    |      0|alias          |  - college_physics                    |   |      |
+|                                        |       |acc,none       |0.45                                   |   |      |
+|                                        |       |acc_stderr,none|0.05                                   |   |      |
+|mmlu_computer_security                  |      0|alias          |  - computer_security                  |   |      |
+|                                        |       |acc,none       |0.81                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_conceptual_physics                 |      0|alias          |  - conceptual_physics                 |   |      |
+|                                        |       |acc,none       |0.74                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_econometrics                       |      0|alias          |  - econometrics                       |   |      |
+|                                        |       |acc,none       |0.65                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_electrical_engineering             |      0|alias          |  - electrical_engineering             |   |      |
+|                                        |       |acc,none       |0.72                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_elementary_mathematics             |      0|alias          |  - elementary_mathematics             |   |      |
+|                                        |       |acc,none       |0.62                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_formal_logic                       |      0|alias          |  - formal_logic                       |   |      |
+|                                        |       |acc,none       |0.57                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_global_facts                       |      0|alias          |  - global_facts                       |   |      |
+|                                        |       |acc,none       |0.46                                   |   |      |
+|                                        |       |acc_stderr,none|0.05                                   |   |      |
+|mmlu_high_school_biology                |      0|alias          |  - high_school_biology                |   |      |
+|                                        |       |acc,none       |0.86                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_high_school_chemistry              |      0|alias          |  - high_school_chemistry              |   |      |
+|                                        |       |acc,none       |0.67                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_high_school_computer_science       |      0|alias          |  - high_school_computer_science       |   |      |
+|                                        |       |acc,none       |0.84                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_high_school_european_history       |      0|alias          |  - high_school_european_history       |   |      |
+|                                        |       |acc,none       |0.82                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_high_school_geography              |      0|alias          |  - high_school_geography              |   |      |
+|                                        |       |acc,none       |0.86                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_high_school_government_and_politics|      0|alias          |  - high_school_government_and_politics|   |      |
+|                                        |       |acc,none       |0.90                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_high_school_macroeconomics         |      0|alias          |  - high_school_macroeconomics         |   |      |
+|                                        |       |acc,none       |0.75                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_high_school_mathematics            |      0|alias          |  - high_school_mathematics            |   |      |
+|                                        |       |acc,none       |0.43                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_high_school_microeconomics         |      0|alias          |  - high_school_microeconomics         |   |      |
+|                                        |       |acc,none       |0.86                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_high_school_physics                |      0|alias          |  - high_school_physics                |   |      |
+|                                        |       |acc,none       |0.45                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_high_school_psychology             |      0|alias          |  - high_school_psychology             |   |      |
+|                                        |       |acc,none       |0.87                                   |   |      |
+|                                        |       |acc_stderr,none|0.01                                   |   |      |
+|mmlu_high_school_statistics             |      0|alias          |  - high_school_statistics             |   |      |
+|                                        |       |acc,none       |0.68                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_high_school_us_history             |      0|alias          |  - high_school_us_history             |   |      |
+|                                        |       |acc,none       |0.85                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_high_school_world_history          |      0|alias          |  - high_school_world_history          |   |      |
+|                                        |       |acc,none       |0.85                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_human_aging                        |      0|alias          |  - human_aging                        |   |      |
+|                                        |       |acc,none       |0.76                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_human_sexuality                    |      0|alias          |  - human_sexuality                    |   |      |
+|                                        |       |acc,none       |0.78                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_humanities                         |N/A    |alias          | - humanities                          |   |      |
+|                                        |       |acc,none       |0.63                                   |   |      |
+|                                        |       |acc_stderr,none|0.01                                   |   |      |
+|mmlu_international_law                  |      0|alias          |  - international_law                  |   |      |
+|                                        |       |acc,none       |0.79                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_jurisprudence                      |      0|alias          |  - jurisprudence                      |   |      |
+|                                        |       |acc,none       |0.79                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_logical_fallacies                  |      0|alias          |  - logical_fallacies                  |   |      |
+|                                        |       |acc,none       |0.80                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_machine_learning                   |      0|alias          |  - machine_learning                   |   |      |
+|                                        |       |acc,none       |0.52                                   |   |      |
+|                                        |       |acc_stderr,none|0.05                                   |   |      |
+|mmlu_management                         |      0|alias          |  - management                         |   |      |
+|                                        |       |acc,none       |0.83                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_marketing                          |      0|alias          |  - marketing                          |   |      |
+|                                        |       |acc,none       |0.89                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_medical_genetics                   |      0|alias          |  - medical_genetics                   |   |      |
+|                                        |       |acc,none       |0.78                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_miscellaneous                      |      0|alias          |  - miscellaneous                      |   |      |
+|                                        |       |acc,none       |0.85                                   |   |      |
+|                                        |       |acc_stderr,none|0.01                                   |   |      |
+|mmlu_moral_disputes                     |      0|alias          |  - moral_disputes                     |   |      |
+|                                        |       |acc,none       |0.75                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_moral_scenarios                    |      0|alias          |  - moral_scenarios                    |   |      |
+|                                        |       |acc,none       |0.48                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_nutrition                          |      0|alias          |  - nutrition                          |   |      |
+|                                        |       |acc,none       |0.77                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_other                              |N/A    |alias          | - other                               |   |      |
+|                                        |       |acc,none       |0.75                                   |   |      |
+|                                        |       |acc_stderr,none|0.01                                   |   |      |
+|mmlu_philosophy                         |      0|alias          |  - philosophy                         |   |      |
+|                                        |       |acc,none       |0.78                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_prehistory                         |      0|alias          |  - prehistory                         |   |      |
+|                                        |       |acc,none       |0.77                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_professional_accounting            |      0|alias          |  - professional_accounting            |   |      |
+|                                        |       |acc,none       |0.57                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_professional_law                   |      0|alias          |  - professional_law                   |   |      |
+|                                        |       |acc,none       |0.50                                   |   |      |
+|                                        |       |acc_stderr,none|0.01                                   |   |      |
+|mmlu_professional_medicine              |      0|alias          |  - professional_medicine              |   |      |
+|                                        |       |acc,none       |0.71                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_professional_psychology            |      0|alias          |  - professional_psychology            |   |      |
+|                                        |       |acc,none       |0.73                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_public_relations                   |      0|alias          |  - public_relations                   |   |      |
+|                                        |       |acc,none       |0.76                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_security_studies                   |      0|alias          |  - security_studies                   |   |      |
+|                                        |       |acc,none       |0.78                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_social_sciences                    |N/A    |alias          | - social_sciences                     |   |      |
+|                                        |       |acc,none       |0.81                                   |   |      |
+|                                        |       |acc_stderr,none|0.01                                   |   |      |
+|mmlu_sociology                          |      0|alias          |  - sociology                          |   |      |
+|                                        |       |acc,none       |0.86                                   |   |      |
+|                                        |       |acc_stderr,none|0.02                                   |   |      |
+|mmlu_stem                               |N/A    |alias          | - stem                                |   |      |
+|                                        |       |acc,none       |0.65                                   |   |      |
+|                                        |       |acc_stderr,none|0.01                                   |   |      |
+|mmlu_us_foreign_policy                  |      0|alias          |  - us_foreign_policy                  |   |      |
+|                                        |       |acc,none       |0.92                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+|mmlu_virology                           |      0|alias          |  - virology                           |   |      |
+|                                        |       |acc,none       |0.58                                   |   |      |
+|                                        |       |acc_stderr,none|0.04                                   |   |      |
+|mmlu_world_religions                    |      0|alias          |  - world_religions                    |   |      |
+|                                        |       |acc,none       |0.82                                   |   |      |
+|                                        |       |acc_stderr,none|0.03                                   |   |      |
+Average: 69.95%
+### TruthfulQA
+|     Task     |Version|        Metric         |      Value      |   |Stderr|
+|--------------|-------|-----------------------|-----------------|---|------|
+|truthfulqa    |N/A    |bleu_acc,none          |             0.45|   |      |
+|              |       |bleu_acc_stderr,none   |             0.02|   |      |
+|              |       |rouge1_acc,none        |             0.45|   |      |
+|              |       |rouge1_acc_stderr,none |             0.02|   |      |
+|              |       |rouge2_diff,none       |             0.92|   |      |
+|              |       |rouge2_diff_stderr,none|             1.07|   |      |
+|              |       |bleu_max,none          |            23.77|   |      |
+|              |       |bleu_max_stderr,none   |             0.81|   |      |
+|              |       |rouge2_acc,none        |             0.38|   |      |
+|              |       |rouge2_acc_stderr,none |             0.02|   |      |
+|              |       |acc,none               |             0.41|   |      |
+|              |       |acc_stderr,none        |             0.01|   |      |
+|              |       |rougeL_diff,none       |             1.57|   |      |
+|              |       |rougeL_diff_stderr,none|             0.93|   |      |
+|              |       |rougeL_acc,none        |             0.46|   |      |
+|              |       |rougeL_acc_stderr,none |             0.02|   |      |
+|              |       |bleu_diff,none         |             1.38|   |      |
+|              |       |bleu_diff_stderr,none  |             0.75|   |      |
+|              |       |rouge2_max,none        |            33.01|   |      |
+|              |       |rouge2_max_stderr,none |             1.05|   |      |
+|              |       |rouge1_diff,none       |             1.72|   |      |
+|              |       |rouge1_diff_stderr,none|             0.92|   |      |
+|              |       |rougeL_max,none        |            45.25|   |      |
+|              |       |rougeL_max_stderr,none |             0.92|   |      |
+|              |       |rouge1_max,none        |            48.29|   |      |
+|              |       |rouge1_max_stderr,none |             0.90|   |      |
+|              |       |alias                  |truthfulqa       |   |      |
+|truthfulqa_gen|      3|bleu_max,none          |            23.77|   |      |
+|              |       |bleu_max_stderr,none   |             0.81|   |      |
+|              |       |bleu_acc,none          |             0.45|   |      |
+|              |       |bleu_acc_stderr,none   |             0.02|   |      |
+|              |       |bleu_diff,none         |             1.38|   |      |
+|              |       |bleu_diff_stderr,none  |             0.75|   |      |
+|              |       |rouge1_max,none        |            48.29|   |      |
+|              |       |rouge1_max_stderr,none |             0.90|   |      |
+|              |       |rouge1_acc,none        |             0.45|   |      |
+|              |       |rouge1_acc_stderr,none |             0.02|   |      |
+|              |       |rouge1_diff,none       |             1.72|   |      |
+|              |       |rouge1_diff_stderr,none|             0.92|   |      |
+|              |       |rouge2_max,none        |            33.01|   |      |
+|              |       |rouge2_max_stderr,none |             1.05|   |      |
+|              |       |rouge2_acc,none        |             0.38|   |      |
+|              |       |rouge2_acc_stderr,none |             0.02|   |      |
+|              |       |rouge2_diff,none       |             0.92|   |      |
+|              |       |rouge2_diff_stderr,none|             1.07|   |      |
+|              |       |rougeL_max,none        |            45.25|   |      |
+|              |       |rougeL_max_stderr,none |             0.92|   |      |
+|              |       |rougeL_acc,none        |             0.46|   |      |
+|              |       |rougeL_acc_stderr,none |             0.02|   |      |
+|              |       |rougeL_diff,none       |             1.57|   |      |
+|              |       |rougeL_diff_stderr,none|             0.93|   |      |
+|              |       |alias                  | - truthfulqa_gen|   |      |
+|truthfulqa_mc1|      2|acc,none               |             0.33|   |      |
+|              |       |acc_stderr,none        |             0.02|   |      |
+|              |       |alias                  | - truthfulqa_mc1|   |      |
+|truthfulqa_mc2|      2|acc,none               |             0.49|   |      |
+|              |       |acc_stderr,none        |             0.02|   |      |
+|              |       |alias                  | - truthfulqa_mc2|   |      |
+Average: 48.59%
+### Winogrande
+|   Task   |Version|    Metric     |  Value   |   |Stderr|
+|----------|------:|---------------|----------|---|------|
+|winogrande|      1|acc,none       |      0.77|   |      |
+|          |       |acc_stderr,none|      0.01|   |      |
+|          |       |alias          |winogrande|   |      |
+Average: 77.35%
+### GSM8K
+|Task |Version|              Metric               |Value|   |Stderr|
+|-----|------:|-----------------------------------|-----|---|------|
+|gsm8k|      3|exact_match,strict-match           | 0.67|   |      |
+|     |       |exact_match_stderr,strict-match    | 0.01|   |      |
+|     |       |exact_match,flexible-extract       | 0.68|   |      |
+|     |       |exact_match_stderr,flexible-extract| 0.01|   |      |
+|     |       |alias                              |gsm8k|   |      |
+Average: 67.48%
+Average score: 67.48%
+```