mayacinka
/

chatty-djinn-14B

@@ -28,6 +28,87 @@ djinn is a merge of the following models using [LazyMergekit](https://colab.rese
 * paulml/DPOB-INMTOB-7B
 * mlabonne/AlphaMonarch-7B
 ## 🧩 Configuration
 Inspired by [theprofessor's config](https://huggingface.co/abacusai/TheProfessor-155b)

 * paulml/DPOB-INMTOB-7B
 * mlabonne/AlphaMonarch-7B
+# 🏆 Benchmarks
+Nous benchmarks, find more [details here](https://gist.github.com/majacinka/3f2a797c8872ca9bfdaa2bbf3369edb5)
+|                                Model                                |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
+|---------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
+|[chatty-djinn-14B](https://huggingface.co/mayacinka/chatty-djinn-14B)|  38.43|  76.29|     68.02|    47.6|  57.59|
+### AGIEval
+|             Task             |Version| Metric |Value|   |Stderr|
+|------------------------------|------:|--------|----:|---|-----:|
+|agieval_aqua_rat              |      0|acc     |23.62|±  |  2.67|
+|                              |       |acc_norm|21.65|±  |  2.59|
+|agieval_logiqa_en             |      0|acc     |32.26|±  |  1.83|
+|                              |       |acc_norm|33.79|±  |  1.86|
+|agieval_lsat_ar               |      0|acc     |23.04|±  |  2.78|
+|                              |       |acc_norm|23.04|±  |  2.78|
+|agieval_lsat_lr               |      0|acc     |38.82|±  |  2.16|
+|                              |       |acc_norm|39.22|±  |  2.16|
+|agieval_lsat_rc               |      0|acc     |59.48|±  |  3.00|
+|                              |       |acc_norm|54.65|±  |  3.04|
+|agieval_sat_en                |      0|acc     |75.73|±  |  2.99|
+|                              |       |acc_norm|74.27|±  |  3.05|
+|agieval_sat_en_without_passage|      0|acc     |35.92|±  |  3.35|
+|                              |       |acc_norm|34.47|±  |  3.32|
+|agieval_sat_math              |      0|acc     |31.36|±  |  3.14|
+|                              |       |acc_norm|26.36|±  |  2.98|
+Average: 38.43%
+### GPT4All
+|    Task     |Version| Metric |Value|   |Stderr|
+|-------------|------:|--------|----:|---|-----:|
+|arc_challenge|      0|acc     |62.12|±  |  1.42|
+|             |       |acc_norm|65.44|±  |  1.39|
+|arc_easy     |      0|acc     |83.88|±  |  0.75|
+|             |       |acc_norm|78.58|±  |  0.84|
+|boolq        |      1|acc     |88.07|±  |  0.57|
+|hellaswag    |      0|acc     |65.18|±  |  0.48|
+|             |       |acc_norm|86.45|±  |  0.34|
+|openbookqa   |      0|acc     |39.60|±  |  2.19|
+|             |       |acc_norm|48.60|±  |  2.24|
+|piqa         |      0|acc     |82.26|±  |  0.89|
+|             |       |acc_norm|83.62|±  |  0.86|
+|winogrande   |      0|acc     |83.27|±  |  1.05|
+Average: 76.29%
+### TruthfulQA
+|    Task     |Version|Metric|Value|   |Stderr|
+|-------------|------:|------|----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |50.55|±  |  1.75|
+|             |       |mc2   |68.02|±  |  1.52|
+Average: 68.02%
+### Bigbench
+|                      Task                      |Version|       Metric        |Value|   |Stderr|
+|------------------------------------------------|------:|---------------------|----:|---|-----:|
+|bigbench_causal_judgement                       |      0|multiple_choice_grade|57.89|±  |  3.59|
+|bigbench_date_understanding                     |      0|multiple_choice_grade|64.50|±  |  2.49|
+|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|32.56|±  |  2.92|
+|bigbench_geometric_shapes                       |      0|multiple_choice_grade|26.18|±  |  2.32|
+|                                                |       |exact_str_match      | 1.11|±  |  0.55|
+|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|30.80|±  |  2.07|
+|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|22.86|±  |  1.59|
+|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|57.67|±  |  2.86|
+|bigbench_movie_recommendation                   |      0|multiple_choice_grade|62.00|±  |  2.17|
+|bigbench_navigate                               |      0|multiple_choice_grade|56.20|±  |  1.57|
+|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|65.65|±  |  1.06|
+|bigbench_ruin_names                             |      0|multiple_choice_grade|64.73|±  |  2.26|
+|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|17.33|±  |  1.20|
+|bigbench_snarks                                 |      0|multiple_choice_grade|76.24|±  |  3.17|
+|bigbench_sports_understanding                   |      0|multiple_choice_grade|75.15|±  |  1.38|
+|bigbench_temporal_sequences                     |      0|multiple_choice_grade|48.90|±  |  1.58|
+|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|22.32|±  |  1.18|
+|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|18.17|±  |  0.92|
+|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|57.67|±  |  2.86|
+Average: 47.6%
+Average score: 57.59%
 ## 🧩 Configuration
 Inspired by [theprofessor's config](https://huggingface.co/abacusai/TheProfessor-155b)