mayacinka commited on
Commit
4176e65
1 Parent(s): e47f269

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -28,6 +28,87 @@ djinn is a merge of the following models using [LazyMergekit](https://colab.rese
28
  * paulml/DPOB-INMTOB-7B
29
  * mlabonne/AlphaMonarch-7B
30
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ## 🧩 Configuration
32
  Inspired by [theprofessor's config](https://huggingface.co/abacusai/TheProfessor-155b)
33
 
 
28
  * paulml/DPOB-INMTOB-7B
29
  * mlabonne/AlphaMonarch-7B
30
 
31
+ # 🏆 Benchmarks
32
+ Nous benchmarks, find more [details here](https://gist.github.com/majacinka/3f2a797c8872ca9bfdaa2bbf3369edb5)
33
+
34
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
35
+ |---------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
36
+ |[chatty-djinn-14B](https://huggingface.co/mayacinka/chatty-djinn-14B)| 38.43| 76.29| 68.02| 47.6| 57.59|
37
+
38
+ ### AGIEval
39
+ | Task |Version| Metric |Value| |Stderr|
40
+ |------------------------------|------:|--------|----:|---|-----:|
41
+ |agieval_aqua_rat | 0|acc |23.62|± | 2.67|
42
+ | | |acc_norm|21.65|± | 2.59|
43
+ |agieval_logiqa_en | 0|acc |32.26|± | 1.83|
44
+ | | |acc_norm|33.79|± | 1.86|
45
+ |agieval_lsat_ar | 0|acc |23.04|± | 2.78|
46
+ | | |acc_norm|23.04|± | 2.78|
47
+ |agieval_lsat_lr | 0|acc |38.82|± | 2.16|
48
+ | | |acc_norm|39.22|± | 2.16|
49
+ |agieval_lsat_rc | 0|acc |59.48|± | 3.00|
50
+ | | |acc_norm|54.65|± | 3.04|
51
+ |agieval_sat_en | 0|acc |75.73|± | 2.99|
52
+ | | |acc_norm|74.27|± | 3.05|
53
+ |agieval_sat_en_without_passage| 0|acc |35.92|± | 3.35|
54
+ | | |acc_norm|34.47|± | 3.32|
55
+ |agieval_sat_math | 0|acc |31.36|± | 3.14|
56
+ | | |acc_norm|26.36|± | 2.98|
57
+
58
+ Average: 38.43%
59
+
60
+ ### GPT4All
61
+ | Task |Version| Metric |Value| |Stderr|
62
+ |-------------|------:|--------|----:|---|-----:|
63
+ |arc_challenge| 0|acc |62.12|± | 1.42|
64
+ | | |acc_norm|65.44|± | 1.39|
65
+ |arc_easy | 0|acc |83.88|± | 0.75|
66
+ | | |acc_norm|78.58|± | 0.84|
67
+ |boolq | 1|acc |88.07|± | 0.57|
68
+ |hellaswag | 0|acc |65.18|± | 0.48|
69
+ | | |acc_norm|86.45|± | 0.34|
70
+ |openbookqa | 0|acc |39.60|± | 2.19|
71
+ | | |acc_norm|48.60|± | 2.24|
72
+ |piqa | 0|acc |82.26|± | 0.89|
73
+ | | |acc_norm|83.62|± | 0.86|
74
+ |winogrande | 0|acc |83.27|± | 1.05|
75
+
76
+ Average: 76.29%
77
+
78
+ ### TruthfulQA
79
+ | Task |Version|Metric|Value| |Stderr|
80
+ |-------------|------:|------|----:|---|-----:|
81
+ |truthfulqa_mc| 1|mc1 |50.55|± | 1.75|
82
+ | | |mc2 |68.02|± | 1.52|
83
+
84
+ Average: 68.02%
85
+
86
+ ### Bigbench
87
+ | Task |Version| Metric |Value| |Stderr|
88
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
89
+ |bigbench_causal_judgement | 0|multiple_choice_grade|57.89|± | 3.59|
90
+ |bigbench_date_understanding | 0|multiple_choice_grade|64.50|± | 2.49|
91
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|32.56|± | 2.92|
92
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|26.18|± | 2.32|
93
+ | | |exact_str_match | 1.11|± | 0.55|
94
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.80|± | 2.07|
95
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|22.86|± | 1.59|
96
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|57.67|± | 2.86|
97
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|62.00|± | 2.17|
98
+ |bigbench_navigate | 0|multiple_choice_grade|56.20|± | 1.57|
99
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|65.65|± | 1.06|
100
+ |bigbench_ruin_names | 0|multiple_choice_grade|64.73|± | 2.26|
101
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|17.33|± | 1.20|
102
+ |bigbench_snarks | 0|multiple_choice_grade|76.24|± | 3.17|
103
+ |bigbench_sports_understanding | 0|multiple_choice_grade|75.15|± | 1.38|
104
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|48.90|± | 1.58|
105
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.32|± | 1.18|
106
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|18.17|± | 0.92|
107
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|57.67|± | 2.86|
108
+
109
+ Average: 47.6%
110
+
111
+ Average score: 57.59%
112
  ## 🧩 Configuration
113
  Inspired by [theprofessor's config](https://huggingface.co/abacusai/TheProfessor-155b)
114