NousResearch
/

Redmond-Hermes-Coder

@@ -68,6 +68,39 @@ The model is currently being uploaded in FP16 format, and there are plans to con
 ## Benchmark Results
 ```
 HumanEval: 39%
 ```
 ## Model Usage

 ## Benchmark Results
 ```
 HumanEval: 39%
+|                      Task                      |Version|       Metric        |Value |   |Stderr|
+|------------------------------------------------|------:|---------------------|-----:|---|-----:|
+|arc_challenge                                   |      0|acc                  |0.2858|±  |0.0132|
+|                                                |       |acc_norm             |0.3148|±  |0.0136|
+|arc_easy                                        |      0|acc                  |0.5349|±  |0.0102|
+|                                                |       |acc_norm             |0.5097|±  |0.0103|
+|bigbench_causal_judgement                       |      0|multiple_choice_grade|0.5158|±  |0.0364|
+|bigbench_date_understanding                     |      0|multiple_choice_grade|0.5230|±  |0.0260|
+|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|0.3295|±  |0.0293|
+|bigbench_geometric_shapes                       |      0|multiple_choice_grade|0.1003|±  |0.0159|
+|                                                |       |exact_str_match      |0.0000|±  |0.0000|
+|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|0.2260|±  |0.0187|
+|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|0.1957|±  |0.0150|
+|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|0.3733|±  |0.0280|
+|bigbench_movie_recommendation                   |      0|multiple_choice_grade|0.3200|±  |0.0209|
+|bigbench_navigate                               |      0|multiple_choice_grade|0.4830|±  |0.0158|
+|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|0.4150|±  |0.0110|
+|bigbench_ruin_names                             |      0|multiple_choice_grade|0.2143|±  |0.0194|
+|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|0.2926|±  |0.0144|
+|bigbench_snarks                                 |      0|multiple_choice_grade|0.5249|±  |0.0372|
+|bigbench_sports_understanding                   |      0|multiple_choice_grade|0.4817|±  |0.0159|
+|bigbench_temporal_sequences                     |      0|multiple_choice_grade|0.2700|±  |0.0140|
+|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|0.1864|±  |0.0110|
+|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|0.1349|±  |0.0082|
+|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|0.3733|±  |0.0280|
+|boolq                                           |      1|acc                  |0.5498|±  |0.0087|
+|hellaswag                                       |      0|acc                  |0.3814|±  |0.0048|
+|                                                |       |acc_norm             |0.4677|±  |0.0050|
+|openbookqa                                      |      0|acc                  |0.1960|±  |0.0178|
+|                                                |       |acc_norm             |0.3100|±  |0.0207|
+|piqa                                            |      0|acc                  |0.6600|±  |0.0111|
+|                                                |       |acc_norm             |0.6610|±  |0.0110|
+|winogrande                                      |      0|acc                  |0.5343|±  |0.0140|
 ```
 ## Model Usage