Tijmen2
/

cosmosage_v2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Tijmen2 commited on Feb 19

Commit

41b412d

•

1 Parent(s): 91e8a08

Update README.md

Files changed (1) hide show

README.md +18 -0

README.md CHANGED Viewed

@@ -106,6 +106,24 @@ When using one of the quantized versions, make sure to pass the quantization con
 }
 ```
 ## Example output
 **User:**

 }
 ```
+## Standard evaluations
+cosmosage can be compared to OpenHermes-2.5-Mistral-7B using standard evaluation metrics.
+| Test Category | cosmosage_v2 | OpenHermes-2.5-Mistral-7B |
+|---------------|-------------------------|------------------------------------|
+| Overall | 0.595 | 0.632 |
+| ARC Challenge | 0.565 | 0.613 |
+| Hellaswag | 0.619 | 0.652 |
+| TruthfulQA:mc1 | 0.348 | 0.361 |
+| TruthfulQA:mc2 | 0.510 | 0.522 |
+| Winogrande | 0.759 | 0.781 |
+| GSM8k | 0.368 | 0.261 |
+cosmosage performs only slightly below OpenHermes-2.5-Mistral-7B on most metrics, indicating that the heavy
+specialization in cosmology has left its general-purpose abilities nearly unchanged. The exception is GSM8k,
+which is a collection of grade school math problems. Here, cosmosage performs slightly better.
 ## Example output
 **User:**