|
--- |
|
license: other |
|
--- |
|
After the initial experiment with chronoboros-33B it was evident that the merge was to unpredictable to be useful, testing the individual models it became clear that the bias should be weighted towards Chronos. |
|
This is the new release of the merge with 75% chronos 33B, and 25% airoboros-1.4 33B. |
|
|
|
Model has been tested with the Alpaca prompting format combined with KoboldAI Lite's instruct and chat modes, as well as regular story writing. |
|
It has also been tested on basic reasoning tasks, but has not seen much testing for factual information. |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Henk717__airochronos-33B) |
|
|
|
| Metric | Value | |
|
|-----------------------|---------------------------| |
|
| Avg. | 51.43 | |
|
| ARC (25-shot) | 64.42 | |
|
| HellaSwag (10-shot) | 85.21 | |
|
| MMLU (5-shot) | 59.79 | |
|
| TruthfulQA (0-shot) | 50.59 | |
|
| Winogrande (5-shot) | 79.32 | |
|
| GSM8K (5-shot) | 13.72 | |
|
| DROP (3-shot) | 6.93 | |
|
|