airochronos-33B / README.md
leaderboard-pr-bot's picture
Adding Evaluation Results
8b6a1c5
|
raw
history blame
1.23 kB
---
license: other
---
After the initial experiment with chronoboros-33B it was evident that the merge was to unpredictable to be useful, testing the individual models it became clear that the bias should be weighted towards Chronos.
This is the new release of the merge with 75% chronos 33B, and 25% airoboros-1.4 33B.
Model has been tested with the Alpaca prompting format combined with KoboldAI Lite's instruct and chat modes, as well as regular story writing.
It has also been tested on basic reasoning tasks, but has not seen much testing for factual information.
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Henk717__airochronos-33B)
| Metric | Value |
|-----------------------|---------------------------|
| Avg. | 51.43 |
| ARC (25-shot) | 64.42 |
| HellaSwag (10-shot) | 85.21 |
| MMLU (5-shot) | 59.79 |
| TruthfulQA (0-shot) | 50.59 |
| Winogrande (5-shot) | 79.32 |
| GSM8K (5-shot) | 13.72 |
| DROP (3-shot) | 6.93 |