Henk717
/

airochronos-33B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

airochronos-33B / README.md

leaderboard-pr-bot's picture

leaderboard-pr-bot

Adding Evaluation Results

8b6a1c5 about 1 year ago

|

1.23 kB

	---
	license: other
	---
	After the initial experiment with chronoboros-33B it was evident that the merge was to unpredictable to be useful, testing the individual models it became clear that the bias should be weighted towards Chronos.
	This is the new release of the merge with 75% chronos 33B, and 25% airoboros-1.4 33B.

	Model has been tested with the Alpaca prompting format combined with KoboldAI Lite's instruct and chat modes, as well as regular story writing.
	It has also been tested on basic reasoning tasks, but has not seen much testing for factual information.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Henk717__airochronos-33B)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 51.43 \|
	\| ARC (25-shot) \| 64.42 \|
	\| HellaSwag (10-shot) \| 85.21 \|
	\| MMLU (5-shot) \| 59.79 \|
	\| TruthfulQA (0-shot) \| 50.59 \|
	\| Winogrande (5-shot) \| 79.32 \|
	\| GSM8K (5-shot) \| 13.72 \|
	\| DROP (3-shot) \| 6.93 \|