mlabonne
/

NeuralBeagle14-7B

@@ -12,9 +12,9 @@ tags:
 ![](https://i.imgur.com/89ZAKcn.png)
-# NeuralBeagle14-7B
-**Update 01/16/24: NeuralBeagle14-7B is probably the best 7B model you can find. 🎉**
 NeuralBeagle14-7B is a DPO fine-tune of [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) using the [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) preference dataset and my DPO notebook from [this article](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac).
@@ -22,8 +22,26 @@ Thanks [Argilla](https://huggingface.co/argilla) for providing the dataset and t
 You can try it out in this [Space](https://huggingface.co/spaces/mlabonne/NeuralBeagle14-7B-GGUF-Chat) (GGUF Q4_K_M).
 ## 🏆 Evaluation
 The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite. It is the best 7B model to date.
 | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
@@ -38,12 +56,6 @@ The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/ll
 You can find the complete benchmark on [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
-It's also on top of the Open LLM Leaderboard:
-![](https://i.imgur.com/62gUTFn.png)
-Compared to Beagle14, there's no improvement in this benchmark. This might be due to an unlucky run, but I think I might be overexploiting argilla/distilabel-intel-orca-dpo-pairs at this point. Another preference dataset could improve it even further. Note that the Beagle models perform better than Turdus, which is purposely contaminated on Winogrande (very high score).
 ## 💻 Usage
 ```python

 ![](https://i.imgur.com/89ZAKcn.png)
+# 🐶 NeuralBeagle14-7B
+**Update 01/16/24: NeuralBeagle14-7B is (probably) the best 7B model you can find! 🎉**
 NeuralBeagle14-7B is a DPO fine-tune of [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) using the [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) preference dataset and my DPO notebook from [this article](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac).
 You can try it out in this [Space](https://huggingface.co/spaces/mlabonne/NeuralBeagle14-7B-GGUF-Chat) (GGUF Q4_K_M).
+## ⚡ Quantized models
+* **GGUF**: https://huggingface.co/mlabonne/NeuralBeagle14-7B-GGUF
 ## 🏆 Evaluation
+### Open LLM Leaderboard
+NeuralBeagle14-7B ranks first on the Open LLM Leaderboard in the ~7B category.
+![](https://i.imgur.com/4nAzJsr.png)
+It has the same average score as Beagle14-7B ("Show merges"), which could be due to might be due to an unlucky run.
+I think I might be overexploiting argilla/distilabel-intel-orca-dpo-pairs at this point, since this dataset or its original version are present in multiple models.
+I need to find more high-quality preference data for the next DPO merge.
+Note that some models like udkai/Turdus and nfaheem/Marcoroni-7b-DPO-Merge are unfortunately contaminated on purpose (see the very high Winogrande score).
+### Nous
 The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite. It is the best 7B model to date.
 | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
 You can find the complete benchmark on [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
 ## 💻 Usage
 ```python