Update README.md
Browse files
README.md
CHANGED
@@ -12,9 +12,9 @@ tags:
|
|
12 |
|
13 |
![](https://i.imgur.com/89ZAKcn.png)
|
14 |
|
15 |
-
# NeuralBeagle14-7B
|
16 |
|
17 |
-
**Update 01/16/24: NeuralBeagle14-7B is probably the best 7B model you can find
|
18 |
|
19 |
NeuralBeagle14-7B is a DPO fine-tune of [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) using the [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) preference dataset and my DPO notebook from [this article](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac).
|
20 |
|
@@ -22,8 +22,26 @@ Thanks [Argilla](https://huggingface.co/argilla) for providing the dataset and t
|
|
22 |
|
23 |
You can try it out in this [Space](https://huggingface.co/spaces/mlabonne/NeuralBeagle14-7B-GGUF-Chat) (GGUF Q4_K_M).
|
24 |
|
|
|
|
|
|
|
|
|
25 |
## π Evaluation
|
26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
27 |
The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite. It is the best 7B model to date.
|
28 |
|
29 |
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|
@@ -38,12 +56,6 @@ The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/ll
|
|
38 |
|
39 |
You can find the complete benchmark on [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
|
40 |
|
41 |
-
It's also on top of the Open LLM Leaderboard:
|
42 |
-
|
43 |
-
![](https://i.imgur.com/62gUTFn.png)
|
44 |
-
|
45 |
-
Compared to Beagle14, there's no improvement in this benchmark. This might be due to an unlucky run, but I think I might be overexploiting argilla/distilabel-intel-orca-dpo-pairs at this point. Another preference dataset could improve it even further. Note that the Beagle models perform better than Turdus, which is purposely contaminated on Winogrande (very high score).
|
46 |
-
|
47 |
## π» Usage
|
48 |
|
49 |
```python
|
|
|
12 |
|
13 |
![](https://i.imgur.com/89ZAKcn.png)
|
14 |
|
15 |
+
# πΆ NeuralBeagle14-7B
|
16 |
|
17 |
+
**Update 01/16/24: NeuralBeagle14-7B is (probably) the best 7B model you can find! π**
|
18 |
|
19 |
NeuralBeagle14-7B is a DPO fine-tune of [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) using the [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) preference dataset and my DPO notebook from [this article](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac).
|
20 |
|
|
|
22 |
|
23 |
You can try it out in this [Space](https://huggingface.co/spaces/mlabonne/NeuralBeagle14-7B-GGUF-Chat) (GGUF Q4_K_M).
|
24 |
|
25 |
+
## β‘ Quantized models
|
26 |
+
|
27 |
+
* **GGUF**: https://huggingface.co/mlabonne/NeuralBeagle14-7B-GGUF
|
28 |
+
|
29 |
## π Evaluation
|
30 |
|
31 |
+
### Open LLM Leaderboard
|
32 |
+
|
33 |
+
NeuralBeagle14-7B ranks first on the Open LLM Leaderboard in the ~7B category.
|
34 |
+
|
35 |
+
![](https://i.imgur.com/4nAzJsr.png)
|
36 |
+
|
37 |
+
It has the same average score as Beagle14-7B ("Show merges"), which could be due to might be due to an unlucky run.
|
38 |
+
I think I might be overexploiting argilla/distilabel-intel-orca-dpo-pairs at this point, since this dataset or its original version are present in multiple models.
|
39 |
+
I need to find more high-quality preference data for the next DPO merge.
|
40 |
+
|
41 |
+
Note that some models like udkai/Turdus and nfaheem/Marcoroni-7b-DPO-Merge are unfortunately contaminated on purpose (see the very high Winogrande score).
|
42 |
+
|
43 |
+
### Nous
|
44 |
+
|
45 |
The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite. It is the best 7B model to date.
|
46 |
|
47 |
| Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
|
|
|
56 |
|
57 |
You can find the complete benchmark on [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
|
58 |
|
|
|
|
|
|
|
|
|
|
|
|
|
59 |
## π» Usage
|
60 |
|
61 |
```python
|