mlabonne commited on
Commit
6afe488
β€’
1 Parent(s): 71cf7ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -8
README.md CHANGED
@@ -12,9 +12,9 @@ tags:
12
 
13
  ![](https://i.imgur.com/89ZAKcn.png)
14
 
15
- # NeuralBeagle14-7B
16
 
17
- **Update 01/16/24: NeuralBeagle14-7B is probably the best 7B model you can find. πŸŽ‰**
18
 
19
  NeuralBeagle14-7B is a DPO fine-tune of [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) using the [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) preference dataset and my DPO notebook from [this article](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac).
20
 
@@ -22,8 +22,26 @@ Thanks [Argilla](https://huggingface.co/argilla) for providing the dataset and t
22
 
23
  You can try it out in this [Space](https://huggingface.co/spaces/mlabonne/NeuralBeagle14-7B-GGUF-Chat) (GGUF Q4_K_M).
24
 
 
 
 
 
25
  ## πŸ† Evaluation
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite. It is the best 7B model to date.
28
 
29
  | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
@@ -38,12 +56,6 @@ The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/ll
38
 
39
  You can find the complete benchmark on [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
40
 
41
- It's also on top of the Open LLM Leaderboard:
42
-
43
- ![](https://i.imgur.com/62gUTFn.png)
44
-
45
- Compared to Beagle14, there's no improvement in this benchmark. This might be due to an unlucky run, but I think I might be overexploiting argilla/distilabel-intel-orca-dpo-pairs at this point. Another preference dataset could improve it even further. Note that the Beagle models perform better than Turdus, which is purposely contaminated on Winogrande (very high score).
46
-
47
  ## πŸ’» Usage
48
 
49
  ```python
 
12
 
13
  ![](https://i.imgur.com/89ZAKcn.png)
14
 
15
+ # 🐢 NeuralBeagle14-7B
16
 
17
+ **Update 01/16/24: NeuralBeagle14-7B is (probably) the best 7B model you can find! πŸŽ‰**
18
 
19
  NeuralBeagle14-7B is a DPO fine-tune of [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) using the [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) preference dataset and my DPO notebook from [this article](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac).
20
 
 
22
 
23
  You can try it out in this [Space](https://huggingface.co/spaces/mlabonne/NeuralBeagle14-7B-GGUF-Chat) (GGUF Q4_K_M).
24
 
25
+ ## ⚑ Quantized models
26
+
27
+ * **GGUF**: https://huggingface.co/mlabonne/NeuralBeagle14-7B-GGUF
28
+
29
  ## πŸ† Evaluation
30
 
31
+ ### Open LLM Leaderboard
32
+
33
+ NeuralBeagle14-7B ranks first on the Open LLM Leaderboard in the ~7B category.
34
+
35
+ ![](https://i.imgur.com/4nAzJsr.png)
36
+
37
+ It has the same average score as Beagle14-7B ("Show merges"), which could be due to might be due to an unlucky run.
38
+ I think I might be overexploiting argilla/distilabel-intel-orca-dpo-pairs at this point, since this dataset or its original version are present in multiple models.
39
+ I need to find more high-quality preference data for the next DPO merge.
40
+
41
+ Note that some models like udkai/Turdus and nfaheem/Marcoroni-7b-DPO-Merge are unfortunately contaminated on purpose (see the very high Winogrande score).
42
+
43
+ ### Nous
44
+
45
  The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite. It is the best 7B model to date.
46
 
47
  | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
 
56
 
57
  You can find the complete benchmark on [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
58
 
 
 
 
 
 
 
59
  ## πŸ’» Usage
60
 
61
  ```python