JW17 commited on
Commit
440e7a8
·
verified ·
1 Parent(s): f3b69ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md CHANGED
@@ -1,3 +1,71 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - argilla/ultrafeedback-binarized-preferences-cleaned
5
+ base_model:
6
+ - mistralai/Mistral-7B-v0.1
7
+ language:
8
+ - en
9
+ model-index:
10
+ - name: Mistral-ORPO-β
11
+ results:
12
+ - task:
13
+ type: text-generation
14
+ dataset:
15
+ name: AlpacaEval 1
16
+ type: AlpacaEval
17
+ metrics:
18
+ - name: Win Rate
19
+ type: AlpacaEval 1.0
20
+ value: 87.92%
21
+ source:
22
+ name: self-reported
23
+ url: https://github.com/tatsu-lab/alpaca_eval
24
+ - task:
25
+ type: text-generation
26
+ dataset:
27
+ name: AlpacaEval 2
28
+ type: AlpacaEval
29
+ metrics:
30
+ - name: Win Rate
31
+ type: AlpacaEval 2.0
32
+ value: 11.33%
33
+ source:
34
+ name: self-reported
35
+ url: https://github.com/tatsu-lab/alpaca_eval
36
+ - task:
37
+ type: text-generation
38
+ dataset:
39
+ name: MT-Bench
40
+ type: MT-Bench
41
+ metrics:
42
+ - name: Score
43
+ type: MT-Bench
44
+ value: 7.23
45
+ source:
46
+ name: self-reported
47
+ url: https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/
48
+ pipeline_tag: text-generation
49
  ---
50
+ # **Mistral-ORPO-β (7B)**
51
+
52
+ **Mistral-ORPO** is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the *odds ratio preference optimization (ORPO)*. With ORPO, the model directly learns the preference without the supervised fine-tuning warmup phase. **Mistral-ORPO-β** is fine-tuned exclusively on the 61k instances of the cleaned version of UltraFeedback, [argilla/ultrafeedback-binarized-preferences-cleaned](https://huggingface.co/datasets/argilla/ultrafeedback-binarized-preferences-cleaned), by [Argilla](https://huggingface.co/argilla).
53
+
54
+ ## Model Performance
55
+
56
+ |Model Name|Size|Align|MT-Bench|AlpacaEval 1.0|AlpacaEval 2.0|
57
+ |:--------|:--------------:|:--------------:|:-------------------:|:------------:|:------------:|
58
+ |**Mistral-<tt>ORPO</tt>-⍺**|7B|<tt>ORPO</tt>|7.23|87.92|11.33|
59
+ |**Mistral-<tt>ORPO</tt>-β**|7B|<tt>ORPO</tt>|7.32|91.41|12.20|
60
+ |Zephyr ($\beta$) |7B|DPO|7.34|90.60|10.99|
61
+ |TULU-2-DPO |13B|DPO|7.00|89.5|10.12|
62
+ |Llama-2-Chat |7B|RLHF|6.27|71.37|4.96|
63
+ |Llama-2-Chat |13B|RLHF|6.65|81.09|7.70|
64
+
65
+
66
+ ## Chat Template
67
+ ```
68
+ <|user|>
69
+ Hi! How are you doing?</s>
70
+ <|assistant|>
71
+ ```