Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -2,4 +2,29 @@
2
  library_name: transformers
3
  license: mit
4
  ---
5
- This repository contains the model release for the use of [CPO](https://arxiv.org/abs/2401.08417), please find more details in [our github](https://github.com/fe1ixxu/CPO_SIMPO)!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
  license: mit
4
  ---
5
+ This repository contains the model release for the use of [CPO](https://arxiv.org/abs/2401.08417), please find more details in [our github](https://github.com/fe1ixxu/CPO_SIMPO)!
6
+
7
+ ```
8
+ @inproceedings{
9
+ xu2024contrastive,
10
+ title={Contrastive Preference Optimization: Pushing the Boundaries of {LLM} Performance in Machine Translation},
11
+ author={Haoran Xu and Amr Sharaf and Yunmo Chen and Weiting Tan and Lingfeng Shen and Benjamin Van Durme and Kenton Murray and Young Jin Kim},
12
+ booktitle={Forty-first International Conference on Machine Learning},
13
+ year={2024},
14
+ url={https://openreview.net/forum?id=51iwkioZpn}
15
+ }
16
+ ```
17
+
18
+ Here are released models for CPO and SimPO. The code is based on SimPO github. We focus on highlighting reference-free preference learning and demonstrating the effectiveness of SimPO.
19
+ Additionally, we integrate length normalization and target reward margin into CPO, showing promising results and the potential benefits of combining them together.
20
+ CPO adds a BC-regularizer to prevent the model from deviating too much from the preferred data distribution.
21
+
22
+
23
+
24
+ | models | | AE2 LC | AE2 WR |
25
+ |------------------------------|-----------------------------------------------------------------------------------------------------------|:------:|:------:|
26
+ | Llama3 Instruct 8B SimPO (reported) | [princeton-nlp/Llama-3-Instruct-8B-SimPO](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO) | 44.7 | 40.5 |
27
+ | Llama3 Instruct 8B SimPO (reproduced) | [haoranxu/Llama-3-Instruct-8B-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-SimPO) | 43.3 | 40.6 |
28
+ | Llama3 Instruct 8B CPO | [haoranxu/Llama-3-Instruct-8B-CPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO) | 36.07 | 40.06 |
29
+ | Llama3 Instruct 8B CPO-SimPO | [haoranxu/Llama-3-Instruct-8B-CPO-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO-SimPO) | 46.94 | 44.72 |
30
+