Update README.md
#3
by
CombinHorizon
- opened
README.md
CHANGED
@@ -2,4 +2,29 @@
|
|
2 |
library_name: transformers
|
3 |
license: mit
|
4 |
---
|
5 |
-
This repository contains the model release for the use of [CPO](https://arxiv.org/abs/2401.08417), please find more details in [our github](https://github.com/fe1ixxu/CPO_SIMPO)!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
library_name: transformers
|
3 |
license: mit
|
4 |
---
|
5 |
+
This repository contains the model release for the use of [CPO](https://arxiv.org/abs/2401.08417), please find more details in [our github](https://github.com/fe1ixxu/CPO_SIMPO)!
|
6 |
+
|
7 |
+
```
|
8 |
+
@inproceedings{
|
9 |
+
xu2024contrastive,
|
10 |
+
title={Contrastive Preference Optimization: Pushing the Boundaries of {LLM} Performance in Machine Translation},
|
11 |
+
author={Haoran Xu and Amr Sharaf and Yunmo Chen and Weiting Tan and Lingfeng Shen and Benjamin Van Durme and Kenton Murray and Young Jin Kim},
|
12 |
+
booktitle={Forty-first International Conference on Machine Learning},
|
13 |
+
year={2024},
|
14 |
+
url={https://openreview.net/forum?id=51iwkioZpn}
|
15 |
+
}
|
16 |
+
```
|
17 |
+
|
18 |
+
Here are released models for CPO and SimPO. The code is based on SimPO github. We focus on highlighting reference-free preference learning and demonstrating the effectiveness of SimPO.
|
19 |
+
Additionally, we integrate length normalization and target reward margin into CPO, showing promising results and the potential benefits of combining them together.
|
20 |
+
CPO adds a BC-regularizer to prevent the model from deviating too much from the preferred data distribution.
|
21 |
+
|
22 |
+
|
23 |
+
|
24 |
+
| models | | AE2 LC | AE2 WR |
|
25 |
+
|------------------------------|-----------------------------------------------------------------------------------------------------------|:------:|:------:|
|
26 |
+
| Llama3 Instruct 8B SimPO (reported) | [princeton-nlp/Llama-3-Instruct-8B-SimPO](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO) | 44.7 | 40.5 |
|
27 |
+
| Llama3 Instruct 8B SimPO (reproduced) | [haoranxu/Llama-3-Instruct-8B-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-SimPO) | 43.3 | 40.6 |
|
28 |
+
| Llama3 Instruct 8B CPO | [haoranxu/Llama-3-Instruct-8B-CPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO) | 36.07 | 40.06 |
|
29 |
+
| Llama3 Instruct 8B CPO-SimPO | [haoranxu/Llama-3-Instruct-8B-CPO-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO-SimPO) | 46.94 | 44.72 |
|
30 |
+
|