haoranxu
/

Llama-3-Instruct-8B-CPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Update README.md

#3

by CombinHorizon - opened Aug 3

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -2,4 +2,29 @@
 library_name: transformers
 license: mit
 ---
-This repository contains the model release for the use of [CPO](https://arxiv.org/abs/2401.08417), please find more details in [our github](https://github.com/fe1ixxu/CPO_SIMPO)!

 library_name: transformers
 license: mit
 ---
+This repository contains the model release for the use of [CPO](https://arxiv.org/abs/2401.08417), please find more details in [our github](https://github.com/fe1ixxu/CPO_SIMPO)!
+```
+@inproceedings{
+xu2024contrastive,
+title={Contrastive Preference Optimization: Pushing the Boundaries of {LLM} Performance in Machine Translation},
+author={Haoran Xu and Amr Sharaf and Yunmo Chen and Weiting Tan and Lingfeng Shen and Benjamin Van Durme and Kenton Murray and Young Jin Kim},
+booktitle={Forty-first International Conference on Machine Learning},
+year={2024},
+url={https://openreview.net/forum?id=51iwkioZpn}
+}
+```
+Here are released models for CPO and SimPO. The code is based on SimPO github. We focus on highlighting reference-free preference learning and demonstrating the effectiveness of SimPO.
+Additionally, we integrate length normalization and target reward margin into CPO, showing promising results and the potential benefits of combining them together.
+CPO adds a BC-regularizer to prevent the model from deviating too much from the preferred data distribution.
+| models                       |                                                                                                           | AE2 LC | AE2 WR |
+|------------------------------|-----------------------------------------------------------------------------------------------------------|:------:|:------:|
+| Llama3 Instruct 8B SimPO (reported)     | [princeton-nlp/Llama-3-Instruct-8B-SimPO](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO) |  44.7  |  40.5  |
+| Llama3 Instruct 8B SimPO (reproduced)     | [haoranxu/Llama-3-Instruct-8B-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-SimPO) |  43.3  |  40.6  |
+| Llama3 Instruct 8B CPO       | [haoranxu/Llama-3-Instruct-8B-CPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO) |  36.07  |  40.06  |
+| Llama3 Instruct 8B CPO-SimPO | [haoranxu/Llama-3-Instruct-8B-CPO-SimPO](https://huggingface.co/haoranxu/Llama-3-Instruct-8B-CPO-SimPO) |  46.94  |  44.72  |