|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- orpo |
|
- trl |
|
datasets: |
|
- alvarobartt/dpo-mix-7k-simplified |
|
base_model: mistralai/Mistral-7B-v0.1 |
|
pipeline_tag: text-generation |
|
inference: false |
|
--- |
|
|
|
## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K |
|
|
|
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/60f0608166e5701b80ed3f02/hRyhnTySt-KQ0gnnoclSm.jpeg) |
|
|
|
> Stable Diffusion XL "A capybara, a killer whale, and a robot named Ultra being friends" |
|
|
|
This is an ORPO fine-tune of [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) with |
|
[`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified). |
|
|
|
⚠️ Note that the code is still experimental, as the `ORPOTrainer` PR is still not merged, follow its progress |
|
at [🤗`trl` - `ORPOTrainer` PR](https://github.com/huggingface/trl/pull/1435). |
|
|
|
## Reference |
|
|
|
[`ORPO: Monolithic Preference Optimization without Reference Model`](https://huggingface.co/papers/2403.07691) |