mistral-orpo-mix / README.md
alvarobartt's picture
alvarobartt HF staff
Update README.md
f8d8203 verified
---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- orpo
- trl
datasets:
- alvarobartt/dpo-mix-7k-simplified
base_model: mistralai/Mistral-7B-v0.1
pipeline_tag: text-generation
inference: false
---
## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/60f0608166e5701b80ed3f02/hRyhnTySt-KQ0gnnoclSm.jpeg)
> Stable Diffusion XL "A capybara, a killer whale, and a robot named Ultra being friends"
This is an ORPO fine-tune of [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
[`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).
⚠️ Note that the code is still experimental, as the `ORPOTrainer` PR is still not merged, follow its progress
at [🤗`trl` - `ORPOTrainer` PR](https://github.com/huggingface/trl/pull/1435).
## Reference
[`ORPO: Monolithic Preference Optimization without Reference Model`](https://huggingface.co/papers/2403.07691)