alvarobartt
/

mistral-orpo-mix

Text Generation

text-generation-inference

Model card Files Files and versions Metrics Training metrics Community

mistral-orpo-mix / README.md

alvarobartt's picture

alvarobartt HF staff

Update README.md

f8d8203 verified 8 months ago

|

history blame contribute delete

1.03 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- orpo
	- trl
	datasets:
	- alvarobartt/dpo-mix-7k-simplified
	base_model: mistralai/Mistral-7B-v0.1
	pipeline_tag: text-generation
	inference: false
	---

	## ORPO fine-tune of Mistral 7B v0.1 with DPO Mix 7K

	![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/60f0608166e5701b80ed3f02/hRyhnTySt-KQ0gnnoclSm.jpeg)

	> Stable Diffusion XL "A capybara, a killer whale, and a robot named Ultra being friends"

	This is an ORPO fine-tune of [`mistralai/Mistral-7B-v0.1`](https://huggingface.co/mistralai/Mistral-7B-v0.1) with
	[`alvarobartt/dpo-mix-7k-simplified`](https://huggingface.co/datasets/alvarobartt/dpo-mix-7k-simplified).

	⚠️ Note that the code is still experimental, as the `ORPOTrainer` PR is still not merged, follow its progress
	at [🤗`trl` - `ORPOTrainer` PR](https://github.com/huggingface/trl/pull/1435).

	## Reference

	[`ORPO: Monolithic Preference Optimization without Reference Model`](https://huggingface.co/papers/2403.07691)