openbmb
/

RLAIF-V-12B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

RLAIF-V-12B / README.md

Yirany's picture

Update README.md

a006514 verified 6 months ago

|

history blame contribute delete

3.13 kB

	---
	license: apache-2.0
	datasets:
	- openbmb/RLAIF-V-Dataset
	language:
	- en
	paper:
	---

	# Model Card for RLAIF-V

	[GitHub ](https://github.com/RLHF-V/RLAIF-V) \| [Paper](https://arxiv.org/abs/2405.17220)

	RLAIF-V-12B is a multimodal large language model (MLLM) that exhibits super GPT-4V trustworthiness. The model is built up on OmniLMM from the [MiniCPM-V](https://github.com/OpenBMB/MiniCPM-V) series.

	We utilize a novel framework, [RLAIF-V](https://github.com/RLHF-V/RLAIF-V), which aligns MLLMs in a fully open-source paradigm. This framework maximally exploits the [open-source feedback](https://huggingface.co/datasets/HaoyeZhang/RLAIF-V-Dataset) from two key perspectives, including high-quality feedback data and an online feedback learning algorithm.

	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/T4hALrgNdXKHnkvb-27bA.png" alt="fig1" width="85%"/>
	</p>

	## Model Details

	### Key Features

	* 🏅 Super GPT-4V Trustworthiness: By learning from open-source AI feedback, RLAIF-V-12B achieves super GPT-4V trustworthiness in both generative and discriminative tasks.
	* 💪 Maintaining Well Performance on General Abilities: On benchmarks tested with the general abilities (e.g. LLaVA Bench, MMStar), RLAIF-V-12B also exhibits good performance.

	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/ypXZxb4HE-jDPJU9115bi.png" alt="fig1" width="90%"/>
	</p>
	<!-- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/ypXZxb4HE-jDPJU9115bi.png) -->

	### Examples
	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/yg-Ksp9qi8AodURSmX769.png" alt="fig2-1" width="81%"/>
	<img src="https://cdn-uploads.huggingface.co/production/uploads/6566e0c493e30c8a60048eb3/NSEkeBmH99B44rX8GTZig.png" alt="fig2-1" width="80%"/>
	</p>

	### Model Description
	- Related model: [OmniLMM-12B](https://huggingface.co/openbmb/OmniLMM-12B)
	- Trained on data: [RLAIF-V-Dataset](https://huggingface.co/datasets/HaoyeZhang/RLAIF-V-Dataset)

	## Usage
	Please look at [GitHub](https://github.com/RLHF-V/RLAIF-V) for more details about usage.



	## Citation

	If you find our model/code/paper helpful, please consider cite our papers 📝:

	```bibtex
	@article{yu2023rlhf,
	title={Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback},
	author={Yu, Tianyu and Yao, Yuan and Zhang, Haoye and He, Taiwen and Han, Yifeng and Cui, Ganqu and Hu, Jinyi and Liu, Zhiyuan and Zheng, Hai-Tao and Sun, Maosong and others},
	journal={arXiv preprint arXiv:2312.00849},
	year={2023}
	}

	@article{yu2024rlaifv,
	title={RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness},
	author={Yu, Tianyu and Zhang, Haoye and Yao, Yuan and Dang, Yunkai and Chen, Da and Lu, Xiaoman and Cui, Ganqu and He, Taiwen and Liu, Zhiyuan and Chua, Tat-Seng and Sun, Maosong},
	journal={arXiv preprint arXiv:2405.17220},
	year={2024},
	}
	```