Update README.md
Browse files
README.md
CHANGED
@@ -117,6 +117,7 @@ print(text_outputs)
|
|
117 |
- **Mid Stage:** A mixture of 4.7M high-quality synthetic data, 1 epoch, full model
|
118 |
- **Final-Image Stage:** A mixture of 3.6M single-image data, 1 epoch, full model
|
119 |
- **OneVision Stage:** A mixture of 1.6M single-image/multi-image/video data, 1 epoch, full model
|
|
|
120 |
- **Precision:** bfloat16
|
121 |
|
122 |
## Hardware & Software
|
@@ -131,4 +132,14 @@ print(text_outputs)
|
|
131 |
@article{li2024llavaonevision,
|
132 |
title={LLaVA-OneVision},
|
133 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
134 |
```
|
|
|
117 |
- **Mid Stage:** A mixture of 4.7M high-quality synthetic data, 1 epoch, full model
|
118 |
- **Final-Image Stage:** A mixture of 3.6M single-image data, 1 epoch, full model
|
119 |
- **OneVision Stage:** A mixture of 1.6M single-image/multi-image/video data, 1 epoch, full model
|
120 |
+
- **Critic / Preference Learning Stage:** 9.4k question-image input from [LLaVA-RLHF](https://llava-rlhf.github.io/) with self-generated responses, reward signal from [llava-critic-72b](https://huggingface.co/lmms-lab/llava-critic-72b), iterative DPO for 3 rounds, full model
|
121 |
- **Precision:** bfloat16
|
122 |
|
123 |
## Hardware & Software
|
|
|
132 |
@article{li2024llavaonevision,
|
133 |
title={LLaVA-OneVision},
|
134 |
}
|
135 |
+
|
136 |
+
@article{xiong2024llavacritic,
|
137 |
+
title={LLaVA-Critic: Learning to Evaluate Multimodal Models},
|
138 |
+
author={Xiong, Tianyi and Wang, Xiyao and Guo, Dong and Ye, Qinghao and Fan, Haoqi and Gu, Quanquan and Huang, Heng and Li, Chunyuan},
|
139 |
+
year={2024},
|
140 |
+
eprint={2410.02712},
|
141 |
+
archivePrefix={arXiv},
|
142 |
+
primaryClass={cs.CV},
|
143 |
+
url={https://arxiv.org/abs/2410.02712},
|
144 |
+
}
|
145 |
```
|