YipengZhang
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -39,4 +39,13 @@ The primary intended users of the model are researchers and hobbyists in compute
|
|
39 |
## Training dataset
|
40 |
- JBU Pretrain: MS-COCO stuff 2017
|
41 |
- Pretrain: LLaVA-Pretrain 558K (filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.)
|
42 |
-
- SFT: 858k-mixed dataset in https://huggingface.co/datasets/YipengZhang/LLaVA-UHD-v2-SFT-Data
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
39 |
## Training dataset
|
40 |
- JBU Pretrain: MS-COCO stuff 2017
|
41 |
- Pretrain: LLaVA-Pretrain 558K (filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.)
|
42 |
+
- SFT: 858k-mixed dataset in https://huggingface.co/datasets/YipengZhang/LLaVA-UHD-v2-SFT-Data
|
43 |
+
|
44 |
+
## Citation
|
45 |
+
If you find LLaVA-UHD v2 useful for your research and applications, please cite using this BibTeX:
|
46 |
+
@article{zhang2024llavauhdv2,
|
47 |
+
title={LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer},
|
48 |
+
author={Yipeng Zhang and Yifan Liu and Zonghao Guo and Yidan Zhang and Xuesong Yang and Chi Chen and Jun Song and Bo Zheng and Yuan Yao and Zhiyuan Liu and Tat-Seng Chua and Maosong Sun},
|
49 |
+
journal={arXiv preprint arXiv:2412.13871},
|
50 |
+
year={2024}
|
51 |
+
}
|