YipengZhang
/

LLaVA-UHD-v2

Image-Text-to-Text

text-generation

Model card Files Files and versions Community

YipengZhang commited on Dec 20, 2024

Commit

34d22e8

·

verified ·

1 Parent(s): 8069fc4

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -39,4 +39,13 @@ The primary intended users of the model are researchers and hobbyists in compute
 ## Training dataset
 - JBU Pretrain: MS-COCO stuff 2017
 - Pretrain: LLaVA-Pretrain 558K (filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.)
-- SFT: 858k-mixed dataset in https://huggingface.co/datasets/YipengZhang/LLaVA-UHD-v2-SFT-Data

 ## Training dataset
 - JBU Pretrain: MS-COCO stuff 2017
 - Pretrain: LLaVA-Pretrain 558K (filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.)
+- SFT: 858k-mixed dataset in https://huggingface.co/datasets/YipengZhang/LLaVA-UHD-v2-SFT-Data
+## Citation
+If you find LLaVA-UHD v2 useful for your research and applications, please cite using this BibTeX:
+@article{zhang2024llavauhdv2,
+  title={LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer},
+  author={Yipeng Zhang and Yifan Liu and Zonghao Guo and Yidan Zhang and Xuesong Yang and Chi Chen and Jun Song and Bo Zheng and Yuan Yao and Zhiyuan Liu and Tat-Seng Chua and Maosong Sun},
+  journal={arXiv preprint arXiv:2412.13871},
+  year={2024}
+}