Qwen2-VL
Collection
Vision-language model series based on Qwen2
•
16 items
•
Updated
•
176
We're excited to unveil Qwen2-VL, the latest iteration of our Qwen-VL model, representing nearly a year of innovation.
This is the base pretrained model of Qwen2-VL-7B without instruction tuning.
We have three models with 2, 7 and 72 billion parameters.
This repo contains the pretrained 7B Qwen2-VL model.
For more information, visit our Blog and GitHub.
The code of Qwen2-VL has been in the latest Hugging Face transformers
and we advise you to install the latest version with command pip install -U transformers
, or you might encounter the following error:
KeyError: 'qwen2_vl'
If you find our work helpful, feel free to give us a cite.
@article{Qwen2-VL,
title={Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution},
author={Peng Wang and Shuai Bai and Sinan Tan and Shijie Wang and Zhihao Fan and Jinze Bai and Keqin Chen and Xuejing Liu and Jialin Wang and Wenbin Ge and Yang Fan and Kai Dang and Mengfei Du and Xuancheng Ren and Rui Men and Dayiheng Liu and Chang Zhou and Jingren Zhou and Junyang Lin},
journal={arXiv preprint arXiv:2409.12191},
year={2024}
}
@article{Qwen-VL,
title={Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond},
author={Bai, Jinze and Bai, Shuai and Yang, Shusheng and Wang, Shijie and Tan, Sinan and Wang, Peng and Lin, Junyang and Zhou, Chang and Zhou, Jingren},
journal={arXiv preprint arXiv:2308.12966},
year={2023}
}