OpenGVLab/InternVL-Chat-V1-5 · I'm sorry, but I am unable to view or describe images as I am a text-based program.

May 13

Using the example in the model card, I am getting these outputs:

dynamic ViT batch size: 7
请详细描述图片 这张图片是一张宣传海报，上面有中文文字。海报的主要颜色是蓝色和白色，中间有一个大号的白色字母“A”。海报上的文字包括“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级” 、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级
dynamic ViT batch size: 7
请详细描述图片 这张图片是一张宣传海报，上面有中文文字。海报的主要颜色是蓝色和白色，中间有一个大号的白色字母“A”。海报上的文字包括“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级” 、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级”、“A级
dynamic ViT batch size: 7
请根据图片写一首诗 海报蓝白间，
大写A字显眼。
宣传信息藏其中，
引人注目真精彩。
dynamic ViT batch size: 12
详细描述这两张图片 很抱歉，我无法查看或描述图片。我是一个语言模型，无法处理视觉信息。
dynamic ViT batch size: 12
这两张图片的相同点和区别分别是什么 很抱歉，我无法查看或描述图片。我是一个语言模型，无法处理视觉信息。
dynamic ViT batch size: 12, image_counts: [7, 5]
Describe the image in detail.
I'm sorry, but I am unable to describe the image as I am a text-based AI and do not have the ability to view or analyze images.
Describe the image in detail.
I'm sorry, but I am unable to view or describe images as I am a text-based program.

alextsgnv

May 21

Hello, I also encountered this error when I tried to use the model. It was possible to achieve at least some results different from this only when I used the Chinese traditional language. In other languages, including Simplified Chinese, the model responded in a similar way. Write if you can get the model to respond correctly in other languages.

czczup

OpenGVLab org Jul 7

Thank you for your feedback. Because the V1.5 model did not include multi-image data during training, its performance in handling multiple images is unstable. You might want to try our latest InternVL2 series models, which might offer improvements.

czczup changed discussion status to closed Jul 7