multiple image support?
Hi.
Great work. Does it support multiple images as input as well?
no yet. Just One image
Hello, if the question is still relevant, then this can be done.
https://github.com/THUDM/CogVLM/issues/143
It worked in the previous version, and it still works in this one, but I did not evaluate the quality.
@alextsgnv Were you able to do that? If so, can you share the code snippet?
@wamozart
Yes, I did it, the code is in the link.
https://github.com/THUDM/CogVLM/issues/143
I'm glad I found this model as well, thank you so much for sharing, and I don't think no one still realized the GRAVITY of how powerful this Vision Model is!!!
And like
@wamozart
said, I was looking to do a batch image processing capability for my own dataset to finetune a new SDXL model and I used GPT4V captioner by
@JiayeV
https://github.com/jiayev/GPT4V-Image-Captioner
Its really worth it to do a High Quality Lora Training, which I did, https://civitai.com/user/Ababiya , and turned out really great! but it will be pricy for my 30k Dataset to finetune a Checkpoint, and that's where CoGVLM2 comes in, and thank you
@alextsgnv
for the link. ill look in to it. and Pls MAKE IT SUPPORT Batch image processing capability Like
@JiayeV
, that would be a Game changer.
I also did a small comparison between this 2 models & a result, I used https://civitai.com/models/133005/juggernaut-xl , to train the dataset & ComfyUI workflow and you be the judge.