multiple image support?

#7
by wamozart - opened

Hi.

Great work. Does it support multiple images as input as well?

Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University org

no yet. Just One image

Hello, if the question is still relevant, then this can be done.

https://github.com/THUDM/CogVLM/issues/143

It worked in the previous version, and it still works in this one, but I did not evaluate the quality.

@alextsgnv Were you able to do that? If so, can you share the code snippet?

@wamozart Yes, I did it, the code is in the link.
https://github.com/THUDM/CogVLM/issues/143

deleted
edited Jul 16

I'm glad I found this model as well, thank you so much for sharing, and I don't think no one still realized the GRAVITY of how powerful this Vision Model is!!!
And like @wamozart said, I was looking to do a batch image processing capability for my own dataset to finetune a new SDXL model and I used GPT4V captioner by @JiayeV
https://github.com/jiayev/GPT4V-Image-Captioner
Its really worth it to do a High Quality Lora Training, which I did, https://civitai.com/user/Ababiya , and turned out really great! but it will be pricy for my 30k Dataset to finetune a Checkpoint, and that's where CoGVLM2 comes in, and thank you @alextsgnv for the link. ill look in to it. and Pls MAKE IT SUPPORT Batch image processing capability Like @JiayeV , that would be a Game changer.
I also did a small comparison between this 2 models & a result, I used https://civitai.com/models/133005/juggernaut-xl , to train the dataset & ComfyUI workflow and you be the judge.

gpt4-Vs-cogvlm2.jpg

Alphie_result.jpg

Sign up or log in to comment