Use Molmo vision encoder for classification.
#20
by
shafeeq007
- opened
I want to use Molmo's vision encoder to encode images and train a classification head. I have few questions.
- How can I encode the images in batch as processor creates multiple random crops of input images according to image resolution.
- What is the best way to combine/pool the embeddings of crops of a single image before passing them to the classification head.