How to get Image and Caption Embeddings
#2
by
Harshad2410
- opened
I have a image and a caption associated with image, I want to get the cross embeddings of the both image and text in a single vector form.
Hi,
From the output (BridgeTowerContrastiveOutput) of BridgeTowerForContrastiveLearning you can access cross modal embeddings using:
model = BridgeTowerForContrastiveLearning.from_pretrained("BridgeTower/bridgetower-large-itm-mlm-itc")
inputs = processor(images, texts, padding=True, return_tensors="pt")
outputs = model(**inputs)
cross_modal_embeddings = outputs.cross_embeds
Hi, shaoyent I'm getting error while running this model. can you check ones more it's working or not