How to get Image and Caption Embeddings

by Harshad2410 - opened Sep 16

Sep 16

I have a image and a caption associated with image, I want to get the cross embeddings of the both image and text in a single vector form.

shaoyent

BridgeTower org Sep 23

Hi,
From the output (BridgeTowerContrastiveOutput) of BridgeTowerForContrastiveLearning you can access cross modal embeddings using:

model = BridgeTowerForContrastiveLearning.from_pretrained("BridgeTower/bridgetower-large-itm-mlm-itc")

inputs  = processor(images, texts, padding=True, return_tensors="pt")
outputs = model(**inputs)

cross_modal_embeddings = outputs.cross_embeds

Parth376

15 days ago

Hi, shaoyent I'm getting error while running this model. can you check ones more it's working or not

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment