Transformers
PyTorch
English
bridgetower
gaudi
Inference Endpoints

How to get Image and Caption Embeddings

#2
by Harshad2410 - opened

I have a image and a caption associated with image, I want to get the cross embeddings of the both image and text in a single vector form.

BridgeTower org

Hi,
From the output (BridgeTowerContrastiveOutput) of BridgeTowerForContrastiveLearning you can access cross modal embeddings using:

model = BridgeTowerForContrastiveLearning.from_pretrained("BridgeTower/bridgetower-large-itm-mlm-itc")

inputs  = processor(images, texts, padding=True, return_tensors="pt")
outputs = model(**inputs)

cross_modal_embeddings = outputs.cross_embeds

Hi, shaoyent I'm getting error while running this model. can you check ones more it's working or not

Sign up or log in to comment