Amazing model and very promissing

by paulorodriguesjr - opened Jun 21

Jun 21

Hi guys, I'm here just to say: Amazing model. A lot of multimodality methods.

I'm getting 0.07 ~ 0.14ms inference time in the CAPTION_TO_PHRASE_GROUNDING mode on an RTX 3080 10GB. I think edge devices can benefit from this model aswell.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment