Amazing model and very promissing
#5
by
paulorodriguesjr
- opened
Hi guys, I'm here just to say: Amazing model. A lot of multimodality methods.
I'm getting 0.07 ~ 0.14ms inference time in the CAPTION_TO_PHRASE_GROUNDING mode on an RTX 3080 10GB. I think edge devices can benefit from this model aswell.