metadata

library_name: transformers
base_model: microsoft/Florence-2-base-ft
tags:
  - finetune
  - image-to-text
  - VQA
  - VLM
language:
  - en

Model Details

Visual Question Answering Model

This model is a fine-tuned version of microsoft/Florence-2-base-ft designed for Visual Question Answering (VQA). It has been optimized for tasks where the model interprets images and responds to questions about the visual content.

Model Details

Finetuned by: prithivMLmods
Model type: Visual Question Answering (VQA)
Language(s): English (NLP component)
License: None specified
Finetuned from model: microsoft/Florence-2-base-ft

Usage

This model can be used to perform VQA tasks, where it takes an image and a question about the image as input, and returns an answer based on the visual content.