|
--- |
|
library_name: transformers |
|
base_model: microsoft/Florence-2-base-ft |
|
tags: |
|
- finetune |
|
- image-to-text |
|
- VQA |
|
- VLM |
|
language: |
|
- en |
|
--- |
|
|
|
# Model Details |
|
|
|
# Visual Question Answering Model |
|
|
|
This model is a fine-tuned version of `microsoft/Florence-2-base-ft` designed for Visual Question Answering (VQA). It has been optimized for tasks where the model interprets images and responds to questions about the visual content. |
|
|
|
--- |
|
|
|
### Model Details |
|
|
|
- **Finetuned by:** prithivMLmods |
|
- **Model type:** Visual Question Answering (VQA) |
|
- **Language(s):** English (NLP component) |
|
- **License:** None specified |
|
- **Finetuned from model:** [microsoft/Florence-2-base-ft](https://huggingface.co/microsoft/Florence-2-base-ft) |
|
|
|
### Usage |
|
|
|
This model can be used to perform VQA tasks, where it takes an image and a question about the image as input, and returns an answer based on the visual content. |
|
|
|
|
|
|
|
|