---
language:
- it
base_model:
- openbmb/MiniCPM-V-2_6
library_name: transformers
tags:
- vision
- vqa-italian
- visual-question-answering-italian
---
Finetuned version of MiniCPM-V 2.6 on GQA-it
This is a fine-tuned version of MiniCPM-V 2.6 on GQA-it, designed for Italian Vision Question Answering.
The original model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters.
# Usage
You can visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V.
For more details about dataset please visit: https://github.com/crux82/gqa-it
```python
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer,AutoProcessor
model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True,
attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
img="n346247.jpg"
image = Image.open(img).convert('RGB')
question = "C'è un idrante sull'erba?"
msgs = [{'role': 'user', 'content': [image,question]}]
answer = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer
)
print(answer)
```
# GQA-it
## Italian Question Answering on Image Scene Graphs
GQA-it is a **large-scale Italian dataset for Visual Question Answering** based on the balanced version of [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html).
GQA-it contains more than **1 million question/answer pairs in Italian over 80K images** obtained by applying Neural Machine Translation.
Most importantly, a **Test set of 3,000 question-answer pairs has been manually validated to provide a valuable benchmark in Italian**.
## Example
![](n90294.jpg)
| Language | Question | Answer |
| --- | :---: | :---: |
| En | Is the remote to the right or to the left of the book? | right |
| It | _Il telecomando è a destra o a sinistra del libro?_ | _destra_ |
| En | How thick is the book to the left of the remote? | thick |
| It | _Quanto è spesso il libro a sinistra del telecomando?_ | _spesso_ |
| En | What device is to the left of the calculator made of plastic?| charger |
| It | _Quale dispositivo si trova a sinistra della calcolatrice di plastica?_ | _caricabatterie_ |
| En | What's the charger made of? | plastic |
| It | _Di cosa è fatto il caricabatterie?_ | _plastica_ |
| En | Are there any phones? | no |
| It | _Ci sono dei telefoni?_ | _no_ |
# Citation
```
TODO
```