--- language: - it base_model: - openbmb/MiniCPM-V-2_6 library_name: transformers tags: - vision - vqa-italian - visual-question-answering-italian ---

Finetuned version of MiniCPM-V 2.6 on GQA-it

This is a fine-tuned version of MiniCPM-V 2.6 on GQA-it, designed for Italian Vision Question Answering. The original model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. # Usage You can visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V. For more details about dataset please visit: https://github.com/crux82/gqa-it ```python import torch from PIL import Image from transformers import AutoModel, AutoTokenizer,AutoProcessor model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16) model = model.eval().cuda() tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True) img="n346247.jpg" image = Image.open(img).convert('RGB') question = "C'è un idrante sull'erba?" msgs = [{'role': 'user', 'content': [image,question]}] answer = model.chat( image=None, msgs=msgs, tokenizer=tokenizer ) print(answer) ``` # GQA-it ## Italian Question Answering on Image Scene Graphs GQA-it is a **large-scale Italian dataset for Visual Question Answering** based on the balanced version of [GQA](https://cs.stanford.edu/people/dorarad/gqa/about.html). GQA-it contains more than **1 million question/answer pairs in Italian over 80K images** obtained by applying Neural Machine Translation. Most importantly, a **Test set of 3,000 question-answer pairs has been manually validated to provide a valuable benchmark in Italian**. ## Example ![](n90294.jpg) | Language | Question | Answer | | --- | :---: | :---: | | En | Is the remote to the right or to the left of the book? | right | | It | _Il telecomando è a destra o a sinistra del libro?_ | _destra_ | | En | How thick is the book to the left of the remote? | thick | | It | _Quanto è spesso il libro a sinistra del telecomando?_ | _spesso_ | | En | What device is to the left of the calculator made of plastic?| charger | | It | _Quale dispositivo si trova a sinistra della calcolatrice di plastica?_ | _caricabatterie_ | | En | What's the charger made of? | plastic | | It | _Di cosa è fatto il caricabatterie?_ | _plastica_ | | En | Are there any phones? | no | | It | _Ci sono dei telefoni?_ | _no_ | # Citation ``` TODO ```