metadata

language:
  - it
base_model:
  - openbmb/MiniCPM-V-2_6
library_name: transformers
tags:
  - vision
  - vqa-italian
  - visual-question-answering-italian

A finetuned version of MiniCPM-V 2.6 on GQA-it: Italian Question Answering on Image Scene Graph

Usage

Check out the GitHub repository for more insights and code: https://github.com/crux82/XXXXXX. You can also visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V.

For more details about dataset please visit: https://github.com/crux82/gqa-it

import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer,AutoProcessor

model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True,
                                  attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
img="xx.jpg"
image = Image.open(img).convert('RGB')

question = "C'è un idrante sull'erba?"
msgs = [{'role': 'user', 'content': [image,question]}]

answer = model.chat(
    image=None,
    msgs=msgs,
    tokenizer=tokenizer
)
print(answer)

Citation

TODO