metadata
language:
- it
base_model:
- openbmb/MiniCPM-V-2_6
library_name: transformers
tags:
- vision
- vqa-italian
- visual-question-answering-italian
A finetuned version of MiniCPM-V 2.6 on GQA-it: Italian Question Answering on Image Scene Graph
Usage
Check out the GitHub repository for more insights and code: https://github.com/crux82/XXXXXX. You can also visit the original basic model repository for advanced usage: https://github.com/OpenBMB/MiniCPM-V.
For more details about dataset please visit: https://github.com/crux82/gqa-it
import torch
from PIL import Image
from transformers import AutoModel, AutoTokenizer,AutoProcessor
model = AutoModel.from_pretrained('sag-uniroma2/MiniCPM-V-2_6-gqa-it-finetuned', trust_remote_code=True,
attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6', trust_remote_code=True)
img="xx.jpg"
image = Image.open(img).convert('RGB')
question = "C'è un idrante sull'erba?"
msgs = [{'role': 'user', 'content': [image,question]}]
answer = model.chat(
image=None,
msgs=msgs,
tokenizer=tokenizer
)
print(answer)
Citation
TODO