pipeline_tag: visual-question-answering | |
## MiniCPM-V 2.6 int4 | |
This is the int4 quantized version of [MiniCPM-V 2.6](https://huggingface.co/openbmb/MiniCPM-V-2_6). | |
Running with int4 version would use lower GPU memory (about 7GB). | |
## Usage | |
Inference using Huggingface transformers on NVIDIA GPUs. Requirements tested on python 3.10: | |
``` | |
Pillow==10.1.0 | |
torch==2.1.2 | |
torchvision==0.16.2 | |
transformers==4.40.0 | |
sentencepiece==0.1.99 | |
accelerate==0.30.1 | |
bitsandbytes==0.43.1 | |
``` | |
```python | |
# test.py | |
import torch | |
from PIL import Image | |
from transformers import AutoModel, AutoTokenizer | |
model = AutoModel.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True) | |
tokenizer = AutoTokenizer.from_pretrained('openbmb/MiniCPM-V-2_6-int4', trust_remote_code=True) | |
model.eval() | |
image = Image.open('xx.jpg').convert('RGB') | |
question = 'What is in the image?' | |
msgs = [{'role': 'user', 'content': [image, question]}] | |
res = model.chat( | |
image=None, | |
msgs=msgs, | |
tokenizer=tokenizer | |
) | |
print(res) | |
## if you want to use streaming, please make sure sampling=True and stream=True | |
## the model.chat will return a generator | |
res = model.chat( | |
image=None, | |
msgs=msgs, | |
tokenizer=tokenizer, | |
sampling=True, | |
temperature=0.7, | |
stream=True | |
) | |
generated_text = "" | |
for new_text in res: | |
generated_text += new_text | |
print(new_text, flush=True, end='') | |
``` | |