HELP
I'm trying to use llava-1.5-7b-hf and i'm new and clueless in LMMs but i have an error when i try to use the simple example
raise ValueError(
ValueError: The input provided to the model are wrong. The number of im
age tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.
the code i'm using is:
import requests
from PIL import Image
import torch
from transformers import AutoProcessor, LlavaForConditionalGeneration
model_id = "llava-hf/llava-1.5-7b-hf"
model = LlavaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16
).to(0)
processor = AutoProcessor.from_pretrained(model_id, patch_size = 32 , vision_feature_select_strategy = 'default')
prompt = "What's in the picture"
image_file = "http://images.cocodataset.org/val2017/000000039769.jpg"
raw_image = Image.open(requests.get(image_file, stream=True).raw)
inputs = processor(images=raw_image, text=prompt, return_tensors='pt').to(0, dtype=torch.float16)
output = model.generate(**inputs, max_new_tokens=200, do_sample=False)
print(processor.decode(output[0][2:], skip_special_tokens=True))
You have to add special <image>
token in the prompt so that it knows there should be one image and make sure to format the prompt correctly in chat template. More info in model doc page https://huggingface.co/docs/transformers/en/model_doc/llava
You have to add special
<image>
token in the prompt so that it knows there should be one image and make sure to format the prompt correctly in chat template. More info in model doc page https://huggingface.co/docs/transformers/en/model_doc/llava
thank you for replying but i still have an error
raise ValueError(
ValueError: Image features and image tokens do not match: tokens: 577, features 576
i added these lines:
processor.vision_feature_select_strategy = 'patch'
processor.patch_size = 14
conversation = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "What is shown in this image?"},
],
},
]
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
does this maybe have to do with this??
\envs\dsrnet\lib\site-packages\transformers\models\clip\modeling_clip.
py:540: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actionsrunner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)