llava-hf/bakLlava-v1-hf · Error while running in mps

Dec 18, 2023

•

edited Dec 18, 2023

How can I fix the below error that comes while using mps?

Code :

model_id = "llava-hf/bakLlava-v1-hf"
pipe = pipeline("image-to-text", model=model_id, device='mps', framework='pt')
image = df['Product Image Link'][1000]
max_new_tokens = 200
prompt = "USER: <image>\nWrite a detailed product description for the product in the image for a customer planning to buy this product?\nASSISTANT:"

outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 1000})

Error:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[79], line 4
      1 max_new_tokens = 200
      2 prompt = "USER: <image>\nWrite a detailed product description for the product in the image for a customer planning to buy this product?\nASSISTANT:"
----> 4 outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 1000})

File ~/miniconda3/envs/imgtotext/lib/python3.9/site-packages/transformers/pipelines/image_to_text.py:111, in ImageToTextPipeline.__call__(self, images, **kwargs)
     83 def __call__(self, images: Union[str, List[str], "Image.Image", List["Image.Image"]], **kwargs):
     84     """
     85     Assign labels to the image(s) passed as inputs.
     86 
   (...)
    109         - **generated_text** (`str`) -- The generated text.
    110     """
--> 111     return super().__call__(images, **kwargs)

File ~/miniconda3/envs/imgtotext/lib/python3.9/site-packages/transformers/pipelines/base.py:1140, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   1132     return next(
   1133         iter(
   1134             self.get_iterator(
   (...)
   1137         )
   1138     )
   1139 else:
...
    315     )
    317 final_embedding[image_to_overwrite] = image_features.contiguous().reshape(-1, embed_dim)
    318 final_attention_mask |= image_to_overwrite

ValueError: The input provided to the model are wrong. The number of image tokens is 1 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.