ValueError: The input provided to the model are wrong. The number of image tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

by barleyspectacular - opened Mar 20, 2024

Discussion

barleyspectacular

Mar 20, 2024

Error in the example

wksj

Mar 21, 2024

I got same error.

jpacilo

Mar 21, 2024

Same error here

nielsr

Llava Hugging Face org Mar 21, 2024

Thanks for reporting, looking into this. It has to do with a discrepancy between the slow/fast tokenizer.

A current workaround is using this:

from transformers import LlavaNextProcessor

processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-34b-hf", use_fast=False)

nielsr

Llava Hugging Face org Mar 21, 2024

Opened an issue here: https://github.com/huggingface/transformers/issues/29774 (cc @ArthurZ )

deleted

Mar 21, 2024

there is a question: Is llava-hf/llava-v1.6-34b-hf can be loaded with a single V100-32G GPU?

nielsr

Llava Hugging Face org Mar 21, 2024

•

edited Mar 21, 2024

Hi, so with 4-bit quantization, this model requires 34/2 = 17GB of RAM. So yes that should work.

benjamin-paine

Mar 21, 2024

there is a question: Is llava-hf/llava-v1.6-34b-hf can be loaded with a single V100-32G GPU?

Your mileage may vary, but in 4-bit quantization with Flash Attention 2, I was just able to run this model on my 24G 3090 Ti. I even had to reduce Torch's CUDA split size to squeeze every last bit of optimization I could out of it otherwise I was hitting OOM's.