ValueError: The input provided to the model are wrong. The number of image tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.
#2
by
barleyspectacular
- opened
Error in the example
I got same error.
Same error here
Thanks for reporting, looking into this. It has to do with a discrepancy between the slow/fast tokenizer.
A current workaround is using this:
from transformers import LlavaNextProcessor
processor = LlavaNextProcessor.from_pretrained("llava-hf/llava-v1.6-34b-hf", use_fast=False)
Opened an issue here: https://github.com/huggingface/transformers/issues/29774 (cc @ArthurZ )
Hi, so with 4-bit quantization, this model requires 34/2 = 17GB of RAM. So yes that should work.
there is a question: Is llava-hf/llava-v1.6-34b-hf can be loaded with a single V100-32G GPU?
Your mileage may vary, but in 4-bit quantization with Flash Attention 2, I was just able to run this model on my 24G 3090 Ti. I even had to reduce Torch's CUDA split size to squeeze every last bit of optimization I could out of it otherwise I was hitting OOM's.
nielsr
changed discussion status to
closed
NOT fixed.