Suddenly getting an error while executing processor = AutoProcessor.from_pretrained( 'llava-hf/llava-1.5-7b-hf')
Hello, I have working with llava and suddenly today I am facing this error,
Exception Traceback (most recent call last)
in <cell line: 1>()
----> 1 processor = LlavaProcessor.from_pretrained(
2 'llava-hf/llava-1.5-7b-hf'
3 )
5 frames
/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_fast.py in init(self, *args, **kwargs)
109 elif fast_tokenizer_file is not None and not from_slow:
110 # We have a serialization from tokenizers which let us directly build the backend
--> 111 fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
112 elif slow_tokenizer is not None:
113 # We need to convert a slow tokenizer to build the backend
Exception: data did not match any variant of untagged enum ModelWrapper at line 277156 column 3
What is the reason for sudden occurrence of this error and how to resolve it?
I can see a commit in the processor 11 hours ago, could this be the reason? If so, how to resolve it?
I encounter a same error🥲.
The generation also seems off :(
This is a way to load the model from the previous commit and does fine,
processor = AutoProcessor.from_pretrained(
'llava-hf/llava-1.5-7b-hf',
revision='a272c74'
)
@Dipto084 can you share your env setup pls? Might be that the new update uploaded fast tokenizer which is the default, but your env can't load it
For the many "image" tokens, that is expected, Each image will have as many placeholders as there are image embeddings after the vision tower. So it is around 500 tokens per image. You can skip_special_tokens=True
to remove them and decode only the text
@RaushanTurganbay
Not the guy you were asking the env setup for. But I have Transformers 4.41.1 and experienced the same issues. I manually modified the config files and now it is working.
First, for processor_config.json 4.41.1 works just by having an empty dict, like "{}", as the llava processor in that version does not accept process processor_config parameters.
Second, tokenizer.json accepted by 4.41.1 had a different format for "merges" as you can see from my screenshot of the working one.
By using the older version of tokenizer.json and replace the content of processor_config.json to '{}' things should work. However, best solution is to specify the commit id as suggested by @Dipto084 .
Yes, the new tokenizer config need transformers at least v4.45 where we raised requirement for tokenizers>=0.20
. Same goes for processors, as we'll stop supporting old logic for llava models in the next few releases. So it is expected and advised to use the new version of transformers
If you want to use older version for any reason, then yeah, feel free to indicate the commit hash.
This is a way to load the model from the previous commit and does fine,
processor = AutoProcessor.from_pretrained(
'llava-hf/llava-1.5-7b-hf',
revision='a272c74'
)
Thx so much for your solution. I encountered the same issue with "llava-hf/llava-1.5-7b-hf".
could you tell me where I can find the revision code?