Experiencing issue with 'Can't load tokenizer'

by metagenix-ai - opened about 23 hours ago

Discussion

metagenix-ai

about 23 hours ago

•

edited about 23 hours ago

Hi there,

first thank you for this wonderful project!

Unfortunately, I expercienced problems to execute the code "pipe = pipeline("text-generation", model="Esperanto/Protein-Llama-3-8B")" at the beginning.

As result of that, the follwing error code prompted:
OSError: Can't load tokenizer for 'Esperanto/Protein-Llama-3-8B'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Esperanto/Protein-Llama-3-8B' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

I tried to solve this by upgrading transformers (pip install --upgrade transformers) but this did not help. Moreover, I downloaded the large files, too. Still the same. Do you have any suggestions?

Thanks in advance!

ericsorides

Esperanto Technologies org about 16 hours ago

Hey,
Thanks for reporting this bug! This should be fixed now, the tokenizer has been uploaded.
If it works we'll close this issue!

metagenix-ai

about 14 hours ago

Thank you for your swift update!
This bug still exists.

Here is the complete error message:

OSError Traceback (most recent call last)
Input In [1], in
15 from transformers import pipeline
17 messages = [
18 {"role": "user", "content": "Who are you?"},
19 ]
---> 20 pipe = pipeline("text-generation", model="Esperanto/Protein-Llama-3-8B")
21 pipe(messages)

File /opt/miniconda3/lib/python3.9/site-packages/transformers/pipelines/init.py:1033, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
1030 tokenizer_kwargs = model_kwargs.copy()
1031 tokenizer_kwargs.pop("torch_dtype", None)
-> 1033 tokenizer = AutoTokenizer.from_pretrained(
1034 tokenizer_identifier, use_fast=use_fast, _from_pipeline=task, **hub_kwargs, **tokenizer_kwargs
1035 )
1037 if load_image_processor:
1038 # Try to infer image processor from model or config name (if provided as str)
1039 if image_processor is None:

File /opt/miniconda3/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py:939, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
936 tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
938 if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
--> 939 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
940 else:
941 if tokenizer_class_py is not None:

File /opt/miniconda3/lib/python3.9/site-packages/transformers/tokenization_utils_base.py:2197, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
2194 # If one passes a GGUF file path to gguf_file there is no need for this check as the tokenizer will be
2195 # loaded directly from the GGUF file.
2196 if all(full_file_name is None for full_file_name in resolved_vocab_files.values()) and not gguf_file:
-> 2197 raise EnvironmentError(
2198 f"Can't load tokenizer for '{pretrained_model_name_or_path}'. If you were trying to load it from "
2199 "'https://huggingface.co/models', make sure you don't have a local directory with the same name. "
2200 f"Otherwise, make sure '{pretrained_model_name_or_path}' is the correct path to a directory "
2201 f"containing all relevant files for a {cls.name} tokenizer."
2202 )
2204 for file_id, file_path in vocab_files.items():
2205 if file_id not in resolved_vocab_files:

OSError: Can't load tokenizer for 'Esperanto/Protein-Llama-3-8B'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'Esperanto/Protein-Llama-3-8B' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

Ulrica1620

Esperanto Technologies org about 12 hours ago

Hi,
Could you share the version of the transformers library you are currently using? Instead of upgrading, you might try downgrading to an earlier version (e.g., 4.38.0), as this could resolve the issue.

metagenix-ai

about 1 hour ago

I used transformers version 4.42.0 (current) and downgraded to 4.38.0 (pip install --upgrade transformers==4.38.0).
The error remains unchanged after downgrade. Any further suggestions?

Shanks9

Esperanto Technologies org 6 minutes ago

Can you share your notebook , we can try to take a look at it..

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment