OSError: teknium/Replit-v1-CodeInstruct-3B does not appear to have a tokenizer file

#2
by Sulav - opened

Receiving this error when trying to run this locally:
OSError: teknium/Replit-v1-CodeInstruct-3B does not appear to have a file named replit/replit-code-v1-3b--replit_lm_tokenizer.py. Checkout 'https://huggingface.co/teknium/Replit-v1-CodeInstruct-3B/main' for available files.

I can see that the main branch has a file called replit_lm_tokenizer.py not sure why there is a replit/replit-code-v1-3b-- being pre-pended to it. If I replace the model_name to be replit/replit-code-v1-3b then it loads properly.

This is the code snippet I am running for this:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch 

# model_name = "replit/replit-code-v1-3b" # loads properly
model_name = "teknium/Replit-v1-CodeInstruct-3B" # encounters the error above

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)
model.to('cuda')

Receiving this error when trying to run this locally:
OSError: teknium/Replit-v1-CodeInstruct-3B does not appear to have a file named replit/replit-code-v1-3b--replit_lm_tokenizer.py. Checkout 'https://huggingface.co/teknium/Replit-v1-CodeInstruct-3B/main' for available files.

I can see that the main branch has a file called replit_lm_tokenizer.py not sure why there is a replit/replit-code-v1-3b-- being pre-pended to it. If I replace the model_name to be replit/replit-code-v1-3b then it loads properly.

This is the code snippet I am running for this:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch 

# model_name = "replit/replit-code-v1-3b" # loads properly
model_name = "teknium/Replit-v1-CodeInstruct-3B" # encounters the error above

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)
model.to('cuda')

Yes thank you we are fixing it right now ^_^ It's got some references to the original replit model in the config.json

Receiving this error when trying to run this locally:
OSError: teknium/Replit-v1-CodeInstruct-3B does not appear to have a file named replit/replit-code-v1-3b--replit_lm_tokenizer.py. Checkout 'https://huggingface.co/teknium/Replit-v1-CodeInstruct-3B/main' for available files.

I can see that the main branch has a file called replit_lm_tokenizer.py not sure why there is a replit/replit-code-v1-3b-- being pre-pended to it. If I replace the model_name to be replit/replit-code-v1-3b then it loads properly.

This is the code snippet I am running for this:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch 

# model_name = "replit/replit-code-v1-3b" # loads properly
model_name = "teknium/Replit-v1-CodeInstruct-3B" # encounters the error above

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)
model.to('cuda')

It's fixed now. You can just update the config.json to fix it

teknium changed discussion status to closed

The issue still persists with the same error. I looked at the repo and changing the string in the tokenizer_config.json resolves the issue.

Current:

      "replit/replit-code-v1-3b--replit_lm_tokenizer.ReplitLMTokenizer",
      null
    ]
  }```

Fixed:
```"AutoTokenizer": [
      "replit_lm_tokenizer.ReplitLMTokenizer",
      null
    ]
  }```
Sulav changed discussion status to open

There were two sections of the config file to update, did you remove the model name portion?

I see. I thought I had the latest pulled. Thanks!

Sulav changed discussion status to closed

Sign up or log in to comment