gaudi/opus-mt-zh-en-ctranslate2 · hf_hub

LukeJacob2023

Sep 1, 2024

when I run the code below:

from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
from transformers import AutoTokenizer

model_name = "gaudi/opus-mt-zh-en-ctranslate2"
model = TranslatorCT2fromHfHub(
model_name_or_path=model_name,
device="cuda",
compute_type="int8_float16",
tokenizer=AutoTokenizer.from_pretrained(model_name)
)
outputs = model.generate(
text=["XXX XX XXX XXXXXXX XXXX?", "XX XX XXXX XX XXX!"],
)
print(outputs)

RuntimeError: Cannot load the target vocabulary from the model directory

gaudi

Owner Oct 19, 2024

Hi there! I apologize for the delayed response.

I was able to reproduce the error. It looks like the "hf_hub_ctranslate2" package doesn't seem to recognize the shared_vocabulary.json in the repository. Unfortunately, I'm not sure what can be done from my end to get it to recognize the file.

That being said, there is a work around. If you clone the repository via "git clone https://huggingface.co/gaudi/opus-mt-zh-en-ctranslate2/tree/main" and change the model_name variable to point to the cloned repository, that works.

I've also recompiled the model.bin file with the latest version of Ctranslate2 at this time (4.4.0). I'll try to update the README.md to all these repos when I get the chance.

gaudi changed discussion status to closed Oct 19, 2024

gaudi changed discussion status to open Oct 19, 2024

gaudi
/

opus-mt-zh-en-ctranslate2

hf_hub_ctranslate2 runtime error