[Help Wanted] facebook/seamless-m4t-v2-large does not support the traditional Chinese language (cmn_Hant).
I'm not sure what caused this issue, but the error message just says that it doesn't support 'cmn_Hant'.
To my understanding, the model card indicates that it supports Traditional Chinese.
The source code:
#!/usr/bin/env python
from transformers import AutoProcessor, SeamlessM4Tv2Model
import torchaudio
processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
# from text
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
audio_array_from_text = model.generate(**text_inputs, tgt_lang="cmn_Hant")[0].cpu().numpy().squeeze()
# from audio
audio, orig_freq = torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
audio = torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
audio_inputs = processor(audios=audio, return_tensors="pt")
audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
/home/$(whoami)/miniforge3/envs/seamless-m4t-v2/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:28<00:00, 14.25s/it]
Traceback (most recent call last):
File "/work/$(whoami)/code/seamless-m4t-v2-large/main.py", line 10, in <module>
audio_array_from_text = model.generate(**text_inputs, tgt_lang="cmn_Hant")[0].cpu().numpy().squeeze()
File "/home/$(whoami)/miniforge3/envs/seamless-m4t-v2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/$(whoami)/miniforge3/envs/seamless-m4t-v2/lib/python3.10/site-packages/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py", line 4607, in generate
raise ValueError(
ValueError: `tgt_lang=cmn_Hant` is not supported by this model.
Please specify a `tgt_lang` in arb,ben,cat,ces,cmn,cym,dan,deu,eng,est,fin,fra,hin,ind,ita,jpn,kan,kor,mlt,nld,pes,pol,por,ron,rus,slk,spa,swe,swh,tam,tel,tgl,tha,tur,ukr,urd,uzn,vie. Note that SeamlessM4Tv2 supports
more languages for text translation than for speech synthesis.
There my installed library:
certifi==2023.11.17
charset-normalizer==3.3.2
filelock==3.13.1
fsspec==2023.12.1
huggingface-hub==0.19.4
idna==3.6
Jinja2==3.1.2
MarkupSafe==2.1.3
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
packaging==23.2
Pillow==10.1.0
protobuf==4.25.1
PyYAML==6.0.1
regex==2023.10.3
requests==2.31.0
safetensors==0.4.1
sentencepiece==0.1.99
sympy==1.12
tokenizers==0.15.0
torch==2.1.1
torchaudio==2.1.1
torchvision==0.16.1
tqdm==4.66.1
transformers @ git+https://github.com/huggingface/transformers.git@0ea42ef0f9f71deba7775ead33afa0e493823d60
triton==2.1.0
typing_extensions==4.8.0
urllib3==2.1.0
If any important information has been missed, I apologize for it.
Please feel free to send me a direct message (DM) or comment on this discussion thread.
@reach-vb
@ylacombe
The code below seems to require a lange code to be in all ["text_decoder_lang_to_code_id", "t2u_lang_code_to_id", "vocoder_lang_code_to_id"]
. Is this expected?
Or we just require a lange code to be in one of the ["text_decoder_lang_to_code_id", "t2u_lang_code_to_id", "vocoder_lang_code_to_id"]
# also accept __xxx__
tgt_lang = tgt_lang.replace("__", "")
for key in ["text_decoder_lang_to_code_id", "t2u_lang_code_to_id", "vocoder_lang_code_to_id"]:
lang_code_to_id = getattr(self.generation_config, key, None)
if lang_code_to_id is None:
raise ValueError(
f"""This model generation config doesn't have a `{key}` key which maps the target language
to the right token id. Make sure to load the right generation config."""
)
elif tgt_lang not in lang_code_to_id:
raise ValueError(
f"""`tgt_lang={tgt_lang}` is not supported by this model.
Please specify a `tgt_lang` in {','.join(lang_code_to_id.keys())}. Note that SeamlessM4Tv2 supports
more languages for text translation than for speech synthesis."""
)
Hey
@ZoneTwelve
, thanks for your message here!
I believe that contrarily to what the model cards indicate, traditional Chinese language (cmn_Hant) is not supported for speech generation as you can observe here in the original code!
If you want to do text translation (i.e without generating speech) it should work though!
Loading checkpoint shards: 50%|█████ | 1/2 [07:14<07:14, 434.07s/it] Why is it running so slowly? All the models required for this code have been downloaded locally. Have you ever encountered such a problem
@ZoneTwelve have you solved the problem? bcs i meet the same problem i trying to translate eng to traditional Chinese