[Help Wanted] facebook/seamless-m4t-v2-large does not support the traditional Chinese language (cmn_Hant).

#17
by ZoneTwelve - opened

I'm not sure what caused this issue, but the error message just says that it doesn't support 'cmn_Hant'.
To my understanding, the model card indicates that it supports Traditional Chinese.

The source code:

#!/usr/bin/env python
from transformers import AutoProcessor, SeamlessM4Tv2Model
import torchaudio

processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")

# from text
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
audio_array_from_text = model.generate(**text_inputs, tgt_lang="cmn_Hant")[0].cpu().numpy().squeeze()

# from audio
audio, orig_freq =  torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
audio =  torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
audio_inputs = processor(audios=audio, return_tensors="pt")
audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
/home/$(whoami)/miniforge3/envs/seamless-m4t-v2/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:28<00:00, 14.25s/it]
Traceback (most recent call last):
  File "/work/$(whoami)/code/seamless-m4t-v2-large/main.py", line 10, in <module>
    audio_array_from_text = model.generate(**text_inputs, tgt_lang="cmn_Hant")[0].cpu().numpy().squeeze()
  File "/home/$(whoami)/miniforge3/envs/seamless-m4t-v2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/$(whoami)/miniforge3/envs/seamless-m4t-v2/lib/python3.10/site-packages/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py", line 4607, in generate
    raise ValueError(
ValueError: `tgt_lang=cmn_Hant` is not supported by this model.
                    Please specify a `tgt_lang` in arb,ben,cat,ces,cmn,cym,dan,deu,eng,est,fin,fra,hin,ind,ita,jpn,kan,kor,mlt,nld,pes,pol,por,ron,rus,slk,spa,swe,swh,tam,tel,tgl,tha,tur,ukr,urd,uzn,vie. Note that SeamlessM4Tv2 supports
                    more languages for text translation than for speech synthesis.

There my installed library:

certifi==2023.11.17
charset-normalizer==3.3.2
filelock==3.13.1
fsspec==2023.12.1
huggingface-hub==0.19.4
idna==3.6
Jinja2==3.1.2
MarkupSafe==2.1.3
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
packaging==23.2
Pillow==10.1.0
protobuf==4.25.1
PyYAML==6.0.1
regex==2023.10.3
requests==2.31.0
safetensors==0.4.1
sentencepiece==0.1.99
sympy==1.12
tokenizers==0.15.0
torch==2.1.1
torchaudio==2.1.1
torchvision==0.16.1
tqdm==4.66.1
transformers @ git+https://github.com/huggingface/transformers.git@0ea42ef0f9f71deba7775ead33afa0e493823d60
triton==2.1.0
typing_extensions==4.8.0
urllib3==2.1.0

If any important information has been missed, I apologize for it.
Please feel free to send me a direct message (DM) or comment on this discussion thread.

@reach-vb @ylacombe The code below seems to require a lange code to be in all ["text_decoder_lang_to_code_id", "t2u_lang_code_to_id", "vocoder_lang_code_to_id"]. Is this expected?
Or we just require a lange code to be in one of the ["text_decoder_lang_to_code_id", "t2u_lang_code_to_id", "vocoder_lang_code_to_id"]

            # also accept __xxx__
            tgt_lang = tgt_lang.replace("__", "")
            for key in ["text_decoder_lang_to_code_id", "t2u_lang_code_to_id", "vocoder_lang_code_to_id"]:
                lang_code_to_id = getattr(self.generation_config, key, None)
                if lang_code_to_id is None:
                    raise ValueError(
                        f"""This model generation config doesn't have a `{key}` key which maps the target language
                        to the right token id. Make sure to load the right generation config."""
                    )
                elif tgt_lang not in lang_code_to_id:
                    raise ValueError(
                        f"""`tgt_lang={tgt_lang}` is not supported by this model.
                    Please specify a `tgt_lang` in {','.join(lang_code_to_id.keys())}. Note that SeamlessM4Tv2 supports
                    more languages for text translation than for speech synthesis."""
                    )

Hey @ZoneTwelve , thanks for your message here!
I believe that contrarily to what the model cards indicate, traditional Chinese language (cmn_Hant) is not supported for speech generation as you can observe here in the original code!

If you want to do text translation (i.e without generating speech) it should work though!

@ydshieh , indeed it has to be in the 3 dictionary!

However, it shouldn't be the case if we want to only predict texts, thanks for pointing that out, I'll open a PR to fix this!

Loading checkpoint shards: 50%|█████ | 1/2 [07:14<07:14, 434.07s/it] Why is it running so slowly? All the models required for this code have been downloaded locally. Have you ever encountered such a problem

@ZoneTwelve have you solved the problem? bcs i meet the same problem i trying to translate eng to traditional Chinese

Sign up or log in to comment