[Help Wanted] facebook/seamless-m4t-v2-large does not support the traditional Chinese language (cmn_Hant).

#17

by ZoneTwelve - opened Dec 8, 2023

Dec 8, 2023

•

edited Dec 8, 2023

I'm not sure what caused this issue, but the error message just says that it doesn't support 'cmn_Hant'.
To my understanding, the model card indicates that it supports Traditional Chinese.

The source code:

#!/usr/bin/env python
from transformers import AutoProcessor, SeamlessM4Tv2Model
import torchaudio

processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")

# from text
text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
audio_array_from_text = model.generate(**text_inputs, tgt_lang="cmn_Hant")[0].cpu().numpy().squeeze()

# from audio
audio, orig_freq =  torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
audio =  torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
audio_inputs = processor(audios=audio, return_tensors="pt")
audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()

/home/$(whoami)/miniforge3/envs/seamless-m4t-v2/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:28<00:00, 14.25s/it]
Traceback (most recent call last):
  File "/work/$(whoami)/code/seamless-m4t-v2-large/main.py", line 10, in <module>
    audio_array_from_text = model.generate(**text_inputs, tgt_lang="cmn_Hant")[0].cpu().numpy().squeeze()
  File "/home/$(whoami)/miniforge3/envs/seamless-m4t-v2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/$(whoami)/miniforge3/envs/seamless-m4t-v2/lib/python3.10/site-packages/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py", line 4607, in generate
    raise ValueError(
ValueError: `tgt_lang=cmn_Hant` is not supported by this model.
                    Please specify a `tgt_lang` in arb,ben,cat,ces,cmn,cym,dan,deu,eng,est,fin,fra,hin,ind,ita,jpn,kan,kor,mlt,nld,pes,pol,por,ron,rus,slk,spa,swe,swh,tam,tel,tgl,tha,tur,ukr,urd,uzn,vie. Note that SeamlessM4Tv2 supports
                    more languages for text translation than for speech synthesis.

There my installed library:

certifi==2023.11.17
charset-normalizer==3.3.2
filelock==3.13.1
fsspec==2023.12.1
huggingface-hub==0.19.4
idna==3.6
Jinja2==3.1.2
MarkupSafe==2.1.3
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
packaging==23.2
Pillow==10.1.0
protobuf==4.25.1
PyYAML==6.0.1
regex==2023.10.3
requests==2.31.0
safetensors==0.4.1
sentencepiece==0.1.99
sympy==1.12
tokenizers==0.15.0
torch==2.1.1
torchaudio==2.1.1
torchvision==0.16.1
tqdm==4.66.1
transformers @ git+https://github.com/huggingface/transformers.git@0ea42ef0f9f71deba7775ead33afa0e493823d60
triton==2.1.0
typing_extensions==4.8.0
urllib3==2.1.0

ZoneTwelve

Dec 8, 2023

•

edited Dec 8, 2023

If any important information has been missed, I apologize for it.
Please feel free to send me a direct message (DM) or comment on this discussion thread.

ydshieh

Dec 8, 2023

@reach-vb @ylacombe The code below seems to require a lange code to be in all ["text_decoder_lang_to_code_id", "t2u_lang_code_to_id", "vocoder_lang_code_to_id"]. Is this expected?
Or we just require a lange code to be in one of the ["text_decoder_lang_to_code_id", "t2u_lang_code_to_id", "vocoder_lang_code_to_id"]

            # also accept __xxx__
            tgt_lang = tgt_lang.replace("__", "")
            for key in ["text_decoder_lang_to_code_id", "t2u_lang_code_to_id", "vocoder_lang_code_to_id"]:
                lang_code_to_id = getattr(self.generation_config, key, None)
                if lang_code_to_id is None:
                    raise ValueError(
                        f"""This model generation config doesn't have a `{key}` key which maps the target language
                        to the right token id. Make sure to load the right generation config."""
                    )
                elif tgt_lang not in lang_code_to_id:
                    raise ValueError(
                        f"""`tgt_lang={tgt_lang}` is not supported by this model.
                    Please specify a `tgt_lang` in {','.join(lang_code_to_id.keys())}. Note that SeamlessM4Tv2 supports
                    more languages for text translation than for speech synthesis."""
                    )

ylacombe

Dec 8, 2023

Hey @ZoneTwelve , thanks for your message here!
I believe that contrarily to what the model cards indicate, traditional Chinese language (cmn_Hant) is not supported for speech generation as you can observe here in the original code!

If you want to do text translation (i.e without generating speech) it should work though!

ylacombe

Dec 8, 2023

@ydshieh , indeed it has to be in the 3 dictionary!

However, it shouldn't be the case if we want to only predict texts, thanks for pointing that out, I'll open a PR to fix this!

ylacombe

Dec 17, 2023

Fixed by https://github.com/huggingface/transformers/pull/28019

Zhaoz1997

Jan 15, 2024

Loading checkpoint shards: 50%|█████ | 1/2 [07:14<07:14, 434.07s/it] Why is it running so slowly? All the models required for this code have been downloaded locally. Have you ever encountered such a problem

CTNick

May 20, 2024

@ZoneTwelve have you solved the problem? bcs i meet the same problem i trying to translate eng to traditional Chinese

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment