facebook/seamless-m4t-v2-large · Translation of longer texts

Jan 10

It seems that only very short texts like 1 sentence are supported. I am wondering if I am missing something, or how would you use it to translate a few pages of texts?

shuaiby88

Jan 14

Did you ever figure this out?

peterBagnegaard

Aug 7

I also have this problem. It seems to break down at around 30s of audio.

peterBagnegaard

Aug 7

•

edited Aug 7

I've made a hack to fix this. 30s of audio corresponds in very rough terms to 750 characters, so I split the text up into chunks, converted each chunk and appended the chunks together. This gives reasonable results, although the code is straight up ugly:

def split_text(text, MAX_CHARACTERS=700):
    sub_texts = []
    sub_text = ""
    for chunk in text.split('.'):
        if len(sub_text + chunk) > MAX_CHARACTERS:
            sub_texts.append(sub_text.strip())
            sub_text = ""
        
        sub_text += "." + chunk

    if sub_text:
        sub_texts.append(sub_text.strip())
    return sub_texts

def get_audio_array_from_text(text):
    res = np.array([], dtype=np.float32)
    
    for sub_text in split_text(text):
        text_inputs = processor(text=sub_text, src_lang="dan", return_tensors="pt")
        audio_array_from_text = model.generate(**text_inputs, tgt_lang="dan")[0].numpy().squeeze()
        res = np.concatenate((res, audio_array_from_text))

    return res