metadata
language:
- ru
- zh
- en
tags:
- translation
- text2text-generation
- t5
license: apache-2.0
datasets:
- ccmatrix
metrics:
- sacrebleu
widget:
- example_title: translate zh-ru
text: |
translate to ru: 开发的目的是为用户提供个人同步翻译。
- example_title: translate ru-en
text: >
translate to en: Цель разработки — предоставить пользователям личного
синхронного переводчика.
- example_title: translate en-ru
text: >
translate to ru: The purpose of the development is to provide users with a
personal synchronized interpreter.
- example_title: translate en-zh
text: >
translate to zh: The purpose of the development is to provide users with a
personal synchronized interpreter.
- example_title: translate zh-en
text: |
translate to en: 开发的目的是为用户提供个人同步解释器。
- example_title: translate ru-zh
text: >
translate to zh: Цель разработки — предоставить пользователям личного
синхронного переводчика.
model-index:
- name: utrobinmv/t5_translate_en_ru_zh_base_200
results:
- task:
type: translation
name: Translation en-ru
dataset:
name: ntrex_en-ru
type: ntrex
config: ntrex en-ru
split: test
metrics:
- type: sacrebleu
value: 28.575940911021487
name: bleu
verified: false
- type: chrf
value: 54.27996346886896
name: chrf
verified: false
- type: ter
value: 62.494863914873584
name: ter
verified: false
- type: meteor
value: 0.5174833677740809
name: meteor
verified: false
- type: rouge
value: 0.1908317951570274
name: ROUGE-1
verified: false
- type: rouge
value: 0.065555552204933
name: ROUGE-2
verified: false
- type: rouge
value: 0.1895542893295215
name: ROUGE-L
verified: false
- type: rouge
value: 0.1893813749889601
name: ROUGE-LSUM
verified: false
- type: bertscore
value: 0.8554933660030365
name: bertscore_f1
verified: false
- type: bertscore
value: 0.8578473615646363
name: bertscore_precision
verified: false
- type: bertscore
value: 0.8534188346862793
name: bertscore_recall
verified: false
source:
name: NTREX dataset Benchmark
url: https://huggingface.co/spaces/utrobinmv/TREX_benchmark_en_ru_zh
- name: utrobinmv/t5_translate_en_ru_zh_base_200
results:
- task:
type: translation
name: Translation ru-en
dataset:
name: ntrex_ru-en
type: ntrex
config: ntrex ru-en
split: test
metrics:
- type: sacrebleu
value: 28.575940911021487
name: bleu
verified: false
- type: chrf
value: 54.27996346886896
name: chrf
verified: false
- type: ter
value: 62.494863914873584
name: ter
verified: false
- type: meteor
value: 0.5174833677740809
name: meteor
verified: false
- type: rouge
value: 0.1908317951570274
name: ROUGE-1
verified: false
- type: rouge
value: 0.065555552204933
name: ROUGE-2
verified: false
- type: rouge
value: 0.1895542893295215
name: ROUGE-L
verified: false
- type: rouge
value: 0.1893813749889601
name: ROUGE-LSUM
verified: false
- type: bertscore
value: 0.8554933660030365
name: bertscore_f1
verified: false
- type: bertscore
value: 0.8578473615646363
name: bertscore_precision
verified: false
- type: bertscore
value: 0.8534188346862793
name: bertscore_recall
verified: false
source:
name: NTREX dataset Benchmark
url: https://huggingface.co/spaces/utrobinmv/TREX_benchmark_en_ru_zh
T5 English, Russian and Chinese multilingual machine translation
This model represents a conventional T5 transformer in multitasking mode for translation into the required language, precisely configured for machine translation for pairs: ru-zh, zh-ru, en-zh, zh-en, en-ru, ru-en.
The model can perform direct translation between any pair of Russian, Chinese or English languages. For translation into the target language, the target language identifier is specified as a prefix 'translate to :'. In this case, the source language may not be specified, in addition, the source text may be multilingual.
Example translate Russian to Chinese
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to zh: '
src_text = prefix + "Цель разработки — предоставить пользователям личного синхронного переводчика."
# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#开发的目的是为用户提供个人同步翻译。
and Example translate Chinese to Russian
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to ru: '
src_text = prefix + "开发的目的是为用户提供个人同步翻译。"
# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#Цель разработки - предоставить пользователям персональный синхронный перевод.
Languages covered
Russian (ru_RU), Chinese (zh_CN), English (en_US)