File size: 6,865 Bytes
2cd6d6c 6eeef49 2cd6d6c 6eeef49 f3c3836 ca79eb5 f3c3836 dee8152 82965f1 975b299 dee8152 9b59ea9 dee8152 aad8aed 5a49c01 aad8aed 9b59ea9 76c0d61 9b59ea9 82965f1 975b299 9b59ea9 dee8152 9b59ea9 aad8aed 5a49c01 dee8152 2cd6d6c 5108f13 2cd6d6c 5108f13 2cd6d6c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 |
---
language:
- ru
- zh
- en
tags:
- translation
- text2text-generation
- t5
license: apache-2.0
datasets:
- ccmatrix
metrics:
- sacrebleu
widget:
- example_title: translate zh-ru
text: >
translate to ru: 开发的目的是为用户提供个人同步翻译。
- example_title: translate ru-en
text: >
translate to en: Цель разработки — предоставить пользователям личного синхронного переводчика.
- example_title: translate en-ru
text: >
translate to ru: The purpose of the development is to provide users with a personal synchronized interpreter.
- example_title: translate en-zh
text: >
translate to zh: The purpose of the development is to provide users with a personal synchronized interpreter.
- example_title: translate zh-en
text: >
translate to en: 开发的目的是为用户提供个人同步解释器。
- example_title: translate ru-zh
text: >
translate to zh: Цель разработки — предоставить пользователям личного синхронного переводчика.
model-index:
- name: utrobinmv/t5_translate_en_ru_zh_base_200
results:
- task:
type: translation
name: Translation en-ru
dataset:
name: ntrex_en-ru
type: ntrex
config: ntrex en-ru
split: test
metrics:
- type: sacrebleu
value: 28.575940911021487
name: bleu
verified: false
- type: chrf
value: 54.27996346886896
name: chrf
verified: false
- type: ter
value: 62.494863914873584
name: ter
verified: false
- type: meteor
value: 0.5174833677740809
name: meteor
verified: false
- type: rouge
value: 0.1908317951570274
name: ROUGE-1
verified: false
- type: rouge
value: 0.065555552204933
name: ROUGE-2
verified: false
- type: rouge
value: 0.1895542893295215
name: ROUGE-L
verified: false
- type: rouge
value: 0.1893813749889601
name: ROUGE-LSUM
verified: false
- type: bertscore
value: 0.8554933660030365
name: bertscore_f1
verified: false
- type: bertscore
value: 0.8578473615646363
name: bertscore_precision
verified: false
- type: bertscore
value: 0.8534188346862793
name: bertscore_recall
verified: false
source:
name: NTREX dataset Benchmark
url: https://huggingface.co/spaces/utrobinmv/TREX_benchmark_en_ru_zh
- name: utrobinmv/t5_translate_en_ru_zh_base_200
results:
- task:
type: translation
name: Translation ru-en
dataset:
name: ntrex_ru-en
type: ntrex
config: ntrex ru-en
split: test
metrics:
- type: sacrebleu
value: 28.575940911021487
name: bleu
verified: false
- type: chrf
value: 54.27996346886896
name: chrf
verified: false
- type: ter
value: 62.494863914873584
name: ter
verified: false
- type: meteor
value: 0.5174833677740809
name: meteor
verified: false
- type: rouge
value: 0.1908317951570274
name: ROUGE-1
verified: false
- type: rouge
value: 0.065555552204933
name: ROUGE-2
verified: false
- type: rouge
value: 0.1895542893295215
name: ROUGE-L
verified: false
- type: rouge
value: 0.1893813749889601
name: ROUGE-LSUM
verified: false
- type: bertscore
value: 0.8554933660030365
name: bertscore_f1
verified: false
- type: bertscore
value: 0.8578473615646363
name: bertscore_precision
verified: false
- type: bertscore
value: 0.8534188346862793
name: bertscore_recall
verified: false
source:
name: NTREX dataset Benchmark
url: https://huggingface.co/spaces/utrobinmv/TREX_benchmark_en_ru_zh
---
# T5 English, Russian and Chinese multilingual machine translation
This model represents a conventional T5 transformer in multitasking mode for translation into the required language, precisely configured for machine translation for pairs: ru-zh, zh-ru, en-zh, zh-en, en-ru, ru-en.
The model can perform direct translation between any pair of Russian, Chinese or English languages. For translation into the target language, the target language identifier is specified as a prefix 'translate to <lang>:'. In this case, the source language may not be specified, in addition, the source text may be multilingual.
Example translate Russian to Chinese
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to zh: '
src_text = prefix + "Цель разработки — предоставить пользователям личного синхронного переводчика."
# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#开发的目的是为用户提供个人同步翻译。
```
and Example translate Chinese to Russian
```python
from transformers import T5ForConditionalGeneration, T5Tokenizer
model_name = 'utrobinmv/t5_translate_en_ru_zh_small_1024'
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)
prefix = 'translate to ru: '
src_text = prefix + "开发的目的是为用户提供个人同步翻译。"
# translate Russian to Chinese
input_ids = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(**input_ids)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#Цель разработки - предоставить пользователям персональный синхронный перевод.
```
##
## Languages covered
Russian (ru_RU), Chinese (zh_CN), English (en_US)
|