File size: 2,459 Bytes
07c11b1
8e2122c
 
 
 
 
07c11b1
8e2122c
534b38c
8e2122c
 
 
07c11b1
534b38c
 
 
 
 
 
 
 
 
 
5554ca8
534b38c
 
 
 
 
 
 
 
 
6c30fee
534b38c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6c30fee
534b38c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15238ac
534b38c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
---
language:
- ru
- zh
tags:
- translation
license: mit
datasets:
- ccmatrix
- joefox/newstest-2017-2019-ru_zh
metrics:
- sacrebleu
---

# mBART-large Russian to Chinese and  Chinese to Russian multilingual machine translation

This model is a fine-tuned checkpoint of [mBART-large-50-many-to-many-mmt](https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt). `joefox/mbart-large-ru-zh-ru-many-to-many-mmt` is fine-tuned for ru-zh and zh-ru machine translation. 




The model can translate directly between any pair of Russian and Chinese languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the `forced_bos_token_id` parameter to the `generate` method..

This [post in Russian](https://habr.com/ru/post/721330/) gives more details.



Example translate Russian to Chinese

```python
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

model = MBartForConditionalGeneration.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")

src_text = "Съешь ещё этих мягких французских булок."

# translate Russian to Chinese
tokenizer.src_lang = "ru_RU"
encoded_ru = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(
    **encoded_ru,
    forced_bos_token_id=tokenizer.lang_code_to_id["zh_CN"]
)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)

print(result)
#再吃一些法国面包。
```

and Example translate Chinese to Russian



```python
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast

model = MBartForConditionalGeneration.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")

src_text = "吃一些软的法国面包。"
# translate Chinese to Russian
tokenizer.src_lang = "zh_CN"
encoded_zh = tokenizer(src_text, return_tensors="pt")
generated_tokens = model.generate(
    **encoded_zh,
    forced_bos_token_id=tokenizer.lang_code_to_id["ru_RU"]
)
result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
print(result)
#Ешьте немного французского хлеба.
```

## Languages covered

Russian (ru_RU), Chinese (zh_CN)