joefox commited on
Commit
534b38c
1 Parent(s): 8e2122c

readme update

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md CHANGED
@@ -6,7 +6,72 @@ tags:
6
  - translation
7
  license: mit
8
  datasets:
 
9
  - joefox/newstest-2017-2019-ru_zh
10
  metrics:
11
  - sacrebleu
12
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - translation
7
  license: mit
8
  datasets:
9
+ - ccmatrix
10
  - joefox/newstest-2017-2019-ru_zh
11
  metrics:
12
  - sacrebleu
13
  ---
14
+
15
+ # mBART-large Russian to Chinese and Chinese to Russian multilingual machine translation
16
+
17
+ This model is a fine-tuned checkpoint of [mBART-large-50-many-to-many-mmt](https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt). `joefox/mbart-large-ru-zh-ru-many-to-many-mmt` is fine-tuned for ru-zh and zh-ru machine translation.
18
+
19
+
20
+
21
+
22
+ The model can translate directly between any pair of Russian and Chinese languages. To translate into a target language, the target language id is forced as the first generated token. To force the target language id as the first generated token, pass the `forced_bos_token_id` parameter to the `generate` method..
23
+
24
+ This post in Russian gives more details.
25
+
26
+
27
+
28
+ Example translate Russian to Chinese
29
+
30
+ ```python
31
+ from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
32
+
33
+ model = MBartForConditionalGeneration.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")
34
+ tokenizer = MBart50TokenizerFast.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt", src_lang="ru_RU", tgt_lang="zh_CN")
35
+
36
+ src_text = "Съешь ещё этих мягких французских булок."
37
+
38
+ # translate Russian to Chinese
39
+ tokenizer.src_lang = "ru_RU"
40
+ encoded_ru = tokenizer(src_text, return_tensors="pt")
41
+ generated_tokens = model.generate(
42
+ **encoded_ru,
43
+ forced_bos_token_id=tokenizer.lang_code_to_id["zh_CN"]
44
+ )
45
+ result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
46
+
47
+ print(result)
48
+ #再吃一些法国面包。
49
+ ```
50
+
51
+ and Example translate Chinese to Russian
52
+
53
+
54
+
55
+ ```python
56
+ from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
57
+
58
+ model = MBartForConditionalGeneration.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt")
59
+ tokenizer = MBart50TokenizerFast.from_pretrained("joefox/mbart-large-ru-zh-ru-many-to-many-mmt", src_lang="ru_RU", tgt_lang="zh_CN")
60
+
61
+ src_text = "吃一些软的法国面包。"
62
+ # translate Chinese to Russian
63
+ tokenizer.src_lang = "zh_CN"
64
+ encoded_zh = tokenizer(src_text, return_tensors="pt")
65
+ generated_tokens = model.generate(
66
+ **encoded_zh,
67
+ forced_bos_token_id=tokenizer.lang_code_to_id["ru_RU"]
68
+ )
69
+ result = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
70
+ print(result)
71
+ #Ешьте немного французского хлеба.
72
+ ```
73
+
74
+ ## Languages covered
75
+
76
+ Russian (ru_RU), Sinhala (si_LK), Chinese (zh_CN)
77
+