model checkpoints
Browse files- README.md +57 -0
- mmt_translation.pt +3 -0
README.md
ADDED
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# MMTAfrica
|
2 |
+
[Paper](https://aclanthology.org/2021.wmt-1.48/) - [Installation](#installation) - [Example](#example) - [Model checkpoint](#model-checkpoint) - [Citation](#citation)
|
3 |
+
|
4 |
+
|
5 |
+
This repository contains the official implementation of the MMTAfrica paper ([Emezue & Dossou, WMT 2021](https://aclanthology.org/2021.wmt-1.48/)).
|
6 |
+
|
7 |
+
We focus on the task of multilingual machine translation for African languages in the 2021 WMT Shared Task: Large-Scale Multilingual Machine Translation. We introduce MMTAfrica, the first many-to-many multilingual translation system for six African languages: Fon (fon), Igbo (ibo), Kinyarwanda (kin), Swahili/Kiswahili (swa), Xhosa (xho), and Yoruba (yor) and two non-African languages: English (eng) and French (fra). For multilingual translation concerning African languages, we introduce a novel backtranslation and reconstruction objective, BT\&REC, inspired by the random online back translation and T5 modeling framework respectively, to effectively leverage monolingual data. Additionally, we report improvements from MMTAfrica over the FLORES 101 benchmarks (spBLEU gains ranging from +0.58 in Swahili to French to +19.46 in French to Xhosa).
|
8 |
+
|
9 |
+
## Installation
|
10 |
+
To avoid any conflict with your existing Python setup, we suggest to work in a virtual environment:
|
11 |
+
```
|
12 |
+
python -m venv mmtenv
|
13 |
+
source mmtenv/bin/activate
|
14 |
+
```
|
15 |
+
|
16 |
+
Follow these instructions to install MMTAfrica.
|
17 |
+
```
|
18 |
+
git clone https://github.com/edaiofficial/mmtafrica.git
|
19 |
+
cd mmtafrica
|
20 |
+
pip install -r requirements.txt
|
21 |
+
```
|
22 |
+
|
23 |
+
## Example
|
24 |
+
```bash
|
25 |
+
python mmtafrica.py
|
26 |
+
```
|
27 |
+
Consult the arguments [here](https://github.com/edaiofficial/mmtafrica/blob/main/mmtafrica.py#L772-L860).
|
28 |
+
|
29 |
+
### Reproducing our paper
|
30 |
+
Our data for the paper experiments is stored in the `/experiments` folder. To train MMTAfrica from scratch and reproduce our experiemnts, using the data we have in `/experiments`, run
|
31 |
+
```bash
|
32 |
+
cd experiments
|
33 |
+
python ../mmtafrica.py --model_name='mmtafrica' --homepath="<YOUR HOMEPATH>"
|
34 |
+
```
|
35 |
+
By default, homepath is the current working directory when you run the code.
|
36 |
+
|
37 |
+
# Model checkpoint
|
38 |
+
Our model checkpoints is saved [here](https://drive.google.com/file/d/1gUINHLRQC06HGGeP211-x3IIr3WS84Iy/view?usp=sharing).
|
39 |
+
|
40 |
+
|
41 |
+
## Citation
|
42 |
+
```
|
43 |
+
@inproceedings{emezue-dossou-2021-mmtafrica,
|
44 |
+
title = "{MMTA}frica: Multilingual Machine Translation for {A}frican Languages",
|
45 |
+
author = "Emezue, Chris Chinenye and
|
46 |
+
Dossou, Bonaventure F. P.",
|
47 |
+
booktitle = "Proceedings of the Sixth Conference on Machine Translation",
|
48 |
+
month = nov,
|
49 |
+
year = "2021",
|
50 |
+
address = "Online",
|
51 |
+
publisher = "Association for Computational Linguistics",
|
52 |
+
url = "https://aclanthology.org/2021.wmt-1.48",
|
53 |
+
pages = "398--411",
|
54 |
+
abstract = "In this paper, we focus on the task of multilingual machine translation for African languages and describe our contribution in the 2021 WMT Shared Task: Large-Scale Multilingual Machine Translation. We introduce MMTAfrica, the first many-to-many multilingual translation system for six African languages: Fon (fon), Igbo (ibo), Kinyarwanda (kin), Swahili/Kiswahili (swa), Xhosa (xho), and Yoruba (yor) and two non-African languages: English (eng) and French (fra). For multilingual translation concerning African languages, we introduce a novel backtranslation and reconstruction objective, BT{\&}REC, inspired by the random online back translation and T5 modelling framework respectively, to effectively leverage monolingual data. Additionally, we report improvements from MMTAfrica over the FLORES 101 benchmarks (spBLEU gains ranging from +0.58 in Swahili to French to +19.46 in French to Xhosa).",
|
55 |
+
}
|
56 |
+
```
|
57 |
+
|
mmt_translation.pt
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f625c6b29607333df7b65d9ca693d5b89a5e724b1263bc4f5151938b07a4917b
|
3 |
+
size 2329707789
|