File size: 1,489 Bytes
3b77fca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59


| language | thumbnail  | tags | license | datasets | metrics |
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | 
| English-Greek  | lighteternal/SSE-TUC-mt-en-el-cased  | NTM, EL-EN  | Apache2 |Opus, CC-Matrix |BLEU, chrF |

# English to Greek NMT from Hellenic Army Academy (SSE) and Technical University of Crete (TUC)

## Model description

Trained using the Fairseq framework, transformer_iwslt_de_en architecture.\
BPE segmentation (20k codes).\
Mixed-case model. \

#### How to use

```
from transformers import FSMTTokenizer, FSMTForConditionalGeneration

mname = " <your_downloaded_model_folderpath_here> "

tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)

text = " Katerina, is the best name for a girl."

encoded = tokenizer.encode(text, return_tensors='pt')

outputs = model.generate(encoded, num_beams=5, num_return_sequences=5, early_stopping=True)
for i, output in enumerate(outputs):
    i += 1
    print(f"{i}: {output.tolist()}")
    
    decoded = tokenizer.decode(output, skip_special_tokens=True)
    print(f"{i}: {decoded}")
```


## Training data

Consolidated corpus from Opus and CC-Matrix (~6.6GB in total)


## Eval results

Results on Tatoeba testset (EN-EL): 
| BLEU | chrF  |
| ------ | ------ |
| 76.9 |  0.733 |


Results on XNLI parallel (EN-EL): 
| BLEU | chrF  |
| ------ | ------ |
| 65.4 |  0.624 |

### BibTeX entry and citation info
TODO