File size: 1,489 Bytes
3b77fca |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
| language | thumbnail | tags | license | datasets | metrics |
| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
| English-Greek | lighteternal/SSE-TUC-mt-en-el-cased | NTM, EL-EN | Apache2 |Opus, CC-Matrix |BLEU, chrF |
# English to Greek NMT from Hellenic Army Academy (SSE) and Technical University of Crete (TUC)
## Model description
Trained using the Fairseq framework, transformer_iwslt_de_en architecture.\
BPE segmentation (20k codes).\
Mixed-case model. \
#### How to use
```
from transformers import FSMTTokenizer, FSMTForConditionalGeneration
mname = " <your_downloaded_model_folderpath_here> "
tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)
text = " Katerina, is the best name for a girl."
encoded = tokenizer.encode(text, return_tensors='pt')
outputs = model.generate(encoded, num_beams=5, num_return_sequences=5, early_stopping=True)
for i, output in enumerate(outputs):
i += 1
print(f"{i}: {output.tolist()}")
decoded = tokenizer.decode(output, skip_special_tokens=True)
print(f"{i}: {decoded}")
```
## Training data
Consolidated corpus from Opus and CC-Matrix (~6.6GB in total)
## Eval results
Results on Tatoeba testset (EN-EL):
| BLEU | chrF |
| ------ | ------ |
| 76.9 | 0.733 |
Results on XNLI parallel (EN-EL):
| BLEU | chrF |
| ------ | ------ |
| 65.4 | 0.624 |
### BibTeX entry and citation info
TODO
|