Papadopoulos, Dimitris
Added El2EN model
3b77fca
|
raw
history blame
1.49 kB
language thumbnail tags license datasets metrics
English-Greek lighteternal/SSE-TUC-mt-en-el-cased NTM, EL-EN Apache2 Opus, CC-Matrix BLEU, chrF

English to Greek NMT from Hellenic Army Academy (SSE) and Technical University of Crete (TUC)

Model description

Trained using the Fairseq framework, transformer_iwslt_de_en architecture.
BPE segmentation (20k codes).
Mixed-case model. \

How to use

from transformers import FSMTTokenizer, FSMTForConditionalGeneration

mname = " <your_downloaded_model_folderpath_here> "

tokenizer = FSMTTokenizer.from_pretrained(mname)
model = FSMTForConditionalGeneration.from_pretrained(mname)

text = " Katerina, is the best name for a girl."

encoded = tokenizer.encode(text, return_tensors='pt')

outputs = model.generate(encoded, num_beams=5, num_return_sequences=5, early_stopping=True)
for i, output in enumerate(outputs):
    i += 1
    print(f"{i}: {output.tolist()}")
    
    decoded = tokenizer.decode(output, skip_special_tokens=True)
    print(f"{i}: {decoded}")

Training data

Consolidated corpus from Opus and CC-Matrix (~6.6GB in total)

Eval results

Results on Tatoeba testset (EN-EL):

BLEU chrF
76.9 0.733

Results on XNLI parallel (EN-EL):

BLEU chrF
65.4 0.624

BibTeX entry and citation info

TODO