File size: 2,791 Bytes
da5cf0a 631c674 4df187f d071c0b e3d87b9 8ac4fd7 d2428fa 4df187f 8acfa48 2e2a137 8acfa48 7ecf8d2 4df187f 7ecf8d2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
license: cc-by-4.0
---
# Automatic Translation Alignment of Ancient Greek Texts
GRC-ALIGNMENT model is an XLM-RoBERTa-based model, fine-tuned for automatic multilingual text alignment at the word level.
The model is trained on 12 million monolingual ancient Greek tokens with Masked Language Model (MLM) training objective. Further, the model is fine-tuned on 45k parallel sentences, mainly in ancient Greek-English, Greek-Latin, and Greek-Georgian.
### Multilingual Training Dataset
| Languages |Sentences | Source |
|:---------------------------------------|:-----------:|:--------------------------------------------------------------------------------|
| GRC-ENG | 32.500 | Perseus Digital Library (Iliad, Odyssey, Xenophon, New Testament) |
| GRC-LAT | 8.200 | [Digital Fragmenta Historicorum Graecorum project](https://www.dfhg-project.org/) |
| GRC-KAT <br>GRC-ENG <br>GRC-LAT<br>GRC-ITA<br>GRC-POR | 4.000 | [UGARIT Translation Alignment Editor](https://ugarit.ialigner.com/ ) |
### Model Performance
| Languages | Alignment Error Rate |
|:---------:|:--------------------:|
| GRC-ENG | 19.73% (IterMax) |
| GRC-POR | 23.91% (IterMax) |
| GRC-LAT | 10.60% (ArgMax) |
The gold standard datasets are available on [Github](https://github.com/UgaritAlignment/Alignment-Gold-Standards).
If you use this model, please cite our papers:
<pre>
@InProceedings{yousef-EtAl:2022:LREC,
author = {Yousef, Tariq and Palladino, Chiara and Shamsian, Farnoosh and d’Orange Ferreira, Anise and Ferreira dos Reis, Michel},
title = {An automatic model and Gold Standard for translation alignment of Ancient Greek},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
month = {June},
year = {2022},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {5894--5905},
url = {https://aclanthology.org/2022.lrec-1.634}
}
@InProceedings{yousef-EtAl:2022:LT4HALA2022,
author = {Yousef, Tariq and Palladino, Chiara and Wright, David J. and Berti, Monica},
title = {Automatic Translation Alignment for Ancient Greek and Latin},
booktitle = {Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages},
month = {June},
year = {2022},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {101--107},
url = {https://aclanthology.org/2022.lt4hala2022-1.14}
}
</pre> |