File size: 2,791 Bytes
da5cf0a
 
 
631c674
4df187f
 
d071c0b
 
e3d87b9
8ac4fd7
d2428fa
 
4df187f
 
8acfa48
 
 
 
2e2a137
 
8acfa48
 
 
7ecf8d2
4df187f
7ecf8d2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: cc-by-4.0
---
# Automatic Translation Alignment of Ancient Greek Texts
GRC-ALIGNMENT model is an XLM-RoBERTa-based model, fine-tuned for automatic multilingual text alignment at the word level. 
The model is trained on 12 million monolingual ancient Greek tokens with Masked Language Model (MLM) training objective. Further, the model is fine-tuned on 45k parallel sentences, mainly in ancient Greek-English, Greek-Latin, and Greek-Georgian.

### Multilingual Training Dataset
|                Languages                |Sentences |                                      Source                                      |
|:---------------------------------------|:-----------:|:--------------------------------------------------------------------------------|
| GRC-ENG                                 |      32.500 | Perseus Digital Library (Iliad, Odyssey, Xenophon, New Testament)                |
| GRC-LAT                                 |       8.200 | [Digital Fragmenta Historicorum Graecorum project](https://www.dfhg-project.org/) |
| GRC-KAT <br>GRC-ENG <br>GRC-LAT<br>GRC-ITA<br>GRC-POR |       4.000 | [UGARIT Translation Alignment Editor](https://ugarit.ialigner.com/ )             |

### Model Performance
| Languages | Alignment Error Rate |
|:---------:|:--------------------:|
| GRC-ENG   |     19.73% (IterMax) |
| GRC-POR   |     23.91% (IterMax) |
| GRC-LAT   |      10.60% (ArgMax) |

The gold standard datasets are available on [Github](https://github.com/UgaritAlignment/Alignment-Gold-Standards).

If you use this model, please cite our papers:
<pre>
@InProceedings{yousef-EtAl:2022:LREC,
  author    = {Yousef, Tariq  and  Palladino, Chiara  and  Shamsian, Farnoosh  and  d’Orange Ferreira, Anise  and  Ferreira dos Reis, Michel},
  title     = {An automatic model and Gold Standard for translation alignment of Ancient Greek},
  booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {5894--5905},
  url       = {https://aclanthology.org/2022.lrec-1.634}
}

@InProceedings{yousef-EtAl:2022:LT4HALA2022,
  author    = {Yousef, Tariq  and  Palladino, Chiara  and  Wright, David J.  and  Berti, Monica},
  title     = {Automatic Translation Alignment for Ancient Greek and Latin},
  booktitle      = {Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages     = {101--107},
  url       = {https://aclanthology.org/2022.lt4hala2022-1.14}
}

</pre>