File size: 386 Bytes
0e4e895
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
# Training notebooks for simple Latin BERT uncased

These notebooks and scripts include the code to train this Masked Language Model and its tokenizer, from scratch.

The notebooks should be ready to execute in any computer with a GPU, with minimal changes.


Note: The scripts will create a file `03_full_corpus.txt` with the combination of all the corpora into a single raw text file.