Training notebooks for simple Latin BERT uncased
These notebooks and scripts include the code to train this Masked Language Model and its tokenizer, from scratch.
The notebooks should be ready to execute in any computer with a GPU, with minimal changes.
Note: The scripts will create a file 03_full_corpus.txt
with the combination of all the corpora into a single raw text file.