jonfd/convbert-base-igc-is · Training scripts ?

Hi Daniel,

Happy to hear that the model performed so well on homograph classification. When pre-training the model, I followed Stefan Schweter's instructions:

https://github.com/stefan-it/turkish-bert/blob/master/convbert/CHEATSHEET.md
https://github.com/stefan-it/turkish-bert/blob/master/electra/CHEATSHEET.md

I used the pre-training script from the ConvBERT repository. Since the pre-training corpus (i.e., the Icelandic Gigaword Corpus) doesn't contain any web-crawled or noisy documents, I didn't perform any filtering or cleaning beforehand.

Best regards,
Jón