DistilRoBERTa (base) Middle High German Charter Masked Language Model

This model is a fine-tuned version of distilroberta-base on Middle High German (gmh; ISO 639-2; c. 1050–1500) charters of the monasterium.net data set.

Model description

Please refer this model together with to the distilroberta (base-sized model) card or the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Sanh et al. for additional information.

Intended uses & limitations

This model can be used for sequence prediction tasks, i.e., fill-masks.

Training and evaluation data

The model was fine-tuned using the Middle High German Monasterium charters. It was trained on a NVIDIA GeForce GTX 1660 Ti 6GB GPU.

Training hyperparameters

The following hyperparameters were used during training:

num_train_epochs: 10
learning_rate: 2e-5
weight-decay: 0,01
train_batch_size: 8
eval_batch_size: 8
num_proc: 4
block_size: 256

Training results

Epoch	Training Loss	Validation Loss
1	2.537000	2.112094
2	2.053400	1.838937
3	1.900300	1.706654
4	1.766200	1.607970
5	1.669200	1.532340
6	1.619100	1.490333
7	1.571300	1.476035
8	1.543100	1.428958
9	1.517100	1.423216
10	1.508300	1.408235

Perplexity: 4.07

Updates

2023-03-30: Upload

Citation

Please cite as follows when using this model.

@misc{distilroberta-base-mhg-charter-mlm,
  title={distilroberta-base-mhg-charter-mlm},
  author={Atzenhofer-Baumgartner, Florian},
  year         = { 2023 },
  url          = { https://huggingface.co/atzenhofer/distilroberta-base-mhg-charter-mlm },
  publisher    = { Hugging Face }
}