Danish medical BERT

MeDa-BERT was initialized with weights from a pretrained Danish BERT model and pretrained for 48 epochs using the MLM objective on a Danish medical corpus of 123M tokens.

The development of the corpus and model is described further in this paper.

Here is an example on how to load the model in PyTorch using the 🤗Transformers library:

from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("indsigt-ai/MeDa-BERT")
model = AutoModelForMaskedLM.from_pretrained("indsigt-ai/MeDa-BERT")

Citing

@inproceedings{pedersen-etal-2023-meda,
    title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model",
    author = "Pedersen, Jannik  and
      Laursen, Martin  and
      Vinholt, Pernille  and
      Savarimuthu, Thiusius Rajeeth",
    booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
    month = may,
    year = "2023",
    address = "T{\'o}rshavn, Faroe Islands",
    publisher = "University of Tartu Library",
    url = "https://aclanthology.org/2023.nodalida-1.31",
    pages = "301--307",
}