|
--- |
|
license: cc-by-nc-3.0 |
|
language: |
|
- da |
|
pipeline_tag: fill-mask |
|
tags: |
|
- bert |
|
- danish |
|
widget: |
|
- text: Hvide blodlegemer beskytter kroppen mod [MASK] |
|
--- |
|
|
|
|
|
# Danish medical BERT |
|
|
|
MeDa-BERT was initialized with weights from a [pretrained Danish BERT model](https://huggingface.co/Maltehb/danish-bert-botxo) and pretrained for 48 epochs using the MLM objective on a Danish medical corpus of 123M tokens. |
|
|
|
The development of the corpus and model is described further in [this paper](https://aclanthology.org/2023.nodalida-1.31/). |
|
|
|
Here is an example on how to load the model in PyTorch using the [🤗Transformers](https://github.com/huggingface/transformers) library: |
|
|
|
|
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForMaskedLM |
|
tokenizer = AutoTokenizer.from_pretrained("indsigt-ai/MeDa-BERT") |
|
model = AutoModelForMaskedLM.from_pretrained("indsigt-ai/MeDa-BERT") |
|
``` |
|
|
|
### Citing |
|
|
|
``` |
|
@inproceedings{pedersen-etal-2023-meda, |
|
title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model", |
|
author = "Pedersen, Jannik and |
|
Laursen, Martin and |
|
Vinholt, Pernille and |
|
Savarimuthu, Thiusius Rajeeth", |
|
booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)", |
|
month = may, |
|
year = "2023", |
|
address = "T{\'o}rshavn, Faroe Islands", |
|
publisher = "University of Tartu Library", |
|
url = "https://aclanthology.org/2023.nodalida-1.31", |
|
pages = "301--307", |
|
} |
|
``` |