File size: 1,215 Bytes

40c9ee1

---
license: cc-by-nc-3.0
language:
- da
tags:
- word embeddings
- Danish
---
# Danish medical word embeddings

MeDa-We was trained on a Danish medical corpus of 123M tokens. The word embeddings are 300-dimensional and are trained using [FastText](https://fasttext.cc/).

The embeddings were trained for 10 epochs using a window size of 5 and 10 negative samples.

The development of the corpus and word embeddings is described further in our [paper](https://aclanthology.org/2023.nodalida-1.31/). 

We also trained a transformer model on the developed corpus which can be found [here](https://huggingface.co/jannikskytt/MeDa-Bert).

### Citing

```
@inproceedings{pedersen-etal-2023-meda,
    title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model",
    author = "Pedersen, Jannik  and
      Laursen, Martin  and
      Vinholt, Pernille  and
      Savarimuthu, Thiusius Rajeeth",
    booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
    month = may,
    year = "2023",
    address = "T{\'o}rshavn, Faroe Islands",
    publisher = "University of Tartu Library",
    url = "https://aclanthology.org/2023.nodalida-1.31",
    pages = "301--307",
}
```