File size: 1,215 Bytes
40c9ee1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
---
license: cc-by-nc-3.0
language:
- da
tags:
- word embeddings
- Danish
---
# Danish medical word embeddings
MeDa-We was trained on a Danish medical corpus of 123M tokens. The word embeddings are 300-dimensional and are trained using [FastText](https://fasttext.cc/).
The embeddings were trained for 10 epochs using a window size of 5 and 10 negative samples.
The development of the corpus and word embeddings is described further in our [paper](https://aclanthology.org/2023.nodalida-1.31/).
We also trained a transformer model on the developed corpus which can be found [here](https://huggingface.co/jannikskytt/MeDa-Bert).
### Citing
```
@inproceedings{pedersen-etal-2023-meda,
title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model",
author = "Pedersen, Jannik and
Laursen, Martin and
Vinholt, Pernille and
Savarimuthu, Thiusius Rajeeth",
booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
month = may,
year = "2023",
address = "T{\'o}rshavn, Faroe Islands",
publisher = "University of Tartu Library",
url = "https://aclanthology.org/2023.nodalida-1.31",
pages = "301--307",
}
``` |