File size: 1,214 Bytes
40c9ee1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12a0988
40c9ee1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
---
license: cc-by-nc-3.0
language:
- da
tags:
- word embeddings
- Danish
---
# Danish medical word embeddings

MeDa-We was trained on a Danish medical corpus of 123M tokens. The word embeddings are 300-dimensional and are trained using [FastText](https://fasttext.cc/).

The embeddings were trained for 10 epochs using a window size of 5 and 10 negative samples.

The development of the corpus and word embeddings is described further in our [paper](https://aclanthology.org/2023.nodalida-1.31/). 

We also trained a transformer model on the developed corpus which can be found [here](https://huggingface.co/indsigt-ai/MeDa-BERT).

### Citing

```
@inproceedings{pedersen-etal-2023-meda,
    title = "{M}e{D}a-{BERT}: A medical {D}anish pretrained transformer model",
    author = "Pedersen, Jannik  and
      Laursen, Martin  and
      Vinholt, Pernille  and
      Savarimuthu, Thiusius Rajeeth",
    booktitle = "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)",
    month = may,
    year = "2023",
    address = "T{\'o}rshavn, Faroe Islands",
    publisher = "University of Tartu Library",
    url = "https://aclanthology.org/2023.nodalida-1.31",
    pages = "301--307",
}
```