language: da
tags:
- danish
- bert
- masked-lm
- botxo
license: cc-by-4.0
datasets:
- common_crawl
- wikipedia
- dindebat.dk
- hestenettet.dk
- danish_OpenSubtitles
widget:
- text: Chili Jensen, som bor på Danmarksgade 12, køber chilifrugter fra Netto.
Danish BERT (version 2, uncased) by Certainly (previously known as BotXO) finetuned for Named Entity Recognition on the DaNE dataset (Hvingelby et al., 2020) by Malte Højmark-Bertelsen.
Humongous amounts of credit needs to go to Certainly (previously known as BotXO), for pretraining the Danish BERT. For data and training details see their GitHub repository or this article. You can also visit their organization page on Hugging Face.
It is both available in TensorFlow and Pytorch format. The original TensorFlow version can be downloaded using this link.
Here is an example on how to load Danish BERT for token classification in PyTorch using the 🤗Transformers library:
from transformers import AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("Maltehb/danish-bert-botxo-ner-dane")
model = AutoModelForTokenClassification.from_pretrained("Maltehb/danish-bert-botxo-ner-dane")
References
Danish BERT. (2020). BotXO. https://github.com/botxo/nordic_bert (Original work published 2019)
Hvingelby, R., Pauli, A. B., Barrett, M., Rosted, C., Lidegaard, L. M., & Søgaard, A. (2020). DaNE: A Named Entity Resource for Danish. Proceedings of the 12th Language Resources and Evaluation Conference, 4597–4604. https://www.aclweb.org/anthology/2020.lrec-1.565
Contact
For help or further information feel free to connect with the author Malte Højmark-Bertelsen on hjb@kmd.dk or any of the following platforms: