|
--- |
|
tags: |
|
- spacy |
|
- dacy |
|
- danish |
|
- token-classification |
|
- pos tagging |
|
- morphological analysis |
|
- lemmatization |
|
- dependency parsing |
|
- named entity recognition |
|
- coreference resolution |
|
- named entity linking |
|
- named entity disambiguation |
|
language: |
|
- da |
|
license: apache-2.0 |
|
model-index: |
|
- name: da_dacy_large_trf-0.2.0 |
|
results: |
|
- task: |
|
name: NER |
|
type: token-classification |
|
metrics: |
|
- name: NER Precision |
|
type: precision |
|
value: 0.8858195212 |
|
- name: NER Recall |
|
type: recall |
|
value: 0.8620071685 |
|
- name: NER F Score |
|
type: f_score |
|
value: 0.8737511353 |
|
dataset: |
|
name: DaNE |
|
split: test |
|
type: dane |
|
- task: |
|
name: TAG |
|
type: token-classification |
|
metrics: |
|
- name: TAG (XPOS) Accuracy |
|
type: accuracy |
|
value: 0.9913668347 |
|
dataset: |
|
name: UD Danish DDT |
|
split: test |
|
type: universal_dependencies |
|
config: da_ddt |
|
- task: |
|
name: POS |
|
type: token-classification |
|
metrics: |
|
- name: POS (UPOS) Accuracy |
|
type: accuracy |
|
value: 0.9908174469 |
|
dataset: |
|
name: UD Danish DDT |
|
split: test |
|
type: universal_dependencies |
|
config: da_ddt |
|
- task: |
|
name: MORPH |
|
type: token-classification |
|
metrics: |
|
- name: Morph (UFeats) Accuracy |
|
type: accuracy |
|
value: 0.9880227568 |
|
dataset: |
|
name: UD Danish DDT |
|
split: test |
|
type: universal_dependencies |
|
config: da_ddt |
|
- task: |
|
name: LEMMA |
|
type: token-classification |
|
metrics: |
|
- name: Lemma Accuracy |
|
type: accuracy |
|
value: 0.9589423796 |
|
dataset: |
|
name: UD Danish DDT |
|
split: test |
|
type: universal_dependencies |
|
config: da_ddt |
|
- task: |
|
name: UNLABELED_DEPENDENCIES |
|
type: token-classification |
|
metrics: |
|
- name: Unlabeled Attachment Score (UAS) |
|
type: f_score |
|
value: 0.9280885781 |
|
dataset: |
|
name: UD Danish DDT |
|
split: test |
|
type: universal_dependencies |
|
config: da_ddt |
|
- task: |
|
name: LABELED_DEPENDENCIES |
|
type: token-classification |
|
metrics: |
|
- name: Labeled Attachment Score (LAS) |
|
type: f_score |
|
value: 0.9079997669 |
|
dataset: |
|
name: UD Danish DDT |
|
split: test |
|
type: universal_dependencies |
|
config: da_ddt |
|
- task: |
|
name: SENTS |
|
type: token-classification |
|
metrics: |
|
- name: Sentences F-Score |
|
type: f_score |
|
value: 1.0 |
|
dataset: |
|
name: UD Danish DDT |
|
split: test |
|
type: universal_dependencies |
|
config: da_ddt |
|
- task: |
|
name: coreference-resolution |
|
type: coreference-resolution |
|
metrics: |
|
- name: LEA |
|
type: f_score |
|
value: 0.4672143289 |
|
dataset: |
|
name: DaCoref |
|
type: alexandrainst/dacoref |
|
split: custom |
|
- task: |
|
name: coreference-resolution |
|
type: coreference-resolution |
|
metrics: |
|
- name: Named entity Linking Precision |
|
type: precision |
|
value: 0.84 |
|
- name: Named entity Linking Recall |
|
type: recall |
|
value: 0.2153846154 |
|
- name: Named entity Linking F Score |
|
type: f_score |
|
value: 0.3428571429 |
|
dataset: |
|
name: DaNED |
|
type: named-entity-linking |
|
split: custom |
|
library_name: spacy |
|
datasets: |
|
- universal_dependencies |
|
- dane |
|
- alexandrainst/dacoref |
|
metrics: |
|
- accuracy |
|
--- |
|
|
|
<a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a> |
|
|
|
# DaCy large |
|
|
|
DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines. |
|
DaCy's largest pipeline has achieved State-of-the-Art performance on parts-of-speech tagging and dependency |
|
parsing for Danish on the Danish Dependency treebank as well as competitive performance on named entity recognition, named entity disambiguation and coreference resolution. |
|
To read more check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results. |
|
DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines. |
|
|
|
|
|
| Feature | Description | |
|
| --- | --- | |
|
| **Name** | `da_dacy_large_trf` | |
|
| **Version** | `0.2.0` | |
|
| **spaCy** | `>=3.5.2,<3.6.0` | |
|
| **Default Pipeline** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` | |
|
| **Components** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` | |
|
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) | |
|
| **Sources** | [UD Danish DDT v2.11](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://huggingface.co/datasets/dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[DaCoref](https://huggingface.co/datasets/alexandrainst/dacoref) (Buch-Kromann, Matthias)<br />[DaNED](https://danlp-alexandra.readthedocs.io/en/stable/docs/datasets.html#daned) (Barrett, M. J., Lam, H., Wu, M., Lacroix, O., Plank, B., & Søgaard, A.)<br />[chcaa/dfm-encoder-large-v1](https://huggingface.co/chcaa/dfm-encoder-large-v1) (The Danish Foundation Models team) | |
|
| **License** | `Apache-2.0` | |
|
| **Author** | [Kenneth Enevoldsen](https://chcaa.io/#/) | |
|
|
|
### Label Scheme |
|
|
|
<details> |
|
|
|
<summary>View label scheme (211 labels for 4 components)</summary> |
|
|
|
| Component | Labels | |
|
| --- | --- | |
|
| **`tagger`** | `ADJ`, `ADP`, `ADV`, `AUX`, `CCONJ`, `DET`, `INTJ`, `NOUN`, `NUM`, `PART`, `PRON`, `PROPN`, `PUNCT`, `SCONJ`, `SYM`, `VERB`, `X` | |
|
| **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `NumType=Ord\|POS=ADJ`, `POS=CCONJ`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Sup\|POS=ADV`, `Degree=Pos\|POS=ADV`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Number=Plur\|POS=DET\|PronType=Ind`, `POS=ADP`, `POS=ADV\|PartType=Inf`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=ADP\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `NumType=Card\|POS=NUM`, `Degree=Pos\|POS=ADJ`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=PART\|PartType=Inf`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=PRON\|PronType=Ind`, `POS=INTJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Gen\|POS=PROPN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `POS=PRON\|PronType=Dem`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Number=Plur\|POS=NUM`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `POS=PRON`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Definite=Ind\|Number=Sing\|POS=NUM`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Foreign=Yes\|POS=ADV`, `POS=NOUN`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Degree=Sup\|POS=ADJ`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Mood=Imp\|POS=VERB`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `POS=X`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=VERB\|VerbForm=Ger`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Foreign=Yes\|POS=X`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=PRON\|PronType=Rcp`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `POS=SYM`, `POS=DET\|PronType=Dem`, `Gender=Com\|Number=Sing\|POS=NUM`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `POS=VERB\|Tense=Pres`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NUM`, `Degree=Abs\|POS=ADV`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|POS=NOUN`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NUM`, `Definite=Def\|Number=Plur\|POS=NOUN`, `Case=Gen\|POS=NOUN`, `POS=AUX\|Tense=Pres\|VerbForm=Part` | |
|
| **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` | |
|
| **`ner`** | `LOC`, `MISC`, `ORG`, `PER` | |
|
|
|
</details> |
|
|
|
### Accuracy |
|
|
|
| Type | Score | |
|
| --- | --- | |
|
| `TOKEN_ACC` | 99.92 | |
|
| `TOKEN_P` | 99.70 | |
|
| `TOKEN_R` | 99.77 | |
|
| `TOKEN_F` | 99.74 | |
|
| `SENTS_P` | 100.00 | |
|
| `SENTS_R` | 100.00 | |
|
| `SENTS_F` | 100.00 | |
|
| `TAG_ACC` | 99.14 | |
|
| `POS_ACC` | 99.08 | |
|
| `MORPH_ACC` | 98.80 | |
|
| `MORPH_MICRO_P` | 99.45 | |
|
| `MORPH_MICRO_R` | 99.32 | |
|
| `MORPH_MICRO_F` | 99.39 | |
|
| `DEP_UAS` | 92.81 | |
|
| `DEP_LAS` | 90.80 | |
|
| `ENTS_P` | 88.58 | |
|
| `ENTS_R` | 86.20 | |
|
| `ENTS_F` | 87.38 | |
|
| `LEMMA_ACC` | 95.89 | |
|
| `COREF_LEA_F1` | 46.72 | |
|
| `COREF_LEA_PRECISION` | 45.91 | |
|
| `COREF_LEA_RECALL` | 47.56 | |
|
| `NEL_SCORE` | 34.29 | |
|
| `NEL_MICRO_P` | 84.00 | |
|
| `NEL_MICRO_R` | 21.54 | |
|
| `NEL_MICRO_F` | 34.29 | |
|
| `NEL_MACRO_P` | 86.71 | |
|
| `NEL_MACRO_R` | 24.70 | |
|
| `NEL_MACRO_F` | 37.28 | |
|
|
|
|
|
|
|
### Training |
|
This model was trained using [spaCy](https://spacy.io) and logged to [Weights & Biases](https://wandb.ai/kenevoldsen/dacy-v0.2.0). You can find all the training logs [here](https://wandb.ai/kenevoldsen/dacy-v0.2.0). |