da_dacy_small_trf / README.md
KennethEnevoldsen's picture
Updated to version v0.2.0
0eadea0
|
raw
history blame
15.1 kB
---
tags:
- spacy
- dacy
- danish
- token-classification
- pos tagging
- morphological analysis
- lemmatization
- dependency parsing
- named entity recognition
- coreference resolution
- named entity linking
- named entity disambiguation
language:
- da
license: apache-2.0
model-index:
- name: da_dacy_small_trf-0.2.0
results:
- task:
name: NER
type: token-classification
metrics:
- name: NER Precision
type: precision
value: 0.8306010929
- name: NER Recall
type: recall
value: 0.8172043011
- name: NER F Score
type: f_score
value: 0.8238482385
dataset:
name: DaNE
split: test
type: dane
- task:
name: TAG
type: token-classification
metrics:
- name: TAG (XPOS) Accuracy
type: accuracy
value: 0.9846798742
dataset:
name: UD Danish DDT
split: test
type: universal_dependencies
config: da_ddt
- task:
name: POS
type: token-classification
metrics:
- name: POS (UPOS) Accuracy
type: accuracy
value: 0.9842315369
dataset:
name: UD Danish DDT
split: test
type: universal_dependencies
config: da_ddt
- task:
name: MORPH
type: token-classification
metrics:
- name: Morph (UFeats) Accuracy
type: accuracy
value: 0.9772942762
dataset:
name: UD Danish DDT
split: test
type: universal_dependencies
config: da_ddt
- task:
name: LEMMA
type: token-classification
metrics:
- name: Lemma Accuracy
type: accuracy
value: 0.9466699925
dataset:
name: UD Danish DDT
split: test
type: universal_dependencies
config: da_ddt
- task:
name: UNLABELED_DEPENDENCIES
type: token-classification
metrics:
- name: Unlabeled Attachment Score (UAS)
type: f_score
value: 0.8978522787
dataset:
name: UD Danish DDT
split: test
type: universal_dependencies
config: da_ddt
- task:
name: LABELED_DEPENDENCIES
type: token-classification
metrics:
- name: Labeled Attachment Score (LAS)
type: f_score
value: 0.8701623698
dataset:
name: UD Danish DDT
split: test
type: universal_dependencies
config: da_ddt
- task:
name: SENTS
type: token-classification
metrics:
- name: Sentences F-Score
type: f_score
value: 0.9433304272
dataset:
name: UD Danish DDT
split: test
type: universal_dependencies
config: da_ddt
- task:
name: coreference-resolution
type: coreference-resolution
metrics:
- name: LEA
type: f_score
value: 0.4218334451
dataset:
name: DaCoref
type: alexandrainst/dacoref
split: custom
- task:
name: coreference-resolution
type: coreference-resolution
metrics:
- name: Named entity Linking Precision
type: precision
value: 0.8461538462
- name: Named entity Linking Recall
type: recall
value: 0.2222222222
- name: Named entity Linking F Score
type: f_score
value: 0.352
dataset:
name: DaNED
type: named-entity-linking
split: custom
library_name: spacy
datasets:
- universal_dependencies
- dane
- alexandrainst/dacoref
metrics:
- accuracy
---
<a href="https://github.com/centre-for-humanities-computing/Dacy"><img src="https://centre-for-humanities-computing.github.io/DaCy/_static/icon.png" width="175" height="175" align="right" /></a>
# DaCy small
DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines.
DaCy's largest pipeline has achieved State-of-the-Art performance on parts-of-speech tagging and dependency
parsing for Danish on the Danish Dependency treebank as well as competitive performance on named entity recognition, named entity disambiguation and coreference resolution.
To read more check out the [DaCy repository](https://github.com/centre-for-humanities-computing/DaCy) for material on how to use DaCy and reproduce the results.
DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.
| Feature | Description |
| --- | --- |
| **Name** | `da_dacy_small_trf` |
| **Version** | `0.2.0` |
| **spaCy** | `>=3.5.2,<3.6.0` |
| **Default Pipeline** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` |
| **Components** | `transformer`, `tagger`, `morphologizer`, `trainable_lemmatizer`, `parser`, `ner`, `coref`, `span_resolver`, `span_cleaner`, `entity_linker` |
| **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
| **Sources** | [UD Danish DDT v2.11](https://github.com/UniversalDependencies/UD_Danish-DDT) (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)<br />[DaNE](https://huggingface.co/datasets/dane) (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)<br />[DaCoref](https://huggingface.co/datasets/alexandrainst/dacoref) (Buch-Kromann, Matthias)<br />[DaNED](https://danlp-alexandra.readthedocs.io/en/stable/docs/datasets.html#daned) (Barrett, M. J., Lam, H., Wu, M., Lacroix, O., Plank, B., & Søgaard, A.)<br />[jonfd/electra-small-nordic](https://huggingface.co/jonfd/electra-small-nordic) (Jón Friðrik Daðason) |
| **License** | `Apache-2.0` |
| **Author** | [Kenneth Enevoldsen](https://chcaa.io/#/) |
### Label Scheme
<details>
<summary>View label scheme (211 labels for 4 components)</summary>
| Component | Labels |
| --- | --- |
| **`tagger`** | `ADJ`, `ADP`, `ADV`, `AUX`, `CCONJ`, `DET`, `INTJ`, `NOUN`, `NUM`, `PART`, `PRON`, `PROPN`, `PUNCT`, `SCONJ`, `SYM`, `VERB`, `X` |
| **`morphologizer`** | `AdpType=Prep\|POS=ADP`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=AUX\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=PROPN`, `Definite=Ind\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `POS=SCONJ`, `Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Act`, `POS=ADV`, `Number=Plur\|POS=DET\|PronType=Dem`, `Degree=Pos\|Number=Plur\|POS=ADJ`, `Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `POS=PUNCT`, `NumType=Ord\|POS=ADJ`, `POS=CCONJ`, `Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `POS=VERB\|VerbForm=Inf\|Voice=Act`, `Case=Acc\|Gender=Neut\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Degree=Sup\|POS=ADV`, `Degree=Pos\|POS=ADV`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Number=Plur\|POS=DET\|PronType=Ind`, `POS=ADP`, `POS=ADV\|PartType=Inf`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Mood=Ind\|POS=AUX\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Act`, `POS=ADP\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `NumType=Card\|POS=NUM`, `Degree=Pos\|POS=ADJ`, `Definite=Ind\|Number=Sing\|POS=AUX\|Tense=Past\|VerbForm=Part`, `POS=PART\|PartType=Inf`, `Case=Acc\|POS=PRON\|Person=3\|PronType=Prs\|Reflex=Yes`, `Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Number[psor]=Plur\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs`, `POS=VERB\|Tense=Pres\|VerbForm=Part`, `Case=Nom\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Sing\|POS=NOUN`, `Definite=Def\|Degree=Sup\|Number=Plur\|POS=ADJ`, `Case=Acc\|Number=Plur\|POS=PRON\|Person=3\|PronType=Prs`, `POS=AUX\|VerbForm=Inf\|Voice=Act`, `Definite=Ind\|Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Definite=Ind\|Degree=Cmp\|Number=Sing\|POS=ADJ`, `Degree=Cmp\|POS=ADJ`, `POS=PRON\|PartType=Inf`, `Definite=Ind\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Case=Nom\|Gender=Com\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=PRON\|PronType=Ind`, `POS=INTJ`, `Gender=Com\|Number=Sing\|POS=DET\|PronType=Dem`, `Case=Gen\|Number=Plur\|POS=DET\|PronType=Ind`, `Mood=Ind\|POS=VERB\|Tense=Pres\|VerbForm=Fin\|Voice=Pass`, `Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Degree=Cmp\|POS=ADV`, `Number=Plur\|Number[psor]=Plur\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=3\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Gen\|POS=PROPN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Ind`, `Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Definite=Def\|Degree=Sup\|POS=ADJ`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=DET\|PronType=Dem`, `Definite=Def\|Number=Sing\|POS=VERB\|Tense=Past\|VerbForm=Part`, `POS=PRON\|PronType=Dem`, `Degree=Pos\|Gender=Com\|Number=Sing\|POS=ADJ`, `Number=Plur\|POS=NUM`, `POS=VERB\|VerbForm=Inf\|Voice=Pass`, `Definite=Def\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=1\|PronType=Prs`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `POS=PRON`, `Definite=Ind\|Number=Sing\|POS=NOUN`, `Definite=Ind\|Number=Sing\|POS=NUM`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Sing\|POS=NOUN`, `Foreign=Yes\|POS=ADV`, `POS=NOUN`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Sing\|POS=NOUN`, `Gender=Com\|Number=Plur\|POS=NOUN`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Ind`, `Case=Gen\|Definite=Ind\|Gender=Com\|Number=Plur\|POS=NOUN`, `Degree=Pos\|Gender=Neut\|Number=Sing\|POS=ADJ`, `Degree=Sup\|POS=ADJ`, `Degree=Pos\|Number=Sing\|POS=ADJ`, `Mood=Imp\|POS=VERB`, `Case=Nom\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `Case=Acc\|Gender=Com\|POS=PRON\|Person=2\|Polite=Form\|PronType=Prs`, `POS=X`, `Case=Gen\|Definite=Def\|Gender=Com\|Number=Plur\|POS=NOUN`, `Number=Plur\|POS=PRON\|PronType=Dem`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=1\|PronType=Prs`, `Number=Plur\|POS=PRON\|PronType=Int,Rel`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Degree=Cmp\|Number=Plur\|POS=ADJ`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Case=Nom\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Acc\|Gender=Com\|Number=Sing\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Com\|POS=PRON\|PronType=Int,Rel`, `Case=Gen\|Degree=Pos\|Number=Plur\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `POS=VERB\|VerbForm=Ger`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Dem`, `Case=Gen\|POS=PRON\|PronType=Int,Rel`, `Mood=Ind\|POS=VERB\|Tense=Past\|VerbForm=Fin\|Voice=Pass`, `Abbr=Yes\|POS=X`, `Case=Gen\|Definite=Ind\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|Number=Plur\|POS=NOUN`, `Foreign=Yes\|POS=X`, `Number=Plur\|POS=PRON\|PronType=Rcp`, `Case=Nom\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Case=Gen\|Degree=Cmp\|POS=ADJ`, `Case=Gen\|Definite=Def\|Gender=Neut\|Number=Plur\|POS=NOUN`, `Case=Acc\|Gender=Com\|Number=Plur\|POS=PRON\|Person=2\|PronType=Prs`, `Gender=Neut\|Number=Sing\|POS=PRON\|PronType=Dem`, `Number=Plur\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Gender=Neut\|Number=Sing\|Number[psor]=Plur\|POS=DET\|Person=1\|Poss=Yes\|PronType=Prs\|Style=Form`, `Number=Plur\|Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs\|Reflex=Yes`, `Number[psor]=Sing\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=PRON\|PronType=Rcp`, `POS=DET\|Person=2\|Polite=Form\|Poss=Yes\|PronType=Prs`, `POS=SYM`, `POS=DET\|PronType=Dem`, `Gender=Com\|Number=Sing\|POS=NUM`, `Number[psor]=Plur\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Case=Gen\|Number=Plur\|POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Def\|Degree=Abs\|POS=ADJ`, `POS=VERB\|Tense=Pres`, `Definite=Ind\|Gender=Neut\|Number=Sing\|POS=NUM`, `Degree=Abs\|POS=ADV`, `Case=Gen\|Definite=Def\|Degree=Pos\|Number=Sing\|POS=ADJ`, `Gender=Com\|Number=Sing\|POS=PRON\|PronType=Int,Rel`, `POS=VERB\|Tense=Past\|VerbForm=Part`, `Definite=Ind\|Degree=Sup\|Number=Sing\|POS=ADJ`, `Gender=Neut\|Number=Sing\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Gender=Com\|Number=Sing\|Number[psor]=Sing\|POS=PRON\|Person=1\|Poss=Yes\|PronType=Prs`, `Number=Plur\|Number[psor]=Sing\|POS=DET\|Person=2\|Poss=Yes\|PronType=Prs`, `Number[psor]=Plur\|POS=PRON\|Person=3\|Poss=Yes\|PronType=Prs`, `Definite=Ind\|POS=NOUN`, `Case=Gen\|Gender=Com\|Number=Sing\|POS=DET\|PronType=Ind`, `Definite=Ind\|Gender=Com\|Number=Sing\|POS=NUM`, `Definite=Def\|Number=Plur\|POS=NOUN`, `Case=Gen\|POS=NOUN`, `POS=AUX\|Tense=Pres\|VerbForm=Part` |
| **`parser`** | `ROOT`, `acl:relcl`, `advcl`, `advmod`, `advmod:lmod`, `amod`, `appos`, `aux`, `case`, `cc`, `ccomp`, `compound:prt`, `conj`, `cop`, `dep`, `det`, `expl`, `fixed`, `flat`, `iobj`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obl`, `obl:lmod`, `obl:tmod`, `punct`, `xcomp` |
| **`ner`** | `LOC`, `MISC`, `ORG`, `PER` |
</details>
### Accuracy
| Type | Score |
| --- | --- |
| `TOKEN_ACC` | 99.92 |
| `TOKEN_P` | 99.70 |
| `TOKEN_R` | 99.77 |
| `TOKEN_F` | 99.74 |
| `SENTS_P` | 92.96 |
| `SENTS_R` | 95.75 |
| `SENTS_F` | 94.33 |
| `TAG_ACC` | 98.47 |
| `POS_ACC` | 98.42 |
| `MORPH_ACC` | 97.73 |
| `MORPH_MICRO_P` | 98.94 |
| `MORPH_MICRO_R` | 98.33 |
| `MORPH_MICRO_F` | 98.64 |
| `DEP_UAS` | 89.79 |
| `DEP_LAS` | 87.02 |
| `ENTS_P` | 83.06 |
| `ENTS_R` | 81.72 |
| `ENTS_F` | 82.38 |
| `LEMMA_ACC` | 94.67 |
| `COREF_LEA_F1` | 42.18 |
| `COREF_LEA_PRECISION` | 44.79 |
| `COREF_LEA_RECALL` | 39.86 |
| `NEL_SCORE` | 35.20 |
| `NEL_MICRO_P` | 84.62 |
| `NEL_MICRO_R` | 22.22 |
| `NEL_MICRO_F` | 35.20 |
| `NEL_MACRO_P` | 87.68 |
| `NEL_MACRO_R` | 24.76 |
| `NEL_MACRO_F` | 37.52 |
### Training
This model was trained using [spaCy](https://spacy.io) and logged to [Weights & Biases](https://wandb.ai/kenevoldsen/dacy-v0.2.0). You can find all the training logs [here](https://wandb.ai/kenevoldsen/dacy-v0.2.0).