da_dacy_small_trf / README.md
KennethEnevoldsen's picture
Updated to version v0.2.0
0eadea0
|
raw
history blame
15.1 kB
metadata
tags:
  - spacy
  - dacy
  - danish
  - token-classification
  - pos tagging
  - morphological analysis
  - lemmatization
  - dependency parsing
  - named entity recognition
  - coreference resolution
  - named entity linking
  - named entity disambiguation
language:
  - da
license: apache-2.0
model-index:
  - name: da_dacy_small_trf-0.2.0
    results:
      - task:
          name: NER
          type: token-classification
        metrics:
          - name: NER Precision
            type: precision
            value: 0.8306010929
          - name: NER Recall
            type: recall
            value: 0.8172043011
          - name: NER F Score
            type: f_score
            value: 0.8238482385
        dataset:
          name: DaNE
          split: test
          type: dane
      - task:
          name: TAG
          type: token-classification
        metrics:
          - name: TAG (XPOS) Accuracy
            type: accuracy
            value: 0.9846798742
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: POS
          type: token-classification
        metrics:
          - name: POS (UPOS) Accuracy
            type: accuracy
            value: 0.9842315369
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: MORPH
          type: token-classification
        metrics:
          - name: Morph (UFeats) Accuracy
            type: accuracy
            value: 0.9772942762
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: LEMMA
          type: token-classification
        metrics:
          - name: Lemma Accuracy
            type: accuracy
            value: 0.9466699925
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: UNLABELED_DEPENDENCIES
          type: token-classification
        metrics:
          - name: Unlabeled Attachment Score (UAS)
            type: f_score
            value: 0.8978522787
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: LABELED_DEPENDENCIES
          type: token-classification
        metrics:
          - name: Labeled Attachment Score (LAS)
            type: f_score
            value: 0.8701623698
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: SENTS
          type: token-classification
        metrics:
          - name: Sentences F-Score
            type: f_score
            value: 0.9433304272
        dataset:
          name: UD Danish DDT
          split: test
          type: universal_dependencies
          config: da_ddt
      - task:
          name: coreference-resolution
          type: coreference-resolution
        metrics:
          - name: LEA
            type: f_score
            value: 0.4218334451
        dataset:
          name: DaCoref
          type: alexandrainst/dacoref
          split: custom
      - task:
          name: coreference-resolution
          type: coreference-resolution
        metrics:
          - name: Named entity Linking Precision
            type: precision
            value: 0.8461538462
          - name: Named entity Linking Recall
            type: recall
            value: 0.2222222222
          - name: Named entity Linking F Score
            type: f_score
            value: 0.352
        dataset:
          name: DaNED
          type: named-entity-linking
          split: custom
library_name: spacy
datasets:
  - universal_dependencies
  - dane
  - alexandrainst/dacoref
metrics:
  - accuracy

DaCy small

DaCy is a Danish language processing framework with state-of-the-art pipelines as well as functionality for analysing Danish pipelines. DaCy's largest pipeline has achieved State-of-the-Art performance on parts-of-speech tagging and dependency parsing for Danish on the Danish Dependency treebank as well as competitive performance on named entity recognition, named entity disambiguation and coreference resolution. To read more check out the DaCy repository for material on how to use DaCy and reproduce the results. DaCy also contains guides on usage of the package as well as behavioural test for biases and robustness of Danish NLP pipelines.

Feature Description
Name da_dacy_small_trf
Version 0.2.0
spaCy >=3.5.2,<3.6.0
Default Pipeline transformer, tagger, morphologizer, trainable_lemmatizer, parser, ner, coref, span_resolver, span_cleaner, entity_linker
Components transformer, tagger, morphologizer, trainable_lemmatizer, parser, ner, coref, span_resolver, span_cleaner, entity_linker
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources UD Danish DDT v2.11 (Johannsen, Anders; Martínez Alonso, Héctor; Plank, Barbara)
DaNE (Rasmus Hvingelby, Amalie B. Pauli, Maria Barrett, Christina Rosted, Lasse M. Lidegaard, Anders Søgaard)
DaCoref (Buch-Kromann, Matthias)
DaNED (Barrett, M. J., Lam, H., Wu, M., Lacroix, O., Plank, B., & Søgaard, A.)
jonfd/electra-small-nordic (Jón Friðrik Daðason)
License Apache-2.0
Author Kenneth Enevoldsen

Label Scheme

View label scheme (211 labels for 4 components)
Component Labels
tagger ADJ, ADP, ADV, AUX, CCONJ, DET, INTJ, NOUN, NUM, PART, PRON, PROPN, PUNCT, SCONJ, SYM, VERB, X
morphologizer AdpType=Prep|POS=ADP, Definite=Ind|Gender=Com|Number=Sing|POS=NOUN, Mood=Ind|POS=AUX|Tense=Pres|VerbForm=Fin|Voice=Act, POS=PROPN, Definite=Ind|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part, Definite=Def|Gender=Neut|Number=Sing|POS=NOUN, POS=SCONJ, Definite=Def|Gender=Com|Number=Sing|POS=NOUN, Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Act, POS=ADV, Number=Plur|POS=DET|PronType=Dem, Degree=Pos|Number=Plur|POS=ADJ, Definite=Ind|Gender=Com|Number=Plur|POS=NOUN, POS=PUNCT, NumType=Ord|POS=ADJ, POS=CCONJ, Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN, POS=VERB|VerbForm=Inf|Voice=Act, Case=Acc|Gender=Neut|Number=Sing|POS=PRON|Person=3|PronType=Prs, Degree=Sup|POS=ADV, Degree=Pos|POS=ADV, Gender=Com|Number=Sing|POS=DET|PronType=Ind, Number=Plur|POS=DET|PronType=Ind, POS=ADP, POS=ADV|PartType=Inf, Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs, Mood=Ind|POS=AUX|Tense=Past|VerbForm=Fin|Voice=Act, Definite=Def|Degree=Pos|Number=Sing|POS=ADJ, Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs, Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Act, POS=ADP|PartType=Inf, Definite=Ind|Degree=Pos|Gender=Com|Number=Sing|POS=ADJ, NumType=Card|POS=NUM, Degree=Pos|POS=ADJ, Definite=Ind|Number=Sing|POS=AUX|Tense=Past|VerbForm=Part, POS=PART|PartType=Inf, Case=Acc|POS=PRON|Person=3|PronType=Prs|Reflex=Yes, Definite=Def|Gender=Com|Number=Plur|POS=NOUN, Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN, Number[psor]=Plur|POS=DET|Person=3|Poss=Yes|PronType=Prs, POS=VERB|Tense=Pres|VerbForm=Part, Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs, Case=Gen|Definite=Def|Gender=Com|Number=Sing|POS=NOUN, Definite=Def|Degree=Sup|Number=Plur|POS=ADJ, Case=Acc|Number=Plur|POS=PRON|Person=3|PronType=Prs, POS=AUX|VerbForm=Inf|Voice=Act, Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ, Definite=Ind|Degree=Cmp|Number=Sing|POS=ADJ, Degree=Cmp|POS=ADJ, POS=PRON|PartType=Inf, Definite=Ind|Degree=Pos|Number=Sing|POS=ADJ, Case=Nom|Gender=Com|POS=PRON|PronType=Ind, Number=Plur|POS=PRON|PronType=Ind, POS=INTJ, Gender=Com|Number=Sing|POS=DET|PronType=Dem, Case=Gen|Number=Plur|POS=DET|PronType=Ind, Mood=Ind|POS=VERB|Tense=Pres|VerbForm=Fin|Voice=Pass, Definite=Def|Gender=Neut|Number=Plur|POS=NOUN, Degree=Cmp|POS=ADV, Number=Plur|Number[psor]=Plur|POS=PRON|Person=1|Poss=Yes|PronType=Prs|Style=Form, Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=3|PronType=Prs, Number=Plur|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, Case=Gen|POS=PROPN, Gender=Neut|Number=Sing|POS=PRON|PronType=Ind, Number=Plur|POS=VERB|Tense=Past|VerbForm=Part, Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs, Definite=Def|Degree=Sup|POS=ADJ, Gender=Neut|Number=Sing|POS=DET|PronType=Ind, Case=Gen|Definite=Ind|Gender=Neut|Number=Sing|POS=NOUN, Gender=Neut|Number=Sing|POS=DET|PronType=Dem, Definite=Def|Number=Sing|POS=VERB|Tense=Past|VerbForm=Part, POS=PRON|PronType=Dem, Degree=Pos|Gender=Com|Number=Sing|POS=ADJ, Number=Plur|POS=NUM, POS=VERB|VerbForm=Inf|Voice=Pass, Definite=Def|Degree=Sup|Number=Sing|POS=ADJ, Number=Sing|POS=PRON|PronType=Int,Rel, Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=1|PronType=Prs, Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs, Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs, POS=PRON, Definite=Ind|Number=Sing|POS=NOUN, Definite=Ind|Number=Sing|POS=NUM, Case=Gen|Definite=Ind|Gender=Com|Number=Sing|POS=NOUN, Foreign=Yes|POS=ADV, POS=NOUN, Case=Gen|Definite=Def|Gender=Neut|Number=Sing|POS=NOUN, Gender=Com|Number=Plur|POS=NOUN, Gender=Neut|Number=Sing|POS=PRON|PronType=Int,Rel, Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs, Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs, Gender=Com|Number=Sing|POS=PRON|PronType=Ind, Case=Gen|Definite=Ind|Gender=Com|Number=Plur|POS=NOUN, Degree=Pos|Gender=Neut|Number=Sing|POS=ADJ, Degree=Sup|POS=ADJ, Degree=Pos|Number=Sing|POS=ADJ, Mood=Imp|POS=VERB, Case=Nom|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs, Case=Acc|Gender=Com|POS=PRON|Person=2|Polite=Form|PronType=Prs, POS=X, Case=Gen|Definite=Def|Gender=Com|Number=Plur|POS=NOUN, Number=Plur|POS=PRON|PronType=Dem, Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=1|PronType=Prs, Number=Plur|POS=PRON|PronType=Int,Rel, Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, Degree=Cmp|Number=Plur|POS=ADJ, Number=Plur|Number[psor]=Sing|POS=DET|Person=1|Poss=Yes|PronType=Prs, Gender=Com|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form, Case=Nom|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs, Case=Acc|Gender=Com|Number=Sing|POS=PRON|Person=2|PronType=Prs, Gender=Com|POS=PRON|PronType=Int,Rel, Case=Gen|Degree=Pos|Number=Plur|POS=ADJ, Gender=Neut|Number=Sing|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, POS=VERB|VerbForm=Ger, Gender=Com|Number=Sing|POS=PRON|PronType=Dem, Case=Gen|POS=PRON|PronType=Int,Rel, Mood=Ind|POS=VERB|Tense=Past|VerbForm=Fin|Voice=Pass, Abbr=Yes|POS=X, Case=Gen|Definite=Ind|Gender=Neut|Number=Plur|POS=NOUN, Gender=Com|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs, Definite=Ind|Number=Plur|POS=NOUN, Foreign=Yes|POS=X, Number=Plur|POS=PRON|PronType=Rcp, Case=Nom|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs, Case=Gen|Degree=Cmp|POS=ADJ, Case=Gen|Definite=Def|Gender=Neut|Number=Plur|POS=NOUN, Case=Acc|Gender=Com|Number=Plur|POS=PRON|Person=2|PronType=Prs, Gender=Neut|Number=Sing|POS=PRON|PronType=Dem, Number=Plur|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form, Gender=Neut|Number=Sing|Number[psor]=Plur|POS=DET|Person=1|Poss=Yes|PronType=Prs|Style=Form, Number=Plur|Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs|Reflex=Yes, Number[psor]=Sing|POS=PRON|Person=3|Poss=Yes|PronType=Prs, Case=Gen|Number=Plur|POS=PRON|PronType=Rcp, POS=DET|Person=2|Polite=Form|Poss=Yes|PronType=Prs, POS=SYM, POS=DET|PronType=Dem, Gender=Com|Number=Sing|POS=NUM, Number[psor]=Plur|POS=DET|Person=2|Poss=Yes|PronType=Prs, Case=Gen|Number=Plur|POS=VERB|Tense=Past|VerbForm=Part, Definite=Def|Degree=Abs|POS=ADJ, POS=VERB|Tense=Pres, Definite=Ind|Gender=Neut|Number=Sing|POS=NUM, Degree=Abs|POS=ADV, Case=Gen|Definite=Def|Degree=Pos|Number=Sing|POS=ADJ, Gender=Com|Number=Sing|POS=PRON|PronType=Int,Rel, POS=VERB|Tense=Past|VerbForm=Part, Definite=Ind|Degree=Sup|Number=Sing|POS=ADJ, Gender=Neut|Number=Sing|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs, Gender=Com|Number=Sing|Number[psor]=Sing|POS=PRON|Person=1|Poss=Yes|PronType=Prs, Number=Plur|Number[psor]=Sing|POS=DET|Person=2|Poss=Yes|PronType=Prs, Number[psor]=Plur|POS=PRON|Person=3|Poss=Yes|PronType=Prs, Definite=Ind|POS=NOUN, Case=Gen|Gender=Com|Number=Sing|POS=DET|PronType=Ind, Definite=Ind|Gender=Com|Number=Sing|POS=NUM, Definite=Def|Number=Plur|POS=NOUN, Case=Gen|POS=NOUN, POS=AUX|Tense=Pres|VerbForm=Part
parser ROOT, acl:relcl, advcl, advmod, advmod:lmod, amod, appos, aux, case, cc, ccomp, compound:prt, conj, cop, dep, det, expl, fixed, flat, iobj, list, mark, nmod, nmod:poss, nsubj, nummod, obj, obl, obl:lmod, obl:tmod, punct, xcomp
ner LOC, MISC, ORG, PER

Accuracy

Type Score
TOKEN_ACC 99.92
TOKEN_P 99.70
TOKEN_R 99.77
TOKEN_F 99.74
SENTS_P 92.96
SENTS_R 95.75
SENTS_F 94.33
TAG_ACC 98.47
POS_ACC 98.42
MORPH_ACC 97.73
MORPH_MICRO_P 98.94
MORPH_MICRO_R 98.33
MORPH_MICRO_F 98.64
DEP_UAS 89.79
DEP_LAS 87.02
ENTS_P 83.06
ENTS_R 81.72
ENTS_F 82.38
LEMMA_ACC 94.67
COREF_LEA_F1 42.18
COREF_LEA_PRECISION 44.79
COREF_LEA_RECALL 39.86
NEL_SCORE 35.20
NEL_MICRO_P 84.62
NEL_MICRO_R 22.22
NEL_MICRO_F 35.20
NEL_MACRO_P 87.68
NEL_MACRO_R 24.76
NEL_MACRO_F 37.52

Training

This model was trained using spaCy and logged to Weights & Biases. You can find all the training logs here.