aehrm's picture
readme: use bibtex
edea4cd
metadata
tags:
  - flair
  - token-classification
  - sequence-tagger-model
language: de

Tagger for literary character mentions (DROC corpus)

This is the character recognizer model that is being used in LLpro. It detects character mentions in literary fiction: (a) proper nouns ("Alice", "Effi"), and (b) nominal phrases ("Gärtner", "Mutter", "Graf", "Idiot", "Schöne", ...). The model is trained on the DROC dataset, fine-tuning the domain-adapted lkonle/fiction-gbert-large. (Training code)

F1-Score: 91.85 (on a held-out data split; micro average on B-PER and I-PER labels)


Demo Usage:

from flair.data import Sentence
from flair.models import SequenceTagger

# load tagger
tagger = SequenceTagger.load("aehrm/droc-character-recognizer")

# make example sentence
sentence = Sentence("Effi folgte Graf Instetten nach Kessin.")

# predict NER tags
tagger.predict(sentence)

# print sentence
print(sentence)
# >>> Sentence[7]: "Effi folgte Graf Instetten nach Kessin." → ["Effi"/PER, "Graf Instetten"/PER]

# print predicted NER spans
print('The following NER tags are found:')
# iterate over entities and print
for entity in sentence.get_spans('character'):
    print(entity)
# >>> Span[0:1]: "Effi" → PER (1.0)
# >>> Span[2:4]: "Graf Instetten" → PER (1.0)

Cite:

Please cite the following paper when using this model.


@inproceedings{ehrmanntraut-et-al-llpro-2023,
    address = {Ingolstadt, Germany},
    title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
    booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
    publisher = {{KONVENS} 2023 Organizers},
    author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
    year = {2023},
}