Update README.md
Browse files
README.md
CHANGED
@@ -6,3 +6,55 @@ tags:
|
|
6 |
language: de
|
7 |
---
|
8 |
# Tagger for literary character mentions (DROC corpus)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
language: de
|
7 |
---
|
8 |
# Tagger for literary character mentions (DROC corpus)
|
9 |
+
|
10 |
+
Thi is the character recognizer model that is being used in [LLpro](https://github.com/cophi-wue/LLpro). It detects character mentions in literary fiction: (a) proper nouns ("Alice", "Effi"), and (b) nominal phrases ("Gärtner", "Mutter", "Graf", "Idiot", "Schöne", ...). The model is trained on the [DROC dataset](https://gitlab2.informatik.uni-wuerzburg.de/kallimachos/DROC-Release), fine-tuning the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_character_recognizer.py))
|
11 |
+
|
12 |
+
F1-Score: **91.85** (on a held-out data split; micro average on B-PER and I-PER labels)
|
13 |
+
|
14 |
+
|
15 |
+
---
|
16 |
+
|
17 |
+
**Demo Usage:**
|
18 |
+
|
19 |
+
```
|
20 |
+
from flair.data import Sentence
|
21 |
+
from flair.models import SequenceTagger
|
22 |
+
|
23 |
+
# load tagger
|
24 |
+
tagger = SequenceTagger.load("aehrm/droc-character-recognizer")
|
25 |
+
|
26 |
+
# make example sentence
|
27 |
+
sentence = Sentence("Effi folgte Graf Instetten nach Kessin.")
|
28 |
+
|
29 |
+
# predict NER tags
|
30 |
+
tagger.predict(sentence)
|
31 |
+
|
32 |
+
# print sentence
|
33 |
+
print(sentence)
|
34 |
+
# >>> Sentence[7]: "Effi folgte Graf Instetten nach Kessin." → ["Effi"/PER, "Graf Instetten"/PER]
|
35 |
+
|
36 |
+
# print predicted NER spans
|
37 |
+
print('The following NER tags are found:')
|
38 |
+
# iterate over entities and print
|
39 |
+
for entity in sentence.get_spans('character'):
|
40 |
+
print(entity)
|
41 |
+
# >>> Span[0:1]: "Effi" → PER (1.0)
|
42 |
+
# >>> Span[2:4]: "Graf Instetten" → PER (1.0)
|
43 |
+
```
|
44 |
+
|
45 |
+
**Cite**:
|
46 |
+
|
47 |
+
Please cite the following paper when using this model.
|
48 |
+
|
49 |
+
```
|
50 |
+
|
51 |
+
@inproceedings{ehrmanntraut_llpro_2023,
|
52 |
+
location = {Ingolstadt, Germany},
|
53 |
+
title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
|
54 |
+
booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
|
55 |
+
publisher = {{KONVENS} 2023 Organizers},
|
56 |
+
author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
|
57 |
+
date = {2023},
|
58 |
+
}
|
59 |
+
|
60 |
+
```
|