File size: 3,394 Bytes
ccac13e
 
 
 
 
 
 
 
cb0865c
efa1a88
cb0865c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
tags:
  - flair
  - token-classification
  - sequence-tagger-model
language: de
---
# REDEWIEDERGABE Tagger: free indirect STWR

This model is part of an ensemble of binary taggers that recognize German speech, thought and writing representation, that is being used in [LLpro](https://github.com/cophi-wue/LLpro). They can be used to automatically detect and annotate the following 4 types of speech, thought and writing representation in German texts:

| STWR type                      | Example                                                                 | Translation                                              |
|--------------------------------|-------------------------------------------------------------------------|----------------------------------------------------------|
| direct                         | Dann sagte er: **"Ich habe Hunger."**                                       | Then he said: **"I'm hungry."**                             |
| free indirect ('erlebte Rede',  **this tagger**) | Er war ratlos. **Woher sollte er denn hier bloß ein Mittagessen bekommen?** | He was at a loss. **Where should he ever find lunch here?** |
| indirect                 | Sie fragte, **wo das Essen sei.**                                           | She asked **where the food was.**                            |
| reported                  | **Sie sprachen über das Mittagessen.**                                      | **They talked about lunch.**                                 |

The ensemble is trained on the [REDEWIEDERGABE corpus](https://github.com/redewiedergabe/corpus) ([Annotation guidelines](http://redewiedergabe.de/richtlinien/richtlinien.html)), fine-tuning each tagger on the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training Code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_redewiedergabe.py))

**F1-Scores:**

| STWR type |  F1-Score |
|-----------|-----------|
| direct    | 90.76     |
| indirect | 79.16  |
| **free indirect (this tagger)** | **58.00**  |
| reported   | 70.47   |

----

**Demo Usage:**

```python
from flair.data import Sentence
from flair.models import SequenceTagger


sentence = Sentence('Sie sprachen über das Mittagessen. Sie fragte, wo das Essen sei. Woher sollte er das wissen? Dann sagte er: "Ich habe Hunger."')

rwtypes = ['direct', 'indirect', 'freeindirect', 'reported']
for rwtype in rwtypes:
    model = SequenceTagger.load(f'aehrm/redewiedergabe-{rwtype}')
    model.predict(sentence)
    print(rwtype, [ x.data_point.text for x in sentence.get_labels() ])
# >>> direct ['"', 'Ich', 'habe', 'Hunger', '.', '"']
# >>> indirect ['wo', 'das', 'Essen', 'sei', '.']
# >>> freeindirect ['Woher', 'sollte', 'er', 'das', 'wissen', '?']
# >>> reported ['Sie', 'sprachen', 'über', 'das', 'Mittagessen', '.', 'Woher', 'sollte', 'er', 'das', 'wissen', '?']
```

**Cite**:

Please cite the following paper when using this model.

```
@inproceedings{ehrmanntraut-et-al-llpro-2023,
	address = {Ingolstadt, Germany},
	title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
	booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
	publisher = {{KONVENS} 2023 Organizers},
	author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
	year = {2023},
}
```