File size: 3,387 Bytes
92c0043 93cbb8b 1b0a349 93cbb8b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
---
tags:
- flair
- token-classification
- sequence-tagger-model
language: de
---
# REDEWIEDERGABE Tagger: direct STWR
This model is part of an ensemble of binary taggers that recognize German speech, thought and writing representation, that is being used in [LLpro](https://github.com/cophi-wue/LLpro). They can be used to automatically detect and annotate the following 4 types of speech, thought and writing representation in German texts:
| STWR type | Example | Translation |
|--------------------------------|-------------------------------------------------------------------------|----------------------------------------------------------|
| direct (**this tagger**) | Dann sagte er: **"Ich habe Hunger."** | Then he said: **"I'm hungry."** |
| free indirect ('erlebte Rede') | Er war ratlos. **Woher sollte er denn hier bloß ein Mittagessen bekommen?** | He was at a loss. **Where should he ever find lunch here?** |
| indirect | Sie fragte, **wo das Essen sei.** | She asked **where the food was.** |
| reported | **Sie sprachen über das Mittagessen.** | **They talked about lunch.** |
The ensemble is trained on the [REDEWIEDERGABE corpus](https://github.com/redewiedergabe/corpus) ([Annotation guidelines](http://redewiedergabe.de/richtlinien/richtlinien.html)), fine-tuning each tagger on the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training Code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_redewiedergabe.py))
**F1-Scores:**
| STWR type | F1-Score |
|-----------|-----------|
| **direct (this tagger)** | **90.76** |
| indirect | 79.16 |
| free indirect | 58.00 |
| reported | 70.47 |
----
**Demo Usage:**
```python
from flair.data import Sentence
from flair.models import SequenceTagger
sentence = Sentence('Sie sprachen über das Mittagessen. Sie fragte, wo das Essen sei. Woher sollte er das wissen? Dann sagte er: "Ich habe Hunger."')
rwtypes = ['direct', 'indirect', 'freeindirect', 'reported']
for rwtype in rwtypes:
model = SequenceTagger.load(f'aehrm/redewiedergabe-{rwtype}')
model.predict(sentence)
print(rwtype, [ x.data_point.text for x in sentence.get_labels() ])
# >>> direct ['"', 'Ich', 'habe', 'Hunger', '.', '"']
# >>> indirect ['wo', 'das', 'Essen', 'sei', '.']
# >>> freeindirect ['Woher', 'sollte', 'er', 'das', 'wissen', '?']
# >>> reported ['Sie', 'sprachen', 'über', 'das', 'Mittagessen', '.', 'Woher', 'sollte', 'er', 'das', 'wissen', '?']
```
**Cite**:
Please cite the following paper when using this model.
```
@inproceedings{ehrmanntraut-et-al-llpro-2023,
address = {Ingolstadt, Germany},
title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
publisher = {{KONVENS} 2023 Organizers},
author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
year = {2023},
}
```
|