|
--- |
|
tags: |
|
- flair |
|
- token-classification |
|
- sequence-tagger-model |
|
language: de |
|
--- |
|
# REDEWIEDERGABE Tagger: direct STWR |
|
|
|
This model is part of an ensemble of binary taggers that recognize German speech, thought and writing representation. They can be used to automatically detect and annotate the following 4 types of speech, thought and writing representation in German texts: |
|
|
|
| STWR type | Example | Translation | |
|
|--------------------------------|-------------------------------------------------------------------------|----------------------------------------------------------| |
|
| direct (**this tagger**) | Dann sagte er: **"Ich habe Hunger."** | Then he said: **"I'm hungry."** | |
|
| free indirect ('erlebte Rede') | Er war ratlos. **Woher sollte er denn hier bloß ein Mittagessen bekommen?** | He was at a loss. **Where should he ever find lunch here?** | |
|
| indirect | Sie fragte, **wo das Essen sei.** | She asked **where the food was.** | |
|
| reported | **Sie sprachen über das Mittagessen.** | **They talked about lunch.** | |
|
|
|
The ensemble is trained on the [REDEWIEDERGABE corpus](https://github.com/redewiedergabe/corpus) ([Annotation guidelines](http://redewiedergabe.de/richtlinien/richtlinien.html)), fine-tuning each tagger on the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training Code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_redewiedergabe.py)) |
|
|
|
**F1-Scores:** |
|
|
|
| STWR type | F1-Score | |
|
|-----------|-----------| |
|
| **direct (this tagger)** | **90.76** | |
|
| indirect | 79.16 | |
|
| free indirect | 58.00 | |
|
| reported | 70.47 | |
|
|
|
---- |
|
|
|
**Demo Usage:** |
|
|
|
```python |
|
from flair.data import Sentence |
|
from flair.models import SequenceTagger |
|
|
|
|
|
sentence = Sentence('Sie sprachen über das Mittagessen. Sie fragte, wo das Essen sei. Woher sollte er das wissen? Dann sagte er: "Ich habe Hunger."') |
|
|
|
rwtypes = ['direct', 'indirect', 'freeindirect', 'reported'] |
|
for rwtype in rwtypes: |
|
model = SequenceTagger.load(f'aehrm/redewiedergabe-{rwtype}') |
|
model.predict(sentence) |
|
print(rwtype, [ x.data_point.text for x in sentence.get_labels() ]) |
|
# >>> direct ['"', 'Ich', 'habe', 'Hunger', '.', '"'] |
|
# >>> indirect ['wo', 'das', 'Essen', 'sei', '.'] |
|
# >>> freeindirect ['Woher', 'sollte', 'er', 'das', 'wissen', '?'] |
|
# >>> reported ['Sie', 'sprachen', 'über', 'das', 'Mittagessen', '.', 'Woher', 'sollte', 'er', 'das', 'wissen', '?'] |
|
``` |
|
|
|
**Cite**: |
|
|
|
Please cite the following paper when using this model. |
|
|
|
``` |
|
@inproceedings{ehrmanntraut-et-al-llpro-2023, |
|
address = {Ingolstadt, Germany}, |
|
title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text}, |
|
booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)}, |
|
publisher = {{KONVENS} 2023 Organizers}, |
|
author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis}, |
|
year = {2023}, |
|
} |
|
``` |
|
|
|
|