aehrm's picture
Update README.md
efa1a88
|
raw
history blame
3.39 kB
---
tags:
- flair
- token-classification
- sequence-tagger-model
language: de
---
# REDEWIEDERGABE Tagger: free indirect STWR
This model is part of an ensemble of binary taggers that recognize German speech, thought and writing representation, that is being used in [LLpro](https://github.com/cophi-wue/LLpro). They can be used to automatically detect and annotate the following 4 types of speech, thought and writing representation in German texts:
| STWR type | Example | Translation |
|--------------------------------|-------------------------------------------------------------------------|----------------------------------------------------------|
| direct | Dann sagte er: **"Ich habe Hunger."** | Then he said: **"I'm hungry."** |
| free indirect ('erlebte Rede', **this tagger**) | Er war ratlos. **Woher sollte er denn hier bloß ein Mittagessen bekommen?** | He was at a loss. **Where should he ever find lunch here?** |
| indirect | Sie fragte, **wo das Essen sei.** | She asked **where the food was.** |
| reported | **Sie sprachen über das Mittagessen.** | **They talked about lunch.** |
The ensemble is trained on the [REDEWIEDERGABE corpus](https://github.com/redewiedergabe/corpus) ([Annotation guidelines](http://redewiedergabe.de/richtlinien/richtlinien.html)), fine-tuning each tagger on the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training Code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_redewiedergabe.py))
**F1-Scores:**
| STWR type | F1-Score |
|-----------|-----------|
| direct | 90.76 |
| indirect | 79.16 |
| **free indirect (this tagger)** | **58.00** |
| reported | 70.47 |
----
**Demo Usage:**
```python
from flair.data import Sentence
from flair.models import SequenceTagger
sentence = Sentence('Sie sprachen über das Mittagessen. Sie fragte, wo das Essen sei. Woher sollte er das wissen? Dann sagte er: "Ich habe Hunger."')
rwtypes = ['direct', 'indirect', 'freeindirect', 'reported']
for rwtype in rwtypes:
model = SequenceTagger.load(f'aehrm/redewiedergabe-{rwtype}')
model.predict(sentence)
print(rwtype, [ x.data_point.text for x in sentence.get_labels() ])
# >>> direct ['"', 'Ich', 'habe', 'Hunger', '.', '"']
# >>> indirect ['wo', 'das', 'Essen', 'sei', '.']
# >>> freeindirect ['Woher', 'sollte', 'er', 'das', 'wissen', '?']
# >>> reported ['Sie', 'sprachen', 'über', 'das', 'Mittagessen', '.', 'Woher', 'sollte', 'er', 'das', 'wissen', '?']
```
**Cite**:
Please cite the following paper when using this model.
```
@inproceedings{ehrmanntraut-et-al-llpro-2023,
address = {Ingolstadt, Germany},
title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
publisher = {{KONVENS} 2023 Organizers},
author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
year = {2023},
}
```