Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,46 @@
|
|
|
|
|
|
|
|
|
|
1 |
# Scene Segmenter for the Shared Task on Scene Segmentation
|
2 |
|
|
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- de
|
4 |
+
---
|
5 |
# Scene Segmenter for the Shared Task on Scene Segmentation
|
6 |
|
7 |
+
This is the scene segmenter model that is being used in [LLpro](https://github.com/cophi-wue/LLpro). On borders between sentences, it predicts one of the following labels:
|
8 |
+
- `B-Scene`: the preceding sentence began a new *Scene*.
|
9 |
+
- `B-Nonscene`: the preceding sentence began a new *Non-Scene*.
|
10 |
+
- `Scene`: the preceding sentence belongs to a *Scene*, but does not begin a new one – i.e., the scene continues.
|
11 |
+
- `Nonscene`: the preceding sentence belongs to a *Noncene*, but does not begin a new one – i.e., the non-scene continues.
|
12 |
|
13 |
+
Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special `[SEP]` token.
|
14 |
+
On these `[SEP]` tokens, the linear classification layer predicts one of the four above classes.
|
15 |
+
|
16 |
+
The model is trained on the dataset corresponding to the [KONVENS 2021 Shared Task on Scene Segmentation](http://lsx-events.informatik.uni-wuerzburg.de/stss-2021/task.html) [(Zehe et al., 2021)][http://ceur-ws.org/Vol-3001/#paper1] fine-tuning the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_scene_segmenter.py))
|
17 |
+
|
18 |
+
F1-Score:
|
19 |
+
- **40.22** on Track 1 (in-domain dime novels)
|
20 |
+
- **35.09** on Track 2 (out-of-domain high brow novels)
|
21 |
+
|
22 |
+
The respective test datasets are only available to the task organizers; the task organizers evaluated this model on their private test set and report above scores. See the [KONVENS paper](http://ceur-ws.org/Vol-3001/#paper1) for a description of their metric.
|
23 |
+
|
24 |
+
---
|
25 |
+
|
26 |
+
**Demo Usage**:
|
27 |
+
|
28 |
+
```python
|
29 |
+
TODO
|
30 |
+
```
|
31 |
+
|
32 |
+
|
33 |
+
**Cite**:
|
34 |
+
|
35 |
+
Please cite the following paper when using this model.
|
36 |
+
|
37 |
+
```
|
38 |
+
@inproceedings{ehrmanntraut-et-al-llpro-2023,
|
39 |
+
location = {Ingolstadt, Germany},
|
40 |
+
title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
|
41 |
+
booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
|
42 |
+
publisher = {{KONVENS} 2023 Organizers},
|
43 |
+
author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
|
44 |
+
date = {2023},
|
45 |
+
}
|
46 |
+
```
|