aehrm commited on
Commit
d749f81
1 Parent(s): 1bbf0c3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -1,3 +1,46 @@
 
 
 
 
1
  # Scene Segmenter for the Shared Task on Scene Segmentation
2
 
 
 
 
 
 
3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ ---
5
  # Scene Segmenter for the Shared Task on Scene Segmentation
6
 
7
+ This is the scene segmenter model that is being used in [LLpro](https://github.com/cophi-wue/LLpro). On borders between sentences, it predicts one of the following labels:
8
+ - `B-Scene`: the preceding sentence began a new *Scene*.
9
+ - `B-Nonscene`: the preceding sentence began a new *Non-Scene*.
10
+ - `Scene`: the preceding sentence belongs to a *Scene*, but does not begin a new one – i.e., the scene continues.
11
+ - `Nonscene`: the preceding sentence belongs to a *Noncene*, but does not begin a new one – i.e., the non-scene continues.
12
 
13
+ Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special `[SEP]` token.
14
+ On these `[SEP]` tokens, the linear classification layer predicts one of the four above classes.
15
+
16
+ The model is trained on the dataset corresponding to the [KONVENS 2021 Shared Task on Scene Segmentation](http://lsx-events.informatik.uni-wuerzburg.de/stss-2021/task.html) [(Zehe et al., 2021)][http://ceur-ws.org/Vol-3001/#paper1] fine-tuning the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_scene_segmenter.py))
17
+
18
+ F1-Score:
19
+ - **40.22** on Track 1 (in-domain dime novels)
20
+ - **35.09** on Track 2 (out-of-domain high brow novels)
21
+
22
+ The respective test datasets are only available to the task organizers; the task organizers evaluated this model on their private test set and report above scores. See the [KONVENS paper](http://ceur-ws.org/Vol-3001/#paper1) for a description of their metric.
23
+
24
+ ---
25
+
26
+ **Demo Usage**:
27
+
28
+ ```python
29
+ TODO
30
+ ```
31
+
32
+
33
+ **Cite**:
34
+
35
+ Please cite the following paper when using this model.
36
+
37
+ ```
38
+ @inproceedings{ehrmanntraut-et-al-llpro-2023,
39
+ location = {Ingolstadt, Germany},
40
+ title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
41
+ booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
42
+ publisher = {{KONVENS} 2023 Organizers},
43
+ author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
44
+ date = {2023},
45
+ }
46
+ ```