metadata

language:
  - de

Scene Segmenter for the Shared Task on Scene Segmentation

This is the scene segmenter model that is being used in LLpro. On borders between sentences, it predicts one of the following labels:

B-Scene: the preceding sentence began a new Scene.
B-Nonscene: the preceding sentence began a new Non-Scene.
Scene: the preceding sentence belongs to a Scene, but does not begin a new one – i.e., the scene continues.
Nonscene: the preceding sentence belongs to a Noncene, but does not begin a new one – i.e., the non-scene continues.

Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special [SEP] token. On these [SEP] tokens, the linear classification layer predicts one of the four above classes.

The model is trained on the dataset corresponding to the KONVENS 2021 Shared Task on Scene Segmentation [(Zehe et al., 2021)][http://ceur-ws.org/Vol-3001/#paper1] fine-tuning the domain-adapted lkonle/fiction-gbert-large. (Training code)

F1-Score:

40.22 on Track 1 (in-domain dime novels)
35.09 on Track 2 (out-of-domain high brow novels)

The respective test datasets are only available to the task organizers; the task organizers evaluated this model on their private test set and report above scores. See the KONVENS paper for a description of their metric.

Demo Usage:

TODO

Cite:

Please cite the following paper when using this model.

@inproceedings{ehrmanntraut-et-al-llpro-2023,
    location = {Ingolstadt, Germany},
    title = {{LLpro}: A Literary Language Processing Pipeline for {German} Narrative Text},
    booktitle = {Proceedings of the 10th Conference on Natural Language Processing ({KONVENS} 2022)},
    publisher = {{KONVENS} 2023 Organizers},
    author = {Ehrmanntraut, Anton and Konle, Leonard and Jannidis, Fotis},
    date = {2023},
}