Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ This is the scene segmenter model that is being used in [LLpro](https://github.c
|
|
13 |
Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special `[SEP]` token.
|
14 |
On these `[SEP]` tokens, the linear classification layer predicts one of the four above classes.
|
15 |
|
16 |
-
The model is trained on the dataset corresponding to the [KONVENS 2021 Shared Task on Scene Segmentation](http://lsx-events.informatik.uni-wuerzburg.de/stss-2021/task.html) [
|
17 |
|
18 |
F1-Score:
|
19 |
- **40.22** on Track 1 (in-domain dime novels)
|
@@ -26,7 +26,55 @@ The respective test datasets are only available to the task organizers; the task
|
|
26 |
**Demo Usage**:
|
27 |
|
28 |
```python
|
29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30 |
```
|
31 |
|
32 |
|
|
|
13 |
Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special `[SEP]` token.
|
14 |
On these `[SEP]` tokens, the linear classification layer predicts one of the four above classes.
|
15 |
|
16 |
+
The model is trained on the dataset corresponding to the [KONVENS 2021 Shared Task on Scene Segmentation](http://lsx-events.informatik.uni-wuerzburg.de/stss-2021/task.html) ([Zehe et al., 2021](http://ceur-ws.org/Vol-3001/#paper1)) fine-tuning the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_scene_segmenter.py))
|
17 |
|
18 |
F1-Score:
|
19 |
- **40.22** on Track 1 (in-domain dime novels)
|
|
|
26 |
**Demo Usage**:
|
27 |
|
28 |
```python
|
29 |
+
import torch
|
30 |
+
from transformers import BertTokenizer, BertForTokenClassification
|
31 |
+
|
32 |
+
tokenizer = BertTokenizer.from_pretrained('aehrm/stss-scene-segmenter')
|
33 |
+
model = BertForTokenClassification.from_pretrained('aehrm/stss-scene-segmenter', sep_token_id=tokenizer.sep_token_id).eval()
|
34 |
+
|
35 |
+
|
36 |
+
sentences = ['Und so begann unser kleines Abenteuer auf Hoher See...', 'Es war früh am Morgen, als wir in See stechen wollten.', 'Das Wasser war still.']
|
37 |
+
inputs = tokenizer(' [SEP] '.join(sentences), return_tensors='pt')
|
38 |
+
|
39 |
+
# inference on the model
|
40 |
+
with torch.no_grad():
|
41 |
+
logits = model(**inputs).logits
|
42 |
+
|
43 |
+
# concentrate on the logits corresponding to the [SEP] tokens
|
44 |
+
relevant_logits = logits[inputs.input_ids == tokenizer.sep_token_id]
|
45 |
+
|
46 |
+
predicted_ids = relevant_logits.argmax(axis=1).numpy()
|
47 |
+
predicted_labels = [ model.config.id2label[x] for x in predicted_ids ]
|
48 |
+
|
49 |
+
# print the associated prediction for each sentence / [CLS] token
|
50 |
+
for label, sent in zip(predicted_labels, sentences):
|
51 |
+
print(label, sent)
|
52 |
+
# >>> Scene Und so begann unser kleines Abenteuer auf Hoher See...
|
53 |
+
# >>> Scene-B Es war früh am Morgen, als wir in See stechen wollten. (This sentence begins a new scene.)
|
54 |
+
# >>> Scene Das Wasser war still.
|
55 |
+
|
56 |
+
# alternatively, decode the respective bridge type
|
57 |
+
prev = None
|
58 |
+
for label, sent in zip(predicted_labels, sentences):
|
59 |
+
bridge = None
|
60 |
+
if prev == 'Scene' and label == 'Scene-B':
|
61 |
+
bridge = 'SCENE-TO-SCENE'
|
62 |
+
elif prev == 'Scene' and label == 'Nonscene-B':
|
63 |
+
bridge = 'SCENE-TO-NONSCENE'
|
64 |
+
elif prev == 'Nonscene' and label == 'Scene-B':
|
65 |
+
bridge = 'NONSCENE-TO-SCENE'
|
66 |
+
else:
|
67 |
+
bridge = 'NOBORDER'
|
68 |
+
|
69 |
+
if prev is not None:
|
70 |
+
print(bridge)
|
71 |
+
print(sent)
|
72 |
+
prev = label
|
73 |
+
# >>> Und so begann unser kleines Abenteuer auf Hoher See...
|
74 |
+
# >>> SCENE-TO-SCENE
|
75 |
+
# >>> Es war früh am Morgen, als wir in See stechen wollten.
|
76 |
+
# >>> NOBORDER
|
77 |
+
# >>> Das Wasser war still.
|
78 |
```
|
79 |
|
80 |
|