aehrm commited on
Commit
f71ff82
1 Parent(s): d749f81

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -2
README.md CHANGED
@@ -13,7 +13,7 @@ This is the scene segmenter model that is being used in [LLpro](https://github.c
13
  Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special `[SEP]` token.
14
  On these `[SEP]` tokens, the linear classification layer predicts one of the four above classes.
15
 
16
- The model is trained on the dataset corresponding to the [KONVENS 2021 Shared Task on Scene Segmentation](http://lsx-events.informatik.uni-wuerzburg.de/stss-2021/task.html) [(Zehe et al., 2021)][http://ceur-ws.org/Vol-3001/#paper1] fine-tuning the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_scene_segmenter.py))
17
 
18
  F1-Score:
19
  - **40.22** on Track 1 (in-domain dime novels)
@@ -26,7 +26,55 @@ The respective test datasets are only available to the task organizers; the task
26
  **Demo Usage**:
27
 
28
  ```python
29
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ```
31
 
32
 
 
13
  Broadly speaking, the model is being used in a token classification setup. A sequence of multiple sentences is represented by interspersing the respective tokenizations with the special `[SEP]` token.
14
  On these `[SEP]` tokens, the linear classification layer predicts one of the four above classes.
15
 
16
+ The model is trained on the dataset corresponding to the [KONVENS 2021 Shared Task on Scene Segmentation](http://lsx-events.informatik.uni-wuerzburg.de/stss-2021/task.html) ([Zehe et al., 2021](http://ceur-ws.org/Vol-3001/#paper1)) fine-tuning the domain-adapted [lkonle/fiction-gbert-large](https://huggingface.co/lkonle/fiction-gbert-large). ([Training code](https://github.com/cophi-wue/LLpro/blob/main/contrib/train_scene_segmenter.py))
17
 
18
  F1-Score:
19
  - **40.22** on Track 1 (in-domain dime novels)
 
26
  **Demo Usage**:
27
 
28
  ```python
29
+ import torch
30
+ from transformers import BertTokenizer, BertForTokenClassification
31
+
32
+ tokenizer = BertTokenizer.from_pretrained('aehrm/stss-scene-segmenter')
33
+ model = BertForTokenClassification.from_pretrained('aehrm/stss-scene-segmenter', sep_token_id=tokenizer.sep_token_id).eval()
34
+
35
+
36
+ sentences = ['Und so begann unser kleines Abenteuer auf Hoher See...', 'Es war früh am Morgen, als wir in See stechen wollten.', 'Das Wasser war still.']
37
+ inputs = tokenizer(' [SEP] '.join(sentences), return_tensors='pt')
38
+
39
+ # inference on the model
40
+ with torch.no_grad():
41
+ logits = model(**inputs).logits
42
+
43
+ # concentrate on the logits corresponding to the [SEP] tokens
44
+ relevant_logits = logits[inputs.input_ids == tokenizer.sep_token_id]
45
+
46
+ predicted_ids = relevant_logits.argmax(axis=1).numpy()
47
+ predicted_labels = [ model.config.id2label[x] for x in predicted_ids ]
48
+
49
+ # print the associated prediction for each sentence / [CLS] token
50
+ for label, sent in zip(predicted_labels, sentences):
51
+ print(label, sent)
52
+ # >>> Scene Und so begann unser kleines Abenteuer auf Hoher See...
53
+ # >>> Scene-B Es war früh am Morgen, als wir in See stechen wollten. (This sentence begins a new scene.)
54
+ # >>> Scene Das Wasser war still.
55
+
56
+ # alternatively, decode the respective bridge type
57
+ prev = None
58
+ for label, sent in zip(predicted_labels, sentences):
59
+ bridge = None
60
+ if prev == 'Scene' and label == 'Scene-B':
61
+ bridge = 'SCENE-TO-SCENE'
62
+ elif prev == 'Scene' and label == 'Nonscene-B':
63
+ bridge = 'SCENE-TO-NONSCENE'
64
+ elif prev == 'Nonscene' and label == 'Scene-B':
65
+ bridge = 'NONSCENE-TO-SCENE'
66
+ else:
67
+ bridge = 'NOBORDER'
68
+
69
+ if prev is not None:
70
+ print(bridge)
71
+ print(sent)
72
+ prev = label
73
+ # >>> Und so begann unser kleines Abenteuer auf Hoher See...
74
+ # >>> SCENE-TO-SCENE
75
+ # >>> Es war früh am Morgen, als wir in See stechen wollten.
76
+ # >>> NOBORDER
77
+ # >>> Das Wasser war still.
78
  ```
79
 
80