File size: 3,679 Bytes
1637e3d
6d8ea6d
1637e3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d161770
1637e3d
9dfbc86
1637e3d
644514f
1637e3d
 
6d8ea6d
53c289d
 
 
 
d161770
53c289d
d161770
53c289d
 
 
 
 
7079f3b
 
 
53c289d
7079f3b
53c289d
7079f3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d161770
7079f3b
53c289d
 
6d8ea6d
32dd769
7f33bd0
 
 
 
 
 
6c9dc8c
 
 
1b3298f
 
6c9dc8c
1b3298f
 
 
 
 
 
32dd769
 
 
6d8ea6d
97cb487
 
 
 
 
 
 
 
fa1b6fa
97cb487
fa1b6fa
97cb487
65a58b9
e7e51b5
97cb487
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---

language: German

tags:

- text-classification

- pytorch

- nli

- de


pipeline_tag: zero-shot-classification

widget:

- text: "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss."

  candidate_labels: "Computer, Handy, Tablet, dringend, nicht dringend"

  hypothesis_template: "In diesem Satz geht es um das Thema {}."

---

# SVALabs - Gbert Large Zeroshot Nli  

In this repository, we present our german zeroshot model. 

This model was trained on the basis of the German BERT large model from [deepset.ai](https://huggingface.co/deepset/gbert-large) and finetuned for natural language inference based on 847.862 machine-translated nli sentence pairs, using the [mnli](https://huggingface.co/datasets/multi_nli), [anli](https://huggingface.co/datasets/anli) and [snli](https://huggingface.co/datasets/snli) datasets.

For this purpose, we translated the sentence pairs in these dataset to German.

### Model Details

| | Description or Link  |
|---|---|
|**Base model**   | [```gbert-large```](https://huggingface.co/deepset/gbert-large) |
|**Finetuning task**| Text Pair Classification / Natural Language Inference  |
|**Source dataset**| [```mnli```](https://huggingface.co/datasets/multi_nli) ; [```anli```](https://huggingface.co/datasets/anli) ; [```snli```](https://huggingface.co/datasets/snli)   |

### Performance

We evaluated our model for the nli task using the TEST set of the German part of the [xnli](https://huggingface.co/datasets/xnli dataset).

TEST-Set Accuracy: 86% 


## Zeroshot Text Classification Task Benchmark 

We further tested our model for a zeroshot text classification task using a part of the [10kGNAD Dataset](https://tblock.github.io/10kGNAD/).
Specifically, we used all articles that were labeled "Kultur", "Sport", "Web", "Wirtschaft" und "Wissenschaft". 

The next table shows the results as well as a comparison with other German language zeroshot options performing the same task: 

| Model               | NDCG@1 | NDCG@5 | NDCG@10 | Recall@1 | Recall@5 | Recall@10 |

|:-------------------:|:------:|:------:|:-------:|:--------:|:--------:|:---------:|

| BM25                | 0.1463 | 0.3451 | 0.4097  | 0.1463   | 0.5424   | 0.7415    |

| BM25(Top 100) +Ours | 0.6410 | 0.7885 | 0.7943  | 0.6410   | 0.8576   | 0.9024    |

## Other Applications



DESCRIPTION GOES HERE: 
Satz 1:
"Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss" 
Satz 2: 
"Ich hab ein kleines Problem mit meinem Macbook, und auch wenn die Reparatur nicht eilt, würde ich es gerne addressieren."
Label: 
["Computer", "Handy", "Tablet", "dringend", "nicht dringend"] 

EMOTION EXAMPLE:
"Ich bin entäuscht, dass ich kein Ticket für das Konzert meiner Lieblingsband bekommen habe."
label: "Furcht, Freude, Wut , Überraschung,  Traurigkeit, Ekel, Verachtung"

 
   - text: "Wer ist die reichste Person der Welt" 

  candidate_labels: "Frage, Schlagwörter"

  hypothesis_template: "Hierbei handelt es sich um {}."

""""""""



```python

from transformers import pipeline

classifier = pipeline("zero-shot-classification",

                      model="Dehnes/zeroshot_gbert")

sequence = "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss" 

candidate_labels = ["Computer", "Handy", "Tablet", "dringend", "nicht dringend"] 

#hypothesis_template = "In diesem Satz geht es um das Thema {}."     ## Since monolingual model,its sensitive to hypothesis template. This can be experimented
#hypothesis_template = "Dieser Satz drückt ein Gefühl von {} aus."

classifier(sequence, candidate_labels, hypothesis_template=hypothesis_template)

```