File size: 3,679 Bytes
1637e3d 6d8ea6d 1637e3d d161770 1637e3d 9dfbc86 1637e3d 644514f 1637e3d 6d8ea6d 53c289d d161770 53c289d d161770 53c289d 7079f3b 53c289d 7079f3b 53c289d 7079f3b d161770 7079f3b 53c289d 6d8ea6d 32dd769 7f33bd0 6c9dc8c 1b3298f 6c9dc8c 1b3298f 32dd769 6d8ea6d 97cb487 fa1b6fa 97cb487 fa1b6fa 97cb487 65a58b9 e7e51b5 97cb487 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
---
language: German
tags:
- text-classification
- pytorch
- nli
- de
pipeline_tag: zero-shot-classification
widget:
- text: "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss."
candidate_labels: "Computer, Handy, Tablet, dringend, nicht dringend"
hypothesis_template: "In diesem Satz geht es um das Thema {}."
---
# SVALabs - Gbert Large Zeroshot Nli
In this repository, we present our german zeroshot model.
This model was trained on the basis of the German BERT large model from [deepset.ai](https://huggingface.co/deepset/gbert-large) and finetuned for natural language inference based on 847.862 machine-translated nli sentence pairs, using the [mnli](https://huggingface.co/datasets/multi_nli), [anli](https://huggingface.co/datasets/anli) and [snli](https://huggingface.co/datasets/snli) datasets.
For this purpose, we translated the sentence pairs in these dataset to German.
### Model Details
| | Description or Link |
|---|---|
|**Base model** | [```gbert-large```](https://huggingface.co/deepset/gbert-large) |
|**Finetuning task**| Text Pair Classification / Natural Language Inference |
|**Source dataset**| [```mnli```](https://huggingface.co/datasets/multi_nli) ; [```anli```](https://huggingface.co/datasets/anli) ; [```snli```](https://huggingface.co/datasets/snli) |
### Performance
We evaluated our model for the nli task using the TEST set of the German part of the [xnli](https://huggingface.co/datasets/xnli dataset).
TEST-Set Accuracy: 86%
## Zeroshot Text Classification Task Benchmark
We further tested our model for a zeroshot text classification task using a part of the [10kGNAD Dataset](https://tblock.github.io/10kGNAD/).
Specifically, we used all articles that were labeled "Kultur", "Sport", "Web", "Wirtschaft" und "Wissenschaft".
The next table shows the results as well as a comparison with other German language zeroshot options performing the same task:
| Model | NDCG@1 | NDCG@5 | NDCG@10 | Recall@1 | Recall@5 | Recall@10 |
|:-------------------:|:------:|:------:|:-------:|:--------:|:--------:|:---------:|
| BM25 | 0.1463 | 0.3451 | 0.4097 | 0.1463 | 0.5424 | 0.7415 |
| BM25(Top 100) +Ours | 0.6410 | 0.7885 | 0.7943 | 0.6410 | 0.8576 | 0.9024 |
## Other Applications
DESCRIPTION GOES HERE:
Satz 1:
"Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss"
Satz 2:
"Ich hab ein kleines Problem mit meinem Macbook, und auch wenn die Reparatur nicht eilt, würde ich es gerne addressieren."
Label:
["Computer", "Handy", "Tablet", "dringend", "nicht dringend"]
EMOTION EXAMPLE:
"Ich bin entäuscht, dass ich kein Ticket für das Konzert meiner Lieblingsband bekommen habe."
label: "Furcht, Freude, Wut , Überraschung, Traurigkeit, Ekel, Verachtung"
- text: "Wer ist die reichste Person der Welt"
candidate_labels: "Frage, Schlagwörter"
hypothesis_template: "Hierbei handelt es sich um {}."
""""""""
```python
from transformers import pipeline
classifier = pipeline("zero-shot-classification",
model="Dehnes/zeroshot_gbert")
sequence = "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss"
candidate_labels = ["Computer", "Handy", "Tablet", "dringend", "nicht dringend"]
#hypothesis_template = "In diesem Satz geht es um das Thema {}." ## Since monolingual model,its sensitive to hypothesis template. This can be experimented
#hypothesis_template = "Dieser Satz drückt ein Gefühl von {} aus."
classifier(sequence, candidate_labels, hypothesis_template=hypothesis_template)
``` |