|
--- |
|
|
|
language: German |
|
|
|
tags: |
|
|
|
- text-classification |
|
|
|
- pytorch |
|
|
|
- nli |
|
|
|
- de |
|
|
|
|
|
pipeline_tag: zero-shot-classification |
|
|
|
widget: |
|
|
|
- text: "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss." |
|
|
|
candidate_labels: "Computer, Handy, Tablet, dringend, nicht dringend" |
|
|
|
hypothesis_template: "In diesem Satz geht es um das Thema {}." |
|
|
|
--- |
|
|
|
# SVALabs - Gbert Large Zeroshot Nli |
|
|
|
In this repository, we present our german zeroshot model. |
|
|
|
This model was trained on the basis of the German BERT large model from [deepset.ai](https://huggingface.co/deepset/gbert-large) and finetuned for natural language inference based on 847.862 machine-translated nli sentence pairs, using the [mnli](https://huggingface.co/datasets/multi_nli), [anli](https://huggingface.co/datasets/anli) and [snli](https://huggingface.co/datasets/snli) datasets. |
|
|
|
For this purpose, we translated the sentence pairs in these dataset to German. |
|
|
|
### Model Details |
|
|
|
| | Description or Link | |
|
|---|---| |
|
|**Base model** | [```gbert-large```](https://huggingface.co/deepset/gbert-large) | |
|
|**Finetuning task**| Text Pair Classification / Natural Language Inference | |
|
|**Source dataset**| [```mnli```](https://huggingface.co/datasets/multi_nli) ; [```anli```](https://huggingface.co/datasets/anli) ; [```snli```](https://huggingface.co/datasets/snli) | |
|
|
|
### Performance |
|
|
|
We evaluated our model for the nli task using the TEST set of the German part of the [xnli](https://huggingface.co/datasets/xnli dataset). |
|
|
|
TEST-Set Accuracy: 86% |
|
|
|
|
|
## Zeroshot Text Classification Task Benchmark |
|
|
|
We further tested our model for a zeroshot text classification task using a part of the [10kGNAD Dataset](https://tblock.github.io/10kGNAD/). |
|
Specifically, we used all articles that were labeled "Kultur", "Sport", "Web", "Wirtschaft" und "Wissenschaft". |
|
|
|
The next table shows the results as well as a comparison with other German language zeroshot options performing the same task: |
|
|
|
| Model | NDCG@1 | NDCG@5 | NDCG@10 | Recall@1 | Recall@5 | Recall@10 | |
|
|
|
|:-------------------:|:------:|:------:|:-------:|:--------:|:--------:|:---------:| |
|
|
|
| BM25 | 0.1463 | 0.3451 | 0.4097 | 0.1463 | 0.5424 | 0.7415 | |
|
|
|
| BM25(Top 100) +Ours | 0.6410 | 0.7885 | 0.7943 | 0.6410 | 0.8576 | 0.9024 | |
|
|
|
## Other Applications |
|
|
|
|
|
|
|
DESCRIPTION GOES HERE: |
|
Satz 1: |
|
"Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss" |
|
Satz 2: |
|
"Ich hab ein kleines Problem mit meinem Macbook, und auch wenn die Reparatur nicht eilt, würde ich es gerne addressieren." |
|
Label: |
|
["Computer", "Handy", "Tablet", "dringend", "nicht dringend"] |
|
|
|
EMOTION EXAMPLE: |
|
"Ich bin entäuscht, dass ich kein Ticket für das Konzert meiner Lieblingsband bekommen habe." |
|
label: "Furcht, Freude, Wut , Überraschung, Traurigkeit, Ekel, Verachtung" |
|
|
|
|
|
- text: "Wer ist die reichste Person der Welt" |
|
|
|
candidate_labels: "Frage, Schlagwörter" |
|
|
|
hypothesis_template: "Hierbei handelt es sich um {}." |
|
|
|
"""""""" |
|
|
|
|
|
|
|
```python |
|
|
|
from transformers import pipeline |
|
|
|
classifier = pipeline("zero-shot-classification", |
|
|
|
model="Dehnes/zeroshot_gbert") |
|
|
|
sequence = "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss" |
|
|
|
candidate_labels = ["Computer", "Handy", "Tablet", "dringend", "nicht dringend"] |
|
|
|
#hypothesis_template = "In diesem Satz geht es um das Thema {}." ## Since monolingual model,its sensitive to hypothesis template. This can be experimented |
|
#hypothesis_template = "Dieser Satz drückt ein Gefühl von {} aus." |
|
|
|
classifier(sequence, candidate_labels, hypothesis_template=hypothesis_template) |
|
|
|
``` |