--- language: German tags: - text-classification - pytorch - nli - de pipeline_tag: zero-shot-classification widget: - text: "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss." candidate_labels: "Computer, Handy, Tablet, dringend, nicht dringend" hypothesis_template: "In diesem Satz geht es um das Thema {}." --- # SVALabs - Gbert Large Zeroshot Nli In this repository, we present our german zeroshot model. This model was trained on the basis of the German BERT large model from [deepset.ai](https://huggingface.co/deepset/gbert-large) and finetuned for natural language inference based on 847.862 machine-translated nli sentence pairs, using the [mnli](https://huggingface.co/datasets/multi_nli), [anli](https://huggingface.co/datasets/anli) and [snli](https://huggingface.co/datasets/snli) datasets. For this purpose, we translated the sentence pairs in these dataset to German. If you are a German speaker you may also have a look at our Blog post about Zeroshot Classification and our model. ### Model Details | | Description or Link | |---|---| |**Base model** | [```gbert-large```](https://huggingface.co/deepset/gbert-large) | |**Finetuning task**| Text Pair Classification / Natural Language Inference | |**Source datasets**| [```mnli```](https://huggingface.co/datasets/multi_nli) ; [```anli```](https://huggingface.co/datasets/anli) ; [```snli```](https://huggingface.co/datasets/snli) | ### Performance We evaluated our model for the nli task using the TEST set of the German part of the [xnli](https://huggingface.co/datasets/xnli) dataset. XNLI TEST-Set Accuracy: 86% ### Zeroshot Text Classification Task Benchmark We further tested our model for a zeroshot text classification task using a part of the [10kGNAD Dataset](https://tblock.github.io/10kGNAD/). Specifically, we used all articles that were labeled "Kultur", "Sport", "Web", "Wirtschaft" und "Wissenschaft". The next table shows the results as well as a comparison with other German language zeroshot options performing the same task: | Model | Accuracy | |:-------------------:|:------:| | Svalabs/gbert-large-zeroshot-nli | 0.79 | | Sahajtomar/German_Zeroshot | 0.76 | | Symanto/xlm-roberta-base-snli-mnli-anli-xnli | 0.16 | | deepset/gbert-base | 0.65 | ### How to use The simplest way to use the model is the huggingface transformers pipeline tool. Just initialize the pipeline specifying the task as "zero-shot-classification", and select "svalabs/gbert-large-zeroshot-nli" as model. The model requires you to specify labels (ideally labels suited to your task), a sequence (or list of sequences) to classify and a hypothesis template. In our tests, if the labels comprise only single words, "In diesem Satz geht es um das Thema {}" performed the best. ```python from transformers import pipeline zershot_pipeline = pipeline("zero-shot-classification", model="svalabs/gbert-large-zeroshot-nli") sequence = "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss" labels = ["Computer", "Handy", "Tablet", "dringend", "nicht dringend"] #hypothesis_template = "In diesem Satz geht es um das Thema {}." ## Since monolingual model,its sensitive to hypothesis template. This can be experimented #hypothesis_template = "Dieser Satz drückt ein Gefühl von {} aus." zershot_pipeline(sequence, labels, hypothesis_template=hypothesis_template) ``` ## Other Applications DESCRIPTION GOES HERE: Satz 1: "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss" Satz 2: "Ich hab ein kleines Problem mit meinem Macbook, und auch wenn die Reparatur nicht eilt, würde ich es gerne addressieren." Label: ["Computer", "Handy", "Tablet", "dringend", "nicht dringend"] EMOTION EXAMPLE: "Ich bin entäuscht, dass ich kein Ticket für das Konzert meiner Lieblingsband bekommen habe." label: "Furcht, Freude, Wut , Überraschung, Traurigkeit, Ekel, Verachtung" - text: "Wer ist die reichste Person der Welt" candidate_labels: "Frage, Schlagwörter" hypothesis_template: "Hierbei handelt es sich um {}." """""""" ### Contact - Daniel Ehnes, daniel.ehnes@sva.de - Baran Avinc, baran.avinc@sva.de