metadata

language: German
tags:
  - text-classification
  - pytorch
  - nli
  - de
pipeline_tag: zero-shot-classification
widget:
  - text: >-
      Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst
      werden muss.
    candidate_labels: Computer, Handy, Tablet, dringend, nicht dringend
    hypothesis_template: In diesem Satz geht es um das Thema {}.

SVALabs - Gbert Large Zeroshot Nli

In this repository, we present our german zeroshot model.

This model was trained on the basis of the German BERT large model from deepset.ai and finetuned for natural language inference based on 847.862 machine-translated nli sentence pairs, using the mnli, anli and snli datasets.

For this purpose, we translated the sentence pairs in these dataset to German.

If you are a German speaker you may also have a look at our Blog post about Zeroshot Classification and our model.

Model Details

	Description or Link
Base model	`gbert-large`
Finetuning task	Text Pair Classification / Natural Language Inference
Source datasets	`mnli` ; `anli` ; `snli`

Performance

We evaluated our model for the nli task using the TEST set of the German part of the xnli dataset.

XNLI TEST-Set Accuracy: 86%

Zeroshot Text Classification Task Benchmark

We further tested our model for a zeroshot text classification task using a part of the 10kGNAD Dataset. Specifically, we used all articles that were labeled "Kultur", "Sport", "Web", "Wirtschaft" und "Wissenschaft".

The next table shows the results as well as a comparison with other German language zeroshot options performing the same task:

Model	Accuracy
Svalabs/gbert-large-zeroshot-nli	0.79
Sahajtomar/German_Zeroshot	0.76
Symanto/xlm-roberta-base-snli-mnli-anli-xnli	0.16
deepset/gbert-base	0.65

How to use

The simplest way to use the model is the hugging-face transformers pipeline tool. Just initialize the pipeline specifying the task as "zero-shot-classification"


from transformers import pipeline

zershot_pipeline = pipeline("zero-shot-classification",

                      model="svalabs/gbert-large-zeroshot-nli")

sequence = "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss" 

labels = ["Computer", "Handy", "Tablet", "dringend", "nicht dringend"] 

#hypothesis_template = "In diesem Satz geht es um das Thema {}."     ## Since monolingual model,its sensitive to hypothesis template. This can be experimented
#hypothesis_template = "Dieser Satz drückt ein Gefühl von {} aus."

zershot_pipeline(sequence, labels, hypothesis_template=hypothesis_template)

Other Applications

DESCRIPTION GOES HERE: Satz 1: "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss" Satz 2: "Ich hab ein kleines Problem mit meinem Macbook, und auch wenn die Reparatur nicht eilt, würde ich es gerne addressieren." Label: ["Computer", "Handy", "Tablet", "dringend", "nicht dringend"]

EMOTION EXAMPLE: "Ich bin entäuscht, dass ich kein Ticket für das Konzert meiner Lieblingsband bekommen habe." label: "Furcht, Freude, Wut , Überraschung, Traurigkeit, Ekel, Verachtung"

text: "Wer ist die reichste Person der Welt"

candidate_labels: "Frage, Schlagwörter"

hypothesis_template: "Hierbei handelt es sich um {}."

""""""""

Contact

Daniel Ehnes, daniel.ehnes@sva.de
Baran Avinc, baran.avinc@sva.de