Update README.md

7079f3b about 3 years ago

3.68 kB

	---

	language: German

	tags:

	- text-classification

	- pytorch

	- nli

	- de


	pipeline_tag: zero-shot-classification

	widget:

	- text: "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss."

	candidate_labels: "Computer, Handy, Tablet, dringend, nicht dringend"

	hypothesis_template: "In diesem Satz geht es um das Thema {}."

	---

	# SVALabs - Gbert Large Zeroshot Nli

	In this repository, we present our german zeroshot model.

	This model was trained on the basis of the German BERT large model from [deepset.ai](https://huggingface.co/deepset/gbert-large) and finetuned for natural language inference based on 847.862 machine-translated nli sentence pairs, using the [mnli](https://huggingface.co/datasets/multi_nli), [anli](https://huggingface.co/datasets/anli) and [snli](https://huggingface.co/datasets/snli) datasets.

	For this purpose, we translated the sentence pairs in these dataset to German.

	### Model Details

	\| \| Description or Link \|
	\|---\|---\|
	\|Base model \| [```gbert-large```](https://huggingface.co/deepset/gbert-large) \|
	\|Finetuning task\| Text Pair Classification / Natural Language Inference \|
	\|Source dataset\| [```mnli```](https://huggingface.co/datasets/multi_nli) ; [```anli```](https://huggingface.co/datasets/anli) ; [```snli```](https://huggingface.co/datasets/snli) \|

	### Performance

	We evaluated our model for the nli task using the TEST set of the German part of the [xnli](https://huggingface.co/datasets/xnli dataset).

	TEST-Set Accuracy: 86%


	## Zeroshot Text Classification Task Benchmark

	We further tested our model for a zeroshot text classification task using a part of the [10kGNAD Dataset](https://tblock.github.io/10kGNAD/).
	Specifically, we used all articles that were labeled "Kultur", "Sport", "Web", "Wirtschaft" und "Wissenschaft".

	The next table shows the results as well as a comparison with other German language zeroshot options performing the same task:

	\| Model \| NDCG@1 \| NDCG@5 \| NDCG@10 \| Recall@1 \| Recall@5 \| Recall@10 \|

	\|:-------------------:\|:------:\|:------:\|:-------:\|:--------:\|:--------:\|:---------:\|

	\| BM25 \| 0.1463 \| 0.3451 \| 0.4097 \| 0.1463 \| 0.5424 \| 0.7415 \|

	\| BM25(Top 100) +Ours \| 0.6410 \| 0.7885 \| 0.7943 \| 0.6410 \| 0.8576 \| 0.9024 \|

	## Other Applications



	DESCRIPTION GOES HERE:
	Satz 1:
	"Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss"
	Satz 2:
	"Ich hab ein kleines Problem mit meinem Macbook, und auch wenn die Reparatur nicht eilt, würde ich es gerne addressieren."
	Label:
	["Computer", "Handy", "Tablet", "dringend", "nicht dringend"]

	EMOTION EXAMPLE:
	"Ich bin entäuscht, dass ich kein Ticket für das Konzert meiner Lieblingsband bekommen habe."
	label: "Furcht, Freude, Wut , Überraschung, Traurigkeit, Ekel, Verachtung"


	- text: "Wer ist die reichste Person der Welt"

	candidate_labels: "Frage, Schlagwörter"

	hypothesis_template: "Hierbei handelt es sich um {}."

	""""""""



	```python

	from transformers import pipeline

	classifier = pipeline("zero-shot-classification",

	model="Dehnes/zeroshot_gbert")

	sequence = "Ich habe ein Problem mit meinem Iphone das so schnell wie möglich gelöst werden muss"

	candidate_labels = ["Computer", "Handy", "Tablet", "dringend", "nicht dringend"]

	#hypothesis_template = "In diesem Satz geht es um das Thema {}." ## Since monolingual model,its sensitive to hypothesis template. This can be experimented
	#hypothesis_template = "Dieser Satz drückt ein Gefühl von {} aus."

	classifier(sequence, candidate_labels, hypothesis_template=hypothesis_template)

	```