MoritzLaurer HF staff commited on
Commit
81a6fdb
·
verified ·
1 Parent(s): 0c57a21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -68
README.md CHANGED
@@ -1,72 +1,164 @@
1
  ---
2
- license: mit
 
3
  tags:
4
- - generated_from_trainer
5
- base_model: microsoft/deberta-v3-base
6
- metrics:
7
- - accuracy
8
- model-index:
9
- - name: deberta-v3-base-zeroshot-v2.0-2024-03-21-22-15
10
- results: []
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
-
16
- # deberta-v3-base-zeroshot-v2.0-2024-03-21-22-15
17
-
18
- This model is a fine-tuned version of [microsoft/deberta-v3-base](https://huggingface.co/microsoft/deberta-v3-base) on an unknown dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 0.1169
21
- - F1 Macro: 0.5016
22
- - F1 Micro: 0.5474
23
- - Accuracy Balanced: 0.5434
24
- - Accuracy: 0.5474
25
- - Precision Macro: 0.6345
26
- - Recall Macro: 0.5434
27
- - Precision Micro: 0.5474
28
- - Recall Micro: 0.5474
29
-
30
- ## Model description
31
-
32
- More information needed
33
-
34
- ## Intended uses & limitations
35
-
36
- More information needed
37
-
38
- ## Training and evaluation data
39
-
40
- More information needed
41
-
42
- ## Training procedure
43
-
44
- ### Training hyperparameters
45
-
46
- The following hyperparameters were used during training:
47
- - learning_rate: 2e-05
48
- - train_batch_size: 16
49
- - eval_batch_size: 128
50
- - seed: 42
51
- - gradient_accumulation_steps: 2
52
- - total_train_batch_size: 32
53
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
54
- - lr_scheduler_type: linear
55
- - lr_scheduler_warmup_ratio: 0.06
56
- - num_epochs: 2
57
- - mixed_precision_training: Native AMP
58
-
59
- ### Training results
60
-
61
- | Training Loss | Epoch | Step | Validation Loss | F1 Macro | F1 Micro | Accuracy Balanced | Accuracy | Precision Macro | Recall Macro | Precision Micro | Recall Micro |
62
- |:-------------:|:-----:|:-----:|:---------------:|:--------:|:--------:|:-----------------:|:--------:|:---------------:|:------------:|:---------------:|:------------:|
63
- | 0.2288 | 1.0 | 27331 | 0.6189 | 0.7688 | 0.7881 | 0.7705 | 0.7881 | 0.7673 | 0.7705 | 0.7881 | 0.7881 |
64
- | 0.1559 | 2.0 | 54662 | 0.6059 | 0.7896 | 0.8082 | 0.7898 | 0.8082 | 0.7894 | 0.7898 | 0.8082 | 0.8082 |
65
-
66
-
67
- ### Framework versions
68
-
69
- - Transformers 4.37.2
70
- - Pytorch 2.1.2+cu121
71
- - Datasets 2.17.1
72
- - Tokenizers 0.15.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  tags:
5
+ - text-classification
6
+ - zero-shot-classification
7
+ pipeline_tag: zero-shot-classification
8
+ library_name: transformers
9
+ license: mit
 
 
10
  ---
11
 
12
+ # Model description: deberta-v3-base-zeroshot-v2.0
13
+ The model is designed for zero-shot classification with the Hugging Face pipeline.
14
+
15
+ The model can do one universal classification task: determine whether a hypothesis is "true" or "not true" given a text
16
+ (`entailment` vs. `not_entailment`).
17
+ This task format is based on the Natural Language Inference task (NLI).
18
+ The task is so universal that any classification task can be reformulated into this task.
19
+
20
+ ## Training data
21
+ The model was trained on a mixture of __33 datasets and 387 classes__ that have been reformatted into this universal format.
22
+ 1. Five NLI datasets with ~885k texts: "mnli", "anli", "fever", "wanli", "ling"
23
+ 2. 28 classification tasks reformatted into the universal NLI format. ~51k cleaned texts were used to avoid overfitting:
24
+ 'amazonpolarity', 'imdb', 'appreviews', 'yelpreviews', 'rottentomatoes',
25
+ 'emotiondair', 'emocontext', 'empathetic',
26
+ 'financialphrasebank', 'banking77', 'massive',
27
+ 'wikitoxic_toxicaggregated', 'wikitoxic_obscene', 'wikitoxic_threat', 'wikitoxic_insult', 'wikitoxic_identityhate',
28
+ 'hateoffensive', 'hatexplain', 'biasframes_offensive', 'biasframes_sex', 'biasframes_intent',
29
+ 'agnews', 'yahootopics',
30
+ 'trueteacher', 'spam', 'wellformedquery',
31
+ 'manifesto', 'capsotu'.
32
+
33
+ See details on each dataset here: https://github.com/MoritzLaurer/zeroshot-classifier/blob/main/datasets_overview.csv
34
+
35
+ Note that compared to other NLI models, this model predicts two classes (`entailment` vs. `not_entailment`)
36
+ as opposed to three classes (entailment/neutral/contradiction)
37
+
38
+ The model was only trained on English data. For __multilingual use-cases__,
39
+ I recommend machine translating texts to English with libraries like [EasyNMT](https://github.com/UKPLab/EasyNMT).
40
+ English-only models tend to perform better than multilingual models and
41
+ validation with English data can be easier if you don't speak all languages in your corpus.
42
+
43
+ ### How to use the model
44
+ #### Simple zero-shot classification pipeline
45
+ ```python
46
+ #!pip install transformers[sentencepiece]
47
+ from transformers import pipeline
48
+ text = "Angela Merkel is a politician in Germany and leader of the CDU"
49
+ hypothesis_template = "This example is about {}"
50
+ classes_verbalized = ["politics", "economy", "entertainment", "environment"]
51
+ zeroshot_classifier = pipeline("zero-shot-classification", model="MoritzLaurer/deberta-v3-base-zeroshot-v2.0")
52
+ output = zeroshot_classifier(text, classes_verbalised, hypothesis_template=hypothesis_template, multi_label=False)
53
+ print(output)
54
+ ```
55
+
56
+ ### Details on data and training
57
+
58
+ Reproduction code is available here, in the `v2_synthetic_data` directory: https://github.com/MoritzLaurer/zeroshot-classifier/tree/main
59
+
60
+
61
+ ## Metrics
62
+
63
+ Balanced accuracy is reported for all datasets.
64
+ `deberta-v3-large-zeroshot-v1.1-all-33` was trained on all datasets, with only maximum 500 texts per class to avoid overfitting.
65
+ The metrics on these datasets are therefore not strictly zeroshot, as the model has seen some data for each task during training.
66
+ `deberta-v3-large-zeroshot-v1.1-heldout` indicates zeroshot performance on the respective dataset.
67
+ To calculate these zeroshot metrics, the pipeline was run 28 times, each time with one dataset held out from training to simulate a zeroshot setup.
68
+
69
+ ![figure_large_v1.1](https://raw.githubusercontent.com/MoritzLaurer/zeroshot-classifier/main/results/fig_large_v1.1.png)
70
+
71
+
72
+ | | deberta-v3-large-mnli-fever-anli-ling-wanli-binary | deberta-v3-large-zeroshot-v1.1-heldout | deberta-v3-large-zeroshot-v1.1-all-33 |
73
+ |:---------------------------|----------------------------:|-----------------------------------------:|----------------------------------------:|
74
+ | datasets mean (w/o nli) | 64.1 | 73.4 | 85.2 |
75
+ | amazonpolarity (2) | 94.7 | 96.6 | 96.8 |
76
+ | imdb (2) | 90.3 | 95.2 | 95.5 |
77
+ | appreviews (2) | 93.6 | 94.3 | 94.7 |
78
+ | yelpreviews (2) | 98.5 | 98.4 | 98.9 |
79
+ | rottentomatoes (2) | 83.9 | 90.5 | 90.8 |
80
+ | emotiondair (6) | 49.2 | 42.1 | 72.1 |
81
+ | emocontext (4) | 57 | 69.3 | 82.4 |
82
+ | empathetic (32) | 42 | 34.4 | 58 |
83
+ | financialphrasebank (3) | 77.4 | 77.5 | 91.9 |
84
+ | banking77 (72) | 29.1 | 52.8 | 72.2 |
85
+ | massive (59) | 47.3 | 64.7 | 77.3 |
86
+ | wikitoxic_toxicaggreg (2) | 81.6 | 86.6 | 91 |
87
+ | wikitoxic_obscene (2) | 85.9 | 91.9 | 93.1 |
88
+ | wikitoxic_threat (2) | 77.9 | 93.7 | 97.6 |
89
+ | wikitoxic_insult (2) | 77.8 | 91.1 | 92.3 |
90
+ | wikitoxic_identityhate (2) | 86.4 | 89.8 | 95.7 |
91
+ | hateoffensive (3) | 62.8 | 66.5 | 88.4 |
92
+ | hatexplain (3) | 46.9 | 61 | 76.9 |
93
+ | biasframes_offensive (2) | 62.5 | 86.6 | 89 |
94
+ | biasframes_sex (2) | 87.6 | 89.6 | 92.6 |
95
+ | biasframes_intent (2) | 54.8 | 88.6 | 89.9 |
96
+ | agnews (4) | 81.9 | 82.8 | 90.9 |
97
+ | yahootopics (10) | 37.7 | 65.6 | 74.3 |
98
+ | trueteacher (2) | 51.2 | 54.9 | 86.6 |
99
+ | spam (2) | 52.6 | 51.8 | 97.1 |
100
+ | wellformedquery (2) | 49.9 | 40.4 | 82.7 |
101
+ | manifesto (56) | 10.6 | 29.4 | 44.1 |
102
+ | capsotu (21) | 23.2 | 69.4 | 74 |
103
+ | mnli_m (2) | 93.1 | nan | 93.1 |
104
+ | mnli_mm (2) | 93.2 | nan | 93.2 |
105
+ | fevernli (2) | 89.3 | nan | 89.5 |
106
+ | anli_r1 (2) | 87.9 | nan | 87.3 |
107
+ | anli_r2 (2) | 76.3 | nan | 78 |
108
+ | anli_r3 (2) | 73.6 | nan | 74.1 |
109
+ | wanli (2) | 82.8 | nan | 82.7 |
110
+ | lingnli (2) | 90.2 | nan | 89.6 |
111
+
112
+
113
+
114
+ ## Limitations and bias
115
+ The model can only do text classification tasks.
116
+
117
+ Please consult the original DeBERTa paper and the papers for the different datasets for potential biases.
118
+
119
+
120
+ ## License
121
+ The base model (DeBERTa-v3) is published under the MIT license.
122
+ The training data is released under different, permissive, commercially-friendly licenses
123
+ ([MNLI](https://huggingface.co/datasets/nyu-mll/multi_nli), [FEVER-NLI](https://huggingface.co/datasets/fever), [synthetic_zeroshot_mixtral_v0.1](https://huggingface.co/datasets/MoritzLaurer/synthetic_zeroshot_mixtral_v0.1))
124
+
125
+ ## Citation
126
+
127
+ This model is an extension of the research described in this [paper](https://arxiv.org/pdf/2312.17543.pdf).
128
+
129
+ If you use this model academically, please cite:
130
+ ```
131
+ @misc{laurer_building_2023,
132
+ title = {Building {Efficient} {Universal} {Classifiers} with {Natural} {Language} {Inference}},
133
+ url = {http://arxiv.org/abs/2312.17543},
134
+ doi = {10.48550/arXiv.2312.17543},
135
+ abstract = {Generative Large Language Models (LLMs) have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4\%.},
136
+ urldate = {2024-01-05},
137
+ publisher = {arXiv},
138
+ author = {Laurer, Moritz and van Atteveldt, Wouter and Casas, Andreu and Welbers, Kasper},
139
+ month = dec,
140
+ year = {2023},
141
+ note = {arXiv:2312.17543 [cs]},
142
+ keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language},
143
+ }
144
+ ```
145
+
146
+ ### Ideas for cooperation or questions?
147
+ If you have questions or ideas for cooperation, contact me at moritz{at}huggingface{dot}co or [LinkedIn](https://www.linkedin.com/in/moritz-laurer/)
148
+
149
+
150
+
151
+ ### Hypotheses used for classification
152
+ The hypotheses in the tables below were used to fine-tune the model.
153
+ Inspecting them can help users get a feeling for which type of hypotheses and tasks the model was trained on.
154
+ You can formulate your own hypotheses by changing the `hypothesis_template` of the zeroshot pipeline. For example:
155
+
156
+ ```python
157
+ from transformers import pipeline
158
+ text = "Angela Merkel is a politician in Germany and leader of the CDU"
159
+ hypothesis_template = "Merkel is the leader of the party: {}"
160
+ classes_verbalized = ["CDU", "SPD", "Greens"]
161
+ zeroshot_classifier = pipeline("zero-shot-classification", model="MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33")
162
+ output = zeroshot_classifier(text, classes_verbalized, hypothesis_template=hypothesis_template, multi_label=False)
163
+ print(output)
164
+ ```