MoritzLaurer
/

deberta-v3-base-zeroshot-v2.0-c

@@ -20,6 +20,8 @@ The model can do one universal classification task: determine whether a hypothes
 (`entailment` vs. `not_entailment`).
 This task format is based on the Natural Language Inference task (NLI).
 The task is so universal that any classification task can be reformulated into this task.
 ## Training data
@@ -29,12 +31,9 @@ I first created a list of 500+ diverse text classification tasks for 25 professi
 I then used this as seed data to generate several hundred thousand texts for the different tasks with Mixtral-8x7B-Instruct-v0.1.
 The final dataset used is available in the [synthetic_zeroshot_mixtral_v0.1](https://huggingface.co/datasets/MoritzLaurer/synthetic_zeroshot_mixtral_v0.1) dataset
 in the subset `mixtral_written_text_for_tasks_v4`. Data curation was done in multiple iterations and I will release more information on this process soon.
-2. Two commercially-friendly NLI datasets: ([MNLI](https://huggingface.co/datasets/nyu-mll/multi_nli), [FEVER-NLI](https://huggingface.co/datasets/fever).
 These datasets were added to increase generalization. Datasets like ANLI were excluded due to their non-commercial license.
-Note that compared to other NLI models, this model predicts two classes (`entailment` vs. `not_entailment`)
-as opposed to three classes (entailment/neutral/contradiction)
 The model was only trained on English data. I will release a multilingual version of this model soon.
 For __multilingual use-cases__,
 I alternatively recommend machine translating texts to English with libraries like [EasyNMT](https://github.com/UKPLab/EasyNMT).
@@ -43,8 +42,7 @@ validation with English data can be easier if you don't speak all languages in y
-### How to use the model
-#### Simple zero-shot classification pipeline
 ```python
 #!pip install transformers[sentencepiece]
 from transformers import pipeline
@@ -58,10 +56,6 @@ print(output)
 `multi_label=False` forces the model to decide on only one class. `multi_label=True` enables the model to choose multiple classes.
-### Details on data and training
-Reproduction code is available here, in the `v2_synthetic_data` directory: https://github.com/MoritzLaurer/zeroshot-classifier/tree/main
 ## Metrics
@@ -69,10 +63,6 @@ The model was evaluated on 28 different text classification tasks with the [bala
 The main reference point is `facebook/bart-large-mnli` which is at the time of writing (27.03.24) the most used commercially-friendly 0-shot classifier.
 The different `...zeroshot-v2.0` models were all trained with the same data and the only difference the the underlying foundation model.
-Note that my `...zeroshot-v1.1` models (e.g. [deberta-v3-base-zeroshot-v1.1-all-33](https://huggingface.co/MoritzLaurer/deberta-v3-base-zeroshot-v1.1-all-33))
- perform better on these 28 datasets, but they are trained on several datasets with non-commercial licenses.
-For commercial users, I therefore recommend using the v2.0 model and non-commercial users might get better performance with the v1.1 models.
 ![results_aggreg_v2.0](https://raw.githubusercontent.com/MoritzLaurer/zeroshot-classifier/e859471dd183ad44b705c047130433301386aab8/v2_synthetic_data/results/zeroshot-v2.0-aggreg.png)
 |                            |   facebook/bart-large-mnli |   roberta-base-zeroshot-v2.0 |   roberta-large-zeroshot-v2.0 |   deberta-v3-base-zeroshot-v2.0 |   deberta-v3-large-zeroshot-v2.0 |
@@ -109,6 +99,21 @@ For commercial users, I therefore recommend using the v2.0 model and non-commerc
 ## Limitations and bias
 The model can only do text classification tasks.
@@ -149,7 +154,8 @@ If you have questions or ideas for cooperation, contact me at moritz{at}huggingf
 ### Flexible usage and "prompting"
-You can formulate your own hypotheses by changing the `hypothesis_template` of the zeroshot pipeline. For example:
 ```python
 from transformers import pipeline

 (`entailment` vs. `not_entailment`).
 This task format is based on the Natural Language Inference task (NLI).
 The task is so universal that any classification task can be reformulated into this task.
+Note that compared to other NLI models, this model predicts two classes (`entailment` vs. `not_entailment`)
+as opposed to three classes (entailment/neutral/contradiction).
 ## Training data
 I then used this as seed data to generate several hundred thousand texts for the different tasks with Mixtral-8x7B-Instruct-v0.1.
 The final dataset used is available in the [synthetic_zeroshot_mixtral_v0.1](https://huggingface.co/datasets/MoritzLaurer/synthetic_zeroshot_mixtral_v0.1) dataset
 in the subset `mixtral_written_text_for_tasks_v4`. Data curation was done in multiple iterations and I will release more information on this process soon.
+2. Two commercially-friendly NLI datasets: ([MNLI](https://huggingface.co/datasets/nyu-mll/multi_nli), [FEVER-NLI](https://huggingface.co/datasets/fever)).
 These datasets were added to increase generalization. Datasets like ANLI were excluded due to their non-commercial license.
 The model was only trained on English data. I will release a multilingual version of this model soon.
 For __multilingual use-cases__,
 I alternatively recommend machine translating texts to English with libraries like [EasyNMT](https://github.com/UKPLab/EasyNMT).
+## How to use the model
 ```python
 #!pip install transformers[sentencepiece]
 from transformers import pipeline
 `multi_label=False` forces the model to decide on only one class. `multi_label=True` enables the model to choose multiple classes.
 ## Metrics
 The main reference point is `facebook/bart-large-mnli` which is at the time of writing (27.03.24) the most used commercially-friendly 0-shot classifier.
 The different `...zeroshot-v2.0` models were all trained with the same data and the only difference the the underlying foundation model.
 ![results_aggreg_v2.0](https://raw.githubusercontent.com/MoritzLaurer/zeroshot-classifier/e859471dd183ad44b705c047130433301386aab8/v2_synthetic_data/results/zeroshot-v2.0-aggreg.png)
 |                            |   facebook/bart-large-mnli |   roberta-base-zeroshot-v2.0 |   roberta-large-zeroshot-v2.0 |   deberta-v3-base-zeroshot-v2.0 |   deberta-v3-large-zeroshot-v2.0 |
+## When to use which model
+- deberta-v3 vs. roberta: deberta-v3 performs clearly better than roberta, but it is slower.
+roberta is directly compatible with Hugging Face's production inference TEI containers and flash attention.
+These containers are a good choice for production use-cases. tl;dr: For accuracy, use a deberta-v3 model.
+If production inference speed is a concern, you can consider a roberta model (e.g. in a TEI container and [HF Inference Endpoints](https://ui.endpoints.huggingface.co/catalog)).
+- `zeroshot-v1.1` vs. `zeroshot-v2.0` models: My `zeroshot-v1.1` models (see [Zeroshot Classifier Collection](https://huggingface.co/collections/MoritzLaurer/zeroshot-classifiers-6548b4ff407bb19ff5c3ad6f)))
+perform better on these 28 datasets, but they are trained on several datasets with non-commercial licenses.
+For commercial users, I therefore recommend using a v2.0 model and non-commercial users might get better performance with a v1.1 model.
+## Reproduction
+Reproduction code is available here, in the `v2_synthetic_data` directory: https://github.com/MoritzLaurer/zeroshot-classifier/tree/main
 ## Limitations and bias
 The model can only do text classification tasks.
 ### Flexible usage and "prompting"
+You can formulate your own hypotheses by changing the `hypothesis_template` of the zeroshot pipeline.
+Similar to "prompt engineering" for LLMs, you can test different formulations of your `hypothesis_template` and verbalized classes to improve performance.
 ```python
 from transformers import pipeline