Update README.md
Browse filesadd optimum + onnx
README.md
CHANGED
@@ -17,14 +17,14 @@ datasets:
|
|
17 |
DistilCamemBERT-NLI
|
18 |
===================
|
19 |
|
20 |
-
We present DistilCamemBERT-NLI which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) fine-tuned for the Natural Language Inference (NLI) task for the french language, also known as recognizing textual entailment (RTE). This model is constructed on the XNLI dataset which
|
21 |
|
22 |
-
This modelization is close to [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) based on [CamemBERT](https://huggingface.co/camembert-base) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase for example. Indeed, inference cost can be a technological issue especially
|
23 |
|
24 |
Dataset
|
25 |
-------
|
26 |
|
27 |
-
The dataset XNLI from [FLUE](https://huggingface.co/datasets/flue)
|
28 |
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
|
29 |
|
30 |
Evaluation results
|
@@ -40,7 +40,7 @@ Evaluation results
|
|
40 |
Benchmark
|
41 |
---------
|
42 |
|
43 |
-
We compare the [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) model to 2 other modelizations working on french language. The first one [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) is based on well named [CamemBERT](https://huggingface.co/camembert-base), the french RoBERTa model and the second one [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) based on [mDeBERTav3](https://huggingface.co/microsoft/mdeberta-v3-base) a multilingual model. To compare the performances the metrics of accuracy and [MCC (Matthews Correlation Coefficient)](https://en.wikipedia.org/wiki/Phi_coefficient)
|
44 |
|
45 |
| **model** | **time (ms)** | **accuracy (%)** | **MCC (x100)** |
|
46 |
| :--------------: | :-----------: | :--------------: | :------------: |
|
@@ -54,7 +54,7 @@ Zero-shot classification
|
|
54 |
The main advantage of such modelization is to create a zero-shot classifier allowing text classification without training. This task can be summarized by:
|
55 |
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
|
56 |
|
57 |
-
For this part, we use
|
58 |
|
59 |
| **model** | **time (ms)** | **accuracy (%)** | **MCC (x100)** |
|
60 |
| :--------------: | :-----------: | :--------------: | :------------: |
|
@@ -62,7 +62,7 @@ For this part, we use 2 datasets, the first one: [allocine](https://huggingface.
|
|
62 |
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 378.39 | **86.37** | **73.74** |
|
63 |
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 520.58 | 84.97 | 70.05 |
|
64 |
|
65 |
-
The second one: [mlsum](https://huggingface.co/datasets/mlsum) used to train the summarization models.
|
66 |
|
67 |
| **model** | **time (ms)** | **accuracy (%)** | **MCC (x100)** |
|
68 |
| :--------------: | :-----------: | :--------------: | :------------: |
|
@@ -103,6 +103,24 @@ result
|
|
103 |
0.0455702543258667]}
|
104 |
```
|
105 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
Citation
|
107 |
--------
|
108 |
```bibtex
|
|
|
17 |
DistilCamemBERT-NLI
|
18 |
===================
|
19 |
|
20 |
+
We present DistilCamemBERT-NLI, which is [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) fine-tuned for the Natural Language Inference (NLI) task for the french language, also known as recognizing textual entailment (RTE). This model is constructed on the XNLI dataset, which determines whether a premise entails, contradicts or neither entails or contradicts a hypothesis.
|
21 |
|
22 |
+
This modelization is close to [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) based on [CamemBERT](https://huggingface.co/camembert-base) model. The problem of the modelizations based on CamemBERT is at the scaling moment, for the production phase, for example. Indeed, inference cost can be a technological issue especially in the context of cross-encoding like this task. To counteract this effect, we propose this modelization which divides the inference time by 2 with the same consumption power, thanks to DistilCamemBERT.
|
23 |
|
24 |
Dataset
|
25 |
-------
|
26 |
|
27 |
+
The dataset XNLI from [FLUE](https://huggingface.co/datasets/flue) comprises 392,702 premises with their hypothesis for the train and 5,010 couples for the test. The goal is to predict textual entailment (does sentence A imply/contradict/neither sentence B?) and is a classification task (given two sentences, predict one of three labels). Sentence A is called *premise*, and sentence B is called *hypothesis*, then the goal of modelization is determined as follows:
|
28 |
$$P(premise=c\in\{contradiction, entailment, neutral\}\vert hypothesis)$$
|
29 |
|
30 |
Evaluation results
|
|
|
40 |
Benchmark
|
41 |
---------
|
42 |
|
43 |
+
We compare the [DistilCamemBERT](https://huggingface.co/cmarkea/distilcamembert-base) model to 2 other modelizations working on the french language. The first one [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) is based on well named [CamemBERT](https://huggingface.co/camembert-base), the french RoBERTa model and the second one [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) based on [mDeBERTav3](https://huggingface.co/microsoft/mdeberta-v3-base) a multilingual model. To compare the performances, the metrics of accuracy and [MCC (Matthews Correlation Coefficient)](https://en.wikipedia.org/wiki/Phi_coefficient) were used. We used an **AMD Ryzen 5 4500U @ 2.3GHz with 6 cores** for mean inference time measure.
|
44 |
|
45 |
| **model** | **time (ms)** | **accuracy (%)** | **MCC (x100)** |
|
46 |
| :--------------: | :-----------: | :--------------: | :------------: |
|
|
|
54 |
The main advantage of such modelization is to create a zero-shot classifier allowing text classification without training. This task can be summarized by:
|
55 |
$$P(hypothesis=i\in\mathcal{C}|premise)=\frac{e^{P(premise=entailment\vert hypothesis=i)}}{\sum_{j\in\mathcal{C}}e^{P(premise=entailment\vert hypothesis=j)}}$$
|
56 |
|
57 |
+
For this part, we use two datasets, the first one: [allocine](https://huggingface.co/datasets/allocine) used to train the sentiment analysis models. The dataset comprises two classes: "positif" and "négatif" appreciation of movie reviews. Here we use "Ce commentaire est {}." as the hypothesis template and "positif" and "négatif" as candidate labels.
|
58 |
|
59 |
| **model** | **time (ms)** | **accuracy (%)** | **MCC (x100)** |
|
60 |
| :--------------: | :-----------: | :--------------: | :------------: |
|
|
|
62 |
| [BaptisteDoyen/camembert-base-xnli](https://huggingface.co/BaptisteDoyen/camembert-base-xnli) | 378.39 | **86.37** | **73.74** |
|
63 |
| [MoritzLaurer/mDeBERTa-v3-base-mnli-xnli](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 520.58 | 84.97 | 70.05 |
|
64 |
|
65 |
+
The second one: [mlsum](https://huggingface.co/datasets/mlsum) used to train the summarization models. In this aim, we aggregate sub-topics and select a few of them. We use the articles summary part to predict their topics. In this case, the hypothesis template used is "C'est un article traitant de {}." and the candidate labels are: "économie", "politique", "sport" and "science".
|
66 |
|
67 |
| **model** | **time (ms)** | **accuracy (%)** | **MCC (x100)** |
|
68 |
| :--------------: | :-----------: | :--------------: | :------------: |
|
|
|
103 |
0.0455702543258667]}
|
104 |
```
|
105 |
|
106 |
+
### Optimum + ONNX
|
107 |
+
|
108 |
+
```python
|
109 |
+
from optimum.onnxruntime import ORTModelForSequenceClassification
|
110 |
+
from transformers import AutoTokenizer, pipeline
|
111 |
+
|
112 |
+
HUB_MODEL = "cmarkea/distilcamembert-base-nli"
|
113 |
+
|
114 |
+
tokenizer = AutoTokenizer.from_pretrained(HUB_MODEL)
|
115 |
+
model = ORTModelForSequenceClassification.from_pretrained(HUB_MODEL)
|
116 |
+
onnx_qa = pipeline("zero-shot-classification", model=model, tokenizer=tokenizer)
|
117 |
+
|
118 |
+
# Quantized onnx model
|
119 |
+
quantized_model = ORTModelForSequenceClassification.from_pretrained(
|
120 |
+
HUB_MODEL, file_name="model_quantized.onnx"
|
121 |
+
)
|
122 |
+
```
|
123 |
+
|
124 |
Citation
|
125 |
--------
|
126 |
```bibtex
|