Update README.md
Browse files
README.md
CHANGED
@@ -45,18 +45,17 @@ It achieves the following results on the evaluation set:
|
|
45 |
- Loss: 0.1116
|
46 |
- Accuracy: 0.9823
|
47 |
|
48 |
-
## Model description
|
49 |
|
50 |
-
|
|
|
|
|
51 |
|
52 |
-
|
53 |
-
|
54 |
-
More information needed
|
55 |
|
56 |
## Training and evaluation data
|
57 |
|
58 |
-
More information needed
|
59 |
-
|
60 |
## Training procedure
|
61 |
|
62 |
### Training hyperparameters
|
|
|
45 |
- Loss: 0.1116
|
46 |
- Accuracy: 0.9823
|
47 |
|
48 |
+
## Base Model description
|
49 |
|
50 |
+
This model is a distilled version of the [RoBERTa-base model](https://huggingface.co/roberta-base). It follows the same training procedure as [DistilBERT](https://huggingface.co/distilbert-base-uncased).
|
51 |
+
The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/distillation).
|
52 |
+
This model is case-sensitive: it makes a difference between English and English.
|
53 |
|
54 |
+
The model has 6 layers, 768 dimension and 12 heads, totalizing 82M parameters (compared to 125M parameters for RoBERTa-base).
|
55 |
+
On average DistilRoBERTa is twice as fast as Roberta-base.
|
|
|
56 |
|
57 |
## Training and evaluation data
|
58 |
|
|
|
|
|
59 |
## Training procedure
|
60 |
|
61 |
### Training hyperparameters
|