Update README.md
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ Model was pretrained using standard MLM objective on a large text corpora includ
|
|
27 |
|
28 |
## How to Get Started with the Model
|
29 |
|
30 |
-
```
|
31 |
from transformers import AutoTokenizer, AutoModel
|
32 |
|
33 |
tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
|
@@ -43,31 +43,21 @@ predictions = model(**inputs)
|
|
43 |
|
44 |
### Training Data
|
45 |
|
46 |
-
|
47 |
-
|
48 |
-
Mix of the following data:
|
49 |
-
* Wikipedia
|
50 |
-
* Books
|
51 |
-
* Twitter comments
|
52 |
-
* Pikabu
|
53 |
-
* Proza.ru
|
54 |
-
* Film subtitles
|
55 |
-
* News websites
|
56 |
-
* Social corpus
|
57 |
-
|
58 |
-
~500gb of raw texts
|
59 |
|
60 |
### Training Procedure
|
61 |
|
62 |
#### Training Hyperparameters
|
63 |
|
64 |
-
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
|
|
71 |
|
72 |
Model was trained using 8xA100 for ~22 days.
|
73 |
|
@@ -75,25 +65,29 @@ Model was trained using 8xA100 for ~22 days.
|
|
75 |
|
76 |
Standard RoBERTa-base parameters:
|
77 |
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
|
85 |
-
|
86 |
-
|
|
|
|
|
87 |
|
88 |
## Evaluation
|
89 |
|
90 |
-
|
|
|
|
|
91 |
|
92 |
-
| Модель
|
93 |
-
|
94 |
-
| vk-roberta-base
|
95 |
-
| vk-deberta-distill | 0.433
|
96 |
-
| vk-deberta-base
|
97 |
-
| vk-bert-base
|
98 |
-
| sber-
|
99 |
-
| sber-
|
|
|
27 |
|
28 |
## How to Get Started with the Model
|
29 |
|
30 |
+
```python
|
31 |
from transformers import AutoTokenizer, AutoModel
|
32 |
|
33 |
tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
|
|
|
43 |
|
44 |
### Training Data
|
45 |
|
46 |
+
500gb of raw texts in total. Mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles,
|
47 |
+
News websites, Social corpus.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
### Training Procedure
|
50 |
|
51 |
#### Training Hyperparameters
|
52 |
|
53 |
+
| Argument | Value |
|
54 |
+
|--------------------|----------------------|
|
55 |
+
| Training regime | fp16 mixed precision |
|
56 |
+
| Training framework | Fairseq |
|
57 |
+
| Optimizer | Adam |
|
58 |
+
| Adam betas | 0.9,0.98 |
|
59 |
+
| Adam eps | 1e-6 |
|
60 |
+
| Num training steps | 500k |
|
61 |
|
62 |
Model was trained using 8xA100 for ~22 days.
|
63 |
|
|
|
65 |
|
66 |
Standard RoBERTa-base parameters:
|
67 |
|
68 |
+
| Argument | Value |
|
69 |
+
|-------------------------|-------|
|
70 |
+
|Activation function | gelu |
|
71 |
+
|Attention dropout | 0.1 |
|
72 |
+
|Dropout | 0.1 |
|
73 |
+
|Encoder attention heads | 12 |
|
74 |
+
|Encoder embed dim | 768 |
|
75 |
+
|Encoder ffn embed dim | 3,072 |
|
76 |
+
|Encoder layers | 12 |
|
77 |
+
|Max positions | 512 |
|
78 |
+
|Vocab size | 50266 |
|
79 |
|
80 |
## Evaluation
|
81 |
|
82 |
+
Russian Super Glue dev set.
|
83 |
+
|
84 |
+
Best result across base size models in bold.
|
85 |
|
86 |
+
| Модель | RCB | PARus | MuSeRC | TERRa | RUSSE | RWSD | DaNetQA | Результат |
|
87 |
+
|------------------------------------------------------------------------|-----------|--------|---------|-------|---------|---------|---------|-----------|
|
88 |
+
| [vk-roberta-base](https://huggingface.co/deepvk/roberta-base) | 0.46 | 0.56 | 0.679 | 0.769 | 0.960 | 0.569 | 0.658 | 0.665 |
|
89 |
+
| [vk-deberta-distill](https://huggingface.co/deepvk/deberta-v1-distill) | 0.433 | 0.56 | 0.625 | 0.59 | 0.943 | 0.569 | 0.726 | 0.635 |
|
90 |
+
| [vk-deberta-base](https://huggingface.co/deepvk/deberta-v1-base) | 0.450 |**0.61**|**0.722**| 0.704 | 0.948 | 0.578 |**0.76** |**0.682** |
|
91 |
+
| [vk-bert-base](https://huggingface.co/deepvk/bert-base-uncased) | 0.467 | 0.57 | 0.587 | 0.704 | 0.953 |**0.583**| 0.737 | 0.657 |
|
92 |
+
| [sber-bert-base](https://huggingface.co/ai-forever/ruBert-base) | **0.491** |**0.61**| 0.663 | 0.769 |**0.962**| 0.574 | 0.678 | 0.678 |
|
93 |
+
| [sber-roberta-large](https://huggingface.co/ai-forever/ruRoberta-large)| 0.463 | 0.61 | 0.775 | 0.886 | 0.946 | 0.564 | 0.761 | 0.715 |
|