deepvk
/

roberta-base

@@ -27,7 +27,7 @@ Model was pretrained using standard MLM objective on a large text corpora includ
 ## How to Get Started with the Model
-```
 from transformers import AutoTokenizer, AutoModel
 tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
@@ -43,31 +43,21 @@ predictions = model(**inputs)
 ### Training Data
-<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-Mix of the following data:
-* Wikipedia
-* Books
-* Twitter comments
-* Pikabu
-* Proza.ru
-* Film subtitles
-* News websites
-* Social corpus
-~500gb of raw texts
 ### Training Procedure
 #### Training Hyperparameters
-- **Training regime:** fp16 mixed precision
-- **Training framework:** Fairseq
-- **Optimizer:** Adam
-- **Adam betas:** 0.9,0.98
-- **Adam eps:** 1e-6
-- **Num training steps:** 500k
-- **Train batch size:** 4096
 Model was trained using 8xA100 for ~22 days.
@@ -75,25 +65,29 @@ Model was trained using 8xA100 for ~22 days.
 Standard RoBERTa-base parameters:
-- **Activation function:** gelu
-- **Attention dropout:** 0.1
-- **Dropout:** 0.1
-- **Encoder attention heads:** 12
-- **Encoder embed dim:** 768
-- **Encoder ffn embed dim:** 3,072
-- **Encoder layers:** 12
-- **Max positions:** 512
-- **Vocab size:** 50266
 ## Evaluation
-Результаты на Russian Super Glue dev
-| Модель             | RCB   | PARus | MuSeRC | TERRa | RUSSE | RWSD  | DaNetQA | Результат |
-|--------------------|-------|-------|--------|-------|-------|-------|---------|-----------|
-| vk-roberta-base    | 0.46  | 0.56  | 0.679  | 0.769 | 0.960 | 0.569 | 0.658   | 0.665     |
-| vk-deberta-distill | 0.433 | 0.56  | 0.625  | 0.59  | 0.943 | 0.569 | 0.726   | 0.635     |
-| vk-deberta-base    | 0.450 | 0.61  | 0.722  | 0.704 | 0.948 | 0.578 | 0.76    | 0.682     |
-| vk-bert-base       | 0.467 | 0.57  | 0.587  | 0.704 | 0.953 | 0.583 | 0.737   | 0.657     |
-| sber-roberta-large | 0.463 | 0.61  | 0.775  | 0.886 | 0.946 | 0.564 | 0.761   | 0.715     |
-| sber-bert-base     | 0.491 | 0.61  | 0.663  | 0.769 | 0.962 | 0.574 | 0.678   | 0.678     |

 ## How to Get Started with the Model
+```python
 from transformers import AutoTokenizer, AutoModel
 tokenizer = AutoTokenizer.from_pretrained("deepvk/roberta-base")
 ### Training Data
+500gb of raw texts in total. Mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles,
+News websites, Social corpus.
 ### Training Procedure
 #### Training Hyperparameters
+| Argument           | Value                |
+|--------------------|----------------------|
+| Training regime    | fp16 mixed precision |
+| Training framework | Fairseq              |
+| Optimizer          | Adam                 |
+| Adam betas         | 0.9,0.98             |
+| Adam eps           | 1e-6                 |
+| Num training steps | 500k                 |
 Model was trained using 8xA100 for ~22 days.
 Standard RoBERTa-base parameters:
+| Argument                | Value |
+|-------------------------|-------|
+|Activation function      | gelu  |
+|Attention dropout        | 0.1   |
+|Dropout                  | 0.1   |
+|Encoder attention heads  | 12    |
+|Encoder embed dim        | 768   |
+|Encoder ffn embed dim    | 3,072 |
+|Encoder layers           | 12    |
+|Max positions            | 512   |
+|Vocab size               | 50266 |
 ## Evaluation
+Russian Super Glue dev set.
+Best result across base size models in bold.
+| Модель                                                                 | RCB       |  PARus | MuSeRC  | TERRa | RUSSE   | RWSD    | DaNetQA | Результат |
+|------------------------------------------------------------------------|-----------|--------|---------|-------|---------|---------|---------|-----------|
+| [vk-roberta-base](https://huggingface.co/deepvk/roberta-base)          | 0.46      |  0.56  | 0.679   | 0.769 | 0.960   | 0.569   | 0.658   | 0.665     |
+| [vk-deberta-distill](https://huggingface.co/deepvk/deberta-v1-distill) | 0.433     |  0.56  | 0.625   | 0.59  | 0.943   | 0.569   | 0.726   | 0.635     |
+| [vk-deberta-base](https://huggingface.co/deepvk/deberta-v1-base)       | 0.450     |**0.61**|**0.722**| 0.704 | 0.948   | 0.578   |**0.76** |**0.682**  |
+| [vk-bert-base](https://huggingface.co/deepvk/bert-base-uncased)        | 0.467     |  0.57  | 0.587   | 0.704 | 0.953   |**0.583**| 0.737   | 0.657     |
+| [sber-bert-base](https://huggingface.co/ai-forever/ruBert-base)        | **0.491** |**0.61**| 0.663   | 0.769 |**0.962**| 0.574   | 0.678   | 0.678     |
+| [sber-roberta-large](https://huggingface.co/ai-forever/ruRoberta-large)| 0.463     |  0.61  | 0.775   | 0.886 | 0.946   | 0.564   | 0.761   | 0.715     |