deepvk
/

roberta-base

@@ -4,23 +4,19 @@ language:
 - ru
 - en
 library_name: transformers
 ---
-# RoBERTa-base from deepvk
 <!-- Provide a quick summary of what the model is/does. -->
-Pretrained bidirectional encoder for russian language.
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-Model was pretrained using standard MLM objective on a large text corpora including open social data, books, Wikipedia, webpages etc.
-- **Developed by:** VK Applied Research Team
 - **Model type:** RoBERTa
 - **Languages:** Mostly russian and small fraction of other languages
 - **License:** Apache 2.0
@@ -43,12 +39,10 @@ predictions = model(**inputs)
 ### Training Data
-500gb of raw texts in total. Mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles,
-News websites, Social corpus.
-### Training Procedure
-#### Training Hyperparameters
 | Argument           | Value                |
 |--------------------|----------------------|
@@ -59,36 +53,35 @@ News websites, Social corpus.
 | Adam eps           | 1e-6                 |
 | Num training steps | 500k                 |
-Model was trained using 8xA100 for ~22 days.
-#### Architecture details
-Standard RoBERTa-base parameters:
 | Argument                | Value          |
 |-------------------------|----------------|
-|Activation function      | gelu           |
-|Attention dropout        | 0.1            |
-|Dropout                  | 0.1            |
 |Encoder attention heads  | 12             |
 |Encoder embed dim        | 768            |
 |Encoder ffn embed dim    | 3,072          |
-|Encoder layers           | 12             |
 |Max positions            | 512            |
 |Vocab size               | 50266          |
 |Tokenizer type           | Byte-level BPE |
 ## Evaluation
-Russian Super Glue dev set.
-Best result across base size models in bold.
 | Модель                                                                 | RCB       |  PARus | MuSeRC  | TERRa | RUSSE   | RWSD    | DaNetQA | Результат |
 |------------------------------------------------------------------------|-----------|--------|---------|-------|---------|---------|---------|-----------|
-| [vk-roberta-base](https://huggingface.co/deepvk/roberta-base)          | 0.46      |  0.56  | 0.679   | 0.769 | 0.960   | 0.569   | 0.658   | 0.665     |
 | [vk-deberta-distill](https://huggingface.co/deepvk/deberta-v1-distill) | 0.433     |  0.56  | 0.625   | 0.59  | 0.943   | 0.569   | 0.726   | 0.635     |
 | [vk-deberta-base](https://huggingface.co/deepvk/deberta-v1-base)       | 0.450     |**0.61**|**0.722**| 0.704 | 0.948   | 0.578   |**0.76** |**0.682**  |
 | [vk-bert-base](https://huggingface.co/deepvk/bert-base-uncased)        | 0.467     |  0.57  | 0.587   | 0.704 | 0.953   |**0.583**| 0.737   | 0.657     |
-| [sber-bert-base](https://huggingface.co/ai-forever/ruBert-base)        | **0.491** |**0.61**| 0.663   | 0.769 |**0.962**| 0.574   | 0.678   | 0.678     |
-| [sber-roberta-large](https://huggingface.co/ai-forever/ruRoberta-large)| 0.463     |  0.61  | 0.775   | 0.886 | 0.946   | 0.564   | 0.761   | 0.715     |

 - ru
 - en
 library_name: transformers
+pipeline_tag: fill-mask
 ---
+# RoBERTa-base
 <!-- Provide a quick summary of what the model is/does. -->
+Pretrained bidirectional encoder for russian language.
+The model was trained using standard MLM objective on large text corpora including open social data.
+See [`Training Details`](https://huggingface.co/docs/hub/model-cards#training-details) section for more information
+- **Developed by:** [deepvk](https://vk.com/deepvk)
 - **Model type:** RoBERTa
 - **Languages:** Mostly russian and small fraction of other languages
 - **License:** Apache 2.0
 ### Training Data
+500 GB of raw text in total.
+A mix of the following data: Wikipedia, Books, Twitter comments, Pikabu, Proza.ru, Film subtitles, News websites, and Social corpus.
+### Training Hyperparameters
 | Argument           | Value                |
 |--------------------|----------------------|
 | Adam eps           | 1e-6                 |
 | Num training steps | 500k                 |
+The model was trained on a machine with 8xA100 for approximately 22 days.
+### Architecture details
 | Argument                | Value          |
 |-------------------------|----------------|
+|Encoder layers           | 12             |
 |Encoder attention heads  | 12             |
 |Encoder embed dim        | 768            |
 |Encoder ffn embed dim    | 3,072          |
+|Activation function      | GeLU           |
+|Attention dropout        | 0.1            |
+|Dropout                  | 0.1            |
 |Max positions            | 512            |
 |Vocab size               | 50266          |
 |Tokenizer type           | Byte-level BPE |
 ## Evaluation
+We evaluated the model on [Russian Super Glue](https://russiansuperglue.com/) dev set.
+The best result in each task is marked in bold.
+All models have the same size except the distilled version of DeBERTa.
 | Модель                                                                 | RCB       |  PARus | MuSeRC  | TERRa | RUSSE   | RWSD    | DaNetQA | Результат |
 |------------------------------------------------------------------------|-----------|--------|---------|-------|---------|---------|---------|-----------|
 | [vk-deberta-distill](https://huggingface.co/deepvk/deberta-v1-distill) | 0.433     |  0.56  | 0.625   | 0.59  | 0.943   | 0.569   | 0.726   | 0.635     |
+|                                                                        |           |        |         |       |         |         |         |           |
+| [vk-roberta-base](https://huggingface.co/deepvk/roberta-base)          | 0.46      |  0.56  | 0.679   | 0.769 | 0.960   | 0.569   | 0.658   | 0.665     |
 | [vk-deberta-base](https://huggingface.co/deepvk/deberta-v1-base)       | 0.450     |**0.61**|**0.722**| 0.704 | 0.948   | 0.578   |**0.76** |**0.682**  |
 | [vk-bert-base](https://huggingface.co/deepvk/bert-base-uncased)        | 0.467     |  0.57  | 0.587   | 0.704 | 0.953   |**0.583**| 0.737   | 0.657     |
+| [sber-bert-base](https://huggingface.co/ai-forever/ruBert-base)        | **0.491** |**0.61**| 0.663   | 0.769 |**0.962**| 0.574   | 0.678   | 0.678     |