--- license: mit language: - ru metrics: - f1 - roc_auc - precision - recall pipeline_tag: text-classification tags: - rubert - sentiment datasets: - sismetanin/rureviews - RuSentiment - LinisCrowd2015 - LinisCrowd2016 - KaggleRussianNews --- This is [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased) model fine-tuned for __sentiment classification__ of short __Russian__ texts. The task is a __multi-class classification__ with the following labels: ```yaml 0: neutral 1: positive 2: negative ``` ## Usage ```python from transformers import pipeline model = pipeline(model="seara/rubert-base-cased-russian-sentiment") model("Привет, ты мне нравишься!") # [{'label': 'positive', 'score': 0.9818321466445923}] ``` ## Dataset This model was trained on the union of the following datasets: - Kaggle Russian News Dataset - Linis Crowd 2015 - Linis Crowd 2016 - RuReviews - RuSentiment An overview of the training data can be found on [S. Smetanin Github repository](https://github.com/sismetanin/sentiment-analysis-in-russian). __Download links for all Russian sentiment datasets collected by Smetanin can be found in this [repository](https://github.com/searayeah/russian-sentiment-emotions-datasets).__ ## Training Training were done in this [project](https://github.com/searayeah/vkr-bert) with this parameters: ``` max_length: 512 batch_size: 64 optimizer: adam lr: 0.00001 weight_decay: 0 num_epochs: 5 ``` Train/validation/test splits are 80%/10%/10%. ## Eval results (on test split) | |neutral|positive|negative|macro avg|weighted avg| |---------|-------|--------|--------|---------|------------| |precision|0.71 |0.84 |0.75 |0.77 |0.76 | |recall |0.74 |0.84 |0.71 |0.76 |0.76 | |f1-score |0.73 |0.84 |0.73 |0.76 |0.76 | |auc-roc |0.86 |0.95 |0.91 |0.91 |0.90 | |support |5196 |3831 |3599 |12626 |12626 |