seara
/

rubert-base-cased-russian-sentiment

Text Classification

sentiment-analysis

multi-class-classification

sentiment analysis

Inference Endpoints

Model card Files Files and versions Community

seara commited on Aug 24, 2023

Commit

2e0e8ec

•

1 Parent(s): 023cc1c

Update README.md

Files changed (1) hide show

README.md +55 -1

README.md CHANGED Viewed

@@ -16,4 +16,58 @@ datasets:
 - RuSentiment
 - LinisCrowd2015
 - LinisCrowd2016
----

 - RuSentiment
 - LinisCrowd2015
 - LinisCrowd2016
+- KaggleRussianNews
+---
+This is [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased) model fine-tuned for __sentiment classification__ of short __Russian__ texts.
+The task is a __multi-class classification__ with the following labels: 0->neutral, 1->positive, 2->negative.
+## Usage
+```python
+from transformers import pipeline
+model = pipeline(model="seara/rubert-base-cased-russian-sentiment")
+model("Привет, ты мне нравишься!")
+# [{'label': 'positive', 'score': 0.9818321466445923}]
+```
+## Dataset
+This model was trained on the union of the following datasets:
+- Kaggle Russian News Dataset
+- Linis Crowd 2015
+- Linis Crowd 2016
+- RuReviews
+- RuSentiment
+An overview of the training data can be found in the [article by S. Smetanin](https://www.sciencedirect.com/science/article/abs/pii/S0306457320309730)
+and the summary of the overview is available on his [Github repository](https://github.com/sismetanin/sentiment-analysis-in-russian).
+__Download links for all Russian sentiment datasets collected by Smetanin can be found in this [Github repository](https://github.com/searayeah/russian-sentiment-emotions-datasets).__
+## Training
+Training were done in this [project](https://github.com/searayeah/vkr-bert) with this parameters:
+```
+tokenizer.max_length: 256
+batch_size: 32
+optimizer: adam
+lr: 0.00001
+weight_decay: 0
+epochs: 2
+```
+Train/validation/test splits are 80%/10%/10%.
+## Metrics (on test split)
+|         |neutral|positive|negative|macro avg|weighted avg|
+|---------|-------|--------|--------|---------|------------|
+|precision|0.71   |0.84    |0.75    |0.77     |0.76        |
+|recall   |0.74   |0.84    |0.71    |0.76     |0.76        |
+|f1-score |0.73   |0.84    |0.73    |0.76     |0.76        |
+|auc-roc  |0.86   |0.95    |0.91    |0.91     |0.90        |
+|support  |5196   |3831    |3599    |12626    |12626       |