seara commited on
Commit
2e0e8ec
1 Parent(s): 023cc1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -1
README.md CHANGED
@@ -16,4 +16,58 @@ datasets:
16
  - RuSentiment
17
  - LinisCrowd2015
18
  - LinisCrowd2016
19
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  - RuSentiment
17
  - LinisCrowd2015
18
  - LinisCrowd2016
19
+ - KaggleRussianNews
20
+ ---
21
+
22
+ This is [RuBERT](https://huggingface.co/DeepPavlov/rubert-base-cased) model fine-tuned for __sentiment classification__ of short __Russian__ texts.
23
+ The task is a __multi-class classification__ with the following labels: 0->neutral, 1->positive, 2->negative.
24
+
25
+ ## Usage
26
+
27
+ ```python
28
+ from transformers import pipeline
29
+ model = pipeline(model="seara/rubert-base-cased-russian-sentiment")
30
+ model("Привет, ты мне нравишься!")
31
+ # [{'label': 'positive', 'score': 0.9818321466445923}]
32
+ ```
33
+
34
+ ## Dataset
35
+
36
+ This model was trained on the union of the following datasets:
37
+
38
+ - Kaggle Russian News Dataset
39
+ - Linis Crowd 2015
40
+ - Linis Crowd 2016
41
+ - RuReviews
42
+ - RuSentiment
43
+
44
+ An overview of the training data can be found in the [article by S. Smetanin](https://www.sciencedirect.com/science/article/abs/pii/S0306457320309730)
45
+ and the summary of the overview is available on his [Github repository](https://github.com/sismetanin/sentiment-analysis-in-russian).
46
+
47
+ __Download links for all Russian sentiment datasets collected by Smetanin can be found in this [Github repository](https://github.com/searayeah/russian-sentiment-emotions-datasets).__
48
+
49
+ ## Training
50
+
51
+ Training were done in this [project](https://github.com/searayeah/vkr-bert) with this parameters:
52
+
53
+ ```
54
+ tokenizer.max_length: 256
55
+ batch_size: 32
56
+ optimizer: adam
57
+ lr: 0.00001
58
+ weight_decay: 0
59
+ epochs: 2
60
+ ```
61
+
62
+ Train/validation/test splits are 80%/10%/10%.
63
+
64
+ ## Metrics (on test split)
65
+
66
+ | |neutral|positive|negative|macro avg|weighted avg|
67
+ |---------|-------|--------|--------|---------|------------|
68
+ |precision|0.71 |0.84 |0.75 |0.77 |0.76 |
69
+ |recall |0.74 |0.84 |0.71 |0.76 |0.76 |
70
+ |f1-score |0.73 |0.84 |0.73 |0.76 |0.76 |
71
+ |auc-roc |0.86 |0.95 |0.91 |0.91 |0.90 |
72
+ |support |5196 |3831 |3599 |12626 |12626 |
73
+