|
--- |
|
license: mit |
|
language: |
|
- ru |
|
metrics: |
|
- accuracy |
|
pipeline_tag: text-classification |
|
|
|
widget: |
|
- text: "Взрыв газа произошел в 2-этажном доме в поселке под Казанью, пострадали четыре человека, сообщает МЧС" |
|
example_title: "Новость" |
|
- text: "Сын поздравил меня с днём рождения стихами ❤️" |
|
example_title: "Не новость" |
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
News_classifier is a fine-tuned model designed for classifying news posts from various Russian-language Telegram channels. This model can be integrated into a news aggregation service. |
|
|
|
- **Model type:** Sentence RuBERT (Russian, cased, 12-layer, 768-hidden, 12-heads, 180M parameters) |
|
- **Language(s):** russian (ru) |
|
- **License:** mit |
|
- **Finetuned from model:** `DeepPavlov/rubert-base-cased-sentence` |
|
|
|
## Dataset |
|
- Russian telegram posts |
|
- train/valid/test: 2970/165/165 |
|
|
|
## Training Details |
|
- token max length: 512 |
|
- num labels: 2 |
|
- batch size: 16 |
|
- learning rate: 2e-5 |
|
- train epochs: 20 |
|
- weight decay: 0.01 |
|
|
|
## Metrics: |
|
- Matthews_correlation (training evaluation metric): 0.89 |
|
- Accuracy: 0.95 |
|
|
|
## Label Scheme |
|
- LABEL_1 - news |
|
- LABEL_0 - not news |