|
--- |
|
language: |
|
- ru |
|
library_name: fasttext |
|
pipeline_tag: text-classification |
|
tags: |
|
- news |
|
- media |
|
- russian |
|
- multilingual |
|
--- |
|
|
|
# FastText Text Classifier |
|
|
|
This is a FastText model for text classification, trained on |
|
my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier), consisting of news from the last 5 |
|
years, hosted on Hugging Face Hub. |
|
The learning news dataset is a well-balanced sample of recent news from the last five years. |
|
|
|
## Model Description |
|
|
|
This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an |
|
accuracy of 0.8691 on a test dataset. |
|
|
|
## Task |
|
|
|
The model is designed to classify russian languages news articles into 11 categories. |
|
|
|
## Categories |
|
|
|
The news category is assigned by the classifier to one of 11 categories: |
|
|
|
- climate (климат) |
|
- conflicts (конфликты) |
|
- culture (культура) |
|
- economy (экономика) |
|
- gloss (глянец) |
|
- health (здоровье) |
|
- politics (политика) |
|
- science (наука) |
|
- society (общество) |
|
- sports (спорт) |
|
- travel (путешествия) |
|
} |
|
|
|
## Intended uses & limitations |
|
|
|
The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the |
|
classification of news categories politics, society and conflicts. |
|
|
|
## Usage |
|
|
|
To use this model, you will need the `fasttext` and `transformers` libraries. Install them using pip: |
|
|
|
`pip install fasttext transformers` |
|
|
|
Example of how to use the model: |
|
|
|
```python |
|
from huggingface_hub import hf_hub_download |
|
import fasttext |
|
|
|
|
|
class FastTextClassifierPipeline: |
|
def __init__(self, model_path): |
|
self.model = fasttext.load_model(model_path) |
|
|
|
def __call__(self, texts): |
|
if isinstance(texts, str): |
|
texts = [texts] |
|
|
|
results = [] |
|
for text in texts: |
|
prediction = self.model.predict(text) |
|
label = prediction[0][0].replace("__label__", "") |
|
score = float(prediction[1][0]) |
|
results.append({"label": label, "score": score}) |
|
|
|
return results |
|
|
|
|
|
def pipeline(task="text-classification", model=None): |
|
# Загрузка файла model.bin |
|
repo_id = "data-silence/fasttext-rus-news-classifier" |
|
model_file = hf_hub_download(repo_id=repo_id, filename="fasttext_news_classifier.bin") |
|
return FastTextClassifierPipeline(model_file) |
|
|
|
|
|
# Создание классификатора |
|
classifier = pipeline("text-classification") |
|
|
|
# Использование классификатора |
|
text = "В Париже завершилась церемония закрытия Олимпийских игр" |
|
result = classifier(text) |
|
print(result) |
|
# [{'label': 'sports', 'score': 1.0000100135803223}] |
|
``` |
|
|
|
## Contacts |
|
|
|
If you have any questions or suggestions for improving the model, please create an issue in this repository or contact |
|
me at enjoy@data-silence.com. |