data-silence's picture
Update README.md
a9dfadf verified
---
language:
- ru
library_name: fasttext
pipeline_tag: text-classification
tags:
- news
- media
- russian
- multilingual
---
# FastText Text Classifier
This is a FastText model for text classification, trained on
my [news dataset](https://huggingface.co/datasets/data-silence/rus_news_classifier), consisting of news from the last 5
years, hosted on Hugging Face Hub.
The learning news dataset is a well-balanced sample of recent news from the last five years.
## Model Description
This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an
accuracy of 0.8691 on a test dataset.
## Task
The model is designed to classify russian languages news articles into 11 categories.
## Categories
The news category is assigned by the classifier to one of 11 categories:
- climate (климат)
- conflicts (конфликты)
- culture (культура)
- economy (экономика)
- gloss (глянец)
- health (здоровье)
- politics (политика)
- science (наука)
- society (общество)
- sports (спорт)
- travel (путешествия)
}
## Intended uses & limitations
The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the
classification of news categories politics, society and conflicts.
## Usage
To use this model, you will need the `fasttext` and `transformers` libraries. Install them using pip:
`pip install fasttext transformers`
Example of how to use the model:
```python
from huggingface_hub import hf_hub_download
import fasttext
class FastTextClassifierPipeline:
def __init__(self, model_path):
self.model = fasttext.load_model(model_path)
def __call__(self, texts):
if isinstance(texts, str):
texts = [texts]
results = []
for text in texts:
prediction = self.model.predict(text)
label = prediction[0][0].replace("__label__", "")
score = float(prediction[1][0])
results.append({"label": label, "score": score})
return results
def pipeline(task="text-classification", model=None):
# Загрузка файла model.bin
repo_id = "data-silence/fasttext-rus-news-classifier"
model_file = hf_hub_download(repo_id=repo_id, filename="fasttext_news_classifier.bin")
return FastTextClassifierPipeline(model_file)
# Создание классификатора
classifier = pipeline("text-classification")
# Использование классификатора
text = "В Париже завершилась церемония закрытия Олимпийских игр"
result = classifier(text)
print(result)
# [{'label': 'sports', 'score': 1.0000100135803223}]
```
## Contacts
If you have any questions or suggestions for improving the model, please create an issue in this repository or contact
me at enjoy@data-silence.com.