democracy-sentiment-analysis-turkish-roberta
This model is a fine-tuned version of cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4469
- Accuracy: 0.8184
- F1: 0.8186
- Precision: 0.8224
- Recall: 0.8184
Model description
This model is fine-tuned from the base model cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual for sentiment analysis in Turkish, specifically focusing on democracy-related text. The model classifies texts into three sentiment categories:
Positive Neutral Negative
Intended uses & limitations
This model is well-suited for analyzing sentiments in Turkish texts that discuss democracy, governance, and related political discourse.
Training and evaluation data
The training dataset consists of 30,000 rows gathered from various sources, including: Kaggle, Hugging Face, Ekşi Sözlük, and synthetic data generated using state-of-the-art LLMs. The dataset is multilingual in origin, with texts in English, Russian, and Turkish. All non-Turkish texts were translated into Turkish. The data represents a broad spectrum of democratic discourse from 30 different sources.
How to Use
To use this model for sentiment analysis, you can leverage the Hugging Face pipeline
for text classification as shown below:
from transformers import pipeline
# Load the model from Hugging Face
sentiment_model = pipeline(model="yeniguno/democracy-sentiment-analysis-turkish-roberta", task='text-classification')
# Example text input
response = sentiment_model("En iyisi devletin tüm gücünü tek bir lidere verelim")
# Print the result
print(response)
# [{'label': 'negative', 'score': 0.9617443084716797}]
# Example text input
response = sentiment_model("Birçok farklı sesin çıkması zaman alıcı ve karmaşık görünebilir, ancak demokrasinin getirdiği özgürlük ve çeşitlilik, toplumun gerçek gücüdür.")
# Print the result
print(response)
# [{'label': 'positive', 'score': 0.958978533744812}]
# Example text input
response = sentiment_model("Bugün hava yağmurlu.")
# Print the result
print(response)
# [{'label': 'neutral', 'score': 0.9915837049484253}]
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 2
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|---|---|---|
0.7236 | 1.0 | 802 | 0.4797 | 0.8039 | 0.8031 | 0.8037 | 0.8039 |
0.424 | 2.0 | 1604 | 0.4469 | 0.8184 | 0.8186 | 0.8224 | 0.8184 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 23