democracy-sentiment-analysis-turkish-roberta

This model is a fine-tuned version of cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4469
Accuracy: 0.8184
F1: 0.8186
Precision: 0.8224
Recall: 0.8184

Model description

This model is fine-tuned from the base model cardiffnlp/twitter-xlm-roberta-base-sentiment-multilingual for sentiment analysis in Turkish, specifically focusing on democracy-related text. The model classifies texts into three sentiment categories:

Positive Neutral Negative

Intended uses & limitations

This model is well-suited for analyzing sentiments in Turkish texts that discuss democracy, governance, and related political discourse.

Training and evaluation data

The training dataset consists of 30,000 rows gathered from various sources, including: Kaggle, Hugging Face, Ekşi Sözlük, and synthetic data generated using state-of-the-art LLMs. The dataset is multilingual in origin, with texts in English, Russian, and Turkish. All non-Turkish texts were translated into Turkish. The data represents a broad spectrum of democratic discourse from 30 different sources.

How to Use

To use this model for sentiment analysis, you can leverage the Hugging Face pipeline for text classification as shown below:

from transformers import pipeline

# Load the model from Hugging Face
sentiment_model = pipeline(model="yeniguno/democracy-sentiment-analysis-turkish-roberta", task='text-classification')

# Example text input
response = sentiment_model("En iyisi devletin tüm gücünü tek bir lidere verelim")

# Print the result
print(response)
# [{'label': 'negative', 'score': 0.9617443084716797}]

# Example text input
response = sentiment_model("Birçok farklı sesin çıkması zaman alıcı ve karmaşık görünebilir, ancak demokrasinin getirdiği özgürlük ve çeşitlilik, toplumun gerçek gücüdür.")

# Print the result
print(response)
# [{'label': 'positive', 'score': 0.958978533744812}]

# Example text input
response = sentiment_model("Bugün hava yağmurlu.")

# Print the result
print(response)
# [{'label': 'neutral', 'score': 0.9915837049484253}]

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Precision	Recall
0.7236	1.0	802	0.4797	0.8039	0.8031	0.8037	0.8039
0.424	2.0	1604	0.4469	0.8184	0.8186	0.8224	0.8184

Framework versions

Transformers 4.44.2
Pytorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.19.1

yeniguno
/

democracy-sentiment-analysis-turkish-roberta