Edit model card

Greek-Reddit-BERT

A Greek topic classification model based on GREEK-BERT.
This model is fine-tuned on GreekReddit as part of our research article:
Mastrokostas, C., Giarelis, N., & Karacapilidis, N. (2024) Social Media Topic Classification on Greek Reddit
For more information see the evaluation section below.

Training dataset

The training dataset of Greek-Reddit-BERT is GreekReddit, which is a topic classification dataset.
Overall, GreekReddit contains 6,534 user posts collected from Greek subreddits belonging to various topics (i.e., society, politics, economy, entertainment/culture, sports).

Training configuration

We fine-tuned nlpaueb/bert-base-greek-uncased-v1 (113 million parameters) on the GreekReddit train split using the following parameters:

  • GPU batch size = 16
  • Total training epochs = 4
  • Learning rate = 5e−5
  • Dropout Rate = 0.1
  • Number of labels = 10
  • 32-bit floating precision
  • Tokenization
    • maximum input token length = 512
    • padding = True
    • truncation = True

Evaluation

Model Precision Recall F1 Hamming Loss
Greek-Reddit-BERT 80.05 81.48 80.61 19.84

Example code

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model_name = 'IMISLab/Greek-Reddit-BERT'
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name) 

topic_classifier = pipeline(
    'text-classification',
    device = 'cpu',
    model = model,
    tokenizer = tokenizer,
    truncation = True,
    max_length = 512
)
    
text = 'Άλλες οικονομίες, όπως η Κίνα, προσπαθούν να διατηρούν την αξία του νομίσματος τους χαμηλά ώστε να καταστήσουν τις εξαγωγές τους πιο ελκυστικές στο εξωτερικό. Γιατί όμως θεωρούμε πως η πτωτική πορεία της Τουρκικής λίρας είναι η ""αχίλλειος πτέρνα"" της Τουρκίας;'
output = topic_classifier(text)
print(output[0]['label'])

Contact

If you have any questions/feedback about the model please e-mail one of the following authors:

giarelis@ceid.upatras.gr
cmastrokostas@ac.upatras.gr
karacap@upatras.gr

Citation

@article{mastrokostas2024social,
  title={Social Media Topic Classification on Greek Reddit},
  author={Mastrokostas, Charalampos and Giarelis, Nikolaos and Karacapilidis, Nikos},
  journal={Information},
  volume={15},
  number={9},
  pages={521},
  year={2024},
  publisher={Multidisciplinary Digital Publishing Institute}
}
Downloads last month
26
Safetensors
Model size
113M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train IMISLab/Greek-Reddit-BERT

Evaluation results