Amharic Hate Speech Detection Model using Fine-Tuned mBERT

Overview

This repository presents a Hate Speech Detection Model for the Amharic language, fine-tuned from the multilingual BERT (mBERT) model. Leveraging the HuggingFace Trainer API, this model is specifically designed to detect hate speech in Amharic with high accuracy and precision.

Model Details

The base model for this project is Davlan's bert-base-multilingual-cased-finetuned-amharic from Huggingface. This pretrained model was further fine-tuned on a custom dataset for the downstream task of hate speech detection in Amharic.

Key Highlights:

Model Architecture: mBERT (Multilingual BERT)
Training Framework: HuggingFace's Trainer API
Performance:
- F1-Score: 0.9172
- Accuracy: 91.59%
Training Parameters:
- Epochs: 15
- Learning Rate: 5e-5

Dataset

The model was fine-tuned using a dataset sourced from Mendeley Data. The dataset consists of 30,000 labeled instances, making it one of the most comprehensive datasets for Amharic hate speech detection.

Dataset Overview:

Total Samples: 30,000
Source: Mendeley Data Repository
Language: Amharic

Model Usage

For those interested in utilizing or exploring this model further, the complete Google Colab notebook detailing the training process and performance metrics is available on GitHub. You can easily access it via the following link:

Google Colab Notebook: Amharic Hate Speech Detection Using mBERT

How to Use

To use this model for Amharic hate speech detection, you can follow the steps in the Google Colab notebook to load and test the model on new data. The notebook includes all necessary instructions for:

Loading the fine-tuned mBERT model
Preprocessing Amharic text data
Making predictions on new instances

Contact Information

If you have any questions or suggestions, feel free to reach out or contribute via GitHub.

devaprobs
/

hate-speech-detection-using-amharic-language