Amharic Hate Speech Detection Model using Fine-Tuned mBERT
Overview
This repository presents a Hate Speech Detection Model for the Amharic language, fine-tuned from the multilingual BERT (mBERT) model. Leveraging the HuggingFace Trainer API, this model is specifically designed to detect hate speech in Amharic with high accuracy and precision.
Model Details
The base model for this project is Davlan's bert-base-multilingual-cased-finetuned-amharic from Huggingface. This pretrained model was further fine-tuned on a custom dataset for the downstream task of hate speech detection in Amharic.
Key Highlights:
- Model Architecture: mBERT (Multilingual BERT)
- Training Framework: HuggingFace's Trainer API
- Performance:
- F1-Score: 0.9172
- Accuracy: 91.59%
- Training Parameters:
- Epochs: 15
- Learning Rate: 5e-5
Dataset
The model was fine-tuned using a dataset sourced from Mendeley Data. The dataset consists of 30,000 labeled instances, making it one of the most comprehensive datasets for Amharic hate speech detection.
Dataset Overview:
- Total Samples: 30,000
- Source: Mendeley Data Repository
- Language: Amharic
Model Usage
For those interested in utilizing or exploring this model further, the complete Google Colab notebook detailing the training process and performance metrics is available on GitHub. You can easily access it via the following link:
Google Colab Notebook: Amharic Hate Speech Detection Using mBERT
How to Use
To use this model for Amharic hate speech detection, you can follow the steps in the Google Colab notebook to load and test the model on new data. The notebook includes all necessary instructions for:
- Loading the fine-tuned mBERT model
- Preprocessing Amharic text data
- Making predictions on new instances
Contact Information
If you have any questions or suggestions, feel free to reach out or contribute via GitHub.
- Downloads last month
- 69