Edit model card

Amharic Hate Speech Detection Model using Fine-Tuned mBERT

Overview

This repository presents a Hate Speech Detection Model for the Amharic language, fine-tuned from the multilingual BERT (mBERT) model. Leveraging the HuggingFace Trainer API, this model is specifically designed to detect hate speech in Amharic with high accuracy and precision.

Model Details

The base model for this project is Davlan's bert-base-multilingual-cased-finetuned-amharic from Huggingface. This pretrained model was further fine-tuned on a custom dataset for the downstream task of hate speech detection in Amharic.

Key Highlights:

  • Model Architecture: mBERT (Multilingual BERT)
  • Training Framework: HuggingFace's Trainer API
  • Performance:
    • F1-Score: 0.9172
    • Accuracy: 91.59%
  • Training Parameters:
    • Epochs: 15
    • Learning Rate: 5e-5

Dataset

The model was fine-tuned using a dataset sourced from Mendeley Data. The dataset consists of 30,000 labeled instances, making it one of the most comprehensive datasets for Amharic hate speech detection.

Dataset Overview:

  • Total Samples: 30,000
  • Source: Mendeley Data Repository
  • Language: Amharic

Model Usage

For those interested in utilizing or exploring this model further, the complete Google Colab notebook detailing the training process and performance metrics is available on GitHub. You can easily access it via the following link:

Google Colab Notebook: Amharic Hate Speech Detection Using mBERT

How to Use

To use this model for Amharic hate speech detection, you can follow the steps in the Google Colab notebook to load and test the model on new data. The notebook includes all necessary instructions for:

  • Loading the fine-tuned mBERT model
  • Preprocessing Amharic text data
  • Making predictions on new instances

Contact Information

If you have any questions or suggestions, feel free to reach out or contribute via GitHub.

Downloads last month
69
Safetensors
Model size
178M params
Tensor type
F32
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using devaprobs/hate-speech-detection-using-amharic-language 1