metadata

title: Hate Speech Text Classifier
emoji: 👁
colorFrom: green
colorTo: red
sdk: gradio
sdk_version: 4.21.0
app_file: app.py
pinned: false

Monitoring Harmful Text in Online Platforms

Overview

This repository hosts the RandomForest classifier model designed for detecting harmful text AGAINST GROUPS. The model classifies text into one of three categories: "Offensive or Hateful", "Neutral or Ambiguous", and "Not Hate". Achieving an accuracy of 92.5%, this model was developed through the combination of three distinct datasets, ensuring robustness and reliability in varied contexts. It was presented at the prestigious annual Gulf Coast Conference & Expo on AI.

Model Details Model Type: RandomForest Classifier Accuracy: 92.5% Labels: 0: Neutral or Ambiguous 1: Not Hate 2: Offensive or Hateful Training Data: Augmented version of this dataset (279k+ rows)

Dataset

The model was trained on a concatenated dataset from three distinct sources, curated to encompass a wide range of linguistic expressions and contexts. The datasets were preprocessed and balanced to ensure fair representation across all categories.

Performance

The model achieves an accuracy of 92.5%, evaluated on a held-out test set.

Contributing

We welcome contributions to improve the model and extend its applicability.