# emotion-classification-model This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on the [dair-ai/emotion dataset](https://huggingface.co/datasets/dair-ai/emotion). It is designed to classify text into the following emotional categories: sadness (0), joy (1), love (2), anger (3), fear (4), surprise (5). It achieves the following results: - **Validation Accuracy:** 94.25% - **Test Accuracy:** 93.2% ## Model Description This model uses the DistilBERT architecture, which is a lighter and faster variant of BERT. It has been fine-tuned specifically for emotion classification, making it suitable for tasks such as sentiment analysis, customer feedback analysis, and user emotion detection. ### Key Features - Efficient and lightweight for deployment. - High accuracy for emotion detection tasks. - Pretrained on a diverse dataset and fine-tuned for high specificity to emotions. ## Intended Uses & Limitations ### Intended Uses - Emotion analysis in text data. - Sentiment detection in customer reviews, tweets, or user feedback. - Psychological or behavioral studies to analyze emotional tone in communications. ### Limitations - May not generalize well to datasets with highly domain-specific language. - Might struggle with sarcasm, irony, or other nuanced forms of language. - The model is English-specific and may not perform well on non-English text. ## Training and Evaluation Data ### Training Dataset - **Dataset:** [dair-ai/emotion](https://huggingface.co/datasets/dair-ai/emotion) - **Training Set Size:** 16,000 examples - **Dataset Description:** The dataset contains English sentences labeled with six emotional categories: sadness (0), joy (1), love (2), anger (3), fear (4), surprise (5) ### Results - **Training Time:** ~226 seconds - **Training Loss:** 0.0520 - **Validation Accuracy:** 94.25% - **Test Accuracy:** 93.2% ## Training Procedure ### Hyperparameters - **Learning Rate:** 5e-05 - **Batch Size:** 16 (train and evaluation) - **Epochs:** 3 - **Seed:** 42 - **Optimizer:** AdamW (betas=(0.9,0.999), epsilon=1e-08) - **Learning Rate Scheduler:** Linear - **Mixed Precision Training:** Native AMP ### Training and Validation Results | Epoch | Training Loss | Validation Loss | Validation Accuracy | |-------|---------------|-----------------|---------------------| | 1 | 0.5383 | 0.1845 | 92.90% | | 2 | 0.2254 | 0.1589 | 93.55% | | 3 | 0.0520 | 0.1485 | 94.25% | ### Final Evaluation - **Validation Loss:** 0.1485 - **Validation Accuracy:** 94.25% - **Test Loss:** 0.1758 - **Test Accuracy:** 93.2% ### Performance Metrics - **Training Speed:** ~212 samples/second - **Evaluation Speed:** ~1144 samples/second ## Usage Example ```python from transformers import pipeline # Load the fine-tuned model classifier = pipeline("text-classification", model="Panda0116/emotion-classification-model") # Example usage text = "I am so happy to see you!" emotion = classifier(text) print(emotion) ```