---
license: mit
datasets:
- google-research-datasets/go_emotions
language:
- en
library_name: transformers
tags:
- sentiment
---

# Fine-Tuned MiniLM for GoEmotions Sentiment Analysis


This repository contains a fine-tuned version of Microsoft's MiniLM-v2 model, specifically optimized for sentiment analysis using the GoEmotions dataset. The model is capable of classifying text into the following emotional/sentiment categories:

This model is just **90MB** making it ideal for memory constraint environments.

* anger
* approval
* confusion
* disappointment
* disapproval
* gratitude
* joy
* sadness
* neutral

These sentiments more or less cover all the sentiments that can be in a sentence. Useful for validating sentiment analysis models.

Label Analogy when using Inference:
```
{
  "LABEL_0":anger,
  "LABEL_1":approval,
  "LABEL_2":confusion,
  "LABEL_3":disappointment,
  "LABEL_4":disapproval,
  "LABEL_5":gratitude,
  "LABEL_6":joy,
  "LABEL_7":sadness,
  "LABEL_8":neutral
}
```

## Why MiniLM?

MiniLM is a distilled version of larger language models like BERT and RoBERTa. It strikes a remarkable balance between performance and efficiency:

* **Reduced Size:** MiniLM is significantly smaller than its parent models, making it faster to load and deploy, especially in resource-constrained environments.
* **Comparable Performance:** Despite its compact size, MiniLM maintains surprisingly high accuracy on various natural language processing (NLP) tasks, including sentiment analysis.
* **Distillation Power:**  MiniLM's distillation technique ensures that it captures the essential knowledge of larger models, making it a potent tool for real-world applications.

## GoEmotions Dataset 

google-research-datasets/go_emotions

The GoEmotions dataset is a valuable resource for sentiment analysis. It consists of thousands of Reddit comments labeled with the nine emotional/sentiment classes listed above.  This dataset's richness in diverse expressions of emotions makes it an ideal choice for training a versatile sentiment analysis model.

## Training Procedure

1. **Data Preprocessing:** The GoEmotions dataset was preprocessed to ensure consistency and remove noise.
2. **Tokenizer:** The MiniLM-v2 tokenizer was used to convert text into numerical representations suitable for the model.
3. **Fine-Tuning:** The MiniLM-v2 model was fine-tuned on the GoEmotions dataset using a standard training loop. The model's parameters were adjusted to optimize its performance on sentiment classification.
4. **Evaluation:**  The fine-tuned model was evaluated on a held-out test set to measure its accuracy and generalization capabilities.

## How to Use This Model

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

required_sentiments = ['anger', 'approval', 'confusion', 'disappointment', 'disapproval', 'gratitude', 'joy', 'sadness', 'neutral']


model = AutoModelForSequenceClassification.from_pretrained('./saved_model')
tokenizer = AutoTokenizer.from_pretrained('./saved_model')

text = "How can you be so careless"

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding='max_length', max_length=128)

model.eval()
with torch.no_grad():
    outputs = model(**inputs)
    
predictions = torch.argmax(outputs.logits, dim=-1).item()

# Map the label to sentiment
label_mapping = {idx: sentiment for idx, sentiment in enumerate(required_sentiments)}
predicted_sentiment = label_mapping[predictions]

print(f'Text: {text}')
print(f'Predicted Sentiment: {predicted_sentiment}')