ongaunjie's picture
Update README.md
ed0c631
|
raw
history blame
2.2 kB
---
language: en
license: apache-2.0
---
# Women's Clothing Reviews Sentiment Analysis with DistilBERT
## Overview
This Hugging Face repository contains a fine-tuned DistilBERT model for sentiment analysis of women's clothing reviews. The model is designed to classify reviews into positive, negative, or neutral sentiment categories, providing valuable insights into customer opinions.
## Model Details
- **Model Architecture**: Fine-tuned DistilBERT
- **Sentiment Categories**: Neutral [0], Negative [1], Positive [2]
- **Input Format**: Text-based clothing reviews
- **Output Format**: Sentiment category labels
## Fine-tuning procedure
This model was fine-tuned using a relatively small dataset containing 23487 rows broken down into train/eval/test dataset. Nevertheless, the fine-tuned model was able to performs slightly better than the base-distilbert-model on the test dataset.
## Training result
It achieved the following results on the evaluation set:
- **Validation Loss**: 1.1677
### Comparison between the base distilbert model VS fine-tuned distilbert
| Model | Accuracy | Precision | Recall | F1 Score |
|--------------- | -------- | --------- | ------ | -------- |
| DistilBERT base model | 0.79 | 0.77 | 0.79 | 0.77 |
| DistilBERT fine-tuned | 0.85 | 0.86 | 0.85 | 0.85 |
## Installation
To use this model, you'll need to install the Hugging Face Transformers library and any additional dependencies.
- **pip install transformers**
- **pip install torch**
## Usage
You can easily load the pre-trained model for sentiment analysis using Hugging Face's DistilBertForSequenceClassification and DistilBertTokenizerFast.
```python
from transformers import DistilBertForSequenceClassification, DistilBertTokenizerFast
import torch
model_name = "ongaunjie/distilbert-cloths-sentiment"
tokenizer = DistilBertTokenizerFast.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name)
review = "This dress is amazing, I love it!"
inputs = tokenizer.encode(review, return_tensors="pt")
with torch.no_grad():
outputs = model(inputs)
predicted_class = int(torch.argmax(outputs.logits))