--- language: en license: apache-2.0 --- # Women's Clothing Reviews Sentiment Analysis with DistilBERT ## Overview This Hugging Face repository contains a fine-tuned DistilBERT model for sentiment analysis of women's clothing reviews. The model is designed to classify reviews into positive, negative, or neutral sentiment categories, providing valuable insights into customer opinions. ## Model Details - **Model Architecture**: Fine-tuned DistilBERT - **Sentiment Categories**: Neutral [0], Negative [1], Positive [2] - **Input Format**: Text-based clothing reviews - **Output Format**: Sentiment category labels ## Fine-tuning procedure This model was fine-tuned using a relatively small dataset containing 23487 rows broken down into train/eval/test dataset. Nevertheless, the fine-tuned model was able to performs slightly better than the base-distilbert-model on the test dataset. ## Training result It achieved the following results on the evaluation set: - **Validation Loss**: 1.1677 ### Comparison between the base distilbert model VS fine-tuned distilbert | Model | Accuracy | Precision | Recall | F1 Score | |--------------- | -------- | --------- | ------ | -------- | | DistilBERT base model | 0.79 | 0.77 | 0.79 | 0.77 | | DistilBERT fine-tuned | 0.85 | 0.86 | 0.85 | 0.85 | ## Installation To use this model, you'll need to install the Hugging Face Transformers library and any additional dependencies. - **pip install transformers** - **pip install torch** ## Usage You can easily load the pre-trained model for sentiment analysis using Hugging Face's DistilBertForSequenceClassification and DistilBertTokenizerFast. ```python from transformers import DistilBertForSequenceClassification, DistilBertTokenizerFast import torch model_name = "ongaunjie/distilbert-cloths-sentiment" tokenizer = DistilBertTokenizerFast.from_pretrained(model_name) model = DistilBertForSequenceClassification.from_pretrained(model_name) review = "This dress is amazing, I love it!" inputs = tokenizer.encode(review, return_tensors="pt") with torch.no_grad(): outputs = model(inputs) predicted_class = int(torch.argmax(outputs.logits))