azizbarank's picture
Create README.md
9675b54
|
raw
history blame
2.03 kB

Distilled version of the RoBERTa model fine-tuned on the SST-2 part of the GLUE dataset. It was obtained from the "teacher" RoBERTa model by using task-specific knowledge distillation. Since it was fine-tuned on the SST-2, the final model is ready to be used in sentiment analysis tasks.

Modifications to the original RoBERTa model:

The final distilled model was able to achieve 91.6% accuracy on the SST-2 dataset with only 85M parameters. Given the original RoBERTa achieves 92.5% accuracy on the same dataset with much more parameters (125M), it is an impressive result.

Tabular Comparison:

Modifications Original RoBERTa distilroberta-sst-2-distilled
Parameters 125M 85M
Performance on SST-2 92.5 91.6

Evaluation & Training Results

Epoch Training Loss Validation Loss Accuracy
1 0.819500 0.547877 0.904817
2 0.308400 0.616938 0.900229
3 0.193600 0.496516 0.912844
4 0.136300 0.486479 0.917431
5 0.105100 0.449959 0.917431
6 0.081800 0.452210 0.916284

Usage

To use the model from the 🤗/transformers library

# !pip install transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("azizbarank/distilroberta-base-sst2-distilled")

model = AutoModelForSequenceClassification.from_pretrained("azizbarank/distilroberta-base-sst2-distilled")