README.md · azizbarank/distilroberta-base-sst2-distilled at 9675b54829c758bd44d5e992e99d3d95e3b32c81

Distilled version of the RoBERTa model fine-tuned on the SST-2 part of the GLUE dataset. It was obtained from the "teacher" RoBERTa model by using task-specific knowledge distillation. Since it was fine-tuned on the SST-2, the final model is ready to be used in sentiment analysis tasks.

Modifications to the original RoBERTa model:

The final distilled model was able to achieve 91.6% accuracy on the SST-2 dataset with only 85M parameters. Given the original RoBERTa achieves 92.5% accuracy on the same dataset with much more parameters (125M), it is an impressive result.

Tabular Comparison:

Modifications	Original RoBERTa	distilroberta-sst-2-distilled
Parameters	125M	85M
Performance on SST-2	92.5	91.6

Evaluation & Training Results

Epoch	Training Loss	Validation Loss	Accuracy
1	0.819500	0.547877	0.904817
2	0.308400	0.616938	0.900229
3	0.193600	0.496516	0.912844
4	0.136300	0.486479	0.917431
5	0.105100	0.449959	0.917431
6	0.081800	0.452210	0.916284

Usage

To use the model from the 🤗/transformers library

# !pip install transformers

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("azizbarank/distilroberta-base-sst2-distilled")

model = AutoModelForSequenceClassification.from_pretrained("azizbarank/distilroberta-base-sst2-distilled")