metadata

license: apache-2.0
language: en
tags:
  - generated_from_trainer
metrics:
  - accuracy
  - f1
  - precision
  - recall
model-index:
  - name: distilroberta-base-finetuned-fake-news-english
    results: []
widget:
  - text: >-
      Wisconsin has not counted more votes than it has registered voters. This
      tweet is comparing the vote count from 2020 with the number of registered
      voters from 2018. When we take a look at Wisconsin’s current total of
      registered voters, we see that there is nothing fraudulent about the
      state’s count.
    example_title: fake
  - text: >-
      Barack Hussein Obama II is an American politician who served as the 44th
      president of the United States from 2009 to 2017. A member of the
      Democratic Party, Obama was the first African-American president of the
      United States.
    example_title: real

distilroberta-base-finetuned-fake-news-english

This model is a fine-tuned version of distilroberta-base on the fake-and-real news dataset. It achieves the following results on the evaluation set:

Loss: 0.0020
Accuracy: 0.9997
F1: 0.9997
Precision: 0.9994
Recall: 1.0
Auc: 0.9997

Intended uses & limitations

The model may not work with the articles over 512 tokens after preprocessing as the model's context is restricted to a maximum of 512 tokens in the sequence.

Training and evaluation data

The fake-and-real news dataset contains a total of 44,898 annotated articles with 21,417 real and 23,481 fake. The dataset was stratified split into train, validation, and test subsets with a proportion of 60:20:20 respectively. The model was fine-tuned on the train subset and evaluated on validation and test subsets.

Split	# examples
train	17959
validation	13469
test	13470

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 224
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	F1	Precision	Recall	Auc
0.251	0.36	200	0.0030	0.9996	0.9995	0.9995	0.9995	0.9996
0.0022	0.71	400	0.0012	0.9998	0.9998	0.9995	1.0	0.9998
0.0013	1.07	600	0.0001	1.0	1.0	1.0	1.0	1.0
0.0004	1.43	800	0.0015	0.9997	0.9997	0.9994	1.0	0.9997
0.0013	1.78	1000	0.0020	0.9997	0.9997	0.9994	1.0	0.9997

Framework versions

Transformers 4.17.0
Pytorch 1.10.0+cu111
Datasets 2.0.0
Tokenizers 0.12.0