Giyaseddin commited on
Commit
bfef5f3
1 Parent(s): 4ff201c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -0
README.md CHANGED
@@ -1,3 +1,118 @@
1
  ---
2
  license: gpl-3.0
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: gpl-3.0
3
+ language: en
4
+ library: transformers
5
+ other: distilbert
6
+ datasets:
7
+ - Fake and real news dataset
8
  ---
9
+
10
+ # DistilBERT base cased model for Fake News Classification
11
+
12
+ ## Model description
13
+
14
+ DistilBERT is a transformers model, smaller and faster than BERT, which was pretrained on the same corpus in a
15
+ self-supervised fashion, using the BERT base model as a teacher. This means it was pretrained on the raw texts only,
16
+ with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic
17
+ process to generate inputs and labels from those texts using the BERT base model.
18
+
19
+ This is a Fake News classification model finetuned [pretrained DistilBERT model](https://huggingface.co/distilbert-base-cased) on
20
+ [Fake and real news dataset](https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset)
21
+
22
+ ## Intended uses & limitations
23
+
24
+ This can only be used for the kind of news that are similar to the ones in the dataset,
25
+ please visit the [dataset's kaggle page](https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset) to see the data.
26
+
27
+ ### How to use
28
+
29
+ You can use this model directly with a :
30
+
31
+ ```python
32
+ >>> from transformers import pipeline
33
+ >>> classifier = pipeline("text-classification", model="Giyaseddin/distilbert-base-cased-finetuned-fake-and-real-news-dataset", return_all_scores=True)
34
+ >>> examples = ["Yesterday, Speaker Paul Ryan tweeted a video of himself on the Mexican border flying in a helicopter and traveling on horseback with US border agents. RT if you agree It is time for The Wall. pic.twitter.com/s5MO8SG7SL Paul Ryan (@SpeakerRyan) August 1, 2017It makes for great theater to see Republican Speaker Ryan pleading the case for a border wall, but how sincere are the GOP about building the border wall? Even after posting a video that appears to show Ryan s support for the wall, he still seems unsure of himself. It s almost as though he s testing the political winds when he asks Twitter users to retweet if they agree that we need to start building the wall. How committed is the (formerly?) anti-Trump Paul Ryan to building the border wall that would fulfill one of President Trump s most popular campaign promises to the American people? Does he have the what it takes to defy the wishes of corporate donors and the US Chamber of Commerce, and do the right thing for the national security and well-being of our nation?The Last Refuge- Republicans are in control of the House of Representatives, Republicans are in control of the Senate, a Republican President is in the White House, and somehow there s negotiations on how to fund the #1 campaign promise of President Donald Trump, the border wall.Here s the rub.Here s what pundits never discuss.The Republican party doesn t need a single Democrat to fund the border wall.A single spending bill could come from the House of Representatives that fully funds 100% of the border wall. The spending bill then goes to the senate, where again, it doesn t need a single Democrat vote because spending legislation is specifically what reconciliation was designed to facilitate. That House bill can pass the Senate with 51 votes and proceed directly to the President s desk for signature.So, ask yourself: why is this even a point of discussion?The honest answer, for those who are no longer suffering from Battered Conservative Syndrome, is that Republicans don t want to fund or build an actual physical barrier known as the Southern Border Wall.It really is that simple.If one didn t know better, they d almost think Speaker Ryan was attempting to emulate the man he clearly despised during the 2016 presidential campaign."]
35
+ >>> classifier(examples)
36
+ [[{'label': 'LABEL_0', 'score': 1.0},
37
+ {'label': 'LABEL_1', 'score': 1.0119109106199176e-08}]]
38
+ ```
39
+
40
+ ### Limitations and bias
41
+
42
+ Even if the training data used for this model could be characterized as fairly neutral, this model can have biased
43
+ predictions. It also inherits some of
44
+ [the bias of its teacher model](https://huggingface.co/bert-base-uncased#limitations-and-bias).
45
+
46
+ This bias will also affect all fine-tuned versions of this model.
47
+
48
+ ## Pre-training data
49
+
50
+ DistilBERT pretrained on the same data as BERT, which is [BookCorpus](https://yknzhu.wixsite.com/mbweb), a dataset
51
+ consisting of 11,038 unpublished books and [English Wikipedia](https://en.wikipedia.org/wiki/English_Wikipedia)
52
+ (excluding lists, tables and headers).
53
+
54
+ ## Fine-tuning data
55
+
56
+ [Fake and real news dataset](https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset)
57
+
58
+
59
+ ## Training procedure
60
+
61
+ ### Preprocessing
62
+
63
+ In the preprocessing phase, both the title and the text of the news are concatenated using a separator `[SEP]`.
64
+ This makes the full text as:
65
+
66
+ ```
67
+ [CLS] Title Sentence [SEP] News text body [SEP]
68
+ ```
69
+
70
+ The data are splitted according to the following ratio:
71
+ - Training set 60%.
72
+ - Validation set 20%.
73
+ - Test set 20%.
74
+
75
+ Lables are mapped as: `{fake: 0, true: 1}`
76
+
77
+ ### Fine-tuning
78
+
79
+ The model was finetuned on GeForce GTX 960M for 5 hours. The parameters are:
80
+
81
+ | Parameter | Value |
82
+ |:-------------------:|:-----:|
83
+ | Learning rate | 5e-5 |
84
+ | Weight decay | 0.01 |
85
+ | Training batch size | 4 |
86
+ | Epochs | 3 |
87
+
88
+ Here is the scores during the training:
89
+
90
+ | Epoch | Training Loss | Validation Loss | Accuracy | F1 | Precision | Recall |
91
+ |:----------:|:-------------:|:-----------------:|:----------:|:---------:|:-----------:|:---------:|
92
+ | 1 | 0.008300 | 0.005783 | 0.998330 | 0.998252 | 0.996511 | 1.000000 |
93
+ | 2 | 0.000000 | 0.000161 | 0.999889 | 0.999883 | 0.999767 | 1.000000 |
94
+ | 3 | 0.000000 | 0.000122 | 0.999889 | 0.999883 | 0.999767 | 1.000000 |
95
+
96
+ ## Evaluation results
97
+
98
+ When fine-tuned on downstream task of fake news binary classification, this model achieved the following results:
99
+ (scores are rounded to 2 floating points)
100
+
101
+ | | precision | recall | f1-score | support |
102
+ |:------------:|:---------:|:------:|:--------:|:-------:|
103
+ | Fake | 1.00 | 1.00 | 1.00 | 4697 |
104
+ | True | 1.00 | 1.00 | 1.00 | 4283 |
105
+ | accuracy | - | - | 1.00 | 8980 |
106
+ | macro avg | 1.00 | 1.00 | 1.00 | 8980 |
107
+ | weighted avg | 1.00 | 1.00 | 1.00 | 8980 |
108
+
109
+ Confision matrix:
110
+
111
+
112
+ | Actual\Predicted | Fake | True |
113
+ |:-----------------:|:----:|:----:|
114
+ | Fake | 4696 | 1 |
115
+ | True | 1 | 4282 |
116
+
117
+ The AUC score is 0.9997
118
+