|
|
|
|
|
# BERT FINETUNED ON PHISHING DETECTION |
|
|
|
This model is a fine-tuned version of [bert-large-uncased](https://huggingface.co/bert-large-uncased) on an [phishing dataset](https://huggingface.co/datasets/ealvaradob/phishing-dataset), |
|
capable of detecting phishing in its four most common forms: URLs, Emails, SMS messages and even websites. |
|
|
|
It achieves the following results on the evaluation set: |
|
|
|
- Loss: 0.1953 |
|
- Accuracy: 0.9717 |
|
- Precision: 0.9658 |
|
- Recall: 0.9670 |
|
- False Positive Rate: 0.0249 |
|
|
|
## Model description |
|
|
|
BERT is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. |
|
This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why |
|
it can use lots of publicly available data) with an automatic process to generate inputs and labels from |
|
those texts. |
|
|
|
|
|
|
|
## Motivation and Purpose |
|
|
|
Phishing is one of the most frequent and most expensive cyber-attacks according to several security reports. |
|
This model aims to efficiently and accurately prevent phishing attacks against individuals and organizations. |
|
To achieve it, BERT was trained on a diverse and robust dataset containing: URLs, SMS Messages, Emails and |
|
Websites, which allows the model to extend its detection capability beyond the usual and to be used in various |
|
contexts. |
|
|
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | False Positive Rate | |
|
|:-------------:|:-----:|:-----:|:---------------:|:--------:|:---------:|:------:|:-------------------:| |
|
| 0.1487 | 1.0 | 3866 | 0.1454 | 0.9596 | 0.9709 | 0.9320 | 0.0203 | |
|
| 0.0805 | 2.0 | 7732 | 0.1389 | 0.9691 | 0.9663 | 0.9601 | 0.0243 | |
|
| 0.0389 | 3.0 | 11598 | 0.1779 | 0.9683 | 0.9778 | 0.9461 | 0.0156 | |
|
| 0.0091 | 4.0 | 15464 | 0.1953 | 0.9717 | 0.9658 | 0.9670 | 0.0249 | |
|
|
|
|
|
|