metadata
license: apache-2.0
base_model: bert-base-cased
tags:
- PII
- NER
- Bert
- Token Classification
datasets:
- generator
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: pii_model
results:
- task:
name: Token Classification
type: token-classification
dataset:
name: generator
type: generator
config: default
split: train
args: default
metrics:
- name: Precision
type: precision
value: 0.954751
- name: Recall
type: recall
value: 0.965233
- name: F1
type: f1
value: 0.959964
- name: Accuracy
type: accuracy
value: 0.991199
pipeline_tag: token-classification
language:
- en
Personal Identifiable Information (PII Model)
This model is a fine-tuned version of bert-base-cased on the generator dataset. It achieves the following results:
- Training Loss: 0.003900
- Validation Loss: 0.051071
- Precision: 95.53%
- Recall: 96.60%
- F1: 96%
- Accuracy:99.11%
Model description
Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore.
Model can Detect Following Entity Group
- ACCOUNTNUMBER
- FIRSTNAME
- ACCOUNTNAME
- PHONENUMBER
- CREDITCARDCVV
- CREDITCARDISSUER
- PREFIX
- LASTNAME
- AMOUNT
- DATE
- DOB
- COMPANYNAME
- BUILDINGNUMBER
- STREET
- SECONDARYADDRESS
- STATE
- CITY
- CREDITCARDNUMBER
- SSN
- URL
- USERNAME
- PASSWORD
- COUNTY
- PIN
- MIDDLENAME
- IBAN
- GENDER
- AGE
- ZIPCODE
- SEX
Training hyperparameters
The following hyperparameters were used during training:
Hyperparameter | Value |
---|---|
Learning Rate | 5e-5 |
Train Batch Size | 16 |
Eval Batch Size | 16 |
Number of Training Epochs | 7 |
Weight Decay | 0.01 |
Save Strategy | Epoch |
Load Best Model at End | True |
Metric for Best Model | F1 |
Push to Hub | True |
Evaluation Strategy | Epoch |
Early Stopping Patience | 3 |
Training results
Epoch | Training Loss | Validation Loss | Precision (%) | Recall (%) | F1 Score (%) | Accuracy (%) |
---|---|---|---|---|---|---|
1 | 0.0443 | 0.038108 | 91.88 | 95.17 | 93.50 | 98.80 |
2 | 0.0318 | 0.035728 | 94.13 | 96.15 | 95.13 | 98.90 |
3 | 0.0209 | 0.032016 | 94.81 | 96.42 | 95.61 | 99.01 |
4 | 0.0154 | 0.040221 | 93.87 | 95.80 | 94.82 | 98.88 |
5 | 0.0084 | 0.048183 | 94.21 | 96.06 | 95.13 | 98.93 |
6 | 0.0037 | 0.052281 | 94.49 | 96.60 | 95.53 | 99.07 |
Author
Framework versions
- Transformers 4.38.2
- Pytorch 2.1.0+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2