metadata

license: apache-2.0
base_model: bert-base-cased
tags:
  - PII
  - NER
  - Bert
  - Token Classification
datasets:
  - generator
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: pii_model
    results:
      - task:
          name: Token Classification
          type: token-classification
        dataset:
          name: generator
          type: generator
          config: default
          split: train
          args: default
        metrics:
          - name: Precision
            type: precision
            value: 0.954751
          - name: Recall
            type: recall
            value: 0.965233
          - name: F1
            type: f1
            value: 0.959964
          - name: Accuracy
            type: accuracy
            value: 0.991199
pipeline_tag: token-classification
language:
  - en

Personal Identifiable Information (PII Model)

This model is a fine-tuned version of bert-base-cased on the generator dataset. It achieves the following results:

Training Loss: 0.003900
Validation Loss: 0.051071
Precision: 95.53%
Recall: 96.60%
F1: 96%
Accuracy:99.11%

Model description

Meet our digital safeguard, a savvy token classification model with a knack for spotting personally identifiable information (PII) entities. Trained on the illustrious Bert architecture and fine-tuned on a custom dataset, this model is like a superhero for privacy, swiftly detecting names, addresses, dates of birth, and more. With each token it encounters, it acts as a vigilant guardian, ensuring that sensitive information remains shielded from prying eyes, making the digital realm a safer and more secure place to explore.

Model can Detect Following Entity Group

ACCOUNTNUMBER
FIRSTNAME
ACCOUNTNAME
PHONENUMBER
CREDITCARDCVV
CREDITCARDISSUER
PREFIX
LASTNAME
AMOUNT
DATE
DOB
COMPANYNAME
BUILDINGNUMBER
STREET
SECONDARYADDRESS
STATE
EMAIL
CITY
CREDITCARDNUMBER
SSN
URL
USERNAME
PASSWORD
COUNTY
PIN
MIDDLENAME
IBAN
GENDER
AGE
ZIPCODE
SEX

Training hyperparameters

The following hyperparameters were used during training:

Hyperparameter	Value
Learning Rate	5e-5
Train Batch Size	16
Eval Batch Size	16
Number of Training Epochs	7
Weight Decay	0.01
Save Strategy	Epoch
Load Best Model at End	True
Metric for Best Model	F1
Push to Hub	True
Evaluation Strategy	Epoch
Early Stopping Patience	3

Training results

Epoch	Training Loss	Validation Loss	Precision (%)	Recall (%)	F1 Score (%)	Accuracy (%)
1	0.0443	0.038108	91.88	95.17	93.50	98.80
2	0.0318	0.035728	94.13	96.15	95.13	98.90
3	0.0209	0.032016	94.81	96.42	95.61	99.01
4	0.0154	0.040221	93.87	95.80	94.82	98.88
5	0.0084	0.048183	94.21	96.06	95.13	98.93
6	0.0037	0.052281	94.49	96.60	95.53	99.07

Author

abhijeet__@outlook.com

Framework versions

Transformers 4.38.2
Pytorch 2.1.0+cu121
Datasets 2.18.0
Tokenizers 0.15.2