iiiorg
/

piiranha-v1-detect-personal-information

Token Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

gaodrew commited on Sep 13

Commit

6afda1a

•

1 Parent(s): 87e2e03

Update README.md

Files changed (1) hide show

README.md +6 -5

README.md CHANGED Viewed

@@ -36,22 +36,23 @@ pipeline_tag: token-classification
 Piiranha is trained to **detect 17 types** of Personally Identifiable Information (PII) across six languages. It successfully **catches 98.27% of PII** tokens, with an overall classification **accuracy of 99.44%**.
 Piiranha is especially accurate at detecting passwords, emails (100%), phone numbers, and usernames.
-Supported languages: English, Spanish, French, German, Italian, Dutch
-Supported PII types: Account Number, Building Number, City, Credit Card Number, Date of Birth, Driver's License, Email, First Name, Last Name, ID Card, Password, Social Security Number, Street Address, Tax Number, Phone Number, Username, Zipcode.
 Performance on PII vs. Non PII classification task:
 - **Precision: 98.48%** (98.48% of tokens classified as PII are actually PII)
 - **Recall: 98.27%** (correctly identifies 98.27% of PII tokens)
 - **Specificity: 99.84%** (correctly identifies 99.84% of Non PII tokens)
-<img src="https://cloud-3i4ld6u5y-hack-club-bot.vercel.app/0home.png" alt="Akash Network logo" width="400"/>
 Piiranha was trained on an H100 GPU rented through the [Akash Network](https://akash.network/).
 ## Model Description
 Piiranha is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base).
 The context length is 256 Deberta tokens. If your text is longer than that, just split it up.
 It achieves the following results on a test set of ~73,000 sentences containing PII:
 - Accuracy: 99.44%
 - Loss: 0.0173

 Piiranha is trained to **detect 17 types** of Personally Identifiable Information (PII) across six languages. It successfully **catches 98.27% of PII** tokens, with an overall classification **accuracy of 99.44%**.
 Piiranha is especially accurate at detecting passwords, emails (100%), phone numbers, and usernames.
 Performance on PII vs. Non PII classification task:
 - **Precision: 98.48%** (98.48% of tokens classified as PII are actually PII)
 - **Recall: 98.27%** (correctly identifies 98.27% of PII tokens)
 - **Specificity: 99.84%** (correctly identifies 99.84% of Non PII tokens)
+<img src="https://cloud-3i4ld6u5y-hack-club-bot.vercel.app/0home.png" alt="Akash Network logo" width="250"/>
 Piiranha was trained on an H100 GPU rented through the [Akash Network](https://akash.network/).
 ## Model Description
 Piiranha is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base).
 The context length is 256 Deberta tokens. If your text is longer than that, just split it up.
+Supported languages: English, Spanish, French, German, Italian, Dutch
+Supported PII types: Account Number, Building Number, City, Credit Card Number, Date of Birth, Driver's License, Email, First Name, Last Name, ID Card, Password, Social Security Number, Street Address, Tax Number, Phone Number, Username, Zipcode.
 It achieves the following results on a test set of ~73,000 sentences containing PII:
 - Accuracy: 99.44%
 - Loss: 0.0173