iiiorg
/

piiranha-v1-detect-personal-information

Token Classification

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

gaodrew commited on Sep 13, 2024

Commit

506726d

·

verified ·

1 Parent(s): 398498e

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -33,6 +33,7 @@ pipeline_tag: token-classification
 <a target="_blank" href="https://colab.research.google.com/github/williamgao1729/piiranha-quickstart/blob/main/piiranha_quickstart.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
 Piiranha is trained to **detect 17 types** of Personally Identifiable Information (PII) across six languages. It successfully **catches 98.27% of PII** tokens, with an overall classification **accuracy of 99.44%**.
 Piiranha is especially accurate at detecting passwords, emails (100%), phone numbers, and usernames.
@@ -42,7 +43,9 @@ Performance on PII vs. Non PII classification task:
 - **Specificity: 99.84%** (correctly identifies 99.84% of Non PII tokens)
 <img src="https://cloud-3i4ld6u5y-hack-club-bot.vercel.app/0home.png" alt="Akash Network logo" width="250"/>
 Piiranha was trained on an H100 GPU rented through the Akash Network (https://akash.network)
 ## Model Description
 Piiranha is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base).
 The context length is 256 Deberta tokens. If your text is longer than that, just split it up.

 <a target="_blank" href="https://colab.research.google.com/github/williamgao1729/piiranha-quickstart/blob/main/piiranha_quickstart.ipynb">
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
 Piiranha is trained to **detect 17 types** of Personally Identifiable Information (PII) across six languages. It successfully **catches 98.27% of PII** tokens, with an overall classification **accuracy of 99.44%**.
 Piiranha is especially accurate at detecting passwords, emails (100%), phone numbers, and usernames.
 - **Specificity: 99.84%** (correctly identifies 99.84% of Non PII tokens)
 <img src="https://cloud-3i4ld6u5y-hack-club-bot.vercel.app/0home.png" alt="Akash Network logo" width="250"/>
 Piiranha was trained on an H100 GPU rented through the Akash Network (https://akash.network)
 ## Model Description
 Piiranha is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base).
 The context length is 256 Deberta tokens. If your text is longer than that, just split it up.