Update README.md
Browse files
README.md
CHANGED
@@ -36,22 +36,23 @@ pipeline_tag: token-classification
|
|
36 |
Piiranha is trained to **detect 17 types** of Personally Identifiable Information (PII) across six languages. It successfully **catches 98.27% of PII** tokens, with an overall classification **accuracy of 99.44%**.
|
37 |
Piiranha is especially accurate at detecting passwords, emails (100%), phone numbers, and usernames.
|
38 |
|
39 |
-
Supported languages: English, Spanish, French, German, Italian, Dutch
|
40 |
-
|
41 |
-
Supported PII types: Account Number, Building Number, City, Credit Card Number, Date of Birth, Driver's License, Email, First Name, Last Name, ID Card, Password, Social Security Number, Street Address, Tax Number, Phone Number, Username, Zipcode.
|
42 |
-
|
43 |
Performance on PII vs. Non PII classification task:
|
44 |
- **Precision: 98.48%** (98.48% of tokens classified as PII are actually PII)
|
45 |
- **Recall: 98.27%** (correctly identifies 98.27% of PII tokens)
|
46 |
- **Specificity: 99.84%** (correctly identifies 99.84% of Non PII tokens)
|
47 |
|
48 |
-
<img src="https://cloud-3i4ld6u5y-hack-club-bot.vercel.app/0home.png" alt="Akash Network logo" width="
|
49 |
Piiranha was trained on an H100 GPU rented through the [Akash Network](https://akash.network/).
|
50 |
|
51 |
## Model Description
|
52 |
Piiranha is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base).
|
53 |
The context length is 256 Deberta tokens. If your text is longer than that, just split it up.
|
54 |
|
|
|
|
|
|
|
|
|
|
|
55 |
It achieves the following results on a test set of ~73,000 sentences containing PII:
|
56 |
- Accuracy: 99.44%
|
57 |
- Loss: 0.0173
|
|
|
36 |
Piiranha is trained to **detect 17 types** of Personally Identifiable Information (PII) across six languages. It successfully **catches 98.27% of PII** tokens, with an overall classification **accuracy of 99.44%**.
|
37 |
Piiranha is especially accurate at detecting passwords, emails (100%), phone numbers, and usernames.
|
38 |
|
|
|
|
|
|
|
|
|
39 |
Performance on PII vs. Non PII classification task:
|
40 |
- **Precision: 98.48%** (98.48% of tokens classified as PII are actually PII)
|
41 |
- **Recall: 98.27%** (correctly identifies 98.27% of PII tokens)
|
42 |
- **Specificity: 99.84%** (correctly identifies 99.84% of Non PII tokens)
|
43 |
|
44 |
+
<img src="https://cloud-3i4ld6u5y-hack-club-bot.vercel.app/0home.png" alt="Akash Network logo" width="250"/>
|
45 |
Piiranha was trained on an H100 GPU rented through the [Akash Network](https://akash.network/).
|
46 |
|
47 |
## Model Description
|
48 |
Piiranha is a fine-tuned version of [microsoft/mdeberta-v3-base](https://huggingface.co/microsoft/mdeberta-v3-base).
|
49 |
The context length is 256 Deberta tokens. If your text is longer than that, just split it up.
|
50 |
|
51 |
+
Supported languages: English, Spanish, French, German, Italian, Dutch
|
52 |
+
|
53 |
+
Supported PII types: Account Number, Building Number, City, Credit Card Number, Date of Birth, Driver's License, Email, First Name, Last Name, ID Card, Password, Social Security Number, Street Address, Tax Number, Phone Number, Username, Zipcode.
|
54 |
+
|
55 |
+
|
56 |
It achieves the following results on a test set of ~73,000 sentences containing PII:
|
57 |
- Accuracy: 99.44%
|
58 |
- Loss: 0.0173
|