bigcode/starpii · Is there any example code to inference the the-stack dataset with starpii like the webapi?

Chinglin

May 25, 2023

This comment has been hidden

loubnabnl

BigCode org May 25, 2023

•

edited May 25, 2023

Hello, you can directly use Token classification pipeline like

from transformers import pipeline

classifier = pipeline("token-classification", model = "bigcode/starpii", aggregation_strategy="simple")
classifier("Hello I'm John and my IP address is 196.780.89.78")

[{'entity_group': 'NAME', 'score': 0.9997844, 'word': ' John', 'start': 9, 'end': 14}, {'entity_group': 'IP_ADDRESS', 'score': 0.99203795, 'word': '196.780.89.', 'start': 52, 'end': 63}]

Check this token-classification documentation and TokenClassificationPipeline docs for more details.

We also release the inference code we used to run PII detection at large scale here: https://github.com/bigcode-project/bigcode-dataset/tree/pii-ner/pii/ner

Note: I suggest that you delete the HF bearer token that you included in your message and create a new one since it's supposed to be a secret. (I took the liberty of hiding your post)

christopher changed discussion status to closed May 26, 2023

Chinglin

Jun 1, 2023

Thanks a lot

mikeerl

Feb 5

Hey, how can I get auth_token to use your model? Im getting the following error:

OSError: bigcode/starpii is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token or log in with huggingface-cli login and pass use_auth_token=True.