Is there any example code to inference the the-stack dataset with starpii like the webapi?
Hello, you can directly use Token classification pipeline like
from transformers import pipeline
classifier = pipeline("token-classification", model = "bigcode/starpii", aggregation_strategy="simple")
classifier("Hello I'm John and my IP address is 196.780.89.78")
[{'entity_group': 'NAME', 'score': 0.9997844, 'word': ' John', 'start': 9, 'end': 14}, {'entity_group': 'IP_ADDRESS', 'score': 0.99203795, 'word': '196.780.89.', 'start': 52, 'end': 63}]
Check this token-classification documentation and TokenClassificationPipeline docs for more details.
We also release the inference code we used to run PII detection at large scale here: https://github.com/bigcode-project/bigcode-dataset/tree/pii-ner/pii/ner
Note: I suggest that you delete the HF bearer token that you included in your message and create a new one since it's supposed to be a secret. (I took the liberty of hiding your post)
Thanks a lot
Hey, how can I get auth_token to use your model? Im getting the following error:
OSError: bigcode/starpii is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with use_auth_token
or log in with huggingface-cli login
and pass use_auth_token=True
.