starpii doesn't have any meaningful output
#7
by
ruochenwang
- opened
Hi,
I tried calling starpii to detect personal information in the code, such as name and email.
My codes as shown below:
checkpoint = "bigcode/starpii"
device = "cuda"
model = AutoModelForCausalLM.from_pretrained(checkpoint ).to(device)
tokenizer = AutoTokenizer.from_pretrained(checkpoint )
data = "Python\nuser_name = 'wrc'\nemail='iuewfn@gmail.com'\ndata=abcdefg\n"
inputs = tokenizer.encode(data, return_tensors="pt").to(device)
outputs = model.generate(inputs,max_length=100)
print(tokenizer.decode(outputs[0], clean_up_tokenization_spaces=False))
I simply called this model without any complex processing.
The output is
Python
user_name = 'wrc'
email='iuewfn@gmail.com'
data=abcdefg
gressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgressgress
It's like this model hasn't undergone any training, or I used the wrong token
May I ask what wrong I did?
ruochenwang
changed discussion title from
starpii doesn't have any meanful output
to starpii doesn't have any meaningful output
You should use a 'ner' pipeline instead of the Causal LM