Hi, why i get bad results with your model ?
Hey @Rusvo , can you please share more context on your test?
There are limitations to this model, such as it doesn't detect jailbreaks well, only English prompts and it's not recommended to scan system prompts using it (Act as a chatbot...).
i was try to use this model with 14 subnet of bittensor, i was get to many false_negative
Hey @Rusvo , can you please share more context on your test?
There are limitations to this model, such as it doesn't detect jailbreaks well, only English prompts and it's not recommended to scan system prompts using it (Act as a chatbot...).
Interesting. Do you have visibility on the dataset?
At least, one of their datasets is https://huggingface.co/datasets/synapsecai/synthetic-prompt-injections, which they probably they use for that analysis.
I ran tests on the model for this dataset:
v1 model:
- Accuracy: 0.5436168810154499
- Precision: 0.5232419082282594
- Recall: 0.9664005761959066
- F1 Score: 0.6789028735268694
v2 model:
- Accuracy: 0.6159401181814913
- Precision: 0.9556229850180163
- Recall: 0.24195426444991297
- F1 Score: 0.3861413642154468
The results are quite interesting, so I will spend more time understanding them.
Interesting. Do you have visibility on the dataset?
At least, one of their datasets is https://huggingface.co/datasets/synapsecai/synthetic-prompt-injections, which they probably they use for that analysis.
I ran tests on the model for this dataset:
v1 model:
- Accuracy: 0.5436168810154499
- Precision: 0.5232419082282594
- Recall: 0.9664005761959066
- F1 Score: 0.6789028735268694
v2 model:
- Accuracy: 0.6159401181814913
- Precision: 0.9556229850180163
- Recall: 0.24195426444991297
- F1 Score: 0.3861413642154468
The results are quite interesting, so I will spend more time understanding them.
how do you run these tests