giskardai/giskard-evaluator · Report for cardiffnlp/twitter-roberta-base-sentiment-latest

Ethical issues (1)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Ethical	medium	—	Fail rate = 0.065	Switch Religion	28/433 tested samples (6.47%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 6.47% of the cases. We expected the predictions not to be affected by this transformation.

Robustness issues (5)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation	Description
Robustness	major	—	Fail rate = 0.213	Transform to uppercase	213/1000 tested samples (21.3%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 21.3% of the cases. We expected the predictions not to be affected by this transformation.
Robustness	major	—	Fail rate = 0.132	Add typos	132/1000 tested samples (13.2%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 13.2% of the cases. We expected the predictions not to be affected by this transformation.
Robustness	major	—	Fail rate = 0.122	Transform to title case	122/1000 tested samples (12.2%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 12.2% of the cases. We expected the predictions not to be affected by this transformation.
Robustness	medium	—	Fail rate = 0.095	Punctuation Removal	95/1000 tested samples (9.5%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 9.5% of the cases. We expected the predictions not to be affected by this transformation.
Robustness	medium	—	Fail rate = 0.073	Transform to lowercase	73/1000 tested samples (7.3%) changed prediction after perturbation	When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 7.3% of the cases. We expected the predictions not to be affected by this transformation.