metadata

license: apache-2.0
language:
  - en
pipeline_tag: text-classification

DeTexD-RoBERTa-base delicate text detection

This is a baseline RoBERTa-base model for the delicate text detection task.

Classification example code

Here's a short usage example with the torch library in a binary classification task:

from transformers import pipeline

classifier = pipeline("text-classification", model="grammarly/detexd-roberta-base")

def predict_binary_score(text: str):
    # get multiclass probability scores
    scores = classifier(text, top_k=None)

    # convert to a single score by summing the probability scores
    # for the higher-index classes
    return sum(score['score']
               for score in scores
               if score['label'] in ('LABEL_3', 'LABEL_4', 'LABEL_5'))

def predict_delicate(text: str, threshold=0.72496545):
    return predict_binary_score(text) > threshold

print(predict_delicate("Time flies like an arrow. Fruit flies like a banana."))

Expected output:

False

Citation Information

DeTexD: A Benchmark Dataset for Delicate Text Detection. Serhii Yavnyi, Oleksii Sliusarenko, Jade Razzaghi, Yichen Mo, Knar Hovakimyan, Artem Chernodub // Accepted for publication at The 7th Workshop on Online Abuse and Harms (WOAH) at ACL 2023 in Toronto